kent icon indicating copy to clipboard operation
kent copied to clipboard

Question about classNet

Open Marh32 opened this issue 2 years ago • 9 comments

Hello,

This bug appeared when I was running netClass,“can't find mysql connection info for database GCF_000001405.40_GRCh38.p14_genomic.fna in hg.conf or ~/.hg.conf, should have a default profile named "db", so values for at least db.host, db.user and db.password. See http://genomewiki.ucsc.edu/index.php/Hg.conf”. And I don't know what "tDb and qDb" are, could you tell me?

Do I need to set qRepeats and tRepeats? Something like the following figure.Or can I run netSyntenic and go straight to netfilter and skip this step? Screenshot 2024-02-23 at 12 57 36

Best regards, Hao

Marh32 avatar Feb 23 '24 04:02 Marh32

Good Evening Marh32: Running the 'netClass' command expects to find repeat information about your query and target genomes in database tables. Run the 'netClass' command without any arguments to see a description of what tDb and qDb are referring to. I'm very curious what procedure you may be running that got you to 'netClass' ? If you do not have databases for the genomes you are using, you don't need to run netClass, you can use your noClass.net file directly without that extra information: netFilter -minGap=10 noClass.net | hgLoadNet -test -noBin -warn -verbose=0 $tDb net$QDb stdin.

NullModel avatar Feb 23 '24 05:02 NullModel

Thank you so much for your reply. I was processing the results of a lastz run with pairwise alignment, and then ran axtchain, chainPreNet, chainNet, and netSyntenic, but after running netClass the above error occurred, and here is the command I entered: netClass -noAr noclass.net GCF_000001405.40_GRCh38.p14_genomic.fna GCF_000164805.1_Tarsius_syrichta-2.0.1_genomic.fna class.net,I want to continue with multiz and phastcons to identify conserved elements in the genomes.

Marh32 avatar Feb 23 '24 06:02 Marh32

tDb - database to fetch target repeat masker table information,I would also like to ask you about the specific format of database.Thank you very much

Marh32 avatar Feb 23 '24 06:02 Marh32

The database tables are the repeatMasker tables rmsk in the UCSC browser database system. You can access these databases with the public MySQL server: http://genome.ucsc.edu/goldenPath/help/mysql.html I'm assuming you are reproducing the primate chains and nets: genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&c=chr7&g=primateChainNet. https://hgdownload.soe.ucsc.edu/goldenPath/hg38/vsTarSyr2/

NullModel avatar Feb 23 '24 06:02 NullModel

Yes,I'm reproducing the primate chains and nets.So if I want to complete the chains and nets step according to my own sequencing data, can I skip the netClass step and directly run netFilter with –syn flag (then do multiz and phastcons)? I see that you have also run reciprocal best, is this step also realized through netFilter? Or do you have a specific process? I didn't find a detailed description on the uscs browser and was a bit confused Thank you very much for your answer.

Marh32 avatar Feb 23 '24 07:02 Marh32

https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/utils/automation/doRecipBest.pl

NullModel avatar Feb 23 '24 07:02 NullModel

Thank you very much.

Marh32 avatar Feb 23 '24 07:02 Marh32

It isn't necessary to make all the different net files. Only one type is used in the multiple alignment depending upon the distance between the genomes. https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/multiz30way.txt

NullModel avatar Feb 23 '24 07:02 NullModel

Ok, thanks for your reply.

Marh32 avatar Feb 23 '24 07:02 Marh32