resources icon indicating copy to clipboard operation
resources copied to clipboard

GO slim?

Open grace-ac opened this issue 3 months ago • 37 comments

I am hoping to get GO slims to categorize into higher level groups

The link to one that I have from a long long time ago no longer exists http://owl.fish.washington.edu/halfshell/bu-alanine-wd/17-07-20/GO-GOslim.sorted

grace-ac avatar Nov 04 '25 02:11 grace-ac

I see there's a bunch of options on this website: https://geneontology.org/docs/go-subset-guide/

grace-ac avatar Nov 04 '25 02:11 grace-ac

i did not...

just did! found it in the handbook!

http://current.geneontology.org/ontology/subsets/goslim_generic.obo

grace-ac avatar Nov 04 '25 02:11 grace-ac

following the guide in the handbook

Code: (start at 275): 33-annot-DEGlists.Rmd

Issue with code chunk at line 342

Image

grace-ac avatar Nov 04 '25 02:11 grace-ac

Check line 309 (and 310). You're assigning paths to the variables, not the column names (which is what you want, I believe).

kubu4 avatar Nov 04 '25 03:11 kubu4

oooh good catch!! I fixed it, but still got the same error

Image

grace-ac avatar Nov 04 '25 03:11 grace-ac

Need to see what that data fram looks like. Run head() and/or str() command on the data frame and post here.

kubu4 avatar Nov 04 '25 03:11 kubu4

'data.frame':	30969 obs. of  56 variables:
 $ transcript_id                     : chr  "g19583.t1" "g12808.t1" "g5589.t1" "g12024.t1" ...
 $ PSC.423                           : int  166 440 172 34 0 0 11012 205 0 0 ...
 $ PSC.426                           : int  78 638 0 15 0 0 7564 106 0 0 ...
 $ PSC.429                           : int  99 882 0 38 9 0 11153 172 0 0 ...
 $ PSC.432                           : int  110 788 0 3 16 0 10104 158 0 0 ...
 $ PSC.435                           : int  109 1090 38 34 32 0 8493 92 0 0 ...
 $ PSC.438                           : int  127 666 290 0 55 0 9920 160 0 0 ...
 $ PSC.441                           : int  125 494 114 8 18 0 10497 175 0 0 ...
 $ PSC.444                           : int  65 658 170 0 0 0 8941 201 0 0 ...
 $ PSC.453                           : int  165 702 1283 50 242 0 5443 280 0 0 ...
 $ PSC.456                           : int  65 834 57 20 35 0 8554 278 0 0 ...
 $ PSC.465                           : int  92 1058 14 0 8 0 10118 139 0 0 ...
 $ PSC.468                           : int  164 663 16 0 4 0 7373 160 0 0 ...
 $ PSC.519                           : int  191 818 56 117 13 0 11788 108 0 0 ...
 $ PSC.522                           : int  62 732 219 13 46 0 10008 240 0 0 ...
 $ PSC.525                           : int  292 1051 247 46 16 0 12181 468 0 0 ...
 $ PSC.528                           : int  62 701 1843 26 458 0 4815 381 0 0 ...
 $ PSC.531                           : int  208 1957 0 20 0 0 13609 47 0 0 ...
 $ PSC.534                           : int  124 1004 1241 62 518 0 6212 506 0 0 ...
 $ PSC.537                           : int  170 1376 45 17 13 0 11634 53 0 0 ...
 $ PSC.540                           : int  146 616 0 13 8 0 11590 151 0 0 ...
 $ PSC.549                           : int  219 1193 0 0 0 0 9150 115 0 0 ...
 $ PSC.552                           : int  98 373 0 15 0 0 9515 125 0 0 ...
 $ PSC.561                           : int  349 755 0 0 0 0 13106 67 0 0 ...
 $ PSC.564                           : int  138 407 94 28 45 0 5866 255 0 0 ...
 $ baseMean                          : num  140.9 NA NA NA 49.7 ...
 $ log2FoldChange                    : num  0.759 NA NA NA -3.055 ...
 $ lfcSE                             : num  0.241 NA NA NA 1.052 ...
 $ stat                              : num  3.15 NA NA NA -2.91 ...
 $ pvalue                            : num  0.00162 NA NA NA 0.00367 ...
 $ padj                              : num  0.0232 NA NA NA 0.039 ...
 $ V2                                : chr  "sp" NA "sp" "sp" ...
 $ Entry                             : chr  "P59644" NA "P80146" "Q09143" ...
 $ gene_name                         : chr  "PI5PA_MOUSE" NA "SEPR_THESR" "CTR1_MOUSE" ...
 $ V5                                : num  37.9 NA 38.5 27 NA ...
 $ V6                                : int  467 NA 314 385 NA NA 413 226 NA NA ...
 $ V7                                : int  232 NA 161 249 NA NA 272 95 NA NA ...
 $ V8                                : int  8 NA 7 8 NA NA 7 1 NA NA ...
 $ V9                                : int  4 NA 81 23 NA NA 10 1 NA NA ...
 $ V10                               : int  462 NA 377 381 NA NA 420 223 NA NA ...
 $ V11                               : int  420 NA 109 80 NA NA 4 100 NA NA ...
 $ V12                               : int  836 NA 407 458 NA NA 402 325 NA NA ...
 $ V13                               : num  8.30e-95 NA 7.71e-56 2.48e-32 NA ...
 $ V14                               : num  311 NA 191 132 NA NA 194 271 NA NA ...
 $ From                              : chr  "P59644" NA "P80146" "Q09143" ...
 $ Reviewed                          : chr  "reviewed" NA "reviewed" "reviewed" ...
 $ Entry.Name                        : chr  "PI5PA_MOUSE" NA "SEPR_THESR" "CTR1_MOUSE" ...
 $ Protein.names                     : chr  "Phosphatidylinositol 4,5-bisphosphate 5-phosphatase A (EC 3.1.3.36) (Inositol polyphosphate 5-phosphatase J) (P"| __truncated__ NA "Extracellular serine proteinase (EC 3.4.21.-)" "High affinity cationic amino acid transporter 1 (CAT-1) (CAT1) (Ecotropic retroviral leukemia receptor) (Ecotro"| __truncated__ ...
 $ Gene.Names                        : chr  "Inpp5j Pib5pa" NA "" "Slc7a1 Atrc1 Rec-1" ...
 $ Organism                          : chr  "Mus musculus (Mouse)" NA "Thermus sp. (strain Rt41A)" "Mus musculus (Mouse)" ...
 $ Length                            : int  1003 NA 410 622 NA NA 522 327 NA NA ...
 $ Gene.Ontology..biological.process.: chr  "negative regulation of neuron projection development [GO:0010977]; phosphatidylinositol dephosphorylation [GO:0046856]" NA "proteolysis [GO:0006508]" "L-arginine transmembrane transport [GO:1903826]; L-histidine import across plasma membrane [GO:1903810]; L-orni"| __truncated__ ...
 $ Gene.Ontology..cellular.component.: chr  "cytoplasm [GO:0005737]; dendritic shaft [GO:0043198]; growth cone [GO:0030426]; plasma membrane [GO:0005886]; r"| __truncated__ NA "extracellular space [GO:0005615]" "apical plasma membrane [GO:0016324]; basolateral plasma membrane [GO:0016323]; membrane [GO:0016020]; protein-c"| __truncated__ ...
 $ Gene.Ontology..GO.                : chr  "cytoplasm [GO:0005737]; dendritic shaft [GO:0043198]; growth cone [GO:0030426]; plasma membrane [GO:0005886]; r"| __truncated__ NA "extracellular space [GO:0005615]; serine-type endopeptidase activity [GO:0004252]; proteolysis [GO:0006508]" "apical plasma membrane [GO:0016324]; basolateral plasma membrane [GO:0016323]; membrane [GO:0016020]; protein-c"| __truncated__ ...
 $ Gene.Ontology..molecular.function.: chr  "inositol-1,3,4,5-tetrakisphosphate 5-phosphatase activity [GO:0052659]; inositol-1,4,5-trisphosphate 5-phosphat"| __truncated__ NA "serine-type endopeptidase activity [GO:0004252]" "L-arginine transmembrane transporter activity [GO:0061459]; L-histidine transmembrane transporter activity [GO:"| __truncated__ ...
 $ Gene.Ontology.IDs                 : chr  "GO:0001726; GO:0004439; GO:0004445; GO:0005737; GO:0005886; GO:0010977; GO:0017124; GO:0030426; GO:0032587; GO:"| __truncated__ NA "GO:0004252; GO:0005615; GO:0006508" "GO:0000064; GO:0001618; GO:0005290; GO:0015189; GO:0015819; GO:0016020; GO:0016323; GO:0016324; GO:0032991; GO:"| __truncated__ ...

grace-ac avatar Nov 04 '25 03:11 grace-ac

Also, is that the appropriate way to read in a column of data? Do you have to read in the entire file as a CSV first and then you can reference the column in subsequent calls?

kubu4 avatar Nov 04 '25 04:11 kubu4

I would recommend using https://github.com/sr320/workflow-annotation

sr320 avatar Nov 04 '25 14:11 sr320

I would recommend using https://github.com/sr320/workflow-annotation

this would have me re-run blast, though, right? i already ran blast and just want to annotate to GO slim

grace-ac avatar Nov 04 '25 19:11 grace-ac

https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq

grace-ac avatar Nov 04 '25 19:11 grace-ac

and done.....

Image

sr320 avatar Nov 04 '25 20:11 sr320

Annotation Summary Report

Job Information

  • Input file: augustus.hints.codingseq
  • Start time: 2025-11-04 11:58:19
  • End time: 2025-11-04 12:01:27
  • Duration: 0h 3m 8s
  • CPUs used: 40
  • Tool: DIAMOND BLASTX (nucleotide)

Results Overview

  • Total sequences: 12,156
  • BLAST hits found: 12,156 (100.0%)
  • GO annotations: 6,206 (51.1%)
  • GO-Slim mappings: 6,173 (50.8%)

Output Files

  • Main results: annotation_with_goslim.tsv
  • Full GO data: annotation_full_go.tsv
  • Raw BLAST: augustus.hints.blast.tsv
  • Processing script: postprocess_uniprot_go.py

Top GO-Slim Categories

GO-Slim Term Count
anatomical structure development 1371
regulation of DNA-templated transcription 833
cell differentiation 775
lipid metabolic process 688
reproductive process 566
transmembrane transport 512
vesicle-mediated transport 510
carbohydrate derivative metabolic process 450
immune system process 438
protein-containing complex assembly 435
cytoskeleton organization 401
nervous system process 391
cell motility 387
chromatin organization 313
cell adhesion 308

GO-Slim Categories

Top 15 GO-Slim categories by sequence count

Performance

  • BLAST throughput: 64.7 sequences/second
  • Annotation rate: 64.7 hits/second

Generated by blast2slim.sh on 2025-11-04 12:01:27

sr320 avatar Nov 04 '25 20:11 sr320

please let me know if i'm on right track:

to get GOslim terms for each species', I could use the blast2slim workflow.

  1. Edit /workflow-annotation/blast2slim.sh script with files i want to use
  2. run them in bash code chunks in Rstudio on raven

grace-ac avatar Nov 05 '25 21:11 grace-ac

omg ignore above comment - i use the qmd document

grace-ac avatar Nov 05 '25 22:11 grace-ac

it's taking a lot longer than 3 mins to run

grace-ac avatar Nov 05 '25 23:11 grace-ac

diamond command not found

i'm working on raven

bash blast2slim.sh -i "https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq" --diamond -o pycno --threads 40
[INFO] Output directory: output/pycno/run_20251105_152508
[INFO] Downloading input FASTA from URL: https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 61 36.6M   61 22.5M    0     0  90.0M      0 --:--:-- --:--:-- --:--:-- 89.7M[INFO] Saved URL FASTA to output/pycno/run_20251105_152508/augustus.hints.codingseq
[INFO] Building DIAMOND protein DB...
100 36.6M  100 36.6M    0     0  96.9M      0 --:--:-- --:--:-- --:--:-- 96.8M
blast2slim.sh: line 114: diamond: command not found

grace-ac avatar Nov 05 '25 23:11 grace-ac

URLs for three files needing GO annotation?

On Wed, Nov 5, 2025 at 3:28 PM Grace Crandall @.***> wrote:

grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494046978__;Iw!!K-Hz7m0Vt54!mY7gqECn1Gfr4pP5dfDr5VxWn_krksx7CIdB5erNSDejuLgi77u41tbjCAlKhLsEOU2Gtie-7kora0zxVzMO5Jg$

diamond command not found

i'm working on raven

bash blast2slim.sh -i "https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq" --diamond -o pycno --threads 40

[INFO] Output directory: output/pycno/run_20251105_152508 [INFO] Downloading input FASTA from URL: https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 61 36.6M 61 22.5M 0 0 90.0M 0 --:--:-- --:--:-- --:--:-- 89.7M[INFO] Saved URL FASTA to output/pycno/run_20251105_152508/augustus.hints.codingseq [INFO] Building DIAMOND protein DB... 100 36.6M 100 36.6M 0 0 96.9M 0 --:--:-- --:--:-- --:--:-- 96.8M blast2slim.sh: line 114: diamond: command not found

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494046978__;Iw!!K-Hz7m0Vt54!mY7gqECn1Gfr4pP5dfDr5VxWn_krksx7CIdB5erNSDejuLgi77u41tbjCAlKhLsEOU2Gtie-7kora0zxVzMO5Jg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNYIFC4VWHIJ7MKC6A333KBZDAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJUGA2DMOJXHA__;!!K-Hz7m0Vt54!mY7gqECn1Gfr4pP5dfDr5VxWn_krksx7CIdB5erNSDejuLgi77u41tbjCAlKhLsEOU2Gtie-7kora0zx-c6_WJM$ . You are receiving this because you commented.Message ID: @.***>

sr320 avatar Nov 05 '25 23:11 sr320

pycno:
on raven:

downloaded from NCBI and on raven
/home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/ncbi_dataset/data/GCA_032158295.1/GCA_032158295.1_ASM3215829v1_genomic.fna

pisaster and dermasterias fastas are on raven (not publicly available bc unpublished) pisaster:
fasta --> /home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/pisaster_clean.fasta

dermasterias:
fasta --> /home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/Newest_Derm_files/derm_imbr_genome.fa

grace-ac avatar Nov 05 '25 23:11 grace-ac

You should not be annotating genomes.... just genes

l https://d.pr/UZHX8l

On Wed, Nov 5, 2025 at 3:43 PM Grace Crandall @.***> wrote:

grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494088930__;Iw!!K-Hz7m0Vt54!i76caYUs3-3wL1Lk4qpFNXdYRZhbNQ84zqXwmyErlM7kbKL43uBMFLY7YZWX9t0QgAxcNt_alo1EerP0kYRZyA4$

pycno: on raven:

downloaded from NCBI and on raven

/home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/ncbi_dataset/data/GCA_032158295.1/GCA_032158295.1_ASM3215829v1_genomic.fna

pisaster and dermasterias fastas are on raven (not publicly available bc unpublished) pisaster: fasta --> /home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/pisaster_clean.fasta

dermasterias: fasta --> /home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/Newest_Derm_files/derm_imbr_genome.fa

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494088930__;Iw!!K-Hz7m0Vt54!i76caYUs3-3wL1Lk4qpFNXdYRZhbNQ84zqXwmyErlM7kbKL43uBMFLY7YZWX9t0QgAxcNt_alo1EerP0kYRZyA4$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN3YDTTEA2XLNWSUPIT33KDQNAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJUGA4DQOJTGA__;!!K-Hz7m0Vt54!i76caYUs3-3wL1Lk4qpFNXdYRZhbNQ84zqXwmyErlM7kbKL43uBMFLY7YZWX9t0QgAxcNt_alo1EerP0-V0BiPU$ . You are receiving this because you commented.Message ID: @.***>

sr320 avatar Nov 05 '25 23:11 sr320

Fastas….

On Wed, Nov 5, 2025 at 3:52 PM Grace Crandall @.***> wrote:

grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494112798__;Iw!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTlmRcQiU$

pycno: 29-deseq2-pycno/DEGlist_transcripts_pycno_controlVexposed.tab https://urldefense.com/v3/__https://github.com/grace-ac/project-pycno-multispecies-2023/blob/main/output/29-deseq2-pycno/DEGlist_transcripts_pycno_controlVexposed.tab__;!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTEK_yqnw$

pisaster: 30-deseq2-pisaster/DEGlist_transcripts_pisaster_controlVexposed.tab https://urldefense.com/v3/__https://github.com/grace-ac/project-pycno-multispecies-2023/blob/main/output/30-deseq2-pisaster/DEGlist_transcripts_pisaster_controlVexposed.tab__;!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTGME4Xu8$

dermasterias: 31-deseq2-derm/DEGlist_transcripts_derm_controlVexposed.tab https://urldefense.com/v3/__https://github.com/grace-ac/project-pycno-multispecies-2023/blob/main/output/31-deseq2-derm/DEGlist_transcripts_derm_controlVexposed.tab__;!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTMaglPgs$

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494112798__;Iw!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTlmRcQiU$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNZDWKV2PYA3RGBDGUT33KESLAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJUGEYTENZZHA__;!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hT0VobI2A$ . You are receiving this because you commented.Message ID: @.***>

sr320 avatar Nov 05 '25 23:11 sr320

to get fastas, would i try to get sequences of the degs from the genomes?

i tried doing this:

seqkit grep -f ../output/33-annot-DEGlists/pisaster_deg_list.txt ../data/pisaster_clean.fasta > ../data/pisa_deg_seq.fasta

where ../output/33-annot-DEGlists/pisaster_deg_list.txt is a text file of just the degs

won't work because pisaster fasta seq are called "Scaffold_"

head ../output/33-annot-DEGlists/pisaster_deg_list.txt 
g7932.t1
g8539.t1
g16117.t1
g28597.t2
g10045.t1
g16115.t1
g18026.t1
g4177.t1
g13103.t1
g23730.t1

pisaster fasta

head ../data/pisaster_clean.fasta
>Scaffold_1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC

deg list to annotate

head DEGlist_transcripts_pisaster_controlVexposed.tab 
baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
g7932.t1	7151.49853952956	-0.947771539025138	0.324903499923841	-2.91708627099216	0.00353318033676956	0.0424457760650037
g8539.t1	2829.19181553788	-0.300096395519226	0.0983024341428447	-3.05278702542758	0.00226726788320537	0.0325388063982281
g16117.t1	453.65519899295	3.87615792350444	0.609878802035088	6.35562001920741	2.07587594169686e-10	9.15310317492556e-08
g28597.t2	27.6318763920616	5.1202641391409	1.73969427502474	2.94319767136561	0.00324840931125055	0.0402215816151529
g10045.t1	332.875209635289	0.856107660156239	0.254348222810544	3.36588811471243	0.000762976487868391	0.0168822470869492
g16115.t1	24.589890392466	2.13803170746172	0.511383343234492	4.18087866127727	2.9038479961146e-05	0.00176937733049686
g18026.t1	12.0502907243087	1.21136877083293	0.399080171019733	3.03540205402245	0.00240215231644605	0.0335176773737187
g4177.t1	93.3178975316799	1.29621160050945	0.416375432246556	3.11308377037457	0.001851434

grace-ac avatar Nov 06 '25 15:11 grace-ac

For every species you should have a fasta file of genes. (coding sequence) For pisaster it should have header like - g7932.t1

On Thu, Nov 6, 2025 at 7:12 AM Grace Crandall @.***> wrote:

grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3497730662__;Iw!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oGjmkJTsk$

to get fastas, would i try to get sequences of the degs from the genomes?

i tried doing this:

seqkit grep -f ../output/33-annot-DEGlists/pisaster_deg_list.txt ../data/pisaster_clean.fasta > ../data/pisa_deg_seq.fasta

where ../output/33-annot-DEGlists/pisaster_deg_list.txt is a text file of just the degs

won't work because pisaster fasta seq are called "Scaffold_"

head ../output/33-annot-DEGlists/pisaster_deg_list.txt g7932.t1 g8539.t1 g16117.t1 g28597.t2 g10045.t1 g16115.t1 g18026.t1 g4177.t1 g13103.t1 g23730.t1

pisaster fasta

head ../data/pisaster_clean.fasta

Scaffold_1 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC

deg list to annotate

head DEGlist_transcripts_pisaster_controlVexposed.tab baseMean log2FoldChange lfcSE stat pvalue padj g7932.t1 7151.49853952956 -0.947771539025138 0.324903499923841 -2.91708627099216 0.00353318033676956 0.0424457760650037 g8539.t1 2829.19181553788 -0.300096395519226 0.0983024341428447 -3.05278702542758 0.00226726788320537 0.0325388063982281 g16117.t1 453.65519899295 3.87615792350444 0.609878802035088 6.35562001920741 2.07587594169686e-10 9.15310317492556e-08 g28597.t2 27.6318763920616 5.1202641391409 1.73969427502474 2.94319767136561 0.00324840931125055 0.0402215816151529 g10045.t1 332.875209635289 0.856107660156239 0.254348222810544 3.36588811471243 0.000762976487868391 0.0168822470869492 g16115.t1 24.589890392466 2.13803170746172 0.511383343234492 4.18087866127727 2.9038479961146e-05 0.00176937733049686 g18026.t1 12.0502907243087 1.21136877083293 0.399080171019733 3.03540205402245 0.00240215231644605 0.0335176773737187 g4177.t1 93.3178975316799 1.29621160050945 0.416375432246556 3.11308377037457 0.001851434

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3497730662__;Iw!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oGjmkJTsk$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNYYAQ7SYQU564YJJMD33NQOTAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJXG4ZTANRWGI__;!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oG-lee5Ww$ . You are receiving this because you commented.Message ID: @.***>

sr320 avatar Nov 06 '25 15:11 sr320

Same files that you blasted prior to get annotation of genes?

On Thu, Nov 6, 2025 at 7:40 AM Steven Roberts @.***> wrote:

For every species you should have a fasta file of genes. (coding sequence) For pisaster it should have header like - g7932.t1

On Thu, Nov 6, 2025 at 7:12 AM Grace Crandall @.***> wrote:

grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3497730662__;Iw!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oGjmkJTsk$

to get fastas, would i try to get sequences of the degs from the genomes?

i tried doing this:

seqkit grep -f ../output/33-annot-DEGlists/pisaster_deg_list.txt ../data/pisaster_clean.fasta > ../data/pisa_deg_seq.fasta

where ../output/33-annot-DEGlists/pisaster_deg_list.txt is a text file of just the degs

won't work because pisaster fasta seq are called "Scaffold_"

head ../output/33-annot-DEGlists/pisaster_deg_list.txt g7932.t1 g8539.t1 g16117.t1 g28597.t2 g10045.t1 g16115.t1 g18026.t1 g4177.t1 g13103.t1 g23730.t1

pisaster fasta

head ../data/pisaster_clean.fasta

Scaffold_1 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC

deg list to annotate

head DEGlist_transcripts_pisaster_controlVexposed.tab baseMean log2FoldChange lfcSE stat pvalue padj g7932.t1 7151.49853952956 -0.947771539025138 0.324903499923841 -2.91708627099216 0.00353318033676956 0.0424457760650037 g8539.t1 2829.19181553788 -0.300096395519226 0.0983024341428447 -3.05278702542758 0.00226726788320537 0.0325388063982281 g16117.t1 453.65519899295 3.87615792350444 0.609878802035088 6.35562001920741 2.07587594169686e-10 9.15310317492556e-08 g28597.t2 27.6318763920616 5.1202641391409 1.73969427502474 2.94319767136561 0.00324840931125055 0.0402215816151529 g10045.t1 332.875209635289 0.856107660156239 0.254348222810544 3.36588811471243 0.000762976487868391 0.0168822470869492 g16115.t1 24.589890392466 2.13803170746172 0.511383343234492 4.18087866127727 2.9038479961146e-05 0.00176937733049686 g18026.t1 12.0502907243087 1.21136877083293 0.399080171019733 3.03540205402245 0.00240215231644605 0.0335176773737187 g4177.t1 93.3178975316799 1.29621160050945 0.416375432246556 3.11308377037457 0.001851434

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3497730662__;Iw!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oGjmkJTsk$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNYYAQ7SYQU564YJJMD33NQOTAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJXG4ZTANRWGI__;!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oG-lee5Ww$ . You are receiving this because you commented.Message ID: @.***>

sr320 avatar Nov 06 '25 15:11 sr320

now i'm back at issue where i'm working on raven but diamond command not found

grace-ac avatar Nov 06 '25 16:11 grace-ac