GO slim?
I am hoping to get GO slims to categorize into higher level groups
The link to one that I have from a long long time ago no longer exists http://owl.fish.washington.edu/halfshell/bu-alanine-wd/17-07-20/GO-GOslim.sorted
I see there's a bunch of options on this website: https://geneontology.org/docs/go-subset-guide/
Did you check handbook?
On Mon, Nov 3, 2025 at 6:02 PM Grace Crandall @.***> wrote:
grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3483395269__;Iw!!K-Hz7m0Vt54!hpRnd5PCQut25aUL87MQdCi60ENijzAlF3z0xOxSLl-AwtvdcT00WXoCv6HhhIRgG4twFV8a5Oh4S0-OPEVJ0rc$
I see there's a bunch of options on this website: https://geneontology.org/docs/go-subset-guide/ https://urldefense.com/v3/__https://geneontology.org/docs/go-subset-guide/__;!!K-Hz7m0Vt54!hpRnd5PCQut25aUL87MQdCi60ENijzAlF3z0xOxSLl-AwtvdcT00WXoCv6HhhIRgG4twFV8a5Oh4S0-OAlj6V4A$
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3483395269__;Iw!!K-Hz7m0Vt54!hpRnd5PCQut25aUL87MQdCi60ENijzAlF3z0xOxSLl-AwtvdcT00WXoCv6HhhIRgG4twFV8a5Oh4S0-OPEVJ0rc$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN6E53L7GBPC376NVDT3273LXAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOBTGM4TKMRWHE__;!!K-Hz7m0Vt54!hpRnd5PCQut25aUL87MQdCi60ENijzAlF3z0xOxSLl-AwtvdcT00WXoCv6HhhIRgG4twFV8a5Oh4S0-O8Q8EHfY$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
i did not...
just did! found it in the handbook!
http://current.geneontology.org/ontology/subsets/goslim_generic.obo
following the guide in the handbook
Code: (start at 275): 33-annot-DEGlists.Rmd
Issue with code chunk at line 342
Check line 309 (and 310). You're assigning paths to the variables, not the column names (which is what you want, I believe).
oooh good catch!! I fixed it, but still got the same error
Need to see what that data fram looks like. Run head() and/or str() command on the data frame and post here.
'data.frame': 30969 obs. of 56 variables:
$ transcript_id : chr "g19583.t1" "g12808.t1" "g5589.t1" "g12024.t1" ...
$ PSC.423 : int 166 440 172 34 0 0 11012 205 0 0 ...
$ PSC.426 : int 78 638 0 15 0 0 7564 106 0 0 ...
$ PSC.429 : int 99 882 0 38 9 0 11153 172 0 0 ...
$ PSC.432 : int 110 788 0 3 16 0 10104 158 0 0 ...
$ PSC.435 : int 109 1090 38 34 32 0 8493 92 0 0 ...
$ PSC.438 : int 127 666 290 0 55 0 9920 160 0 0 ...
$ PSC.441 : int 125 494 114 8 18 0 10497 175 0 0 ...
$ PSC.444 : int 65 658 170 0 0 0 8941 201 0 0 ...
$ PSC.453 : int 165 702 1283 50 242 0 5443 280 0 0 ...
$ PSC.456 : int 65 834 57 20 35 0 8554 278 0 0 ...
$ PSC.465 : int 92 1058 14 0 8 0 10118 139 0 0 ...
$ PSC.468 : int 164 663 16 0 4 0 7373 160 0 0 ...
$ PSC.519 : int 191 818 56 117 13 0 11788 108 0 0 ...
$ PSC.522 : int 62 732 219 13 46 0 10008 240 0 0 ...
$ PSC.525 : int 292 1051 247 46 16 0 12181 468 0 0 ...
$ PSC.528 : int 62 701 1843 26 458 0 4815 381 0 0 ...
$ PSC.531 : int 208 1957 0 20 0 0 13609 47 0 0 ...
$ PSC.534 : int 124 1004 1241 62 518 0 6212 506 0 0 ...
$ PSC.537 : int 170 1376 45 17 13 0 11634 53 0 0 ...
$ PSC.540 : int 146 616 0 13 8 0 11590 151 0 0 ...
$ PSC.549 : int 219 1193 0 0 0 0 9150 115 0 0 ...
$ PSC.552 : int 98 373 0 15 0 0 9515 125 0 0 ...
$ PSC.561 : int 349 755 0 0 0 0 13106 67 0 0 ...
$ PSC.564 : int 138 407 94 28 45 0 5866 255 0 0 ...
$ baseMean : num 140.9 NA NA NA 49.7 ...
$ log2FoldChange : num 0.759 NA NA NA -3.055 ...
$ lfcSE : num 0.241 NA NA NA 1.052 ...
$ stat : num 3.15 NA NA NA -2.91 ...
$ pvalue : num 0.00162 NA NA NA 0.00367 ...
$ padj : num 0.0232 NA NA NA 0.039 ...
$ V2 : chr "sp" NA "sp" "sp" ...
$ Entry : chr "P59644" NA "P80146" "Q09143" ...
$ gene_name : chr "PI5PA_MOUSE" NA "SEPR_THESR" "CTR1_MOUSE" ...
$ V5 : num 37.9 NA 38.5 27 NA ...
$ V6 : int 467 NA 314 385 NA NA 413 226 NA NA ...
$ V7 : int 232 NA 161 249 NA NA 272 95 NA NA ...
$ V8 : int 8 NA 7 8 NA NA 7 1 NA NA ...
$ V9 : int 4 NA 81 23 NA NA 10 1 NA NA ...
$ V10 : int 462 NA 377 381 NA NA 420 223 NA NA ...
$ V11 : int 420 NA 109 80 NA NA 4 100 NA NA ...
$ V12 : int 836 NA 407 458 NA NA 402 325 NA NA ...
$ V13 : num 8.30e-95 NA 7.71e-56 2.48e-32 NA ...
$ V14 : num 311 NA 191 132 NA NA 194 271 NA NA ...
$ From : chr "P59644" NA "P80146" "Q09143" ...
$ Reviewed : chr "reviewed" NA "reviewed" "reviewed" ...
$ Entry.Name : chr "PI5PA_MOUSE" NA "SEPR_THESR" "CTR1_MOUSE" ...
$ Protein.names : chr "Phosphatidylinositol 4,5-bisphosphate 5-phosphatase A (EC 3.1.3.36) (Inositol polyphosphate 5-phosphatase J) (P"| __truncated__ NA "Extracellular serine proteinase (EC 3.4.21.-)" "High affinity cationic amino acid transporter 1 (CAT-1) (CAT1) (Ecotropic retroviral leukemia receptor) (Ecotro"| __truncated__ ...
$ Gene.Names : chr "Inpp5j Pib5pa" NA "" "Slc7a1 Atrc1 Rec-1" ...
$ Organism : chr "Mus musculus (Mouse)" NA "Thermus sp. (strain Rt41A)" "Mus musculus (Mouse)" ...
$ Length : int 1003 NA 410 622 NA NA 522 327 NA NA ...
$ Gene.Ontology..biological.process.: chr "negative regulation of neuron projection development [GO:0010977]; phosphatidylinositol dephosphorylation [GO:0046856]" NA "proteolysis [GO:0006508]" "L-arginine transmembrane transport [GO:1903826]; L-histidine import across plasma membrane [GO:1903810]; L-orni"| __truncated__ ...
$ Gene.Ontology..cellular.component.: chr "cytoplasm [GO:0005737]; dendritic shaft [GO:0043198]; growth cone [GO:0030426]; plasma membrane [GO:0005886]; r"| __truncated__ NA "extracellular space [GO:0005615]" "apical plasma membrane [GO:0016324]; basolateral plasma membrane [GO:0016323]; membrane [GO:0016020]; protein-c"| __truncated__ ...
$ Gene.Ontology..GO. : chr "cytoplasm [GO:0005737]; dendritic shaft [GO:0043198]; growth cone [GO:0030426]; plasma membrane [GO:0005886]; r"| __truncated__ NA "extracellular space [GO:0005615]; serine-type endopeptidase activity [GO:0004252]; proteolysis [GO:0006508]" "apical plasma membrane [GO:0016324]; basolateral plasma membrane [GO:0016323]; membrane [GO:0016020]; protein-c"| __truncated__ ...
$ Gene.Ontology..molecular.function.: chr "inositol-1,3,4,5-tetrakisphosphate 5-phosphatase activity [GO:0052659]; inositol-1,4,5-trisphosphate 5-phosphat"| __truncated__ NA "serine-type endopeptidase activity [GO:0004252]" "L-arginine transmembrane transporter activity [GO:0061459]; L-histidine transmembrane transporter activity [GO:"| __truncated__ ...
$ Gene.Ontology.IDs : chr "GO:0001726; GO:0004439; GO:0004445; GO:0005737; GO:0005886; GO:0010977; GO:0017124; GO:0030426; GO:0032587; GO:"| __truncated__ NA "GO:0004252; GO:0005615; GO:0006508" "GO:0000064; GO:0001618; GO:0005290; GO:0015189; GO:0015819; GO:0016020; GO:0016323; GO:0016324; GO:0032991; GO:"| __truncated__ ...
Also, is that the appropriate way to read in a column of data? Do you have to read in the entire file as a CSV first and then you can reference the column in subsequent calls?
I would recommend using https://github.com/sr320/workflow-annotation
I would recommend using https://github.com/sr320/workflow-annotation
this would have me re-run blast, though, right? i already ran blast and just want to annotate to GO slim
correct.. but in fact this might be faster. got a url to a gene fasta you can drop here?
On Tue, Nov 4, 2025 at 11:31 AM Grace Crandall @.***> wrote:
grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3487709552__;Iw!!K-Hz7m0Vt54!nHwMjYQBLKNMPch48aoqAZN8ZCmv7BbWj9PzszDFMbM78KpdaZ-I3TsnXiaOxpUbKkQhzw_KJZhYvNA3jT1-y64$
I would recommend using https://github.com/sr320/workflow-annotation https://urldefense.com/v3/__https://github.com/sr320/workflow-annotation__;!!K-Hz7m0Vt54!nHwMjYQBLKNMPch48aoqAZN8ZCmv7BbWj9PzszDFMbM78KpdaZ-I3TsnXiaOxpUbKkQhzw_KJZhYvNA3g0FniOU$
this would have me re-run blast, though, right? i already ran blast and just want to annotate to GO slim
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3487709552__;Iw!!K-Hz7m0Vt54!nHwMjYQBLKNMPch48aoqAZN8ZCmv7BbWj9PzszDFMbM78KpdaZ-I3TsnXiaOxpUbKkQhzw_KJZhYvNA3jT1-y64$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN72ULUT6Y5ZD6RHHFL33D5JBAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOBXG4YDSNJVGI__;!!K-Hz7m0Vt54!nHwMjYQBLKNMPch48aoqAZN8ZCmv7BbWj9PzszDFMbM78KpdaZ-I3TsnXiaOxpUbKkQhzw_KJZhYvNA3GNONeRg$ . You are receiving this because you commented.Message ID: @.***>
https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq
and done.....
Annotation Summary Report
Job Information
- Input file: augustus.hints.codingseq
- Start time: 2025-11-04 11:58:19
- End time: 2025-11-04 12:01:27
- Duration: 0h 3m 8s
- CPUs used: 40
- Tool: DIAMOND BLASTX (nucleotide)
Results Overview
- Total sequences: 12,156
- BLAST hits found: 12,156 (100.0%)
- GO annotations: 6,206 (51.1%)
- GO-Slim mappings: 6,173 (50.8%)
Output Files
- Main results: annotation_with_goslim.tsv
- Full GO data: annotation_full_go.tsv
- Raw BLAST: augustus.hints.blast.tsv
- Processing script: postprocess_uniprot_go.py
Top GO-Slim Categories
| GO-Slim Term | Count |
|---|---|
| anatomical structure development | 1371 |
| regulation of DNA-templated transcription | 833 |
| cell differentiation | 775 |
| lipid metabolic process | 688 |
| reproductive process | 566 |
| transmembrane transport | 512 |
| vesicle-mediated transport | 510 |
| carbohydrate derivative metabolic process | 450 |
| immune system process | 438 |
| protein-containing complex assembly | 435 |
| cytoskeleton organization | 401 |
| nervous system process | 391 |
| cell motility | 387 |
| chromatin organization | 313 |
| cell adhesion | 308 |

Top 15 GO-Slim categories by sequence count
Performance
- BLAST throughput: 64.7 sequences/second
- Annotation rate: 64.7 hits/second
Generated by blast2slim.sh on 2025-11-04 12:01:27
please let me know if i'm on right track:
to get GOslim terms for each species', I could use the blast2slim workflow.
- Edit /workflow-annotation/blast2slim.sh script with files i want to use
- run them in bash code chunks in Rstudio on raven
omg ignore above comment - i use the qmd document
it's taking a lot longer than 3 mins to run
Using diamond?
On Wed, Nov 5, 2025 at 3:08 PM Grace Crandall @.***> wrote:
grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3493995133__;Iw!!K-Hz7m0Vt54!j1R3Bjcr0ziFhscozirs999WE2cadgbYEDIKgPaFs_0x-5g0IeWW3T6zeD29_ISLv4Mn2rOFvPbeF6RuoW2PLhY$
it's taking a lot longer than 3 mins to run
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3493995133__;Iw!!K-Hz7m0Vt54!j1R3Bjcr0ziFhscozirs999WE2cadgbYEDIKgPaFs_0x-5g0IeWW3T6zeD29_ISLv4Mn2rOFvPbeF6RuoW2PLhY$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN6AG4G4OLYUWHJNUH333J7PXAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJTHE4TKMJTGM__;!!K-Hz7m0Vt54!j1R3Bjcr0ziFhscozirs999WE2cadgbYEDIKgPaFs_0x-5g0IeWW3T6zeD29_ISLv4Mn2rOFvPbeF6RuCHXAd48$ . You are receiving this because you commented.Message ID: @.***>
diamond command not found
i'm working on raven
bash blast2slim.sh -i "https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq" --diamond -o pycno --threads 40
[INFO] Output directory: output/pycno/run_20251105_152508
[INFO] Downloading input FASTA from URL: https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
61 36.6M 61 22.5M 0 0 90.0M 0 --:--:-- --:--:-- --:--:-- 89.7M[INFO] Saved URL FASTA to output/pycno/run_20251105_152508/augustus.hints.codingseq
[INFO] Building DIAMOND protein DB...
100 36.6M 100 36.6M 0 0 96.9M 0 --:--:-- --:--:-- --:--:-- 96.8M
blast2slim.sh: line 114: diamond: command not found
URLs for three files needing GO annotation?
On Wed, Nov 5, 2025 at 3:28 PM Grace Crandall @.***> wrote:
grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494046978__;Iw!!K-Hz7m0Vt54!mY7gqECn1Gfr4pP5dfDr5VxWn_krksx7CIdB5erNSDejuLgi77u41tbjCAlKhLsEOU2Gtie-7kora0zxVzMO5Jg$
diamond command not found
i'm working on raven
bash blast2slim.sh -i "https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq" --diamond -o pycno --threads 40
[INFO] Output directory: output/pycno/run_20251105_152508 [INFO] Downloading input FASTA from URL: https://gannet.fish.washington.edu/seashell/bu-github/paper-pycno-sswd-2021-2022/data/augustus.hints.codingseq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 61 36.6M 61 22.5M 0 0 90.0M 0 --:--:-- --:--:-- --:--:-- 89.7M[INFO] Saved URL FASTA to output/pycno/run_20251105_152508/augustus.hints.codingseq [INFO] Building DIAMOND protein DB... 100 36.6M 100 36.6M 0 0 96.9M 0 --:--:-- --:--:-- --:--:-- 96.8M blast2slim.sh: line 114: diamond: command not found
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494046978__;Iw!!K-Hz7m0Vt54!mY7gqECn1Gfr4pP5dfDr5VxWn_krksx7CIdB5erNSDejuLgi77u41tbjCAlKhLsEOU2Gtie-7kora0zxVzMO5Jg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNYIFC4VWHIJ7MKC6A333KBZDAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJUGA2DMOJXHA__;!!K-Hz7m0Vt54!mY7gqECn1Gfr4pP5dfDr5VxWn_krksx7CIdB5erNSDejuLgi77u41tbjCAlKhLsEOU2Gtie-7kora0zx-c6_WJM$ . You are receiving this because you commented.Message ID: @.***>
pycno:
on raven:
downloaded from NCBI and on raven
/home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/ncbi_dataset/data/GCA_032158295.1/GCA_032158295.1_ASM3215829v1_genomic.fna
pisaster and dermasterias fastas are on raven (not publicly available bc unpublished)
pisaster:
fasta --> /home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/pisaster_clean.fasta
dermasterias:
fasta --> /home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/Newest_Derm_files/derm_imbr_genome.fa
You should not be annotating genomes.... just genes
On Wed, Nov 5, 2025 at 3:43 PM Grace Crandall @.***> wrote:
grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494088930__;Iw!!K-Hz7m0Vt54!i76caYUs3-3wL1Lk4qpFNXdYRZhbNQ84zqXwmyErlM7kbKL43uBMFLY7YZWX9t0QgAxcNt_alo1EerP0kYRZyA4$
pycno: on raven:
downloaded from NCBI and on raven
/home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/ncbi_dataset/data/GCA_032158295.1/GCA_032158295.1_ASM3215829v1_genomic.fna
pisaster and dermasterias fastas are on raven (not publicly available bc unpublished) pisaster: fasta --> /home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/pisaster_clean.fasta
dermasterias: fasta --> /home/shared/16TB_HDD_01/graceac9/project-pycno-multispecies-2023/data/Newest_Derm_files/derm_imbr_genome.fa
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494088930__;Iw!!K-Hz7m0Vt54!i76caYUs3-3wL1Lk4qpFNXdYRZhbNQ84zqXwmyErlM7kbKL43uBMFLY7YZWX9t0QgAxcNt_alo1EerP0kYRZyA4$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PN3YDTTEA2XLNWSUPIT33KDQNAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJUGA4DQOJTGA__;!!K-Hz7m0Vt54!i76caYUs3-3wL1Lk4qpFNXdYRZhbNQ84zqXwmyErlM7kbKL43uBMFLY7YZWX9t0QgAxcNt_alo1EerP0-V0BiPU$ . You are receiving this because you commented.Message ID: @.***>
pycno: 29-deseq2-pycno/DEGlist_transcripts_pycno_controlVexposed.tab
pisaster: 30-deseq2-pisaster/DEGlist_transcripts_pisaster_controlVexposed.tab
dermasterias: 31-deseq2-derm/DEGlist_transcripts_derm_controlVexposed.tab
Fastas….
On Wed, Nov 5, 2025 at 3:52 PM Grace Crandall @.***> wrote:
grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494112798__;Iw!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTlmRcQiU$
pycno: 29-deseq2-pycno/DEGlist_transcripts_pycno_controlVexposed.tab https://urldefense.com/v3/__https://github.com/grace-ac/project-pycno-multispecies-2023/blob/main/output/29-deseq2-pycno/DEGlist_transcripts_pycno_controlVexposed.tab__;!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTEK_yqnw$
pisaster: 30-deseq2-pisaster/DEGlist_transcripts_pisaster_controlVexposed.tab https://urldefense.com/v3/__https://github.com/grace-ac/project-pycno-multispecies-2023/blob/main/output/30-deseq2-pisaster/DEGlist_transcripts_pisaster_controlVexposed.tab__;!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTGME4Xu8$
dermasterias: 31-deseq2-derm/DEGlist_transcripts_derm_controlVexposed.tab https://urldefense.com/v3/__https://github.com/grace-ac/project-pycno-multispecies-2023/blob/main/output/31-deseq2-derm/DEGlist_transcripts_derm_controlVexposed.tab__;!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTMaglPgs$
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3494112798__;Iw!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hTlmRcQiU$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNZDWKV2PYA3RGBDGUT33KESLAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJUGEYTENZZHA__;!!K-Hz7m0Vt54!iJ2W5TQ3oucSKiiolswoD25gEzyiRf1C9TPmEWRTJLl6TVWvzzcukjq9xTEezNnvpPGuwDhLfh4Hg0hT0VobI2A$ . You are receiving this because you commented.Message ID: @.***>
to get fastas, would i try to get sequences of the degs from the genomes?
i tried doing this:
seqkit grep -f ../output/33-annot-DEGlists/pisaster_deg_list.txt ../data/pisaster_clean.fasta > ../data/pisa_deg_seq.fasta
where ../output/33-annot-DEGlists/pisaster_deg_list.txt is a text file of just the degs
won't work because pisaster fasta seq are called "Scaffold_"
head ../output/33-annot-DEGlists/pisaster_deg_list.txt
g7932.t1
g8539.t1
g16117.t1
g28597.t2
g10045.t1
g16115.t1
g18026.t1
g4177.t1
g13103.t1
g23730.t1
pisaster fasta
head ../data/pisaster_clean.fasta
>Scaffold_1
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA
CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
deg list to annotate
head DEGlist_transcripts_pisaster_controlVexposed.tab
baseMean log2FoldChange lfcSE stat pvalue padj
g7932.t1 7151.49853952956 -0.947771539025138 0.324903499923841 -2.91708627099216 0.00353318033676956 0.0424457760650037
g8539.t1 2829.19181553788 -0.300096395519226 0.0983024341428447 -3.05278702542758 0.00226726788320537 0.0325388063982281
g16117.t1 453.65519899295 3.87615792350444 0.609878802035088 6.35562001920741 2.07587594169686e-10 9.15310317492556e-08
g28597.t2 27.6318763920616 5.1202641391409 1.73969427502474 2.94319767136561 0.00324840931125055 0.0402215816151529
g10045.t1 332.875209635289 0.856107660156239 0.254348222810544 3.36588811471243 0.000762976487868391 0.0168822470869492
g16115.t1 24.589890392466 2.13803170746172 0.511383343234492 4.18087866127727 2.9038479961146e-05 0.00176937733049686
g18026.t1 12.0502907243087 1.21136877083293 0.399080171019733 3.03540205402245 0.00240215231644605 0.0335176773737187
g4177.t1 93.3178975316799 1.29621160050945 0.416375432246556 3.11308377037457 0.001851434
For every species you should have a fasta file of genes. (coding sequence) For pisaster it should have header like - g7932.t1
On Thu, Nov 6, 2025 at 7:12 AM Grace Crandall @.***> wrote:
grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3497730662__;Iw!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oGjmkJTsk$
to get fastas, would i try to get sequences of the degs from the genomes?
i tried doing this:
seqkit grep -f ../output/33-annot-DEGlists/pisaster_deg_list.txt ../data/pisaster_clean.fasta > ../data/pisa_deg_seq.fasta
where ../output/33-annot-DEGlists/pisaster_deg_list.txt is a text file of just the degs
won't work because pisaster fasta seq are called "Scaffold_"
head ../output/33-annot-DEGlists/pisaster_deg_list.txt g7932.t1 g8539.t1 g16117.t1 g28597.t2 g10045.t1 g16115.t1 g18026.t1 g4177.t1 g13103.t1 g23730.t1
pisaster fasta
head ../data/pisaster_clean.fasta
Scaffold_1 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
deg list to annotate
head DEGlist_transcripts_pisaster_controlVexposed.tab baseMean log2FoldChange lfcSE stat pvalue padj g7932.t1 7151.49853952956 -0.947771539025138 0.324903499923841 -2.91708627099216 0.00353318033676956 0.0424457760650037 g8539.t1 2829.19181553788 -0.300096395519226 0.0983024341428447 -3.05278702542758 0.00226726788320537 0.0325388063982281 g16117.t1 453.65519899295 3.87615792350444 0.609878802035088 6.35562001920741 2.07587594169686e-10 9.15310317492556e-08 g28597.t2 27.6318763920616 5.1202641391409 1.73969427502474 2.94319767136561 0.00324840931125055 0.0402215816151529 g10045.t1 332.875209635289 0.856107660156239 0.254348222810544 3.36588811471243 0.000762976487868391 0.0168822470869492 g16115.t1 24.589890392466 2.13803170746172 0.511383343234492 4.18087866127727 2.9038479961146e-05 0.00176937733049686 g18026.t1 12.0502907243087 1.21136877083293 0.399080171019733 3.03540205402245 0.00240215231644605 0.0335176773737187 g4177.t1 93.3178975316799 1.29621160050945 0.416375432246556 3.11308377037457 0.001851434
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3497730662__;Iw!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oGjmkJTsk$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNYYAQ7SYQU564YJJMD33NQOTAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJXG4ZTANRWGI__;!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oG-lee5Ww$ . You are receiving this because you commented.Message ID: @.***>
Same files that you blasted prior to get annotation of genes?
On Thu, Nov 6, 2025 at 7:40 AM Steven Roberts @.***> wrote:
For every species you should have a fasta file of genes. (coding sequence) For pisaster it should have header like - g7932.t1
On Thu, Nov 6, 2025 at 7:12 AM Grace Crandall @.***> wrote:
grace-ac left a comment (RobertsLab/resources#2369) https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3497730662__;Iw!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oGjmkJTsk$
to get fastas, would i try to get sequences of the degs from the genomes?
i tried doing this:
seqkit grep -f ../output/33-annot-DEGlists/pisaster_deg_list.txt ../data/pisaster_clean.fasta > ../data/pisa_deg_seq.fasta
where ../output/33-annot-DEGlists/pisaster_deg_list.txt is a text file of just the degs
won't work because pisaster fasta seq are called "Scaffold_"
head ../output/33-annot-DEGlists/pisaster_deg_list.txt g7932.t1 g8539.t1 g16117.t1 g28597.t2 g10045.t1 g16115.t1 g18026.t1 g4177.t1 g13103.t1 g23730.t1
pisaster fasta
head ../data/pisaster_clean.fasta
Scaffold_1 CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
deg list to annotate
head DEGlist_transcripts_pisaster_controlVexposed.tab baseMean log2FoldChange lfcSE stat pvalue padj g7932.t1 7151.49853952956 -0.947771539025138 0.324903499923841 -2.91708627099216 0.00353318033676956 0.0424457760650037 g8539.t1 2829.19181553788 -0.300096395519226 0.0983024341428447 -3.05278702542758 0.00226726788320537 0.0325388063982281 g16117.t1 453.65519899295 3.87615792350444 0.609878802035088 6.35562001920741 2.07587594169686e-10 9.15310317492556e-08 g28597.t2 27.6318763920616 5.1202641391409 1.73969427502474 2.94319767136561 0.00324840931125055 0.0402215816151529 g10045.t1 332.875209635289 0.856107660156239 0.254348222810544 3.36588811471243 0.000762976487868391 0.0168822470869492 g16115.t1 24.589890392466 2.13803170746172 0.511383343234492 4.18087866127727 2.9038479961146e-05 0.00176937733049686 g18026.t1 12.0502907243087 1.21136877083293 0.399080171019733 3.03540205402245 0.00240215231644605 0.0335176773737187 g4177.t1 93.3178975316799 1.29621160050945 0.416375432246556 3.11308377037457 0.001851434
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/RobertsLab/resources/issues/2369*issuecomment-3497730662__;Iw!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oGjmkJTsk$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABB4PNYYAQ7SYQU564YJJMD33NQOTAVCNFSM6AAAAACLBQJHSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOJXG4ZTANRWGI__;!!K-Hz7m0Vt54!nuCrNbi8kUSaW-iU6cBBKx5Sct73isSzWBafeVC9--rIITRcq-pU0s5d2sUl9vUU9RdawR9LCtwCm6oG-lee5Ww$ . You are receiving this because you commented.Message ID: @.***>
got them!
project-pycno-multispecies-2023/data/pyc_deg_seq.fasta
now i'm back at issue where i'm working on raven but diamond command not found