seqsender icon indicating copy to clipboard operation
seqsender copied to clipboard

Add One Health Enteric BioSample Package + misc bugfixes

Open erikwolfsohn opened this issue 1 year ago • 5 comments

Hi! I just found out about this fantastic submission pipeline you all built and I'm really excited to start using it. I had to make some updates since the majority of my NCBI submissions are enteric pathogens, so figured I'd submit a pull request in case any of these changes can be useful for you all.

📋 Updates

  • Added all mandatory and optional metadata fields for the One Health Enteric BioSample Package to main_config.yaml
  • Added a template and config file for BioSample & SRA submission using that package
  • Changed some handling to remove optional columns if left blank before submission

🛠️ Fixes

  • Slightly modified the behavior of the check_submission_status workflow to prevent FTP navigation errors/query errors.
  • Slightly modified handling for input arguments so users shouldn't be prompted for FASTA input unless they're submitting to Genbank or GISAID
  • Changed handling for mandatory vs optional columns; the pipeline should no longer require optional columns in metadata.csv prior to submission

erikwolfsohn avatar Mar 06 '24 08:03 erikwolfsohn

Hey @erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see.

dthoward96 avatar Mar 08 '24 19:03 dthoward96

Hey @erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see.

Awesome, thank you! I saw in another issue you were talking about implementing Pandera metadata validation as a way to support new pathogens and biosample packages in a future release. I think that's a great idea, and I'd love to contribute if possible. I started working on a pandera schema for the OneHealth enteric package and I really like it as an alternative to validating against that main yaml config file.

Feel free to shoot me an email at [email protected] if you have some time to chat about your plans for that and possible ways I can contribute - I think this submission pipeline is going to be incredibly useful for our lab, so I definitely want to help in any way I can.

erikwolfsohn avatar Mar 15 '24 18:03 erikwolfsohn

Hey @erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see.

Awesome, thank you! I saw in another issue you were talking about implementing Pandera metadata validation as a way to support new pathogens and biosample packages in a future release. I think that's a great idea, and I'd love to contribute if possible. I started working on a pandera schema for the OneHealth enteric package and I really like it as an alternative to validating against that main yaml config file.

Feel free to shoot me an email at [email protected] if you have some time to chat about your plans for that and possible ways I can contribute - I think this submission pipeline is going to be incredibly useful for our lab, so I definitely want to help in any way I can.

Yes, I'm working quickly to get it added. The different requirements for One Health Enteric BioSample attributes has caused some issues when testing so instead of reinventing the wheel, I'm going to move up the pandera validation to the next version update to just resolve this issue instead of implementing a temporary fix. I don't think you'll need to manually create a One Health Enteric specific schema as I'm currently testing a way to automatically generate it from NCBI's website. I should have this available on the version update branch later this week. I've already added the Enteric xml as part of the test set I'm working on. I do have a couple other questions that I'm pooling together so once I have the update live I'll send you a email to let you know with my other questions included. Once you get my email if you could test it with the automatically generated schema that would be a major help.

dthoward96 avatar Mar 19 '24 15:03 dthoward96

Hi Dakota,

I just wanted to check in and make sure I didn't miss any emails from you. I was focusing on some other projects and this dropped off my radar a little bit. Let me know if there's any way I can contribute currently or if anything is ready for testing. Pulling the metadata templates directly from NCBI sounds fantastic, I'm definitely excited for that feature. I'll be at a conference next week so I won't be available to do much testing, but I'll be back on May 13th.

On Tue, Mar 19, 2024 at 8:10 AM Dakota Howard @.***> wrote:

Hey @erikwolfsohn https://github.com/erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see.

Awesome, thank you! I saw in another issue you were talking about implementing Pandera metadata validation as a way to support new pathogens and biosample packages in a future release. I think that's a great idea, and I'd love to contribute if possible. I started working on a pandera schema for the OneHealth enteric package and I really like it as an alternative to validating against that main yaml config file.

Feel free to shoot me an email at @.*** if you have some time to chat about your plans for that and possible ways I can contribute - I think this submission pipeline is going to be incredibly useful for our lab, so I definitely want to help in any way I can.

Yes, I'm working quickly to get it added. The different requirements for One Health Enteric BioSample attributes has caused some issues when testing so instead of reinventing the wheel, I'm going to move up the pandera validation to the next version update to just resolve this issue instead of implementing a temporary fix. I don't think you'll need to manually create a One Health Enteric specific schema as I'm currently testing a way to automatically generate it from NCBI's website. I should have this available on the version update branch later this week. I've already added the Enteric xml as part of the test set I'm working on. I do have a couple other questions that I'm pooling together so once I have the update live I'll send you a email to let you know with my other questions included. Once you get my email if you could test it with the automatically generated schema that would be a major help.

— Reply to this email directly, view it on GitHub https://github.com/CDCgov/seqsender/pull/38#issuecomment-2007452141, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEGEJNHYOLQT6FQ7G2VOWE3YZBIPDAVCNFSM6AAAAABEIUGL7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGQ2TEMJUGE . You are receiving this because you were mentioned.Message ID: @.***>

erikwolfsohn avatar May 01 '24 22:05 erikwolfsohn

Hi Dakota, I just wanted to check in and make sure I didn't miss any emails from you. I was focusing on some other projects and this dropped off my radar a little bit. Let me know if there's any way I can contribute currently or if anything is ready for testing. Pulling the metadata templates directly from NCBI sounds fantastic, I'm definitely excited for that feature. I'll be at a conference next week so I won't be available to do much testing, but I'll be back on May 13th. On Tue, Mar 19, 2024 at 8:10 AM Dakota Howard @.> wrote: Hey @erikwolfsohn https://github.com/erikwolfsohn, thanks for contributing to SeqSender and incorporating enteric pathogens. I made a couple changes based on my review to fix certain issues I identified but everything looks good. I'm still doing a couple more tests to make sure there aren't any other issues but I should have all your changes merged in shortly within the next few days. If you have any other contributions you'd like to make please make a pull request anytime or suggest changes in our issues for features you'd like to see. Awesome, thank you! I saw in another issue you were talking about implementing Pandera metadata validation as a way to support new pathogens and biosample packages in a future release. I think that's a great idea, and I'd love to contribute if possible. I started working on a pandera schema for the OneHealth enteric package and I really like it as an alternative to validating against that main yaml config file. Feel free to shoot me an email at @. if you have some time to chat about your plans for that and possible ways I can contribute - I think this submission pipeline is going to be incredibly useful for our lab, so I definitely want to help in any way I can. Yes, I'm working quickly to get it added. The different requirements for One Health Enteric BioSample attributes has caused some issues when testing so instead of reinventing the wheel, I'm going to move up the pandera validation to the next version update to just resolve this issue instead of implementing a temporary fix. I don't think you'll need to manually create a One Health Enteric specific schema as I'm currently testing a way to automatically generate it from NCBI's website. I should have this available on the version update branch later this week. I've already added the Enteric xml as part of the test set I'm working on. I do have a couple other questions that I'm pooling together so once I have the update live I'll send you a email to let you know with my other questions included. Once you get my email if you could test it with the automatically generated schema that would be a major help. — Reply to this email directly, view it on GitHub <#38 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEGEJNHYOLQT6FQ7G2VOWE3YZBIPDAVCNFSM6AAAAABEIUGL7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGQ2TEMJUGE . You are receiving this because you were mentioned.Message ID: @.***>

Hey @erikwolfsohn,

Sorry, for not having reached out sooner, it took me a bit longer than anticipated to finish working through the update. The update is currently mostly complete and I'm in the process of finalizing the updated instructions for the documentation. I'll go ahead and send you an email now to connect, but I'm definitely in need of users to test this new version. The updated documentation will be available on the branch: https://github.com/CDCgov/seqsender/tree/v1.2.0 by the time you're back on the 13th. I'll also send you an email when I do upload the documentation, as well.

dthoward96 avatar May 03 '24 19:05 dthoward96

SeqSender V1.2.0 is currently out and now supports the One Health Enteric Package. Use the documentation to select the package to get the correct metadata and config file.

dthoward96 avatar Aug 06 '24 20:08 dthoward96

This is awesome, thank you for all your hard work on this! We're planning to scale up sequencing significantly at our lab, so I cannot overstate how excited I am about this new release. Since we do our analysis and submission from inside the Terra.bio cloud platform, I'm working on a Terra workflow to use seqsender in that environment and will definitely share it with you when I'm done - hopefully by the end of this week.

SRA submission has worked great for me so far. I am having some trouble with the GISAID submission, but I'm not sure if I'm encountering a bug or it's just user error. I'll open an separate issue shortly with what I've found so far.

-Erik

On Tue, Aug 6, 2024 at 1:09 PM Dakota Howard @.***> wrote:

SeqSender V1.2.0 is currently out and now supports the One Health Enteric Package. Use the documentation to select the package to get the correct metadata and config file.

— Reply to this email directly, view it on GitHub https://github.com/CDCgov/seqsender/pull/38#issuecomment-2272061973, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEGEJNE76B36TZCBZN2OQYLZQEUO3AVCNFSM6AAAAABEIUGL7SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZSGA3DCOJXGM . You are receiving this because you were mentioned.Message ID: @.***>

erikwolfsohn avatar Aug 14 '24 20:08 erikwolfsohn