nextclade_data icon indicating copy to clipboard operation
nextclade_data copied to clipboard

Update for SARS-CoV-2 dataset?

Open murrellb opened this issue 1 year ago • 6 comments

Hi folks,

I'd love to know about the current rough timeline for the next SARS-CoV-2 dataset release, if that is possible?

Thank you so much for this! Ben

murrellb avatar Sep 18 '24 09:09 murrellb

Thanks for the ping @murrellb and sorry for the delay in release.

I'll try to make a release today. Until then and in the future, if you want to use the latest lineages without having to wait for a release, I suggest you use the nightly builds.

It works like this for web nextclade: https://master.clades.nextstrain.org/?dataset-name=nextstrain/sars-cov-2&input-tree=https://nextstrain.org/charon/getDataset?prefix=staging/nextclade/sars-cov-2

And like this for cli:

curl -o nightly.json "https://nextstrain.org/charon/getDataset?prefix=staging/nextclade/sars-cov-2"
nextclade run -d sars-cov-2 --input-tree nightly.json -t test.tsv

The nightly tree is built automatically every day off the latest pango lineage designations so it potentially contains some small issues that I check for before making releases. But otherwise it's fully usable and it's what I use myself most of the time (which is why I sometimes don't realize it's been a while since the last release).

Let me know if you have questions re the nightly tree and anything in general in the future.

corneliusroemer avatar Sep 18 '24 09:09 corneliusroemer

Ah thank you! I was 100% unaware of these, and this almost certainly solves my problem (and has been immediately handy for the Stockholm sequencing folks too). For anyone else (and my future self) the BA.2.86 dataset is here: https://nextstrain.org/charon/getDataset?prefix=staging/nextclade/sars-cov-2/BA.2.86

murrellb avatar Sep 18 '24 15:09 murrellb

Ah, in case this is the sort of "small issue" you fix, XDV.1 does not appear to be a leaf node one that tree (and my software is upset by this). You can see it on your staging tree here (https://nextstrain.org/staging/nextclade/sars-cov-2/BA.2.86): image

murrellb avatar Sep 18 '24 16:09 murrellb

Ah good you remind me, I had seen this before but must have forgotten to fix. Happy to hear these sorts of things!

corneliusroemer avatar Sep 18 '24 17:09 corneliusroemer

This is great news - thanks so much for the insight Cornelius. I've been wrestling more and more with mutation searches, but now I can just use this and rely on the Nextclade lineage calls.

Mike-Honey avatar Sep 18 '24 20:09 Mike-Honey

I switched also to the nightly builds (previously to identify XEC, now XEK, in RKI Germany GitHub sequences), thanks again for the info 😊

Btw @corneliusroemer, are all these I-substitions after "V483-" correct for "S" in "XEK"? ["S"]:...,"E484I","G485I","F486I","N487I","F490I","Q493I","S494I","G496I","Q498I","P499I","N501I","Y505I","V511I","L513I","S514I","L518I","P521I","A522I","T523I","K529I","K535I","T547I","T549I","V551I","E554I","N556I","K558I","P561I","Q564I","F565I","A570I","T572I","T573I","D574I","L582I","E583I","L585I","G594I","V597I","N603I","T604I","Q607I","V608I","Q613I","D614I","E619I","P621I","V622I","A623I","A626I","Q628I","P631I","V635I","S640I","N641I","V642I","R646I","A653I","E654I","H655I","N658I","E661I","S673I","Q675I","Q677I","T678I","N679I","S680I","P681I","R683I","A684I","A688I","S689I","S691I","A694I","A701I","N703I","S704I","V705I","A706I","T716I","T719I","T723I","S730I","T732I","S735I","D745I","T747I","E748I","Q762I","N764I","A766I","G769I","V772I","E780I","T791I","P793I","D796I","F797I","G798I","P809I","S810I","P812I","A831I","Q836I","D839I","D843I","A845I","A846I","A852I","K854I","N856I","L858I","T859I","A879I","T883I","W886I","F888I","A890I","A899I","L922I","S929I","D936I","S937I","L938I","S939I","T941I","A942I","G946I","D950I","V952I","Q954I","Q957I","T961I","K964I","N969I","N978I","D979I","L981I","S982I","E990I","S1003I","T1006I","T1009I","R1014I","A1020I","S1021I","T1027I","K1045I","H1058I","T1066I","A1070I","Q1071I","K1073I","A1078I","P1079I","H1083I","D1084I","K1086I","A1087I","R1091I","E1092I","H1101I","V1104I","E1111I","Q1113I","T1116I","T1117I","D1118I","V1122I","G1124I","N1125I","V1129I","D1139I","L1141I","P1143I","E1144I","D1146I","S1147I","K1149I","E1150I","D1153I","Y1155I","H1159I","P1162I","V1164I","D1165I","G1167I","D1168I","V1176I","N1178I","K1181I","R1185I","A1190I","K1191I","D1199I","Q1201I","E1202I","K1205I","Q1208I","G1219I","V1228I","M1229I","V1230I","T1231I","L1234I","C1235I","C1243I","C1247I","G1251I","S1252I","E1258I","D1259I","D1260I","S1261I","P1263I","V1264I","L1265I","K1266I","Y1272I"] Also if something like that is an issue and I see it, should I create a ticket here?

icestorm972 avatar Oct 02 '24 17:10 icestorm972

Cornelius released the fresh SC2 datasets to the main channel yesterday, feel free to give them a try! https://github.com/nextstrain/nextclade_data/releases/tag/2024-10-17--16-48-48Z

ivan-aksamentov avatar Oct 18 '24 16:10 ivan-aksamentov

Also if something like that is an issue and I see it, should I create a ticket here?

@icestorm972 yes, absolutely, open an issue with as much details as possible: expected vs observed result, what you are running - Nextclade Web or CLI, which dataset, all the additional parameters, and example sequences if possible, so that we can reproduce the issue on our side. You can open here, in main repo, or chat in discussions forum.

ivan-aksamentov avatar Oct 18 '24 16:10 ivan-aksamentov