LAPIS
LAPIS copied to clipboard
Develop preprocessing pipeline
For
- [ ] SC2 open - it's almost finished but a few minor adjustments are needed
- [x] GISAID
- The pipeline downloads the data from the servers
- The output is a ndjson file with the data
Bug detected by @Taepper: there is one sequence in the open dataset where strain is null. I need to fix it..
Sorry this is a stupid bug on our end, we should just exclude that line from the metadata in ncov ingest. But yeah, this won't happen quickly
We have ingest pipelines for both datasets.