[FEATURE REQUEST]: remove prefix to sequence names
Is this a feature request for FCS-adaptor or FCS-GX? Yes
Describe the problem you'd like to be solved
At the moment, the output of FCS add a prefix to the sequences' names of lcl|.
Describe the solution you'd like Remove this prefix.
Describe alternatives you've considered Make it optional with a command line argument.
Hello,
The addition of lcl| is specific to the FCS-adaptor behavior when accessing the cleaned_sequences directory.
We can consider implementing this in an upcoming release. In the meantime, use sed. Another option is to download the fcs.py runner script and clean the original, uncleaned FASTA with the adaptor report like so:
curl -LO https://github.com/ncbi/fcs/raw/main/dist/fcs.py
zcat ./inputdir/uncleaned.fa.gz | python3 ./fcs.py clean genome --action-report ./outputdir/fcs_adaptor_report.txt --output clean.fasta --contam-fasta-out contam.fasta
Is the fcs.py script in the singularity image?
No, fcs.py is a wrapper to run the executables inside Docker/Singularity containers. If you need to set up using Singularity, follow the instructions here to get the runner, singularity image, and set the image env var:
https://github.com/ncbi/fcs/wiki/FCS-GX#quickstart
We have an updated wiki that demonstrates how to clean a genome with FCS-adaptor style output:
https://github.com/ncbi/fcs/wiki/FCS-adaptor-quickstart#clean-the-genome