seqkit icon indicating copy to clipboard operation
seqkit copied to clipboard

[feature request] seqkit subseq with --circular option?

Open xinehc opened this issue 1 year ago • 4 comments

Hi Wei,

I wonder would it be possible to add a --circular option also to seqkit subset?

I am currently using seqkit subset to extract the flanking regions of some genes with bed files. Some genes are located near the boundary of circular genomes so the flanks get truncated. It would be nice to have this option so that I don't need to manually concatenate the sequences multiple times before annotating genes. It would be even nicer if seqkit subset could differentiate linear/circular genomes with a user specified list or bed. For example:

seq_a    linear
seq_b    circular
...

xinehc avatar Jun 28 '24 05:06 xinehc

seqkit does not have a subset command.

seqkit subseq with --circular option

It sounds useful. But I'm busy writing a manuscript recently, might implement it later.

What's the organism the sequence belong to? Linear human and circular mitochondria? Adding a new column would break the format specification of BED/GTF. You can simply run with linear and circular sequences separately.

shenwei356 avatar Jun 28 '24 07:06 shenwei356

Thanks for the quick reply, I meant seqkit grep, sorry for the confusion.

I am working with a set of plasmid sequences which are short and circular in general, but there might be cases that the plasmid is incomplete and thereby linear.

xinehc avatar Jun 28 '24 07:06 xinehc

grep does not handle flanking sequence. subseq?

shenwei356 avatar Jun 28 '24 07:06 shenwei356

You are right. I used both subseq and grep in a project and got confused.

xinehc avatar Jun 28 '24 07:06 xinehc