json format
Hi,
I am using vg view -aM filtered.gam to view the output file of the aligments, I try to interpret the json format. I know some basic concepts, such as offset represents the number of bases offset, and sequence represents the mutated sequence, but I don't quite understand the meaning of rank. In some alignments, the same rank appears multiple times:
In addition, there are some node information, such as:
{"edit": [{"from_length": 6}, {"from_length": 1, "to_length": 1}], "position": {"node_id": "180334880"}, "rank": "2" }
What does from_length appear twice?
Is there some clarification about the json format?
Thanks
Yumin
Please see here for documentation of the various formats
https://github.com/vgteam/vg/wiki/File-Formats
In particular, that page links to the Protobuf definitions for gam alignments
https://github.com/vgteam/libvgio/blob/eb1fe76878aff8f26f0a2f38a1c133ec2f353e57/deps/vg.proto#L109-L151
For the duplicate ranks: that looks like a bug . But I don't think any vg code uses ranks in alignment paths for anything.
For the two from_lengths: That's be cause your array has two Edits, and each one has a from_length.
Please see here for documentation of the various formats
https://github.com/vgteam/vg/wiki/File-Formats
In particular, that page links to the Protobuf definitions for gam alignments
https://github.com/vgteam/libvgio/blob/eb1fe76878aff8f26f0a2f38a1c133ec2f353e57/deps/vg.proto#L109-L151
For the duplicate ranks: that looks like a bug . But I don't think any vg code uses ranks in alignment paths for anything.
For the two from_lengths: That's be cause your array has two Edits, and each one has a from_length.
So, what does from_length and to_length mean? I do not fully understand.
It's phrased as if you are modifying the graph sequence into the read sequence, so you take ref sequence of size from_length and replace it with a read sequence of size to_length. In the case of a match, you might replace it with the same sequence again.