vg icon indicating copy to clipboard operation
vg copied to clipboard

json format

Open sdws1983 opened this issue 2 years ago • 3 comments

Hi,

I am using vg view -aM filtered.gam to view the output file of the aligments, I try to interpret the json format. I know some basic concepts, such as offset represents the number of bases offset, and sequence represents the mutated sequence, but I don't quite understand the meaning of rank. In some alignments, the same rank appears multiple times:

image

In addition, there are some node information, such as:

{"edit": [{"from_length": 6}, {"from_length": 1, "to_length": 1}], "position": {"node_id": "180334880"}, "rank": "2" }

What does from_length appear twice?

Is there some clarification about the json format?

Thanks

Yumin

sdws1983 avatar Jul 03 '23 06:07 sdws1983

Please see here for documentation of the various formats

https://github.com/vgteam/vg/wiki/File-Formats

In particular, that page links to the Protobuf definitions for gam alignments

https://github.com/vgteam/libvgio/blob/eb1fe76878aff8f26f0a2f38a1c133ec2f353e57/deps/vg.proto#L109-L151

For the duplicate ranks: that looks like a bug . But I don't think any vg code uses ranks in alignment paths for anything.

For the two from_lengths: That's be cause your array has two Edits, and each one has a from_length.

glennhickey avatar Jul 03 '23 16:07 glennhickey

Please see here for documentation of the various formats

https://github.com/vgteam/vg/wiki/File-Formats

In particular, that page links to the Protobuf definitions for gam alignments

https://github.com/vgteam/libvgio/blob/eb1fe76878aff8f26f0a2f38a1c133ec2f353e57/deps/vg.proto#L109-L151

For the duplicate ranks: that looks like a bug . But I don't think any vg code uses ranks in alignment paths for anything.

For the two from_lengths: That's be cause your array has two Edits, and each one has a from_length.

So, what does from_length and to_length mean? I do not fully understand.

sdws1983 avatar Jul 04 '23 01:07 sdws1983

It's phrased as if you are modifying the graph sequence into the read sequence, so you take ref sequence of size from_length and replace it with a read sequence of size to_length. In the case of a match, you might replace it with the same sequence again.

jeizenga avatar Aug 02 '23 18:08 jeizenga