vscode-dvc Interactive plots with bounding boxes

In computer vision, specifically object detection, it is common for a pipeline to output images with bounding boxes displaying the area of interest for specific objects. When the image(s) are relatively small or packed with multiple objects, it can be hard to view these images.

It would be nice to have some sort of interactive plots where the user can toggle on/off different objects based on labels.

This may require some dvc or dvc-render changes first, but just opening here because it would be beneficial to have this implemented within the VS-Code extension.

Related issues:

https://github.com/iterative/dvc/issues/10198

Oct 26 '23 19:10 BradyJ27

To clarify:

Are there multiple images created by the pipeline or do you have an original image that you want to compare to the output? Can you give an example of what is produced by the model?

Oct 26 '23 22:10 mattseddon

Usually it is multiple images. For example, we would print out the labels for the detections that we have made on the validation set.

The following example is from YoloV8s default output. The dvclive yolo demo notebook is a good place to reproduce this.

Yolo actually does both the validation labels (the ground truths) and the predicted values. This could be useful for comparing and contrasting.

Oct 26 '23 23:10 BradyJ27

Let me clarify the question. In the example above are there multiple images available for 000000000042.jpg? Do you have an image available with each of the combinations of labels available? I.e 1 for each of

baseline
dog
motorcycle
dog + motorcycle

I am not an expert in image manipulation but AFAIK removing these labelling boxes from an image is not a trivial task.

Have you seen this done elsewhere?

Oct 27 '23 01:10 mattseddon

So the image is just a copy of the training or validation image with bounding boxes added using some library (usually matplotlib). The bounding boxes are stored in a formatted file (xml, csv, json, or some custom format) normally like "label,x1,y1,x2,y2" (one for each label, i.e. "person,..." \n "dog,...")

So the approach would not be to manipulate the image with the boxes already on it, but rather set the image as the original image from the validation set, then place the interactive bounding boxes over the original image.

In other words, we have 2 files:

image.png
image_labels.xml

And the above images are generated by ~~combining the two files~~ reading the labels and placing them on top of a copy of the original image, thus creating the third file which is the image with bounding boxes displayed. My suggestion is that we take this step and turn it into some interactive format within dvc.

Oct 27 '23 01:10 BradyJ27

See https://docs.wandb.ai/guides/track/log/media#image-overlays for ideas on how others do this

Oct 27 '23 13:10 dberenbaum

One option for implementation would be a custom plot template, right?

Or is this something that's a little more in depth and actually a bigger feature?

Oct 30 '23 18:10 BradyJ27

One option for implementation would be a custom plot template, right?

No, I do not believe that you could shoe-horn the required data/image into the current DVC plots engine.

Or is this something that's a little more in depth and actually a bigger feature?

My opinion is that this is a larger feature given the current state of plots.

Oct 31 '23 02:10 mattseddon

My opinion is that this is a larger feature given the current state of plots.

Ok, that makes sense. I'm sure some more discussion needs to be had regarding implementing something like this, but I would be happy to help contribute!

Oct 31 '23 16:10 BradyJ27

@BradyJ27 can you provide a concrete example of one of the XML files that you mentioned here? Is this the only format available?

Nov 02 '23 22:11 mattseddon

Looks like we might be able to get away without using a plotting library for this. One potential way would be to use https://github.com/lovell/sharp in the clients + generate SVG bounding boxes based on the definitions (XML or other files). Loading the original image with the previous package gives us the option to call image.overlayWith(svgElementBuffer, {top:0, left:0}).toBuffer() where the svgElementBuffer is an SVG full of <rect> elements (source).

Nov 02 '23 23:11 mattseddon

@BradyJ27 can you provide a concrete example of one of the XML files that you mentioned here? Is this the only format available?

I can share an example of the default yolo labels. This is just a text file, but the idea is the same in txt, csv, xml, json, etc. It can technically be any type of file, depending on what architecture you are using, but the above are the most common.

000000000009.txt

Nov 03 '23 00:11 BradyJ27

How do you determine which class the provided data relates to?

This is the contents of the file (for anyone else reading the issue):

45 0.479492 0.688771 0.955609 0.5955
45 0.736516 0.247188 0.498875 0.476417
50 0.637063 0.732938 0.494125 0.510583
45 0.339438 0.418896 0.678875 0.7815
49 0.646836 0.132552 0.118047 0.0969375
49 0.773148 0.129802 0.0907344 0.0972292
49 0.668297 0.226906 0.131281 0.146896
49 0.642859 0.0792187 0.148063 0.148062

Nov 03 '23 01:11 mattseddon

@mattseddon the first number corresponds to a dictionary containing the classes.

It's something like:

...
44: "dog",
45: "person",
46: "car",
...

This is found in a dataset configuration file (specifically for yolo), which is data.yaml.

There is often some configuration similar to this whether it be a dictionary in a training script, a data configuration file, or sometimes the labels are hard coded in the labels file.

I will say that this above is yolo specific, it is more often just the actual label instead of a number corresponding to a dictionary.

Nov 03 '23 01:11 BradyJ27

I was just coming here to revisit (was busy for the past month) this and create some issues in the data and render repos, but it looks like you guys have maybe taken another look. Should I go ahead and create some additional issues and start looking into this, or is this in progress already?

Dec 15 '23 23:12 BradyJ27

I was just coming here to revisit (was busy for the past month) this and create some issues in the data and render repos, but it looks like you guys have maybe taken another look. Should I go ahead and create some additional issues and start looking into this, or is this in progress already?

@BradyJ27, feel free to do that, thanks. I've started to look into how Studio and VSCode are going to render these images but I'm currently not looking into dvc/dvc-render side of things.

Dec 18 '23 15:12 julieg18

While researching on UX, I took into account that while both Studio and VSCode use React for the frontend, Studio has a backend based in Python and VSCode has a backend based in NodeJS. So far, I've come up with two ideas on how the clients (VSCode/Studio) would handle this.

Ideas

Rely on the client backend to create images with the needed bounding boxes. The frontend would render these images. (See Matt's comment)
Send the box coordinates to the frontend and have the frontend render the bounding boxes onto an image using SVGs or HTML canvas (I believe W&B uses Canvas to create the bounding boxes)

Details

Rely on the client backend to create images with the needed bounding boxes. The frontend would render these images.

Pros

Both NodeJS and Python have multiple image manipulation libraries that we could use for creating images with bounding boxes. Matt has already mentioned sharp for NodeJS.

Cons

Studio and VSCode have different backends, so we would have to go about creating images in different ways. This would make keeping things consistent across products more difficult.

Send the box coordinates to the frontend and have the frontend render the bounding boxes onto an image using SVGs or HTML canvas (I believe W&B )

Pros

Since both Studio and VSCode use React in the frontend, it will easier to have consistent plots in both clients. React also has some libraries for Canvas (KonvaJS, FabricJS) and SVGs that would simplify the solution instead of using just Vanilla APIs.

Cons

The solution for rendering the bounding boxes will probably be a bit more complicated then using the methods that backend libraries offer.

What do we think?

Jan 09 '24 17:01 julieg18

It would be nice to have some sort of interactive plots where the user can toggle on/off different objects based on labels.

We will probably want some level of interactivity like this at some point, so I think it makes sense to go with option 2.

Jan 09 '24 18:01 dberenbaum

Started working on implementing this and, after trying HTML Canvas and SVGs, decided on using SVGs to render the plots since they are easier to create and will be more performative especially when it comes to resizing the plots.

Design

Next, I started working on the UI design for the togglable boxes. Here is what I have so far (created in storybook):

Screenshot 2024-01-17 at 10 20 10 AM

Screenshot 2024-01-17 at 9 58 01 AM

Looking at Studio, either version could fit there as well:

Questions About Implementation

Do we want to toggle classes in all revision plots for a specific image path at once or have the toggles per single plot? I tried designs for both for now. There's also the option of toggling classes across all images in the webview at once.
What colors are we going to be using for the bounding boxes? I just chose red and blue for now but I'm assuming we want a pre-set of more muted colors?

What do we think? cc @shcheklein @iterative/vs-code

Jan 17 '24 16:01 julieg18

Look cool, @julieg18 !

Do we want to toggle classes in all revision plots for a specific image path at once or have the toggles per single plot? I tried designs for both for now. There's also the option of toggling classes across all images in the webview at once.

My 2cs. I think we should do toggle all images per path at once, for now.

What colors are we going to be using for the bounding boxes? I just chose red and blue for now but I'm assuming we want a pre-set of more muted colors?

let's take a look how YOLO generates colors / boxes and take if from it?

Jan 17 '24 21:01 shcheklein

Is the HTML produced by the CLI (i.e. plots diff) out of scope for this?

Jan 17 '24 22:01 mattseddon

Is the HTML produced by the CLI (i.e. plots diff) out of scope for this?

I don't think CLI support is a requirement unless it's helpful to consolidate the VS Code and Studio implementation (similar to images per step).

Jan 18 '24 16:01 dberenbaum

Is the HTML produced by the CLI (i.e. plots diff) out of scope for this? I don't think CLI support is a requirement unless it's helpful to consolidate the VS Code and Studio implementation

Are we referring to the DVC CLI being able to create these plots with bounding boxes?

If so, if it is doable for the CLI to create the bounding box plot SVGs, that could help with consolidation since Studio and VS Code would only need to create logic for toggling boxes. Currently, both Studio and VSCode need to create the SVG elements from the image src and bb coordinates as well as the toggle logic.

Jan 18 '24 16:01 julieg18