DracoPy icon indicating copy to clipboard operation
DracoPy copied to clipboard

Support metadata parsing

Open romanov6416 opened this issue 4 years ago • 37 comments

romanov6416 avatar Oct 20 '21 21:10 romanov6416

Hi Romanov!

Thanks for this excellent contribution! I'm not as familiar with the draco standard as the last maintainer, so I'm learning as I go. Would you mind adding some examples and tests to help me review this? Excited to get this integrated!

One question I have: does this change affect the bitstream if no metadata is provided? We use DracoPy with Neuroglancer which has its own custom Draco decoder, so I want to be careful about any backwards incompatible byte streams.

I'll add some more comments per line as I see them.

william-silversmith avatar Oct 22 '21 20:10 william-silversmith

Hi William!

Thank you for the response! I have been hurry enough to make this PR. Currently I found a better idea how to provide metadata support. Will push it near time.

I suppose this change will not affect existed functionality, but yes I have to provide some tests to make sure that it works. I will let you know as soon as I add some tests and make the code more readable.

Thank you again for the great library!

romanov6416 avatar Oct 22 '21 20:10 romanov6416

Okay, I'll wait for your updated submission then. Thanks for the effort you're putting into this!

william-silversmith avatar Oct 22 '21 20:10 william-silversmith

Sorry for the long delay. I was deeping in draco to know more about geometry serialization features.

Currently I added support not only support metadata with entires but geometry attributes (that stores additional data for points) too. Also some tests were added to make it clear how to deal with new code

Unfortunately one test for point cloud doesn't work as data for geometry attributes is not serialized. I didn't investigate this issue because I don't need in point cloud interfaces. Currently the test is disabled.

It would be nice if you reviewed my changes. Please feel free to ask any questions.

romanov6416 avatar Nov 11 '21 14:11 romanov6416

Hi Romanov,

Thanks for pushing this contribution forward! I've started to review the PR. I've had some trouble compiling it on MacOS (a successful PR needs to compile on Mac, Windows, and Ubuntu). I can help you with checking that.

The first problem is:

./src/DracoPy.h:131:44: error: cannot initialize a parameter of type 'void *' with an rvalue of type 'const std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::value_type *'
      (aka 'const char *')
              attribute->GetMappedValue(v, value.data());

I had some issues fixing this with C++ casting, static_cast<void*> worked once then didn't. reinterpret_cast<void*> didn't work. A C-style cast (void*) did work.

The second problem is the use of C++17 which indirectly generates compile errors. Switch the c++11 compiler flag to c++17 in setup.py and I'm able to successfully compile.

warning: decomposition declarations are a C++17 extension [-Wc++17-extensions]

william-silversmith avatar Nov 11 '21 22:11 william-silversmith

This is kind of really basic, so basic it's kind of dumb, but can you explain a bit more about what kind of uses metadata has per object? Would it be like noting that a given model is a building or a person? Is the idea to allow the storage of a JSON-like data structure inside the draco bitstream? I see there's global vs geometric metadata. Can there be more than one model in the bitstream? I didn't see anything about that skimming the spec, but I could have missed it.

If so, is it possible to simplify the interface? It looks like in the tests that the user has to know the exact form of the internal draco data structures. Would it be possible to accept a simple list or dictionary of key-values?

Thanks so much for this excellent contribution!

william-silversmith avatar Nov 11 '21 23:11 william-silversmith

Hi Romanov,

Thanks for pushing this contribution forward! I've started to review the PR. I've had some trouble compiling it on MacOS (a successful PR needs to compile on Mac, Windows, and Ubuntu). I can help you with checking that.

The first problem is:

./src/DracoPy.h:131:44: error: cannot initialize a parameter of type 'void *' with an rvalue of type 'const std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::value_type *'
      (aka 'const char *')
              attribute->GetMappedValue(v, value.data());

I had some issues fixing this with C++ casting, static_cast<void*> worked once then didn't. reinterpret_cast<void*> didn't work. A C-style cast (void*) did work.

The second problem is the use of C++17 which indirectly generates compile errors. Switch the c++11 compiler flag to c++17 in setup.py and I'm able to successfully compile.

warning: decomposition declarations are a C++17 extension [-Wc++17-extensions]

Hope all c++11 issues fixed, please let me know if some of them I missed.

romanov6416 avatar Nov 12 '21 10:11 romanov6416

This is kind of really basic, so basic it's kind of dumb, but can you explain a bit more about what kind of uses metadata has per object? Would it be like noting that a given model is a building or a person? Is the idea to allow the storage of a JSON-like data structure inside the draco bitstream? I see there's global vs geometric metadata. Can there be more than one model in the bitstream? I didn't see anything about that skimming the spec, but I could have missed it.

If so, is it possible to simplify the interface? It looks like in the tests that the user has to know the exact form of the internal draco data structures. Would it be possible to accept a simple list or dictionary of key-values?

Thanks so much for this excellent contribution!

Hope I can explain my view.

Abstract

So far I understand correctly in Draco we have one global GeometryMetadata for each mesh or point cloud. This GeometryMetadata consists of two parts: Metadata and list of GeometryAttribute.

The Metadata contains "entries" (simple key-value storage) and "submetadatas" (list of Metadata). As a result I can say that each Metadata is represented as a tree, where each node has its own data ("entries").

The GeometryAttribute contains its own Metadata and additional data for any point of point cloud or mesh (let me call it "MappedData"). Actually we can store any labeling of mesh per point in "MappedData", i.e. color for each point.

So we have the following structure:

  • GeometryMetadata
    • List of Metadatas
      • Metadata 1
      • Metadata 2
        • Metadata 21
        • Metadata 22 ....
      • Metadata 3
        ....
    • List of GeometryAttributes
      • Geometry Attribute 1
      • Geometry Attribute 2
        • Metadata A2
          • Metadata A21
          • Metadata A22 ...
      • Geometry Attribute 3

More detailes can be found in sources of Draco.

Tree structure of Metadata

So far Metadata has a tree structure I couldn't forward it from c++ to python directly. Tree requires recursive definition that is not supported in Cython, c++ and python simultineously (Cython couldn't compile that code). So I tried to solve it with different approaches. My first approach was to save tree structure but seralize in c++ (in bytes) and deserialize (from bytes) in python. And then vice versa. But it looked terribly, moreover it took time for serialization/deserialization. The second approach (current) is convert tree structure in a list (vector) and put references between metadatas (I named it "metadata_id").

JSON like structure

I don't think I understand your questions about JSON like structure. Maybe you surprised that I used a lot of dicts in tests, don't you? Please provide more details to help me to reply=)

Interface simplification

Yes, I agree that current implementation is not so easy to understand. Actually it requires client reads Draco sources and understand metadata structure before starts using of DracoPy. I had idea to make single JSON like argument to provide custom metadata to encode/decode. But this field can not be directly converted from c++ to Cython code and vice versa because of tree structure.

Finally, sorry for my english, I hope =) And I hope my comment will help you review my code=)

romanov6416 avatar Nov 12 '21 16:11 romanov6416

Thank you for the detailed reply! It's very helpful. I have an even more basic question as someone that isn't a 3D modeler: What are some typical uses of the metadata and geometry attribute fields? My lab only uses the topological compression to represent meshes of neurons. We haven't really branched out and looked at the other Draco abilities.

william-silversmith avatar Nov 12 '21 18:11 william-silversmith

Thank you for the detailed reply! It's very helpful. I have an even more basic question as someone that isn't a 3D modeler: What are some typical uses of the metadata and geometry attribute fields? My lab only uses the topological compression to represent meshes of neurons. We haven't really branched out and looked at the other Draco abilities.

You may image that you have some mesh, in example human jaw with teeth and gingiva. And you want to save this mesh with Draco but don't want to loose the information which point is a tooth and which one is a gingiva. So you can use GeometryAttribute in GeometryMetadata that help you discover is it a gingiva or tooth point. Example of the GeometryAttribute:

{
# label can be '0' (gingiva) or '1' (tooth)
# <point index>: <label>
  0: 0,
  1: 0,
  2: 0,
  3: 1,
  ....
  987: 0
}

Also one more technical case. Maybe you noticed after decoding buffer of mesh Draco creates duplicated points. I.e. in tests I created tetrahedron with only 4 points, but after Draco decodes buffer it returns 7 points (4 original + 3 duplicated points)! Of course you can use deduplicating functions that Draco provides but they only remove points with the same coordinates. Unfortunately in my case mesh may have original points with the same coordinates so Draco deduplication does not work for me. But GeometryAttribute really helps in my case. I add custom GeometryAttribute called "DuplicateAttribute". So that original points with the same coordinates have different "DuplicateAttribute" values. And duplicated ones have the same "DuplicateAttribute" values.

romanov6416 avatar Nov 12 '21 19:11 romanov6416

Thanks! This helps me a lot.

It looks like there's another c++17 issue on MacOS:

In file included from ./src/DracoPy.cpp:653:
./src/DracoPy.h:206:38: error: no member named 'make_unique' in namespace 'std'
            auto sub_metadata = std::make_unique<draco::Metadata>();
                                ~~~~~^

Still working on a more complete review. I'll add some preliminary thoughts in a moment. Would it be possible to add some documentation for how to use the metadata/attribute features to the README.md? I think it would help people use this system much more readily.

william-silversmith avatar Nov 12 '21 21:11 william-silversmith

Nice comments, will fix them on Monday! Thank you for review!

romanov6416 avatar Nov 12 '21 22:11 romanov6416

Hi William, I have noticed that decode_buffer returns "quantization_bits", "quantization_range" and "quantization_origin" (that you pass in encode_mesh as input args) are in draco::GeometryMetadata after buffer decoding. So I suggest changing interface of encoding (encode_mesh) for a little bit by removing input arguments "quantization_bits", "quantization_range" and "quantization_origin" because you can pass it with GeometryMetadataObject. The cons of this little refactoring is interface change.

What do you think?

if (metadata) {
      metadata->GetEntryInt("quantization_bits", &(meshObject.quantization_bits));
      if (metadata->GetEntryDouble("quantization_range", &(meshObject.quantization_range)) &&
          metadata->GetEntryDoubleArray("quantization_origin", &(meshObject.quantization_origin))) {
          meshObject.encoding_options_set = true;
      }

romanov6416 avatar Nov 15 '21 18:11 romanov6416

And one more optimization: it seems we can leave only one encode_buffer. If input argument faces is None then encode_point_cloud will be invoked else encode_mesh will be invoked. Actually both encode_point_cloud_to_buffer and encode_mesh have 90% the copied paste code=)

romanov6416 avatar Nov 15 '21 19:11 romanov6416

fixed review comments. It would be nice you check new revision. Thanks for help!

romanov6416 avatar Nov 15 '21 19:11 romanov6416

So I suggest changing interface of encoding (encode_mesh) for a little bit by removing input arguments "quantization_bits", "quantization_range" and "quantization_origin" because you can pass it with GeometryMetadataObject.

So long as this only affects the C++ code, that's fine with me. We are extensively using those options at the Python level throughout our pipelines so I think change there would be not-so-great.

And one more optimization: it seems we can leave only one encode_buffer. If input argument faces is None then encode_point_cloud will be invoked else encode_mesh will be invoked. Actually both encode_point_cloud_to_buffer and encode_mesh have 90% the copied paste code=)

This seems like a good idea to me!

I'll continue reviewing the code. :D

william-silversmith avatar Nov 16 '21 01:11 william-silversmith

So I suggest changing interface of encoding (encode_mesh) for a little bit by removing input arguments "quantization_bits", "quantization_range" and "quantization_origin" because you can pass it with GeometryMetadataObject.

So long as this only affects the C++ code, that's fine with me. We are extensively using those options at the Python level throughout our pipelines so I think change there would be not-so-great.

And one more optimization: it seems we can leave only one encode_buffer. If input argument faces is None then encode_point_cloud will be invoked else encode_mesh will be invoked. Actually both encode_point_cloud_to_buffer and encode_mesh have 90% the copied paste code=)

This seems like a good idea to me!

I'll continue reviewing the code. :D

Hi William,

I have add single interface for point cloud and mesh encoding. Also I have added wrapper for draco::Encoder that is used to setup quantization settings. So now you can just set up encoder right before encoding. The legacy functions encode_mesh_to_buffer and encode_point_cloud_to_buffer should work as before (the tests pass).

Only one thing I would check additionally. It seems like I have to test if encoding options are applied correctly. I don't see any tests that checks it, so how did you check it earlier? Could you please help me to understand the expected behavior please?

romanov6416 avatar Nov 17 '21 12:11 romanov6416

Sorry for the wait, I'll start running these changes through their paces. :D

william-silversmith avatar Nov 23 '21 00:11 william-silversmith

Could you please help me to understand the expected behavior please?

I kind of doubt there were tests before... For my purposes, we're working with https://github.com/google/neuroglancer which has its own stripped down Draco encoder compiled to Web Assembly so it can decode draco meshes within a web browser. That decoder expects certain limited options. For example, the number of vertex quantization bits can only be 10 or 16. This is too much information, but you can view it here: https://github.com/google/neuroglancer/blob/056a3548abffc3c76c93c7a906f1603ce02b5fa3/src/neuroglancer/datasource/precomputed/meshes.md#multi-resolution-mesh-format

You can also see me try to use these options here: https://github.com/seung-lab/igneous/pull/112/files#diff-15fd2ba7077a3ef6422a3b10c4a3c1c662aa83ff0450ba7d2c8e2f51fbff1e72R105-R112

I'm not sure if I helped answer your question though. Let me know if I can be of more assistance!

william-silversmith avatar Nov 23 '21 00:11 william-silversmith

Could you please help me to understand the expected behavior please?

I kind of doubt there were tests before... For my purposes, we're working with https://github.com/google/neuroglancer which has its own stripped down Draco encoder compiled to Web Assembly so it can decode draco meshes within a web browser. That decoder expects certain limited options. For example, the number of vertex quantization bits can only be 10 or 16. This is too much information, but you can view it here: https://github.com/google/neuroglancer/blob/056a3548abffc3c76c93c7a906f1603ce02b5fa3/src/neuroglancer/datasource/precomputed/meshes.md#multi-resolution-mesh-format

You can also see me try to use these options here: https://github.com/seung-lab/igneous/pull/112/files#diff-15fd2ba7077a3ef6422a3b10c4a3c1c662aa83ff0450ba7d2c8e2f51fbff1e72R105-R112

I'm not sure if I helped answer your question though. Let me know if I can be of more assistance!

Looks like it is not so easy to test my changes for neuroglancer. I hope my changes will not affect your PR.

Sorry for the wait, I'll start running these changes through their paces. :D

Thank you for the efforts and your time! Do you have any updates/suggestions relating to the PR?

romanov6416 avatar Nov 25 '21 20:11 romanov6416

The latest version compiles on MacOS x86_64 👍 I also visually checked that it reproduces meshes from the fly hemibrain. I'll have to check compression too.

william-silversmith avatar Nov 26 '21 22:11 william-silversmith

Hi William,

Is it possible to activate your repo workflows for my PR?

I want to use Linux wheel, but it seems github workflow was not cloned in my account.

romanov6416 avatar Nov 30 '21 09:11 romanov6416

And one more question. Can you please estimate when you review this PR? It will help me to correct set up dependencies in my projects.

romanov6416 avatar Nov 30 '21 13:11 romanov6416

Sorry for the wait! I have to switch laptops to evaluate the code b/c compiling on my M1 Macbook is too difficult. Finally doing the last bit of validation. I'll highlight some issues I'm finding along the way.

  • metadatas: metadata is already plural in english (metadatum would be singular, though almost never seen in real usage)

This code results in a segfault for me.

import os.path
import DracoPy

metadata = {
  "entries": {b"name": b"meta1"},
  "sub_metadata_ids": {b"custom_metadata_name": 1},
}

with open(os.path.join("testdata_files", "bunny.drc"), "rb") as draco_file:
  file_content = draco_file.read()
  mesh_object = DracoPy.decode_buffer_to_mesh(file_content)

binary = DracoPy.encode_to_buffer(
  mesh_object.points, mesh_object.faces,
  metadatas=[ metadata ],
  # geometry_metadata=geometry_metadata,
)
print(binary)

william-silversmith avatar Nov 30 '21 22:11 william-silversmith

I tried:

metadata = {
  "entries": [ {b"name": b"meta1"} ],
  "sub_metadata_ids": {b"custom_metadata_name": 1},
}

Output:

(dracopy) (base) wms:~/DracoPy$ python test.py
Traceback (most recent call last):
  File "src\DracoPy.pyx", line 291, in DracoPy.encode_to_buffer
  File "stringsource", line 48, in vector.from_py.__pyx_convert_vector_from_py_struct__DracoFunctions_3a__3a_MetadataObject
  File "stringsource", line 20, in FromPyStructUtility.__pyx_convert__from_py_struct__DracoFunctions_3a__3a_MetadataObject
  File "stringsource", line 175, in map.from_py.__pyx_convert_unordered_map_from_py_std_3a__3a_string__and_std_3a__3a_string
TypeError: Expected dict, got list

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 13, in <module>
    binary = DracoPy.encode_to_buffer(
  File "src\DracoPy.pyx", line 309, in DracoPy.encode_to_buffer
ValueError: Input invalid

william-silversmith avatar Nov 30 '21 23:11 william-silversmith

Thank you for the response, I will check it tomorrow and try to fix comments.

romanov6416 avatar Nov 30 '21 23:11 romanov6416

Sorry for the wait! I have to switch laptops to evaluate the code b/c compiling on my M1 Macbook is too difficult. Finally doing the last bit of validation. I'll highlight some issues I'm finding along the way.

* metadatas: metadata is already plural in english (metadatum would be singular, though almost never seen in real usage)

This code results in a segfault for me.

import os.path
import DracoPy

metadata = {
  "entries": {b"name": b"meta1"},
  "sub_metadata_ids": {b"custom_metadata_name": 1},
}

with open(os.path.join("testdata_files", "bunny.drc"), "rb") as draco_file:
  file_content = draco_file.read()
  mesh_object = DracoPy.decode_buffer_to_mesh(file_content)

binary = DracoPy.encode_to_buffer(
  mesh_object.points, mesh_object.faces,
  metadatas=[ metadata ],
  # geometry_metadata=geometry_metadata,
)
print(binary)

Sorry I didn't catch the example right away. We can't use empty geometry metadata and not empty metadatas simultineously. In your example you have to provide geometry metadata with minimal body

 {
    "metadata_id": 0,
    "generic_attributes": [],
}

Looks like I have to add additional comments to this function

romanov6416 avatar Nov 30 '21 23:11 romanov6416

We can't use empty geometry metadata and not empty metadatas simultineously.

I think I still don't understand the point of metadata. Is it only to provide a name for the geometry metadata? Could it also be used to tag objects with a name (which wouldn't be per vertex)?

william-silversmith avatar Nov 30 '21 23:11 william-silversmith

I'm sorry I didn't catch what you meant under "to tag objects with a name"

Let me remind metadata structure example that I sent before

* GeometryMetadata
  
  * List of Metadatas
    
    * Metadata 1
    * Metadata 2
      
      * Metadata 21
      * Metadata 22
        ....
    * Metadata 3
      ....
  * List of GeometryAttributes
    
    * Geometry Attribute 1
    * Geometry Attribute 2
      
      * Metadata A2
        
        * Metadata A21
        * Metadata A22
          ...
    * Geometry Attribute 3

Here you can see root object called GeometryMetadata. In draco GeometryMetadata is a subclass of Metadata so if there is metadata in draco mesh then we have to specify geometry_metadata (here we describe GeometryMetadata class fields) and metadatas (here we describe Metadata base class fields as a first element of list) in encode_mesh function.

As any Metadata can have children Metadata so actually Metadata is a tree structure object. Unfortunately, Cython doesn't allow to compile tree structure classes. So I decided to describe Metadata specific fields in a separate list, currently called metadatas, but not in GeometryMetadata class.

romanov6416 avatar Nov 30 '21 23:11 romanov6416

So I watched this video and I think I'm starting to get a better idea of the context in which this system is supposed to work. https://www.youtube.com/watch?v=R_cQogD0KJ8

Sorry this is taking me so long. The Draco structure is very generic and I am only slowly understanding what the point of each part is.

I tried compiling the pyx with cython -3 --cplus src/DracoPy.pyx and got the following errors. I think they can be fixed by adding a b in front.

Error compiling Cython file:
------------------------------------------------------------
...
    encoder.SetSpeedOptions(speed, speed)
    to_create_quantization_metadata = not metadatas.empty()
    cdef MetadataObject *metadata = NULL;
    if to_create_quantization_metadata:
        metadata = &metadatas[geometry_metadata.metadata_id]
        metadata.entries["quantization_bits"] = struct.pack(
                        ^
------------------------------------------------------------

src/DracoPy.pyx:144:25: Unicode literals do not support coercion to C types other than Py_UNICODE/Py_UCS4 (for characters) or Py_UNICODE* (for strings).

Error compiling Cython file:
------------------------------------------------------------
...
    else:
        encoder.SetAttributeExplicitQuantization(
            GeometryAttributeType.POSITION, quantization_bits,
            quantization_range, quantization_origin)
        if to_create_quantization_metadata:
            metadata.entries["quantization_range"] = struct.pack(
                            ^
------------------------------------------------------------

src/DracoPy.pyx:154:29: Unicode literals do not support coercion to C types other than Py_UNICODE/Py_UCS4 (for characters) or Py_UNICODE* (for strings).

Error compiling Cython file:
------------------------------------------------------------
...
            GeometryAttributeType.POSITION, quantization_bits,
            quantization_range, quantization_origin)
        if to_create_quantization_metadata:
            metadata.entries["quantization_range"] = struct.pack(
                "=d", quantization_range)
            metadata.entries["quantization_origin"] = struct.pack(
                            ^
------------------------------------------------------------

src/DracoPy.pyx:156:29: Unicode literals do not support coercion to C types other than Py_UNICODE/Py_UCS4 (for characters) or Py_UNICODE* (for strings).

I think improved my little example test, but I'm still getting a segfault. Could you help me modify it to make sense? Ideally, the code not segfault even if the data is screwed up though.

import os.path
import DracoPy

metadata = [
  {
    "entries": {
        b"name": b"blueness",
        b"whodat": b"1234",
    },
    "sub_metadata_ids": {},
  },
  {
    "entries": {
        b"name": b"redness",
        b"somethingelse": b"1234",
    },
    "sub_metadata_ids": {},
  },
]

geo = {
  "metadata_id": 0,
  "generic_attributes": [{
    "data": { i: int.to_bytes(i, byteorder="little", length=2, signed=True) for i in range(4) },
    "datatype": DracoPy.DataType.DT_INT32,
    "dimension": 1,
    "metadata_id": 1,
  }],
}

with open(os.path.join("testdata_files", "bunny.drc"), "rb") as draco_file:
  file_content = draco_file.read()
  mesh_object = DracoPy.decode_buffer_to_mesh(file_content)

binary = DracoPy.encode_to_buffer(
  mesh_object.points, mesh_object.faces,
  metadatas=metadata,
  geometry_metadata=geo,
)
print(binary)

Thanks again for your patience!

william-silversmith avatar Dec 02 '21 05:12 william-silversmith