jmespath.cpp icon indicating copy to clipboard operation
jmespath.cpp copied to clipboard

Performance issues with large amounts of dynamic data

Open sotex opened this issue 5 years ago • 1 comments

This is really a great project. It is very convenient to use and the performance is very good. But I'm having some problems with it and hope to get help here.

Question 1

I have a large amount of geojson object data in my program. In order to use jmespath for operation, I had to combine it into a large array. Similar to the following code:

   // These geojson object data are located in a large map, which is dynamically added and deleted
   // std::map<std::string,jp::Json> mydata; 

   // When I need to perform jmespath operation
   std::vector<jp::Json> vec;
   vec.reserve( mydata.size() );
   for( auto& kvpair : mydata ) {
      vec.push_back(kvpair.second);
   }
   jp::Json data = {
            {"data",std::move(vec)}
        };
   jp::Expression expr = "avg(data[properties.area<`100`].properties.area)";  // Simple example, not fixed
  auto result = jp::search(expr, data);

I can change mydata directly to use jp::Json array object storage to avoid conversion every time. However, I wonder if there is a better way?

Question 2

Because I have a large number of data, I test the filtering operation of 10000 objects, it takes about 0.34 s.But I have more than 200,000 objects. test environment:

OS : Linux x-mini 5.3.0-45-generic CPU : Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz x4 MEM : 8G ddr3 1333` Compiler and compilation options: g++10.0 use -O2

I can use multi-threading for parallel filtering, but it will get multiple results, which requires secondary processing. I want to know if there is any good way to do it without secondary processing? Tanks.

sotex avatar Apr 07 '20 16:04 sotex

The problem I raised above has been solved by introducing shared_hash_map to define JSONType.

Details of the changes are here: Add shared map/vector implementation for performance optimization .

It's not an excellent scenario, there are aspects of thread security, etc. not considered, but it's working for now.

Thanks to @robertmrk for a great contribution, it's an excellent project worth learning from. I see robertmrk's last activity time was May 15, 2019 with no subsequent activity, not sure if he switched to other platforms for selfless giving.

sotex avatar Apr 15 '20 06:04 sotex