frequency icon indicating copy to clipboard operation
frequency copied to clipboard

Ideas for Improvements and Optimizations

Open wilwade opened this issue 3 years ago • 4 comments

This is an issue to keep track of ideas around improvements and optimizations that we might want to do, but getting to alpha first is priority.

General

  • [ ] We should consider removing all the events and just do off-chain websocket event filter registrations.
  • [ ] Save block space by returning a more accurate weight when hitting errors #812

Schema Related

  • [ ] We should have a document that helps determine what sort of options a schema should have. We currently have a TTL, and "batch" option. I can imagine that this could feature creep quickly without any guide as to what we want to have happen.
  • [ ] Should schemas be incremental or be a hash of the schema? Or some version of a hash of the schema? This means larger schema references however which is bad since there is a lot of that.
  • [x] What if there were a way to do schema discovery? Say I know a certain public key I can look up schemas published by it using a "name"? #1693

Message Storage Related

  • [ ] Some messages we only want to store the most recent version of. Could (should) a schema be setup to trigger a TTL for prior messages based on a key? (Example, DSNP: GraphChange with the key being the target of the Graph Change)
  • [ ] Should we allow a user to target a specific message for complete removal? (This breaks immutability which has issues. Perhaps this is a schema option?)

Message Retrieval Related

  • [ ] Sometimes I only want messages from a specific user, could something help make that possible? (Perhaps an off-chain worker)
  • [ ] I want the Custom RPC to validate schema before it passes messages to me so that all messages I get from the custom RPC are known to be valid.

Batch Related

  • [ ] Could the node retrieve the batch for me via ipfs?
  • [ ] Could the node store and pin the batch for me via ipfs?

Parallelization

  • [ ] Separate out the messages into a separate state than the other active state
  • [ ] Shard the message storage based on schemas
  • [ ] Process messages in parallel after assigning slots and checking capacity
  • [ ] Group by origin's MSA and then process submissions in parallel

CI

  • [ ] Reduce build jobs run times from current 1.5-2h (cache, better hardware, parallelization,...)
  • [ ] Review how Parity does their builds right now and borrow some good techniques from them like rusty-cachier
  • [ ] Consider organizing each job around SRP and being as independent as possible for better parallelization (recommend doing it in a diagram first before changing any code).
  • [ ] Consider moving jobs which don't have to run frequently to a scheduled jobs (ex. cargo audit
  • [ ] Run a quick benchmark on PRs automatically before kicking off the real ones

wilwade avatar May 06 '22 12:05 wilwade

  • SchemaZero: optimize and standardize #60

saraswatpuneet avatar May 06 '22 16:05 saraswatpuneet

We should consider removing all the events and just do off-chain websocket event filter registrations.

I'd like to know more about this. I did some benchmarks and it seems like events are not considered a DB write. So I think the overhead is on writing of the block since they are part of the block properties.

aramikm avatar May 06 '22 16:05 aramikm

@aramikm I think they are considered a DB write. Their impact might not be noticeable because perhaps we are not setting up the state in the benchmark.

I read that events are pruned after every block so the cost is not compounding. Also, since caching is done, it might be just one read and write per block?

https://github.com/paritytech/substrate/blob/master/frame/system/src/lib.rs#L597

Found where events are pruned after every block: https://stackoverflow.com/questions/57219830/what-is-the-cost-of-event-storage-in-substrate

enddynayn avatar May 06 '22 18:05 enddynayn

@enddynayn yeah, in that case we need to be really careful about how our benchmarks are setup, since there might be some hidden costs that are not reflected there.

aramikm avatar May 06 '22 19:05 aramikm