Partition / RowKey Schema + Performance Efficiency
Since the current version stores all errors in a single partition, as the row count grows, performance starts to deteriorate. A better solution would be to try to minimize the number of rows in a partition to just a few hundred.
From the guidelines on designing a scalable table solution doc here: https://msdn.microsoft.com/en-us/library/azure/hh508997.aspx
"A highly uneven distribution of entities across partitions may limit the performance of the larger and more active partitions"
A better solution might be to either
- Partition on a day, or an hour that can also be a range query (numeric) representation like 20150616
- A fixed partition size with a counter so 000001, 000002, and an additional table that has some pointer info to know which dates fall in to which partition buckets.
But those are just two ideas. We're using this in production and now that our table size has increased, performance is dramatically slow (lookups of up to 30 seconds!)
Thoughts?
True. Can you create a PR and set the partition key to something like option 1?
I think the only problem with that is the paging functionality. There'll have to be some internal logic that figures out how to assemble a page and keep track of a cursor window depending on how large / how many partitions to pull. Ideas?
Hmmm... the partition key is already being set to the application name. I think your best bet would be to either delete or migrate older errors elsewhere.
I'm surprised that it is taking that long though since Azure Tables has a default sort that we use (PartitionKey ASC, RowKey ASC).