Rework StorageMetrics space reporting to make sense
The fields in StorageMetrics are poorly named and have confusing meanings.
# This is the amount of space available for taking on new data. This is the sum of free space on the filesystem that the data dir is on plus the internal free space within the storage engine files which can be reused."
"kvstore_available_bytes" : 914406592512,
# This is the amount of free space reported by the filesystem that the data dir is on
"kvstore_free_bytes" : 914406592512,
# This is the size of the filesystem that the data dir is on
"kvstore_total_bytes" : 1056755048448,
# This is meant to be the amount of space on the filesystem that the storage engine's files consume.
# For Redwood before FDB 71.3 or 7.3 this is only the amount of space within the `.redwood` file that is in use by the BTree or internal data structures. For 7.3 / 71.3 and beyond, this matches definition above (space usage of redwood file on filesystem)
"kvstore_used_bytes" : 88564836272,
# No idea what these are, they might be related to log spill space or the memory storage engine specifically?
"kvstore_inline_keys" : 0,
"kvstore_total_nodes" : 0,
"kvstore_total_size" : 0,
The last three fields seem to only apply to Memory engines but the names are misleadingly generic.
For the first four - Available, Free, Total, and Used - a diagram of a storage server filesystem's usage, where each line represents the full size of the volume, would look approximately like this:
| TOTAL |
| USED | FREE |
| XFXFXXFXFXXXFFFFXFFFXX | FREE |
| | AVAILABLE |
where
- TOTAL = the total filesystem size
- USED = the sum of files on the filesystem
- FREE = the free space on the filesystem
- X = a block inside a Storage Engine File that is in use
- F = a block inside a Storage Engine File which is free to be used again
- AVAILABLE = the total usable space the Storage Engine can expand to, both internal and external
Currently
- The sum of all X block space is not reported
- The sum of all F block space is not reported
- TOTAL, USED, FREE, and AVAILABLE are reported
- AVAILABLE is not equal to any math between the other reported things because the math would be either
FREE + (sum of all F block space)orTOTAL - (sum of all X block space)
I propose that we begin reporting the following fields, whose names are hopefully more clear and give a more complete picture of storage space in use and available. These fields would be published in addition to the old fields as I'm sure a lot of tooling depends on them.
-
DiskVolumeCapacityBytes: size of the filesystem -
DiskVolumeFreeBytes: free space on the filesystem -
DiskVolumeUID: a unique identifier of the filesystem, useful to avoid double-counting quantities when multiple StorageServers share the same disk volume -
StorageFileSizeBytes: size of the storage engine file(s) -
StorageFileReusableBytes: space within the storage engine file(s) which is ready to be reused now -
StorageFileUsedBytes: bytes of the storage file that are in use, equal toStorageFileSizeBytes-StorageFileReusableBytes -
StorageAvailableBytes: space the storage engine can expand to, equal toDiskVolumeFreeBytes+StorageFileReusableBytes -
StorageKVBytes: estimate of total logical Key and Value bytes in the Storage Engine -
StorageMemoryBytes: Estimate of the bulk of memory used by the storage engine. For SQLite this is the page cache, for Redwood it is the page cache + decode cache, for Memory engines it is the size of the in-memory structure, etc.
- It's a matter of context, but in my mind disks and filesystems have
capacity' while files havesize'. - The distinction between
free' andavailable' is confusing, especially as we cross boundaries betweenStorage',StorageFile' and `DiskVolume'. I had to scratch my head a bit to figure it out. - +1 to the `Bytes' suffix to make units clear.
- +1 to definitions, so at least we can figure out what we're looking at without resorting to code,
- +1 to consistent names, provided they're clear and unsurprising.
- It's a matter of context, but in my mind disks and filesystems have
capacity' while files havesize'.
I like this. So DiskVolumeCapacityBytes?
- The distinction between
free' andavailable' is confusing, especially as we cross boundaries betweenStorage',StorageFile' and `DiskVolume'. I had to scratch my head a bit to figure it out.
I see it too, but I can't seem to find the perfect names. "Usable" instead of "Available" comes to mind, where "usable" means "can be used but is not being used" however to some people "usable" could mean "theoretically usable and some of it is already used in practice" because that is what it means in other contexts.
Yeah, I think your definitions of 'free' and 'available' are consistent with how Linux uses them for memory, where 'free' means not used at all, and 'available' means unused, or used for caching but we can reclaim that. I personally think that a glance, 'free' and 'available' of functionally synonyms, but practically speaking 'free' came first and the need to express the additional 'available' but not 'free' memory came later. That's roughly the same thing that's going on here, so go for it.