Add get_values_from_label()
I apologize for the formatting changes. I have black integrated into my neovim workflow, which autoformats on save. If you want, I can highlight areas of actual code changes.
Problem
I need to retrieve the original data values associated with a specific cluster. Currently, my system discards these original values after processing. This is problematic in scenarios like batch insertions of streaming data, where I require access to the full data for post-insertion analysis or further processing.
Solution
To address this, I propose replacing the current hashing mechanism with a data encoding approach. Instead of generating and storing a hash of each data value into object.id, the system will now encode the complete value and store the encoded representation.
While this change involves a trade-off between hashing and encoding (specifically, an increase in memory usage and potential performance overhead), I anticipate that the impact on performance and memory usage will be minimal.
However, it's crucial to conduct thorough profiling to accurately measure the performance and memory implications of this change.
Specifically, the profiling will need to focus on:
- Memory Usage: Comparing the memory footprint of storing encoded values versus hash values.
- Performance Overhead: Measuring the time taken for encoding/decoding operations compared to hashing.
- Overall System Performance: Assessing the impact of this change on the overall system's throughput and latency.