deephaven-core icon indicating copy to clipboard operation
deephaven-core copied to clipboard

feat: Utilities to assist in retrieving data from tables

Open rbasralian opened this issue 3 years ago • 2 comments

Adds utilities to make it easier to pull data from a table.

There is a KeyedRecordAdapter that leverages the ToMapListener to allow pulling data by key (e.g. USym):

final KeyedRecordAdapter<String, MyTradeHolder> keyedRecordAdapter = KeyedRecordAdapter.makeKeyedRecordAdapter(
                tradesTable,
                myTradeHolderRecordAdapterDescriptor,
                "USym",
                String.class
        );
final MyTradeHolder lastAaplTrade = keyedRecordAdapter.getRecord("AAPL");
final Map<String, MyTradeHolder> lastTradesBySymbol = keyedRecordAdapter.getRecords("AAPL", "CAT", "SPY");

And a TableToRecordListener that lets you listen to updates as objects instead of just row keys:

final Consumer<MyTradeHolder> tradeConsumer = trade -> System.out.println(trade.toString());
TableToRecordListener.create(tradesTable, myTradeHolderRecordAdapterDescriptor, tradeConsumer);

The mapping of columns to fields in whatever object is handled by a RecordAdapterDescriptor:

RecordAdapterDescriptor<MyTradeHolder> myTradeHolderRecordAdapterDescriptor = RecordAdapterDescriptorBuilder
            .create(MyTradeHolder::new)
            .addColumnAdapter("Sym", RecordUpdater.getStringUpdater(MyTradeHolder::setSym))
            .addColumnAdapter("Price", RecordUpdater.getDoubleUpdater(MyTradeHolder::setPrice))
            .addColumnAdapter("Size", RecordUpdater.getIntUpdater(MyTradeHolder::setSize))
            .addColumnAdapter("Timestamp", RecordUpdater.getReferenceTypeUpdater(DBDateTime.class, MyTradeHolder::setTimestamp))
            .build();

The RecordAdapterDescriptor can be used to create a SingleRowRecordAdapter or a MultiRowRecordAdapter. A single-row adapter creates an object (MyTradeHolder or JsonNode or whatever) and reads data from the columns directly into that object. A multi-row adapter has more steps but is intended to be efficient and chunk-oriented. It will:

  1. Create arrays to hold data for all of the columns.
  2. Read the data for all columns into those arrays (by chunk).
  3. Create an array of records (e.g. MyTradeHolder[], JsonNode[], whatever) and fill it with new empty records.
  4. Populate all of the records with the data from the arrays, one column at a time.

A record adapter instance can be generated based on the descriptor. For example, com.illumon.iris.db.util.dataadapter.rec.json.JsonRecordAdapterUtil#createJsonRecordAdapterDescriptor(t, "Col1", "Col2", "Col3").createMultiRowRecordAdapter(t) would figure out the types of "Col1"/"Col2"/"Col3" and generate a class that populates ObjectNodes with those fields.

There is also a draft Python version that lets you create data structures by passing data to either a function or straight to some type's constructor:

class StockTrade:
    sym: str
    price: float
    size: int
    exch: str

    def __init__(self, sym: str, price: float, size: int, exch: str):
        self.sym = sym
        self.price = price
        self.size = size
        self.exch = exch

keyed_record_adapter: KeyedRecordAdapter[str, StockTrade] = KeyedRecordAdapter(
    source,
    StockTrade,
    'Sym',
    ["Price", "Size", "Exch"]
)

records = keyed_record_adapter.get_records(["AAPL", None])

aapl_trade = records['AAPL']
self.assertEquals('AAPL', aapl_trade.sym)
self.assertEquals(1.1, aapl_trade.price)
self.assertEquals(10_000_000_000, aapl_trade.size)
self.assertEquals('ARCA', aapl_trade.exch)

Random notes:

  • This is a generic version of a utility we previously used for latency monitoring (that's where the "create arrays to hold data, consistently read chunks from columns and copy the chunks to the arrays, then copy the data from the arrays back into row-oriented data structures" part comes from)
  • The MultiRowRecordAdapter is responsible for the real work of pulling data out of tables; both KeyedRecordAdapter and TableToRecordListener use that. (KeyedRecordAdapter also uses SingleRowRecordAdapter when fetching single keys.)
  • The Python version (keyed_record_adapter.py/PyKeyedRecordAdapter.java) only supports strings as keys and has no listening component.
  • (The Java unit tests pass. The Python unit tests fail on primitive keys.)

rbasralian avatar Sep 14 '22 21:09 rbasralian

https://github.com/deephaven/deephaven.io/issues/1715

margaretkennedy avatar Sep 21 '22 11:09 margaretkennedy

I have not looked at this PR recently, so I don't remember the contents. DHC has added functionality to help users extract data. I'm noting it here in case it makes this PR no longer needed.

https://github.com/deephaven/deephaven-core/pull/5595 https://deephaven.io/core/docs/how-to-guides/iterate-table-data/

chipkent avatar Nov 01 '24 18:11 chipkent