quickwit-datasource icon indicating copy to clipboard operation
quickwit-datasource copied to clipboard

Multisearch response unmarshalling loses nano precision from numeric timestamps

Open ddelemeny opened this issue 1 year ago • 1 comments

This issue proves complex to tackle, better keep a description of things here .

The problem :

https://github.com/quickwit-oss/quickwit-datasource/blob/8a3c88b0301b4e68f5fc8c8b8c38730dd0d99173/pkg/quickwit/response_parser.go#L296

When creating a data frame from the response to a search query, the response parser finds numeric timestamp values cast as float64. This cast is not wanted, as float64 can't represent nanosecond precision timestamps.

Where does it happen

https://github.com/quickwit-oss/quickwit-datasource/blob/8a3c88b0301b4e68f5fc8c8b8c38730dd0d99173/pkg/quickwit/client/client.go#L188

The json response from the API gets unmarshalled early in the processing pipeline. The json decoder used in ExecuteMultisearch casts number to float64 by default

The solution :

The decoder can be told to unmarshal numbers to a "polymorphic" type : json.Number, which allows further processing to decide the actual type of the specific datum down the line.

The other problem :

Changing from a simple type to a polymorphic one in a complex multi-branched processing pipeline tends to break a lot of stuff. Reworking the whole call-tree of a Multisearch to handle json.Number instead of float64 numbers is an absurd chore.

The more complete solution ?

A reasonable approach would be to perform a shallow unmarshal early in the process, only to be able to dispatch responses to sub-handlers. Then do a second unmarshal of the response in subhandlers, when it's appropriate to decide if we want the polymorphic type (when timestamps are used). response_parser.go:parseResponse looks like a good candidate

ddelemeny avatar Apr 10 '24 23:04 ddelemeny

sort key seems to suffer from precision loss too, resulting in a possible shuffled ascending order : the dataframe's nanos field received from the backend has incorrect values.

Need more love.

Edit : dataframe nanos don't come from the sort key but from the timestamp. Available timestamp output formats currently do not represent nanos accurately in Quickwit, this needs to be addressed first.

ddelemeny avatar May 15 '24 17:05 ddelemeny