databricks-sql-go
databricks-sql-go copied to clipboard
ArrowIPCStreamIterator.SchemaBytes() returns zero bytes after a cold start
Context
Given an Databricks SQL Warehouse that is currently stopped and this code:
driverCntor, driverErr := dbsql.NewConnector(options...)
if driverErr != nil {
return driverErr
}
conn, connErr := driverCntor.Connect(ctx)
if connErr != nil {
return connErr
}
ctx := context.Background()
rows, queryErr := conn.(driver.QueryerContext).QueryContext(ctx, "SELECT 1", []driver.NamedValue{})
if queryErr != nil {
return queryErr
}
defer rows.Close()
ipcStreamIt, ipcStreamErr := rows.(dbsqlrows.Rows).GetArrowIPCStreams(ctx)
if ipcStreamErr != nil {
return ipcStreamErr
}
defer ipcStreamIt.Close()
schemaBytes, schemaErr := ipcStreamIt.SchemaBytes()
if schemaErr != nil {
return schemaErr
}
if len(schemaBytes) == 0 {
return fmt.Errorf("no schema bytes")
}
Expected result
No error
Actual result
Error: no schema bytes
Notes
- This only happens when the Databricks SQL Warehouse is stopped
- When the Databricks SQL Warehouse is active,
schemaBytescontains correct data
This works when the Databricks SQL Warehouse is either stopped or active:
streamReader, streamReaderErr := ipcStreamIt.Next()
if streamReaderErr != nil {
return streamReaderErr
}
arrowReader, arrowReaderErr := ipc.NewReader(streamReader)
if arrowReaderErr != nil {
return arrowReaderErr
}
defer arrowReader.Release()
arrowSchema = arrowReader.Schema()
However:
- It complicates the usage where the schema must be known before iterating on the Arrow RecordBatches
- It does not work when there are 0 rows in the result (cases 2 and 4 below)
Summary
| Case | Warehouse | Result | ArrowIPCStreamIterator.SchemaBytes() | ArrowIPCStreamIterator.Next() |
|---|---|---|---|---|
| 1 | Stopped (Cold start) | >= 1 rows | ❌ 0 bytes | ✅ not EOF -> Can use ipc.NewReader().ArrowSchema() |
| 2 | 0 rows | ❌ 0 bytes | ❌ EOF | |
| 3 | Active | >= 1 rows | ✅ valid bytes -> Can use ipc.NewReader().ArrowSchema() | ✅ not EOF -> Can use ipc.NewReader().ArrowSchema() |
| 4 | 0 rows | ✅ valid bytes -> Can use ipc.NewReader().ArrowSchema() | ❌ EOF |
I could combine ArrowIPCStreamIterator.SchemaBytes() and ArrowIPCStreamIterator.Next() to work around cases 1 and 4. But haven't found any workaround for case 2.