databricks-sql-go icon indicating copy to clipboard operation
databricks-sql-go copied to clipboard

ArrowIPCStreamIterator.SchemaBytes() returns zero bytes after a cold start

Open datbth opened this issue 2 months ago • 1 comments

Context

Given an Databricks SQL Warehouse that is currently stopped and this code:

	driverCntor, driverErr := dbsql.NewConnector(options...)
	if driverErr != nil {
		return driverErr
	}

	conn, connErr := driverCntor.Connect(ctx)
	if connErr != nil {
		return connErr
	}

	ctx := context.Background()
	rows, queryErr := conn.(driver.QueryerContext).QueryContext(ctx, "SELECT 1", []driver.NamedValue{})
	if queryErr != nil {
		return queryErr
	}
	defer rows.Close()

	ipcStreamIt, ipcStreamErr := rows.(dbsqlrows.Rows).GetArrowIPCStreams(ctx)
	if ipcStreamErr != nil {
		return ipcStreamErr
	}
	defer ipcStreamIt.Close()

	schemaBytes, schemaErr := ipcStreamIt.SchemaBytes()
	if schemaErr != nil {
		return schemaErr
	}

	if len(schemaBytes) == 0 {
		return fmt.Errorf("no schema bytes")
	}

Expected result

No error

Actual result

Error: no schema bytes

Notes

  • This only happens when the Databricks SQL Warehouse is stopped
  • When the Databricks SQL Warehouse is active, schemaBytes contains correct data

datbth avatar Dec 24 '25 03:12 datbth

This works when the Databricks SQL Warehouse is either stopped or active:

	streamReader, streamReaderErr := ipcStreamIt.Next()
	if streamReaderErr != nil {
		return streamReaderErr
	}
	arrowReader, arrowReaderErr := ipc.NewReader(streamReader)
	if arrowReaderErr != nil {
		return arrowReaderErr
	}
	defer arrowReader.Release()

	arrowSchema = arrowReader.Schema()

However:

  • It complicates the usage where the schema must be known before iterating on the Arrow RecordBatches
  • It does not work when there are 0 rows in the result (cases 2 and 4 below)

Summary

Case Warehouse Result ArrowIPCStreamIterator.SchemaBytes() ArrowIPCStreamIterator.Next()
1 Stopped (Cold start) >= 1 rows ❌ 0 bytes ✅ not EOF -> Can use ipc.NewReader().ArrowSchema()
2 0 rows ❌ 0 bytes ❌ EOF
3 Active >= 1 rows ✅ valid bytes -> Can use ipc.NewReader().ArrowSchema() ✅ not EOF -> Can use ipc.NewReader().ArrowSchema()
4 0 rows ✅ valid bytes -> Can use ipc.NewReader().ArrowSchema() ❌ EOF

I could combine ArrowIPCStreamIterator.SchemaBytes() and ArrowIPCStreamIterator.Next() to work around cases 1 and 4. But haven't found any workaround for case 2.

datbth avatar Dec 24 '25 04:12 datbth