evidence icon indicating copy to clipboard operation
evidence copied to clipboard

Boolean column with all nulls causes `npm run sources` sources to fail

Open archiewood opened this issue 2 years ago • 7 comments

Steps To Reproduce

Create a source query with a boolean column containing only nulls

select null::bool as country, 100 as sales 
union all 
select null::bool as country, 200 as sales
union all 
select null::bool as country, 300 as sales

Environment

  • Database: postgres, duckdb (assume also others but not tested)

Expected Behavior

A column with nulls should be returned

Actual Behaviour

Invalid argument error: Need at least 1 bytes for bitmap in buffers[0] in array of type Boolean, but got 0

image

Workarounds

Don't select this column with nulls as it is contains no information anyway

archiewood avatar Jan 18 '24 22:01 archiewood

Only occurs on bool columns

mcrascal avatar Jan 22 '24 20:01 mcrascal

https://github.com/apache/arrow-rs/tree/master/arrow-data/src/data.rs#L778

Comes from arrow-rs, which is a dependency of parquet-wasm, so it's something in multipart parquet creation

csjh avatar Jan 22 '24 20:01 csjh

Awaiting apache/arrow#39776

csjh avatar Jan 23 '24 22:01 csjh

@csjh in the meantime can we provide something more informative in the error message?

mcrascal avatar Jan 23 '24 22:01 mcrascal

Yeah, but something that just occurred to me is that we're still pinned on version 11.0.0, so even if it is fixed upstream we'll still have the faulty version (at least until duckdb-wasm bumps their versions?). Not sure what the best message to put in the error/warning would be considering that.

csjh avatar Jan 23 '24 22:01 csjh

How bad would it be to just convert boolean columns to true/false? Or maybe having a special case when the column is fully null?

csjh avatar Jan 23 '24 23:01 csjh

Special case when column is fully null feels not too bad.

I'd probably advocate for one of the following

  1. Error Message along the lines of:
  • Error: Column "column_name" is of type Boolean and contains only null values, which is not allowed. Consider removing the column from the query or casting it to another type
  1. Cast it to a string by default, and print a warning:
  • Warning: Column "column_name" (type Boolean) contains only null values so has been cast to varchar
  1. Fill with false by default, and print a warning:
  • Warning: Column "column_name" (type Boolean) contains only null values so has been infilled with false values.

archiewood avatar Jan 23 '24 23:01 archiewood

@csjh is this done?

archiewood avatar Mar 04 '24 23:03 archiewood

Fixed by #1521

csjh avatar Mar 04 '24 23:03 csjh