quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Can you support csv format as input

Open sangensong opened this issue 3 years ago • 2 comments

Hi. when I want to Full-text search on Clickhouse,the source file's format insert into clickhouse is csv. The reason why I choose to use csv files to insert clickhouse is that csv files take up less space than json files, and the insertion speed is faster, and the speed of generating csv files is also faster. If I convert the csv file to json file, It's too slow.

I want to you can support the ingest method like

cat *.csv | ./quickwit index ingest --input-format csv --index gh-archive

OR

cat *.csv | ./quickwit index ingest --index gh-archive

Looking forward to your reply, thank you

sangensong avatar May 20 '22 06:05 sangensong

Hey, I'm also looking for csv support. It could be awesome to allow custom separator.

leofvo avatar Jun 13 '23 18:06 leofvo

you can use VRL to feed quickwit with csv, and transform it

# Your source config here
# ...
input_format: plain_text
transform:
  script: |
    # csv looks like: "123;abc;def"
    parsed_csv = parse_csv!(.plain_text, ";")
    .my_field1 = to_int!(parsed_csv[0])
    .my_field2 = parsed_csv[1]
    .my_field3 = parsed_csv[2]
    .original_csv = .plain_text
    del(.plain_text)

currently, this can be made to work with the ingest API, but isn't very user friendly. The ingest api is a datasource that's created automatically, and there is no provided way to edit it. Modifying manually the config stored in the metastore do work.

trinity-1686a avatar Feb 12 '24 11:02 trinity-1686a