sqlite-xsv icon indicating copy to clipboard operation
sqlite-xsv copied to clipboard

csv_reader declaration to include columns with spaces

Open opustecnica opened this issue 2 years ago • 3 comments

While in agreement that spaces should never be used in a column name, at times, when dealing with externally sourced csvs, spaces are indeed present.

When attempting to create a virtual table that includes spaces, a 'vtable constructor failed: iavm_reader' error is returned.

e.g. CREATE VIRTUAL TABLE temp.iavm_reader using csv_reader(plugin, [plugin name])

Is there a way to work around this?

PS When creating a virtual table directly there are no issues with the header.

e.g. CREATE VIRTUAL TABLE temp.iavm_csv using csv(filename=iavm.csv)

opustecnica avatar Sep 21 '23 19:09 opustecnica

not yet, this is a limitation with the custom parser sqlite-xsv has that tries to parse column names + definitions on that case. It splits by word and doesn't take into account spaces/[] brackets/"" quotes, which the SQLite core library does.

But it should! Like you mentioned there are workaround if you use filename=, but that's not always possible. Will look into seeing how much effort it'll be to update that naive parser.

asg017 avatar Sep 21 '23 20:09 asg017

Alex,

To the best of my understanding, columns with (unfortunate) spaces can be handled in SQLite in 3 ways: .1 'Item Type' .2 "Item Type" .3 [Item Type] (this is my preferred way)

When creating a VT like:

CREATE VIRTUAL TABLE IF NOT EXISTS temp.xsv_5m_Sales_Records_Test USING csv(filename='D:\Temp\5m-Sales-Records.csv.zst', header=false,Region TEXT, Country TEXT, [Item Type] TEXT

it fails. When replacing "[]" with single quotes, the creation goes through but the results are:

Region Country Item
------ ------- ----
Europe Poland  Beverages
Europe Poland  Clothes

Using alternate delimiters like '`' (backtick) allows the table creation to complete but the column will assume the generic name of 'Column1'

Region Country Column1
------ ------- -------
Europe Poland  Beverages
Europe Poland  Clothes
Europe Poland  Cosmetics
Europe Poland  Snacks

opustecnica avatar Nov 19 '23 01:11 opustecnica

Alex, if I understand correctly the code despite my turkey induced drowsiness, the column space issue should be resolved in the sqlite-loadable-rs (vtab_argparse.rs). This is an attempt. Let me know your thoughts.

pub fn arg_is_column_declaration(arg: &str) -> Result<Option<ColumnDeclaration>, String> {
    if arg.trim().is_empty() {
        return Ok(None);
    }

    // TODO this is a bit of a hack, but it works for now
    // NOTE Test if 'arg' contains characters that imply the column name has a space.
    let regex_check = Regex::new(r#"[\['"]"#).unwrap();
    // NOTE Assign an appropriate regex string based on the above check.
    let regex_string = if regex_check.is_match(arg) {
        r#"^[\['"](?<name>.*?)[\]'"]\s+(?<declared_type>.*?)\s+(?<constraints>.*)$"#
    } else {
        r#"^(?<name>.*?)\s+(?<declared_type>.*?)\s+(?<constraints>.*)$"#
    };
    // NOTE Compile the regex string into a regex object.
    let regex_result = Regex::new(regex_string);
    // NOTE Check if the regex compiled successfully and assign the results to variables.
    match regex_result {
        Ok(re) => match re.captures(arg) {
            Some(caps) => {
                /* 
               // TEST
                println!(
                    "Found match: Name: {0} - Declared Type: {1} - Constraints: {2}",
                    &caps["name"].trim(),
                    &caps["declared_type"].trim(),
                    &caps["constraints"].trim()
                );
                */
                // TODO Verify scope.
                name = &caps["name"].trim();
                declared_type = &caps["declared_type"].trim();
                constraints = &caps["constraints"].trim();
            }
            None => {
                println!("No match found!");
            }
        },
        Err(err) => {
            println!("Could not compile regex: {}", err);
        }
    }

    Ok(Some(ColumnDeclaration::new(
        name,
        declared_type,
        constraints,
    )))
}

Happy TG!

opustecnica avatar Nov 25 '23 18:11 opustecnica