fulgor icon indicating copy to clipboard operation
fulgor copied to clipboard

Generating `WARNING: No newline at ending of file 'nonewline.fasta'` during construction results in a broken/incorrect index

Open tmaklin opened this issue 1 year ago • 3 comments

I ran fulgor on broken input containing files that generate WARNING: No newline at ending of file 'nonewline.fasta' errors from ggcat and noticed that the index fulgor builds will be wrong after this.

For example a large .fur index had size 206G on disk when generated from broken inputs but when the inputs were fixed the size grew to 281G which is closer to what I expected. Queries on the first index worked but produced results with no matches in the broken inputs.

The different index sizes also replicate on artificial data containing a file that generates the warning.

So, just as a heads-up, it might be better to abort if the inputs have this error. I've also reported this to ggcat and suggested that the warning should be an error.

tmaklin avatar Aug 21 '24 06:08 tmaklin

Hi Tommi and thank for the suggestion. I agree: if the input is broken, GGCAT should abort the construction (and Fulgor too, in turn). Right now, I don't think there is a way to fix this in Fulgor as the warning is just a printed message. Right?

jermp avatar Aug 21 '24 07:08 jermp

Yeah it's just text printed from rust using eprintln (https://github.com/algbio/ggcat/blob/a91ecc97f286b737b37195c0a86f0e11ad6bfc3b/crates/io/src/lines_reader.rs#L155) so detecting would require either capturing the text from rust and parsing it, or checking the input files somewhere within the fulgor code. I don't think this is a very common error to run into, though, so probably OK to wait and see if ggcat changes this.

tmaklin avatar Aug 21 '24 07:08 tmaklin

Ok, as I thought. I'll leave this issue open anyway as a reminder.

thanks!

jermp avatar Aug 21 '24 07:08 jermp