Add API for ReadOnlySpan for efficient values handling
In .NET Core 2.1 and upcoming netstandard 2.1 ReadOnlySpan
To support this it is enough to add ProcessValueInBuffer overload or maybe add method ReadOnlySpan<char> GetSpanValue(int idx).
use ReadOnlySpan can speed up 10% more.
NOTE: need to be run in !!!Release Mode!!!
public ReadOnlySpan<char> GetReadOnlySpan(int idx)
{
if (idx >= fieldsCount) throw new IndexOutOfRangeException();
var f = fields[idx];
if ((f.Quoted && f.EscapedQuotesCount > 0) || f.End >= bufferLength)
{
var chArr = f.GetValue(buffer).ToCharArray();
return new ReadOnlySpan<char>(chArr, 0, chArr.Length);
}
else if (f.Quoted)
{
return new ReadOnlySpan<char>(buffer, f.Start + 1, f.Length - 2);
}
else
{
return new ReadOnlySpan<char>(buffer, f.Start, f.Length);
}
}
need to be run in !!!Release Mode!!!
@skyyearxp have you performed any performance tests?
yes,but the performance is not good enough, i am processing the simple csv file. so i am trying to use the simplest way to handle the csv data. the csv file may be 6GB with time format string, double string, int string, i am trying parse int/double by myself. read buffer, scan buffer, find ',' and parse int/double when needed, when meet '\r''\n' then new line.
@skyyearxp NReco.Csv parser efficiency should be close to max possible performance of CSV parsing (that handles all valid CSVs) that is possible with C# / single thread. Usage of CSV column value accessor that returns ReadOnlySpan<char> should avoid unnecessary allocations, but most likely processing time will not change significantly.
Have you tried to increase buffer size (CsvReader.BufferSize = 32kb by default)? Also, performance of the underlying TextReader is also very important. For example, if you know that your CSV doesn't contain Unicode chars, do not use UTF8 encoding and use ASCII instead + try to wrap your input stream with BufferedStream with rather large buffer size.
If parse speed is still not acceptable, only way is to use multi-threaded implementation.