csv icon indicating copy to clipboard operation
csv copied to clipboard

Add API for ReadOnlySpan for efficient values handling

Open VitaliyMF opened this issue 7 years ago • 4 comments

In .NET Core 2.1 and upcoming netstandard 2.1 ReadOnlySpan can be used for zero-allocation handling of string values - without need to create 'string' object from the buffer.

To support this it is enough to add ProcessValueInBuffer overload or maybe add method ReadOnlySpan<char> GetSpanValue(int idx).

VitaliyMF avatar Nov 28 '18 14:11 VitaliyMF

use ReadOnlySpan can speed up 10% more.

NOTE: need to be run in !!!Release Mode!!!

		public ReadOnlySpan<char> GetReadOnlySpan(int idx)
		{
			if (idx >= fieldsCount) throw new IndexOutOfRangeException();

			var f = fields[idx];

			if ((f.Quoted && f.EscapedQuotesCount > 0) || f.End >= bufferLength)
			{
				var chArr = f.GetValue(buffer).ToCharArray();
				return new ReadOnlySpan<char>(chArr, 0, chArr.Length);
			}
			else if (f.Quoted)
			{
				return new ReadOnlySpan<char>(buffer, f.Start + 1, f.Length - 2);
			}
			else
			{
				return new ReadOnlySpan<char>(buffer, f.Start, f.Length);
			}
		}


need to be run in !!!Release Mode!!!

skyyearxp avatar May 05 '20 10:05 skyyearxp

@skyyearxp have you performed any performance tests?

VitaliyMF avatar May 07 '20 09:05 VitaliyMF

yes,but the performance is not good enough, i am processing the simple csv file. so i am trying to use the simplest way to handle the csv data. the csv file may be 6GB with time format string, double string, int string, i am trying parse int/double by myself. read buffer, scan buffer, find ',' and parse int/double when needed, when meet '\r''\n' then new line.

skyyearxp avatar May 10 '20 13:05 skyyearxp

@skyyearxp NReco.Csv parser efficiency should be close to max possible performance of CSV parsing (that handles all valid CSVs) that is possible with C# / single thread. Usage of CSV column value accessor that returns ReadOnlySpan<char> should avoid unnecessary allocations, but most likely processing time will not change significantly.

Have you tried to increase buffer size (CsvReader.BufferSize = 32kb by default)? Also, performance of the underlying TextReader is also very important. For example, if you know that your CSV doesn't contain Unicode chars, do not use UTF8 encoding and use ASCII instead + try to wrap your input stream with BufferedStream with rather large buffer size.

If parse speed is still not acceptable, only way is to use multi-threaded implementation.

VitaliyMF avatar May 11 '20 08:05 VitaliyMF