Write to a file directly from a datareader
I recently had a need to write to a CSV file directly from a SqlDataReader. The data sets were large, in some cases more than 5 GB, so I wanted to stream directly to a file instead of using a DataTable. I also didn't want to create a wrapper class for each file to output because there were a lot of them and more could be added. I didn't see any existing way to do this but I was able to do it pretty easily by wrapping the SqlDataReader in a dynamic object and passing that in to the CsvWriter.
If this functionality already exists and I just missed it, let me know. Otherwise, would you be interested in this capability as a new feature?
This is covered in the "old" documentation. Also asked (#234).
It's not an intrinsic feature, but is quite simple to stitch together. Using the WriteField API allows you to avoid the cost of using dynamic objects, too. Also note the availability of async variants, if your situation can benefit. Look out for #967 if you use Flush().
Thanks for quick reply. You're right, it is simple to implement this (and other extensions) and there are multiple ways to do it, which in my opinion one of the marks of a well build library so kudos to everyone who's worked on this. What I'm asking is if there would be any interest in adding this as a native feature since I already have the code done. I would just have to migrate it from my project to this one and add the unit tests. If there is but you would prefer not to use dynamic objects, I think it could also be pretty easily added by creating a DataReaderRecordWriter in the Expressions namespace.
I'm currently investigating this solution as well. If anyone can provide a code example that does this already, that would be greatly appreciated. I'll keep investigating in the meantime and write back here with the solution. I actually got here from the first page of a google search, if you can believe that.
Here's the solution I came up with and have tested. It's only part of the body of a larger method that needs refactoring, but I think the code is pretty understandable and can easily be adapted.
This snippet of code is also available as a Gist if you want to share it.
try
{
await using var sqlDataReader = await sqlCommand.ExecuteReaderAsync(cancellationToken);
await using var streamWriter = new StreamWriter($"{reportName}_{startDate:dd-MM-yyyy}-{endDate:dd-MM-yyyy}.csv");
await using var csvWriter = new CsvWriter(streamWriter, _csvConfiguration);
var columnSchema = sqlDataReader.GetColumnSchema();
var columnCount = columnSchema.Count;
// write headers to CSV:
foreach (var columnName in columnSchema.Select(column => column.ColumnName))
{
csvWriter.WriteField(columnName);
}
await csvWriter.NextRecordAsync();
// write rows to CSV:
while (await sqlDataReader.ReadAsync(cancellationToken))
{
object[] values = new object[columnCount];
sqlDataReader.GetValues(values);
foreach (var v in values)
{
csvWriter.WriteField(v);
}
await csvWriter.NextRecordAsync();
}
if (streamWriter.BaseStream is FileStream fs)
{
_logger.LogInformation("Wrote report file at {0}", fs.Name);
}
}
catch (Exception e)
{
_logger.LogError("Error during reading SQL data and writing CSV files: {0}", GetExceptionMessageFirstLine(e));
}
An improvement to @brokenthorn 's solution that avoids allocating an array for every row is to iterate through the columns in the row that's been read like so:
await using (var db = new SqlConnection(Connection.ConnectionString))
await using (var cmd = db.CreateCommand())
{
await db.OpenAsync();
cmd.CommandText = @"
select *
from MyTable";
cmd.CommandTimeout = 5 * 60;
await using (var reader = await cmd.ExecuteReaderAsync())
await using (var writer = new StreamWriter(Path.Combine(MyExtensions.TempFolder, "MyFile.csv")))
await using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
var schema = await reader.GetColumnSchemaAsync();
foreach (var columnName in schema.Select(c => c.ColumnName))
{
csv.WriteField(columnName);
}
await csv.NextRecordAsync();
while (await reader.ReadAsync())
{
for (int i = 0; i < reader.FieldCount; i++)
{
csv.WriteField(reader[i]);
}
await csv.NextRecordAsync();
}
}
}
@c0shea, ah! nice catch! good point!
You've removed exception handling too, I've noted.
One other thing. Perhaps reading each field with reader[i] in a loop is a larger operational overhead compared to reading the whole row? I'm not sure...
You've removed all allocations. But maybe one could have moved my array allocation out of the while loop and reused the array in the loop? Just making sure to clear it before setting new values. Maybe that would be more performant?
I would benchmark but I don't have the incentive anymore. 🤷