Not wroking correctly on file with too many lines
Environment
- Ubuntu 18.04
- CLion 2019.1.4
Issue
I test the program on a CSV file 20k_rows_data.csv.txt with 20K lines and the program does not work correctly. (I change the filename with .txt, because GitHub issue does not support uploading .csv file.)
int main() {
csv::Reader csv;
csv.read("../tests/inputs/20k_rows_data.csv.txt");
auto rows = csv.rows();
auto cols = csv.cols();
int row_count = 0;
for (auto row : rows) {
std::string s = std::to_string(++row_count);
for (auto col : cols) {
s += ' ' + (std::string)(row[col]);
}
std::cout << s << std::endl;
}
}
Part of the output is like (copy from my console):
5332
5333
5334 1 1 1 1 1
5335
5336
5337 1 1 1 1 1
5338
5339
5340
5341 1 1 1 1 1
5342
5343
Note that the outputs are not the same each time I run it.
I also have this issue, is there a fix planned soon?
@p-ranav perhaps it makes sense to make a single threaded/simpler version of the reader implementation and opt-in to the threaded with flags? Runtime or compile time. This issue kind of discouraged me.
Just fooling around with these changes, it looks like the test passes with std or unordered_map on my computer - but going back to unordered_flat_map causes it to have blank records again so I'm of the opinion's there's race conditions going on
diff --git a/include/csv/reader.hpp b/include/csv/reader.hpp
index 56542d7..0793d3b 100644
--- a/include/csv/reader.hpp
+++ b/include/csv/reader.hpp
@@ -46,8 +46,13 @@ SOFTWARE.
#include <iterator>
#include <atomic>
#include <string_view>
+#include <map>
namespace csv {
+ template<typename K, typename V>
+ using map_type = std::map<K, V>;
+ //using map_type = std::unordered_map<K, V>;
+ //using map_type = unordered_flat_map<K, V>;
class Reader {
public:
@@ -121,16 +126,16 @@ namespace csv {
bool ready() {
size_t rows = 0;
- number_of_rows_processed_.try_dequeue(rows);
- row_iterator_queue_.try_dequeue(ready_index_);
- bool result = (ready_index_ < expected_number_of_rows_ && ready_index_ < rows);
+ auto firstValid = number_of_rows_processed_.try_dequeue(rows);
+ auto secondValid = row_iterator_queue_.try_dequeue(ready_index_);
+ bool result = firstValid && secondValid && (ready_index_ < expected_number_of_rows_ && ready_index_ < rows);
return result;
}
- unordered_flat_map<std::string_view, std::string> next_row() {
+ map_type<std::string_view, std::string> next_row() {
row_iterator_queue_.enqueue(next_index_);
next_index_ += 1;
- unordered_flat_map<std::string_view, std::string> result;
+ map_type<std::string_view, std::string> result;
rows_.try_dequeue(rows_ctoken_, result);
return result;
}
@@ -218,8 +223,8 @@ namespace csv {
}
}
- std::vector<unordered_flat_map<std::string_view, std::string>> rows() {
- std::vector<unordered_flat_map<std::string_view, std::string>> rows;
+ std::vector<map_type<std::string_view, std::string>> rows() {
+ std::vector<map_type<std::string_view, std::string>> rows;
while (!done()) {
if (ready()) {
rows.push_back(next_row());
@@ -448,9 +453,9 @@ namespace csv {
std::string filename_;
std::ifstream stream_;
std::vector<std::string> headers_;
- unordered_flat_map<std::string_view, std::string> current_row_;
+ map_type<std::string_view, std::string> current_row_;
std::string current_value_;
- ConcurrentQueue<unordered_flat_map<std::string_view, std::string>> rows_;
+ ConcurrentQueue<map_type<std::string_view, std::string>> rows_;
ProducerToken rows_ptoken_;
ConsumerToken rows_ctoken_;
ConcurrentQueue<size_t> number_of_rows_processed_;
@@ -473,7 +478,7 @@ namespace csv {
ProducerToken values_ptoken_;
ConsumerToken values_ctoken_;
std::string current_dialect_name_;
- unordered_flat_map<std::string, Dialect> dialects_;
+ map_type<std::string, Dialect> dialects_;
Dialect current_dialect_;
size_t done_index_;
size_t ready_index_;
I noticed the try_dequeue's return bool but this is never checked. I'm also not sure why in the next_row pathways, and somehow we can have a ready that completes but next_row() return's an empty record from the concurrent queue.
This code also results in the wrong answer. 0 instead of 1.
csv::Writer csvFile("Test.csv");
csvFile.configure_dialect()
.delimiter(", ")
.column_names("D", "O", "H", "L", "C", "V", "M");
csvFile.write_row("1", "2", "3", "4", "5", "6", "7");
csvFile.close();
csv::Reader csv;
csv.read("Test.csv");
auto rows = csv.rows();
cout << rows.size() << "\n";
If i write another row then the answer is correct (2).
Edit: I see that the issue is Closed but still exists. At least in a version provided by vcpkg.
Hello,
I'm working on a second implementation of this library: https://github.com/p-ranav/csv2. The reader is ready for use. Check it out. Hopefully it works better. I'm planning to archive this repo in favor of csv2.
Sorry again for all the issues you've faced with this library.