framework icon indicating copy to clipboard operation
framework copied to clipboard

Wrong stats on repeated attempts of read after transform

Open chenweiss opened this issue 2 years ago • 2 comments

Frictionless version: 5.10.1

Overview

Attempts to read lines from a transformed CSV files fail after initial exhaustion (either via read_rows or just running validate on it).

Steps to reproduce

Consider the following CSV file:

a,b
1,2

And the following code snippet:

import frictionless
from frictionless import Pipeline, Resource, steps
r = Resource("test.csv")
print(len(r.read_rows()))  # 1
print(len(r.read_rows()))  # 1
r2 = frictionless.transform(r, pipeline=Pipeline(steps=[steps.field_pack(name="test", from_names=["a"], as_object=True)]))
print(len(r2.read_rows()))  # 1
print(len(r2.read_rows()))  # 0
print(len(r.read_rows()))  # 0

I'd expect the number of rows to remain the same until the very end.

In general, I have encountered issues with regards to rows disappearing after basic transformations, perhaps some test cases need to be added to address this? :thinking:

chenweiss avatar Mar 29 '23 01:03 chenweiss

When I change the step to be field_add instead of field_pack, this error does not happen - probably there's a bug in the field_pack step.

chenweiss avatar Mar 29 '23 01:03 chenweiss

Thanks for reporting!

roll avatar Apr 03 '23 11:04 roll