uproot5 icon indicating copy to clipboard operation
uproot5 copied to clipboard

Aryan forth reader latest

Open aryan26roy opened this issue 3 years ago • 1 comments

aryan26roy avatar Jul 06 '22 18:07 aryan26roy

@tamasgal, the backdoor that you rely on that calls specialized C++ for doubly nested lists (std::vector<float>[], if I remember right) will be closed in Awkward v2, which corresponds to Uproot v5. The last time that we accidentally closed this door was in issue #572. At that time, I told you about the project that @aryan26roy is working on now (see https://github.com/scikit-hep/uproot5/issues/572#issuecomment-1066963522).

As you can see in this PR, that work has started. Actually, this PR is after @aryan26roy has already settled on a way to proceed, and he is now adding AwkwardForth to everything. In the near future, it would be a good time to test your custom Interpretations to be sure that the right AwkwardForth is generated for them. That will probably be after this PR is closed, I'm just giving you a heads-up.

Maybe a first step would be to ensure that we're planning to generate AwkwardForth along all the code paths that your Interpretations touch. These tests were made to cover all the code paths, and those sites where AwkwardForth need to be generated are annotated with comments like

https://github.com/scikit-hep/uproot5/blob/92b0b08552fa6f5797fb8b373ee8b660388e9e58/src/uproot/streamers.py#L1225

However, there were a few of these that didn't seem to have any test cases, like this one:

https://github.com/scikit-hep/uproot5/blob/92b0b08552fa6f5797fb8b373ee8b660388e9e58/src/uproot/streamers.py#L1333

If your data and Interpretations help us increase our coverage, that would be great! (If not, these sites would toggle a flag on AwkwardForth generation saying to give up and run slow, which is no worse than they are right now.)

I should also let you know that the current main branch is for Uproot v5 and it already has your backdoor removed. The main-v4 branch is what we use to make new bug-fix releases for Uproot v4.

jpivarski avatar Jul 19 '22 19:07 jpivarski

Sorry for the very late response, I was very busy and abroad and just got a little time window to look over the notifications.

These are awesome news and it seems to be a huge achievement!

I am very curious about the new AwkwardForth implementation and will test them as soon as possible. However, please do not expect any extensive feedback before mid September.

tamasgal avatar Aug 16 '22 15:08 tamasgal

Alright, I had a little exploration, but not sure what exactly I should expect, so let me boldly post something which I guess should work out of the box. This uses a test file sample available in the km3net_testdata Python package (I attached the file here: km3net_online.root.zip). I think this file should also be part of the uproot testdata package.

In the below REPL session, I am trying to access f["KM3NET_EVENT/triggeredHits"].array() which is a vector of vectors of JDAQTriggeredHit, a simple structure which I used to interpret with the following uproot3 code:

            triggered_hits = f["KM3NET_EVENT/triggeredHits"].array(
                uproot3.asjagged(
                    uproot3.astable(
                        uproot3.asdtype(
                            [
                                ("dom_id", ">i4"),
                                ("channel_id", "u1"),
                                ("time", "<u4"),
                                ("tot", "u1"),
                                (" cnt", "u4"),
                                (" vers", "u2"),
                                ("trigger_mask", ">u8"),
                            ]
                        )
                    ),
                    skipbytes=10,
                )
            )

Here is the sample session with the current aryan-forth-reader-latest branch of uproot5:

In [1]: import uproot
uproot.
In [2]: uproot.__version__
Out[2]: '5.0.0rc1'

In [3]: uproot.__file__
Out[3]: '/Users/tamasgal/Dev/uproot5/src/uproot/__init__.py'

In [4]: from km3net_testdata import data_path

In [5]: f = uproot.open(data_path("online/km3net_online.root"))

In [6]: f.keys()
Out[6]:
['JTRIGGER::JTriggerParameters;1',
 'META;1',
 'META/JMeta;1',
 'META/JDataWriter;1',
 'E;1',
 'KM3NET_TIMESLICE;1',
 'KM3NET_TIMESLICE_L0;1',
 'KM3NET_TIMESLICE_L1;1',
 'KM3NET_TIMESLICE_L2;1',
 'KM3NET_TIMESLICE_SN;1',
 'KM3NET_EVENT;1',
 'KM3NET_SUMMARYSLICE;1']

In [7]: f["KM3NET_EVENT"].keys()
Out[7]:
['KM3NET_EVENT',
 'KM3NET_EVENT/KM3NETDAQ::JDAQPreamble',
 'KM3NET_EVENT/KM3NETDAQ::JDAQEventHeader',
 'KM3NET_EVENT/triggeredHits',
 'KM3NET_EVENT/snapshotHits']

In [8]: f["KM3NET_EVENT/triggeredHits"].keys()
Out[8]: []

In [9]: f["KM3NET_EVENT/triggeredHits"].array()
---------------------------------------------------------------------------
CannotBeAwkward                           Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 f["KM3NET_EVENT/triggeredHits"].array()

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:2025, in TBranch.array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
   2019             for (
   2020                 basket_num,
   2021                 range_or_basket,
   2022             ) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
   2023                 ranges_or_baskets.append((branch, basket_num, range_or_basket))
-> 2025 _ranges_or_baskets_to_arrays(
   2026     self,
   2027     ranges_or_baskets,
   2028     branchid_interpretation,
   2029     entry_start,
   2030     entry_stop,
   2031     decompression_executor,
   2032     interpretation_executor,
   2033     library,
   2034     arrays,
   2035     False,
   2036 )
   2038 _fix_asgrouped(
   2039     arrays, expression_context, branchid_interpretation, library, None
   2040 )
   2042 if array_cache is not None:

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3316, in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets)
   3313     pass
   3315 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3316     uproot.source.futures.delayed_raise(*obj)
   3318 else:
   3319     raise AssertionError(obj)

File ~/Dev/uproot5/src/uproot/source/futures.py:36, in delayed_raise(exception_class, exception_value, traceback)
     32 def delayed_raise(exception_class, exception_value, traceback):
     33     """
     34     Raise an exception from a background thread on the main thread.
     35     """
---> 36     raise exception_value.with_traceback(traceback)

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3260, in _ranges_or_baskets_to_arrays.<locals>.basket_to_array(basket)
   3257 context = dict(branch.context)
   3258 context["forth"] = forth_context
-> 3260 basket_arrays[basket.basket_num] = interpretation.basket_array(
   3261     basket.data,
   3262     basket.byte_offsets,
   3263     basket,
   3264     branch,
   3265     context,
   3266     basket.member("fKeylen"),
   3267     library,
   3268 )
   3269 if basket.num_entries != len(basket_arrays[basket.basket_num]):
   3270     raise ValueError(
   3271         """basket {} in tree/branch {} has the wrong number of entries """
   3272         """(expected {}, obtained {}) when interpreted as {}
   (...)
   3280         )
   3281     )

File ~/Dev/uproot5/src/uproot/interpretation/objects.py:153, in AsObjects.basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
    151 output = None
    152 if isinstance(library, uproot.interpretation.library.Awkward):
--> 153     form = self.awkward_form(branch.file)
    155     if awkward_can_optimize(self, form):
    156         import awkward._connect._uproot

File ~/Dev/uproot5/src/uproot/interpretation/objects.py:129, in AsObjects.awkward_form(self, file, context, index_format, header, tobject_header, breadcrumbs)
    127     return self._model.awkward_form(self._branch.file, context)
    128 else:
--> 129     return self._model.awkward_form(self._branch.file, context)

File ~/Dev/uproot5/src/uproot/containers.py:1102, in AsVector.awkward_form(self, file, context)
   1098 def awkward_form(self, file, context):
   1099     awkward = uproot.extras.awkward()
   1100     return awkward.forms.ListOffsetForm(
   1101         context["index_format"],
-> 1102         uproot._util.awkward_form(self._values, file, context),
   1103         parameters={"uproot": {"as": "vector", "header": self._header}},
   1104     )

File ~/Dev/uproot5/src/uproot/_util.py:584, in awkward_form(model, file, context)
    581     return _primitive_awkward_form[model]
    583 else:
--> 584     return model.awkward_form(file, context)

File ~/Dev/uproot5/src/uproot/model.py:661, in Model.awkward_form(cls, file, context)
    636 @classmethod
    637 def awkward_form(cls, file, context):
    638     """
    639     Args:
    640         cls (subclass of :doc:`uproot.model.Model`): This class.
   (...)
    659     Awkward Array.
    660     """
--> 661     raise uproot.interpretation.objects.CannotBeAwkward(
    662         classname_decode(cls.__name__)[0]
    663     )

CannotBeAwkward: JDAQTriggeredHit

Another example is the summary slices, which is slightly larger struct. The uproot4 interpretation works with this one:

uproot.interpretation.numerical.AsDtype(
                [
                    (" cnt", "u4"),
                    (" vers", "u2"),
                    (" cnt2", "u4"),
                    (" vers2", "u2"),
                    (" cnt3", "u4"),
                    (" vers3", "u2"),
                    ("detector_id", ">i4"),
                    ("run", ">i4"),
                    ("frame_index", ">i4"),
                    (" cnt4", "u4"),
                    (" vers4", "u2"),
                    ("UTC_seconds", ">u4"),
                    ("UTC_16nanosecondcycles", ">u4"),
                ]
            )

Here is the continuation of the session above with the current aryan-forth-reader-latest branch of uproot5, reading the summary slice data:

In [11]: f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE"].keys()
Out[11]:
['KM3NETDAQ::JDAQPreamble',
 'KM3NETDAQ::JDAQSummarysliceHeader',
 'vector<KM3NETDAQ::JDAQSummaryFrame>']

In [12]: f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE/vector<KM3NETDAQ::JDAQSummaryFrame>"]
Out[12]: <TBranchElement 'vector<KM3NETDAQ::JDAQSummaryFrame>' at 0x000106b20dc0>

In [13]: f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE/vector<KM3NETDAQ::JDAQSummaryFrame>"].keys()
Out[13]: []

In [14]: f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE/vector<KM3NETDAQ::JDAQSummaryFrame>"].array()
---------------------------------------------------------------------------
CannotBeAwkward                           Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE/vector<KM3NETDAQ::JDAQSummaryFrame>"].array()

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:2025, in TBranch.array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
   2019             for (
   2020                 basket_num,
   2021                 range_or_basket,
   2022             ) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
   2023                 ranges_or_baskets.append((branch, basket_num, range_or_basket))
-> 2025 _ranges_or_baskets_to_arrays(
   2026     self,
   2027     ranges_or_baskets,
   2028     branchid_interpretation,
   2029     entry_start,
   2030     entry_stop,
   2031     decompression_executor,
   2032     interpretation_executor,
   2033     library,
   2034     arrays,
   2035     False,
   2036 )
   2038 _fix_asgrouped(
   2039     arrays, expression_context, branchid_interpretation, library, None
   2040 )
   2042 if array_cache is not None:

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3316, in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets)
   3313     pass
   3315 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3316     uproot.source.futures.delayed_raise(*obj)
   3318 else:
   3319     raise AssertionError(obj)

File ~/Dev/uproot5/src/uproot/source/futures.py:36, in delayed_raise(exception_class, exception_value, traceback)
     32 def delayed_raise(exception_class, exception_value, traceback):
     33     """
     34     Raise an exception from a background thread on the main thread.
     35     """
---> 36     raise exception_value.with_traceback(traceback)

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3260, in _ranges_or_baskets_to_arrays.<locals>.basket_to_array(basket)
   3257 context = dict(branch.context)
   3258 context["forth"] = forth_context
-> 3260 basket_arrays[basket.basket_num] = interpretation.basket_array(
   3261     basket.data,
   3262     basket.byte_offsets,
   3263     basket,
   3264     branch,
   3265     context,
   3266     basket.member("fKeylen"),
   3267     library,
   3268 )
   3269 if basket.num_entries != len(basket_arrays[basket.basket_num]):
   3270     raise ValueError(
   3271         """basket {} in tree/branch {} has the wrong number of entries """
   3272         """(expected {}, obtained {}) when interpreted as {}
   (...)
   3280         )
   3281     )

File ~/Dev/uproot5/src/uproot/interpretation/objects.py:153, in AsObjects.basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
    151 output = None
    152 if isinstance(library, uproot.interpretation.library.Awkward):
--> 153     form = self.awkward_form(branch.file)
    155     if awkward_can_optimize(self, form):
    156         import awkward._connect._uproot

File ~/Dev/uproot5/src/uproot/interpretation/objects.py:129, in AsObjects.awkward_form(self, file, context, index_format, header, tobject_header, breadcrumbs)
    127     return self._model.awkward_form(self._branch.file, context)
    128 else:
--> 129     return self._model.awkward_form(self._branch.file, context)

File ~/Dev/uproot5/src/uproot/containers.py:1102, in AsVector.awkward_form(self, file, context)
   1098 def awkward_form(self, file, context):
   1099     awkward = uproot.extras.awkward()
   1100     return awkward.forms.ListOffsetForm(
   1101         context["index_format"],
-> 1102         uproot._util.awkward_form(self._values, file, context),
   1103         parameters={"uproot": {"as": "vector", "header": self._header}},
   1104     )

File ~/Dev/uproot5/src/uproot/_util.py:584, in awkward_form(model, file, context)
    581     return _primitive_awkward_form[model]
    583 else:
--> 584     return model.awkward_form(file, context)

File ~/Dev/uproot5/src/uproot/model.py:661, in Model.awkward_form(cls, file, context)
    636 @classmethod
    637 def awkward_form(cls, file, context):
    638     """
    639     Args:
    640         cls (subclass of :doc:`uproot.model.Model`): This class.
   (...)
    659     Awkward Array.
    660     """
--> 661     raise uproot.interpretation.objects.CannotBeAwkward(
    662         classname_decode(cls.__name__)[0]
    663     )

CannotBeAwkward: KM3NETDAQ::JDAQSummaryFrame

tamasgal avatar Sep 01 '22 13:09 tamasgal

Did these CannotBeAwkward errors happen before we changed anything? As far I know, we haven't touched the awkward_form method. The new procedure generates a Form while walking through the read-step of the first entry (and it's another issue about how we're going to maintain consistency between that Form with the Form that comes from the awkward_form method).

This PR is not intended to add interpretation capabilities, just to replace the implementation. The awkward_form method is called before invoking the read-implementation that has changed. That's why I'm confused about seeing a difference.


Hmm... Actually, all of your interpretations are structured array AsDtype. All of @aryan26roy's work has been on AsObjects and AsStrings. There's no reason that an AsDtype can't be Awkward. In fact, it wouldn't even need AwkwardForth to generate it, even if it's within an AsJagged, as in your first example. The AwkwardForth work in this branch is irrelevant to these particular Interpretations. There is still the doubly-jagged data from #572 that will depend on the AwkwardForth work, but that's not these examples.

So, is it possible to read the two cases you described above in Uproot 5 main? As library="np" and/or library="ak"? Because those two cases shouldn't have anything to do with developments in this branch.

jpivarski avatar Sep 01 '22 16:09 jpivarski

Ah, my bad, I thought this branch also includes new interpretation mechanics. Sorry ;) I also realised I picked the wrong branch, which does not contain double nested structures.

Anyways, I can confirm that the doubly nested things work absolutely fine and return the correct type. Here is an example with a single branch which is doubly nested:

In [1]: import uproot

In [2]: from km3net_testdata import data_path

In [3]: f = uproot.open(data_path("offline/km3net_offline.root"))

In [4]: f["E/Evt/trks/trks.rec_stages"].array()
Out[4]: <Array [[[1, 3, 5, 4], [1, ... 1], [1], [1]]] type='10 * var * var * int32'>

In [5]: uproot.__version__
Out[5]: '5.0.0rc1'

░ tamasgal@silentbox-(5):uproot5  aryan-forth-reader-latest uproot5 py-system took 1m 37s
░ 21:08:09 > git describe
fatal: No annotated tags can describe '7f35e2c3c8202cfc6f8e9a35146f455be9c50517'.

Did these CannotBeAwkward errors happen before we changed anything?

Yes, similar CannotBeAwkward errors appear with uproot4 with the two examples in https://github.com/scikit-hep/uproot5/pull/644#issuecomment-1234262493 (I roughly checked the traceback and the code snippets seem to be the same for both uproot4 and uproot5, including the AssertionErrors), so it's the same behaviour on this branch.

So, is it possible to read the two cases you described above in Uproot 5 main? As library="np" and/or library="ak"?

No, on the main branch I get a ValueError with the message

unknown class JDAQTriggeredHit that cannot be skipped because its number of bytes is unknown

See below. Do you want me to open a new issue and follow it up there?

In [9]: f["KM3NET_EVENT/triggeredHits"].array(library="ak")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 f["KM3NET_EVENT/triggeredHits"].array(library="ak")

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:1895, in TBranch.array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
   1889             for (
   1890                 basket_num,
   1891                 range_or_basket,
   1892             ) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
   1893                 ranges_or_baskets.append((branch, basket_num, range_or_basket))
-> 1895 _ranges_or_baskets_to_arrays(
   1896     self,
   1897     ranges_or_baskets,
   1898     branchid_interpretation,
   1899     entry_start,
   1900     entry_stop,
   1901     decompression_executor,
   1902     interpretation_executor,
   1903     library,
   1904     arrays,
   1905     False,
   1906 )
   1908 _fix_asgrouped(
   1909     arrays, expression_context, branchid_interpretation, library, None
   1910 )
   1912 if array_cache is not None:

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3180, in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets)
   3177     pass
   3179 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3180     uproot.source.futures.delayed_raise(*obj)
   3182 else:
   3183     raise AssertionError(obj)

File ~/Dev/uproot5/src/uproot/source/futures.py:36, in delayed_raise(exception_class, exception_value, traceback)
     32 def delayed_raise(exception_class, exception_value, traceback):
     33     """
     34     Raise an exception from a background thread on the main thread.
     35     """
---> 36     raise exception_value.with_traceback(traceback)

File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3124, in _ranges_or_baskets_to_arrays.<locals>.basket_to_array(basket)
   3121 interpretation = branchid_interpretation[branch.cache_key]
   3122 basket_arrays = branchid_arrays[branch.cache_key]
-> 3124 basket_arrays[basket.basket_num] = interpretation.basket_array(
   3125     basket.data,
   3126     basket.byte_offsets,
   3127     basket,
   3128     branch,
   3129     branch.context,
   3130     basket.member("fKeylen"),
   3131     library,
   3132 )
   3133 if basket.num_entries != len(basket_arrays[basket.basket_num]):
   3134     raise ValueError(
   3135         """basket {} in tree/branch {} has the wrong number of entries """
   3136         """(expected {}, obtained {}) when interpreted as {}
   (...)
   3144         )
   3145     )

File ~/Dev/uproot5/src/uproot/interpretation/objects.py:133, in AsObjects.basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
    129     output = self.basket_array_forth(
    130         data, byte_offsets, basket, branch, context, cursor_offset, library
    131     )
    132 else:
--> 133     output = ObjectArray(
    134         self._model, branch, context, byte_offsets, data, cursor_offset
    135     ).to_numpy()
    137 self.hook_after_basket_array(
    138     data=data,
    139     byte_offsets=byte_offsets,
   (...)
    145     library=library,
    146 )
    148 return output

File ~/Dev/uproot5/src/uproot/interpretation/objects.py:696, in ObjectArray.to_numpy(self)
    694 output = numpy.empty(len(self), dtype=numpy.dtype(object))
    695 for i in range(len(self)):
--> 696     output[i] = self[i]
    697 return output

File ~/Dev/uproot5/src/uproot/interpretation/objects.py:711, in ObjectArray.__getitem__(self, where)
    707     chunk = uproot.source.chunk.Chunk.wrap(self._branch.file.source, data)
    708     cursor = uproot.source.cursor.Cursor(
    709         0, origin=-(byte_start + self._cursor_offset)
    710     )
--> 711     return self._model.read(
    712         chunk,
    713         cursor,
    714         self._context,
    715         self._branch.file,
    716         self._detached_file,
    717         self._branch,
    718     )
    720 elif isinstance(where, slice):
    721     return ObjectArray(
    722         self._model,
    723         self._branch,
   (...)
    727         self._cursor_offset,
    728     )

File ~/Dev/uproot5/src/uproot/containers.py:1102, in AsVector.read(self, chunk, cursor, context, file, selffile, parent, header)
   1099     if length == 0 and helper_obj.is_forth():
   1100         forth_obj.var_set = True
-> 1102     values = _read_nested(
   1103         self._values, length, chunk, cursor, context, file, selffile, parent
   1104     )
   1106 if helper_obj.is_forth():
   1107     forth_obj.go_to(temp)

File ~/Dev/uproot5/src/uproot/containers.py:64, in _read_nested(model, length, chunk, cursor, context, file, selffile, parent, header)
     62 else:
     63     for i in range(length):
---> 64         values[i] = model.read(chunk, cursor, context, file, selffile, parent)
     65 return values

File ~/Dev/uproot5/src/uproot/model.py:824, in Model.read(cls, chunk, cursor, context, file, selffile, parent, concrete)
    813 if helper_obj.is_forth():
    814     temp = forth_obj.add_node(
    815         "pass",
    816         helper_obj.get_pre(),
   (...)
    822         {},
    823     )
--> 824 self.read_members(chunk, cursor, context, file)
    825 if helper_obj.is_forth():
    826     forth_obj.go_to(temp)

File ~/Dev/uproot5/src/uproot/model.py:1481, in UnknownClass.read_members(self, chunk, cursor, context, file)
   1478             cursor.skip(self._num_bytes - cursor.displacement(self._cursor))
   1480         else:
-> 1481             raise ValueError(
   1482                 """unknown class {} that cannot be skipped because its """
   1483                 """number of bytes is unknown
   1484 in file {}""".format(
   1485                     self.classname, file.file_path
   1486                 )
   1487             )

ValueError: unknown class JDAQTriggeredHit that cannot be skipped because its number of bytes is unknown
in file /Users/tamasgal/Dev/uproot5/venv/lib/python3.9/site-packages/km3net_testdata/data/online/km3net_online.root

tamasgal avatar Sep 02 '22 19:09 tamasgal

No, on the main branch I get a ValueError with the message

unknown class JDAQTriggeredHit that cannot be skipped because its number of bytes is unknown

See below. Do you want me to open a new issue and follow it up there?

If this is a new issue (i.e. it used to be interpreted and now it is not), then yes, I should look at it and find out what broke. If this type couldn't be read before, then it's not likely that I'll be able to solve it—these issues got progressively more complex and I got progressively less time. In the past, you found custom Interpretations that worked for your classes; we can keep using that solution as long as nothing has gotten worse.

The AwkwardForth implementation (which is porting the Python implementation, not adding new capabilities) should be invoked on hand-made Interpretations as well as automatically detected ones. That's what I want to make sure will work for you: it's possible that some of your hand-made Interpretations are providing coverage over parts of the codebase that our built-in test suite isn't. I think there are approximately 4 places in streamers.py where we should be adding AwkwardForth code, but we don't have any sample ROOT files + automatically identified Interpretations that cover these parts of the code, so we can't be certain that our implementations are correct. (In fact, I think we're leaving them as NotImplementedErrors.) If your Interpretations cover those parts, then (1) we'll be able to actually test them and (2) our final product won't be broken when you start using Uproot v5.

That's what we were looking for.

And, wait a minute—@aryan26roy, doesn't he need to set a _use_forth = True flag to the AsObjects Interpretation for it to actually use the AwkwardForth, at the moment?

jpivarski avatar Sep 02 '22 19:09 jpivarski

@jpivarski yes, the exact flag that @tamasgal needs to set is "interpretation._forth=True" to actually trigger the forth generation and reading process.

aryan26roy avatar Sep 02 '22 19:09 aryan26roy

My custom interpretations seem to work fine! I added a test (https://github.com/scikit-hep/uproot5/pull/701) and here is the covereage in streamers.py before:

src/uproot/streamers.py                       546    191    65%   69, 71, 73, 75, 77, 81, 83, 85, 87, 89, 91, 95, 100, 102, 104, 106, 108, 112, 114, 116, 118, 120, 122, 126, 146, 164-174, 189, 213, 286, 288, 314, 364, 385, 423-429, 437-439, 465, 481, 503, 534-535, 548, 563-566, 574, 587-601, 621-633, 664, 669, 737-750, 770, 790-860, 915-918, 922-925, 960-961, 1018-1042, 1049, 1055, 1060, 1070, 1144-1148, 1173, 1194-1213, 1216-1235, 1238-1242, 1262, 1269, 1343-1351, 1408-1444, 1447-1451, 1464-1478, 1556-1560

and after adding the test:

src/uproot/streamers.py                       546     65    88%   89, 95, 102, 120, 126, 146, 189, 286, 385, 423-429, 503, 534-535, 548, 563-566, 587-601, 621-633, 741, 743, 797-806, 915-918, 922-925, 1055, 1060, 1148, 1173, 1194-1213, 1238-1242, 1262, 1269, 1348-1351, 1439-1442, 1447-1451

I think that's already a good jump in coverage (from 65% to 88%), but I will check if I can squeeze out more.

tamasgal avatar Sep 02 '22 20:09 tamasgal

Yeah, if this provides more coverage, then it's great to add it! I'm a little confused as to how it could add more coverage to the AwkwardForth if the Interpretation is an AsDtype. (I'm talking about what I see in #701.)

@aryan26roy, am I missing something?

jpivarski avatar Sep 02 '22 20:09 jpivarski

I have not checked the exact lines, so those might not be related to AwkwardForth ;)

Btw. I just realised that I don't have any AsObject interpretations in my codebase (anymore). Sorry for the confusion!

Edit: I'd move the tests to a different place then...

tamasgal avatar Sep 02 '22 20:09 tamasgal

Getting that increase in coverage is a boon, anyway. If the coverage increase is still there without the interpretation._forth = True line, or when applied to the main branch, then we'd want to include it nevertheless (maybe in a PR that goes directly to main, and hopefully avoid whatever pre-commit was doing). That resolves what I considered issue (1), Uproot coverage.

For issue (2), if you don't have any custom AsObjects Interpretations, then that's great news: nothing will break!

The doubly-nested data that performance-regressed in issue #572 is relevant to AwkwardForth, though. Since it's a doubly nested list, it must be AsObjects, and we'll want to test its performance with AwkwardForth. The AwkwardForth implementation might be a little slower than the custom back-door we implemented with C++, but that custom back-door has to go away (it was always intended to be temporary—though "temporary" has been years by now). What we need to ensure is that the AwkwardForth is no more than a little slower, like 50%, not catastrophically slower, like factors of 10× or 100×.

jpivarski avatar Sep 02 '22 21:09 jpivarski

Yes I see, OK I'll put those two tests into a PR towards main. A tiny test for the doubly-nested data is coming :)

tamasgal avatar Sep 02 '22 21:09 tamasgal

I just checked the coverage with those two tests on main (without forth) but there is no increase -- the coverage of streamers.py there is 88%, so that was a dud. It might be that I missed to install some dependency which skipped some other tests while switching between the virtualenvs.

Anyways, the doubly-nested structure test is in-place, so feel free to copy it over since the pre-commit bombed my PR ;D

This will do it in tests/test_0637-setup-tests-for-AwkwardForth.py:

def test_82():
    with uproot.open(skhep_testdata.data_path("uproot-issue-214.root")) as file:
        branch = file["E/Evt/trks/trks.rec_stages"]
        interp = uproot.interpretation.identify.interpretation_of(branch, {}, False)
        interp._forth = True
        py = branch.array(interpretation=interp, library="ak")
        assert py[0][0][0] == 1

All in all, I am pretty sure that at least KM3NeT is good to go with uproot5 without the need to change existing code, thanks :)

I will report back on the performance once I get my hands on a larger dataset!

tamasgal avatar Sep 02 '22 21:09 tamasgal

I checked a few files which I had on my local drive and I don't see any significant performance difference in the read-out of doubly-nested data between uproot4 and this branch. For whatever reason, this branch performs even slightly better ;)

For example with the file from https://github.com/scikit-hep/uproot5/issues/572 and reading the whole dataset via f["E/Evt/trks/trks.rec_stages"].array() I get

this branch: 222 ms ± 5.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
uproot4: 279 ms ± 6.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

tamasgal avatar Sep 02 '22 23:09 tamasgal

Great! There were other changes, especially in how Awkward Arrays are constructed, so there are plenty of candidates to credit the speed-up with. (It's probably a balance between some things being faster and others being slower.) If the AwkwardForth wasn't working for some reason, it would be a lot worse:

awkwardforth-in-uproot-performance

jpivarski avatar Sep 02 '22 23:09 jpivarski