Aryan forth reader latest
@tamasgal, the backdoor that you rely on that calls specialized C++ for doubly nested lists (std::vector<float>[], if I remember right) will be closed in Awkward v2, which corresponds to Uproot v5. The last time that we accidentally closed this door was in issue #572. At that time, I told you about the project that @aryan26roy is working on now (see https://github.com/scikit-hep/uproot5/issues/572#issuecomment-1066963522).
As you can see in this PR, that work has started. Actually, this PR is after @aryan26roy has already settled on a way to proceed, and he is now adding AwkwardForth to everything. In the near future, it would be a good time to test your custom Interpretations to be sure that the right AwkwardForth is generated for them. That will probably be after this PR is closed, I'm just giving you a heads-up.
Maybe a first step would be to ensure that we're planning to generate AwkwardForth along all the code paths that your Interpretations touch. These tests were made to cover all the code paths, and those sites where AwkwardForth need to be generated are annotated with comments like
https://github.com/scikit-hep/uproot5/blob/92b0b08552fa6f5797fb8b373ee8b660388e9e58/src/uproot/streamers.py#L1225
However, there were a few of these that didn't seem to have any test cases, like this one:
https://github.com/scikit-hep/uproot5/blob/92b0b08552fa6f5797fb8b373ee8b660388e9e58/src/uproot/streamers.py#L1333
If your data and Interpretations help us increase our coverage, that would be great! (If not, these sites would toggle a flag on AwkwardForth generation saying to give up and run slow, which is no worse than they are right now.)
I should also let you know that the current main branch is for Uproot v5 and it already has your backdoor removed. The main-v4 branch is what we use to make new bug-fix releases for Uproot v4.
Sorry for the very late response, I was very busy and abroad and just got a little time window to look over the notifications.
These are awesome news and it seems to be a huge achievement!
I am very curious about the new AwkwardForth implementation and will test them as soon as possible. However, please do not expect any extensive feedback before mid September.
Alright, I had a little exploration, but not sure what exactly I should expect, so let me boldly post something which I guess should work out of the box. This uses a test file sample available in the km3net_testdata Python package (I attached the file here: km3net_online.root.zip). I think this file should also be part of the uproot testdata package.
In the below REPL session, I am trying to access f["KM3NET_EVENT/triggeredHits"].array() which is a vector of vectors of JDAQTriggeredHit, a simple structure which I used to interpret with the following uproot3 code:
triggered_hits = f["KM3NET_EVENT/triggeredHits"].array(
uproot3.asjagged(
uproot3.astable(
uproot3.asdtype(
[
("dom_id", ">i4"),
("channel_id", "u1"),
("time", "<u4"),
("tot", "u1"),
(" cnt", "u4"),
(" vers", "u2"),
("trigger_mask", ">u8"),
]
)
),
skipbytes=10,
)
)
Here is the sample session with the current aryan-forth-reader-latest branch of uproot5:
In [1]: import uproot
uproot.
In [2]: uproot.__version__
Out[2]: '5.0.0rc1'
In [3]: uproot.__file__
Out[3]: '/Users/tamasgal/Dev/uproot5/src/uproot/__init__.py'
In [4]: from km3net_testdata import data_path
In [5]: f = uproot.open(data_path("online/km3net_online.root"))
In [6]: f.keys()
Out[6]:
['JTRIGGER::JTriggerParameters;1',
'META;1',
'META/JMeta;1',
'META/JDataWriter;1',
'E;1',
'KM3NET_TIMESLICE;1',
'KM3NET_TIMESLICE_L0;1',
'KM3NET_TIMESLICE_L1;1',
'KM3NET_TIMESLICE_L2;1',
'KM3NET_TIMESLICE_SN;1',
'KM3NET_EVENT;1',
'KM3NET_SUMMARYSLICE;1']
In [7]: f["KM3NET_EVENT"].keys()
Out[7]:
['KM3NET_EVENT',
'KM3NET_EVENT/KM3NETDAQ::JDAQPreamble',
'KM3NET_EVENT/KM3NETDAQ::JDAQEventHeader',
'KM3NET_EVENT/triggeredHits',
'KM3NET_EVENT/snapshotHits']
In [8]: f["KM3NET_EVENT/triggeredHits"].keys()
Out[8]: []
In [9]: f["KM3NET_EVENT/triggeredHits"].array()
---------------------------------------------------------------------------
CannotBeAwkward Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 f["KM3NET_EVENT/triggeredHits"].array()
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:2025, in TBranch.array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
2019 for (
2020 basket_num,
2021 range_or_basket,
2022 ) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
2023 ranges_or_baskets.append((branch, basket_num, range_or_basket))
-> 2025 _ranges_or_baskets_to_arrays(
2026 self,
2027 ranges_or_baskets,
2028 branchid_interpretation,
2029 entry_start,
2030 entry_stop,
2031 decompression_executor,
2032 interpretation_executor,
2033 library,
2034 arrays,
2035 False,
2036 )
2038 _fix_asgrouped(
2039 arrays, expression_context, branchid_interpretation, library, None
2040 )
2042 if array_cache is not None:
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3316, in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets)
3313 pass
3315 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3316 uproot.source.futures.delayed_raise(*obj)
3318 else:
3319 raise AssertionError(obj)
File ~/Dev/uproot5/src/uproot/source/futures.py:36, in delayed_raise(exception_class, exception_value, traceback)
32 def delayed_raise(exception_class, exception_value, traceback):
33 """
34 Raise an exception from a background thread on the main thread.
35 """
---> 36 raise exception_value.with_traceback(traceback)
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3260, in _ranges_or_baskets_to_arrays.<locals>.basket_to_array(basket)
3257 context = dict(branch.context)
3258 context["forth"] = forth_context
-> 3260 basket_arrays[basket.basket_num] = interpretation.basket_array(
3261 basket.data,
3262 basket.byte_offsets,
3263 basket,
3264 branch,
3265 context,
3266 basket.member("fKeylen"),
3267 library,
3268 )
3269 if basket.num_entries != len(basket_arrays[basket.basket_num]):
3270 raise ValueError(
3271 """basket {} in tree/branch {} has the wrong number of entries """
3272 """(expected {}, obtained {}) when interpreted as {}
(...)
3280 )
3281 )
File ~/Dev/uproot5/src/uproot/interpretation/objects.py:153, in AsObjects.basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
151 output = None
152 if isinstance(library, uproot.interpretation.library.Awkward):
--> 153 form = self.awkward_form(branch.file)
155 if awkward_can_optimize(self, form):
156 import awkward._connect._uproot
File ~/Dev/uproot5/src/uproot/interpretation/objects.py:129, in AsObjects.awkward_form(self, file, context, index_format, header, tobject_header, breadcrumbs)
127 return self._model.awkward_form(self._branch.file, context)
128 else:
--> 129 return self._model.awkward_form(self._branch.file, context)
File ~/Dev/uproot5/src/uproot/containers.py:1102, in AsVector.awkward_form(self, file, context)
1098 def awkward_form(self, file, context):
1099 awkward = uproot.extras.awkward()
1100 return awkward.forms.ListOffsetForm(
1101 context["index_format"],
-> 1102 uproot._util.awkward_form(self._values, file, context),
1103 parameters={"uproot": {"as": "vector", "header": self._header}},
1104 )
File ~/Dev/uproot5/src/uproot/_util.py:584, in awkward_form(model, file, context)
581 return _primitive_awkward_form[model]
583 else:
--> 584 return model.awkward_form(file, context)
File ~/Dev/uproot5/src/uproot/model.py:661, in Model.awkward_form(cls, file, context)
636 @classmethod
637 def awkward_form(cls, file, context):
638 """
639 Args:
640 cls (subclass of :doc:`uproot.model.Model`): This class.
(...)
659 Awkward Array.
660 """
--> 661 raise uproot.interpretation.objects.CannotBeAwkward(
662 classname_decode(cls.__name__)[0]
663 )
CannotBeAwkward: JDAQTriggeredHit
Another example is the summary slices, which is slightly larger struct. The uproot4 interpretation works with this one:
uproot.interpretation.numerical.AsDtype(
[
(" cnt", "u4"),
(" vers", "u2"),
(" cnt2", "u4"),
(" vers2", "u2"),
(" cnt3", "u4"),
(" vers3", "u2"),
("detector_id", ">i4"),
("run", ">i4"),
("frame_index", ">i4"),
(" cnt4", "u4"),
(" vers4", "u2"),
("UTC_seconds", ">u4"),
("UTC_16nanosecondcycles", ">u4"),
]
)
Here is the continuation of the session above with the current aryan-forth-reader-latest branch of uproot5, reading the summary slice data:
In [11]: f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE"].keys()
Out[11]:
['KM3NETDAQ::JDAQPreamble',
'KM3NETDAQ::JDAQSummarysliceHeader',
'vector<KM3NETDAQ::JDAQSummaryFrame>']
In [12]: f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE/vector<KM3NETDAQ::JDAQSummaryFrame>"]
Out[12]: <TBranchElement 'vector<KM3NETDAQ::JDAQSummaryFrame>' at 0x000106b20dc0>
In [13]: f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE/vector<KM3NETDAQ::JDAQSummaryFrame>"].keys()
Out[13]: []
In [14]: f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE/vector<KM3NETDAQ::JDAQSummaryFrame>"].array()
---------------------------------------------------------------------------
CannotBeAwkward Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 f["KM3NET_SUMMARYSLICE/KM3NET_SUMMARYSLICE/vector<KM3NETDAQ::JDAQSummaryFrame>"].array()
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:2025, in TBranch.array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
2019 for (
2020 basket_num,
2021 range_or_basket,
2022 ) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
2023 ranges_or_baskets.append((branch, basket_num, range_or_basket))
-> 2025 _ranges_or_baskets_to_arrays(
2026 self,
2027 ranges_or_baskets,
2028 branchid_interpretation,
2029 entry_start,
2030 entry_stop,
2031 decompression_executor,
2032 interpretation_executor,
2033 library,
2034 arrays,
2035 False,
2036 )
2038 _fix_asgrouped(
2039 arrays, expression_context, branchid_interpretation, library, None
2040 )
2042 if array_cache is not None:
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3316, in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets)
3313 pass
3315 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3316 uproot.source.futures.delayed_raise(*obj)
3318 else:
3319 raise AssertionError(obj)
File ~/Dev/uproot5/src/uproot/source/futures.py:36, in delayed_raise(exception_class, exception_value, traceback)
32 def delayed_raise(exception_class, exception_value, traceback):
33 """
34 Raise an exception from a background thread on the main thread.
35 """
---> 36 raise exception_value.with_traceback(traceback)
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3260, in _ranges_or_baskets_to_arrays.<locals>.basket_to_array(basket)
3257 context = dict(branch.context)
3258 context["forth"] = forth_context
-> 3260 basket_arrays[basket.basket_num] = interpretation.basket_array(
3261 basket.data,
3262 basket.byte_offsets,
3263 basket,
3264 branch,
3265 context,
3266 basket.member("fKeylen"),
3267 library,
3268 )
3269 if basket.num_entries != len(basket_arrays[basket.basket_num]):
3270 raise ValueError(
3271 """basket {} in tree/branch {} has the wrong number of entries """
3272 """(expected {}, obtained {}) when interpreted as {}
(...)
3280 )
3281 )
File ~/Dev/uproot5/src/uproot/interpretation/objects.py:153, in AsObjects.basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
151 output = None
152 if isinstance(library, uproot.interpretation.library.Awkward):
--> 153 form = self.awkward_form(branch.file)
155 if awkward_can_optimize(self, form):
156 import awkward._connect._uproot
File ~/Dev/uproot5/src/uproot/interpretation/objects.py:129, in AsObjects.awkward_form(self, file, context, index_format, header, tobject_header, breadcrumbs)
127 return self._model.awkward_form(self._branch.file, context)
128 else:
--> 129 return self._model.awkward_form(self._branch.file, context)
File ~/Dev/uproot5/src/uproot/containers.py:1102, in AsVector.awkward_form(self, file, context)
1098 def awkward_form(self, file, context):
1099 awkward = uproot.extras.awkward()
1100 return awkward.forms.ListOffsetForm(
1101 context["index_format"],
-> 1102 uproot._util.awkward_form(self._values, file, context),
1103 parameters={"uproot": {"as": "vector", "header": self._header}},
1104 )
File ~/Dev/uproot5/src/uproot/_util.py:584, in awkward_form(model, file, context)
581 return _primitive_awkward_form[model]
583 else:
--> 584 return model.awkward_form(file, context)
File ~/Dev/uproot5/src/uproot/model.py:661, in Model.awkward_form(cls, file, context)
636 @classmethod
637 def awkward_form(cls, file, context):
638 """
639 Args:
640 cls (subclass of :doc:`uproot.model.Model`): This class.
(...)
659 Awkward Array.
660 """
--> 661 raise uproot.interpretation.objects.CannotBeAwkward(
662 classname_decode(cls.__name__)[0]
663 )
CannotBeAwkward: KM3NETDAQ::JDAQSummaryFrame
Did these CannotBeAwkward errors happen before we changed anything? As far I know, we haven't touched the awkward_form method. The new procedure generates a Form while walking through the read-step of the first entry (and it's another issue about how we're going to maintain consistency between that Form with the Form that comes from the awkward_form method).
This PR is not intended to add interpretation capabilities, just to replace the implementation. The awkward_form method is called before invoking the read-implementation that has changed. That's why I'm confused about seeing a difference.
Hmm... Actually, all of your interpretations are structured array AsDtype. All of @aryan26roy's work has been on AsObjects and AsStrings. There's no reason that an AsDtype can't be Awkward. In fact, it wouldn't even need AwkwardForth to generate it, even if it's within an AsJagged, as in your first example. The AwkwardForth work in this branch is irrelevant to these particular Interpretations. There is still the doubly-jagged data from #572 that will depend on the AwkwardForth work, but that's not these examples.
So, is it possible to read the two cases you described above in Uproot 5 main? As library="np" and/or library="ak"? Because those two cases shouldn't have anything to do with developments in this branch.
Ah, my bad, I thought this branch also includes new interpretation mechanics. Sorry ;) I also realised I picked the wrong branch, which does not contain double nested structures.
Anyways, I can confirm that the doubly nested things work absolutely fine and return the correct type. Here is an example with a single branch which is doubly nested:
In [1]: import uproot
In [2]: from km3net_testdata import data_path
In [3]: f = uproot.open(data_path("offline/km3net_offline.root"))
In [4]: f["E/Evt/trks/trks.rec_stages"].array()
Out[4]: <Array [[[1, 3, 5, 4], [1, ... 1], [1], [1]]] type='10 * var * var * int32'>
In [5]: uproot.__version__
Out[5]: '5.0.0rc1'
░ tamasgal@silentbox-(5):uproot5 aryan-forth-reader-latest uproot5 py-system took 1m 37s
░ 21:08:09 > git describe
fatal: No annotated tags can describe '7f35e2c3c8202cfc6f8e9a35146f455be9c50517'.
Did these CannotBeAwkward errors happen before we changed anything?
Yes, similar CannotBeAwkward errors appear with uproot4 with the two examples in https://github.com/scikit-hep/uproot5/pull/644#issuecomment-1234262493 (I roughly checked the traceback and the code snippets seem to be the same for both uproot4 and uproot5, including the AssertionErrors), so it's the same behaviour on this branch.
So, is it possible to read the two cases you described above in Uproot 5 main? As library="np" and/or library="ak"?
No, on the main branch I get a ValueError with the message
unknown class JDAQTriggeredHit that cannot be skipped because its number of bytes is unknown
See below. Do you want me to open a new issue and follow it up there?
In [9]: f["KM3NET_EVENT/triggeredHits"].array(library="ak")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 f["KM3NET_EVENT/triggeredHits"].array(library="ak")
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:1895, in TBranch.array(self, interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, array_cache, library)
1889 for (
1890 basket_num,
1891 range_or_basket,
1892 ) in branch.entries_to_ranges_or_baskets(entry_start, entry_stop):
1893 ranges_or_baskets.append((branch, basket_num, range_or_basket))
-> 1895 _ranges_or_baskets_to_arrays(
1896 self,
1897 ranges_or_baskets,
1898 branchid_interpretation,
1899 entry_start,
1900 entry_stop,
1901 decompression_executor,
1902 interpretation_executor,
1903 library,
1904 arrays,
1905 False,
1906 )
1908 _fix_asgrouped(
1909 arrays, expression_context, branchid_interpretation, library, None
1910 )
1912 if array_cache is not None:
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3180, in _ranges_or_baskets_to_arrays(hasbranches, ranges_or_baskets, branchid_interpretation, entry_start, entry_stop, decompression_executor, interpretation_executor, library, arrays, update_ranges_or_baskets)
3177 pass
3179 elif isinstance(obj, tuple) and len(obj) == 3:
-> 3180 uproot.source.futures.delayed_raise(*obj)
3182 else:
3183 raise AssertionError(obj)
File ~/Dev/uproot5/src/uproot/source/futures.py:36, in delayed_raise(exception_class, exception_value, traceback)
32 def delayed_raise(exception_class, exception_value, traceback):
33 """
34 Raise an exception from a background thread on the main thread.
35 """
---> 36 raise exception_value.with_traceback(traceback)
File ~/Dev/uproot5/src/uproot/behaviors/TBranch.py:3124, in _ranges_or_baskets_to_arrays.<locals>.basket_to_array(basket)
3121 interpretation = branchid_interpretation[branch.cache_key]
3122 basket_arrays = branchid_arrays[branch.cache_key]
-> 3124 basket_arrays[basket.basket_num] = interpretation.basket_array(
3125 basket.data,
3126 basket.byte_offsets,
3127 basket,
3128 branch,
3129 branch.context,
3130 basket.member("fKeylen"),
3131 library,
3132 )
3133 if basket.num_entries != len(basket_arrays[basket.basket_num]):
3134 raise ValueError(
3135 """basket {} in tree/branch {} has the wrong number of entries """
3136 """(expected {}, obtained {}) when interpreted as {}
(...)
3144 )
3145 )
File ~/Dev/uproot5/src/uproot/interpretation/objects.py:133, in AsObjects.basket_array(self, data, byte_offsets, basket, branch, context, cursor_offset, library)
129 output = self.basket_array_forth(
130 data, byte_offsets, basket, branch, context, cursor_offset, library
131 )
132 else:
--> 133 output = ObjectArray(
134 self._model, branch, context, byte_offsets, data, cursor_offset
135 ).to_numpy()
137 self.hook_after_basket_array(
138 data=data,
139 byte_offsets=byte_offsets,
(...)
145 library=library,
146 )
148 return output
File ~/Dev/uproot5/src/uproot/interpretation/objects.py:696, in ObjectArray.to_numpy(self)
694 output = numpy.empty(len(self), dtype=numpy.dtype(object))
695 for i in range(len(self)):
--> 696 output[i] = self[i]
697 return output
File ~/Dev/uproot5/src/uproot/interpretation/objects.py:711, in ObjectArray.__getitem__(self, where)
707 chunk = uproot.source.chunk.Chunk.wrap(self._branch.file.source, data)
708 cursor = uproot.source.cursor.Cursor(
709 0, origin=-(byte_start + self._cursor_offset)
710 )
--> 711 return self._model.read(
712 chunk,
713 cursor,
714 self._context,
715 self._branch.file,
716 self._detached_file,
717 self._branch,
718 )
720 elif isinstance(where, slice):
721 return ObjectArray(
722 self._model,
723 self._branch,
(...)
727 self._cursor_offset,
728 )
File ~/Dev/uproot5/src/uproot/containers.py:1102, in AsVector.read(self, chunk, cursor, context, file, selffile, parent, header)
1099 if length == 0 and helper_obj.is_forth():
1100 forth_obj.var_set = True
-> 1102 values = _read_nested(
1103 self._values, length, chunk, cursor, context, file, selffile, parent
1104 )
1106 if helper_obj.is_forth():
1107 forth_obj.go_to(temp)
File ~/Dev/uproot5/src/uproot/containers.py:64, in _read_nested(model, length, chunk, cursor, context, file, selffile, parent, header)
62 else:
63 for i in range(length):
---> 64 values[i] = model.read(chunk, cursor, context, file, selffile, parent)
65 return values
File ~/Dev/uproot5/src/uproot/model.py:824, in Model.read(cls, chunk, cursor, context, file, selffile, parent, concrete)
813 if helper_obj.is_forth():
814 temp = forth_obj.add_node(
815 "pass",
816 helper_obj.get_pre(),
(...)
822 {},
823 )
--> 824 self.read_members(chunk, cursor, context, file)
825 if helper_obj.is_forth():
826 forth_obj.go_to(temp)
File ~/Dev/uproot5/src/uproot/model.py:1481, in UnknownClass.read_members(self, chunk, cursor, context, file)
1478 cursor.skip(self._num_bytes - cursor.displacement(self._cursor))
1480 else:
-> 1481 raise ValueError(
1482 """unknown class {} that cannot be skipped because its """
1483 """number of bytes is unknown
1484 in file {}""".format(
1485 self.classname, file.file_path
1486 )
1487 )
ValueError: unknown class JDAQTriggeredHit that cannot be skipped because its number of bytes is unknown
in file /Users/tamasgal/Dev/uproot5/venv/lib/python3.9/site-packages/km3net_testdata/data/online/km3net_online.root
No, on the
mainbranch I get aValueErrorwith the messageunknown class JDAQTriggeredHit that cannot be skipped because its number of bytes is unknown
See below. Do you want me to open a new issue and follow it up there?
If this is a new issue (i.e. it used to be interpreted and now it is not), then yes, I should look at it and find out what broke. If this type couldn't be read before, then it's not likely that I'll be able to solve it—these issues got progressively more complex and I got progressively less time. In the past, you found custom Interpretations that worked for your classes; we can keep using that solution as long as nothing has gotten worse.
The AwkwardForth implementation (which is porting the Python implementation, not adding new capabilities) should be invoked on hand-made Interpretations as well as automatically detected ones. That's what I want to make sure will work for you: it's possible that some of your hand-made Interpretations are providing coverage over parts of the codebase that our built-in test suite isn't. I think there are approximately 4 places in streamers.py where we should be adding AwkwardForth code, but we don't have any sample ROOT files + automatically identified Interpretations that cover these parts of the code, so we can't be certain that our implementations are correct. (In fact, I think we're leaving them as NotImplementedErrors.) If your Interpretations cover those parts, then (1) we'll be able to actually test them and (2) our final product won't be broken when you start using Uproot v5.
That's what we were looking for.
And, wait a minute—@aryan26roy, doesn't he need to set a _use_forth = True flag to the AsObjects Interpretation for it to actually use the AwkwardForth, at the moment?
@jpivarski yes, the exact flag that @tamasgal needs to set is "interpretation._forth=True" to actually trigger the forth generation and reading process.
My custom interpretations seem to work fine! I added a test (https://github.com/scikit-hep/uproot5/pull/701) and here is the covereage in streamers.py before:
src/uproot/streamers.py 546 191 65% 69, 71, 73, 75, 77, 81, 83, 85, 87, 89, 91, 95, 100, 102, 104, 106, 108, 112, 114, 116, 118, 120, 122, 126, 146, 164-174, 189, 213, 286, 288, 314, 364, 385, 423-429, 437-439, 465, 481, 503, 534-535, 548, 563-566, 574, 587-601, 621-633, 664, 669, 737-750, 770, 790-860, 915-918, 922-925, 960-961, 1018-1042, 1049, 1055, 1060, 1070, 1144-1148, 1173, 1194-1213, 1216-1235, 1238-1242, 1262, 1269, 1343-1351, 1408-1444, 1447-1451, 1464-1478, 1556-1560
and after adding the test:
src/uproot/streamers.py 546 65 88% 89, 95, 102, 120, 126, 146, 189, 286, 385, 423-429, 503, 534-535, 548, 563-566, 587-601, 621-633, 741, 743, 797-806, 915-918, 922-925, 1055, 1060, 1148, 1173, 1194-1213, 1238-1242, 1262, 1269, 1348-1351, 1439-1442, 1447-1451
I think that's already a good jump in coverage (from 65% to 88%), but I will check if I can squeeze out more.
Yeah, if this provides more coverage, then it's great to add it! I'm a little confused as to how it could add more coverage to the AwkwardForth if the Interpretation is an AsDtype. (I'm talking about what I see in #701.)
@aryan26roy, am I missing something?
I have not checked the exact lines, so those might not be related to AwkwardForth ;)
Btw. I just realised that I don't have any AsObject interpretations in my codebase (anymore). Sorry for the confusion!
Edit: I'd move the tests to a different place then...
Getting that increase in coverage is a boon, anyway. If the coverage increase is still there without the interpretation._forth = True line, or when applied to the main branch, then we'd want to include it nevertheless (maybe in a PR that goes directly to main, and hopefully avoid whatever pre-commit was doing). That resolves what I considered issue (1), Uproot coverage.
For issue (2), if you don't have any custom AsObjects Interpretations, then that's great news: nothing will break!
The doubly-nested data that performance-regressed in issue #572 is relevant to AwkwardForth, though. Since it's a doubly nested list, it must be AsObjects, and we'll want to test its performance with AwkwardForth. The AwkwardForth implementation might be a little slower than the custom back-door we implemented with C++, but that custom back-door has to go away (it was always intended to be temporary—though "temporary" has been years by now). What we need to ensure is that the AwkwardForth is no more than a little slower, like 50%, not catastrophically slower, like factors of 10× or 100×.
Yes I see, OK I'll put those two tests into a PR towards main. A tiny test for the doubly-nested data is coming :)
I just checked the coverage with those two tests on main (without forth) but there is no increase -- the coverage of streamers.py there is 88%, so that was a dud. It might be that I missed to install some dependency which skipped some other tests while switching between the virtualenvs.
Anyways, the doubly-nested structure test is in-place, so feel free to copy it over since the pre-commit bombed my PR ;D
This will do it in tests/test_0637-setup-tests-for-AwkwardForth.py:
def test_82():
with uproot.open(skhep_testdata.data_path("uproot-issue-214.root")) as file:
branch = file["E/Evt/trks/trks.rec_stages"]
interp = uproot.interpretation.identify.interpretation_of(branch, {}, False)
interp._forth = True
py = branch.array(interpretation=interp, library="ak")
assert py[0][0][0] == 1
All in all, I am pretty sure that at least KM3NeT is good to go with uproot5 without the need to change existing code, thanks :)
I will report back on the performance once I get my hands on a larger dataset!
I checked a few files which I had on my local drive and I don't see any significant performance difference in the read-out of doubly-nested data between uproot4 and this branch. For whatever reason, this branch performs even slightly better ;)
For example with the file from https://github.com/scikit-hep/uproot5/issues/572 and reading the whole dataset via f["E/Evt/trks/trks.rec_stages"].array() I get
this branch: 222 ms ± 5.55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
uproot4: 279 ms ± 6.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Great! There were other changes, especially in how Awkward Arrays are constructed, so there are plenty of candidates to credit the speed-up with. (It's probably a balance between some things being faster and others being slower.) If the AwkwardForth wasn't working for some reason, it would be a lot worse:
