Researcher file breaks read_raw_eyelink
Description of the problem
A researcher has reported that their file breaks read_raw_eyelink. I'm reporting the details below, so apologies in advance for the lengthy explanation:
Problem
In the specification for their ASCII format, EyeLink states that recording periods (where gaze/pupil data are actually being recorded) should always be demarcated by “START” and “END” lines, as shown below:
START
xpos ypos pupil
xpos ypos pupil
END
...
START
xpos ypos pupil
xpos ypos pupil
END
...
START
xpos ypos pupil
xpos ypos pupil
END
For us, this is important because lines in the ASCII file that occur outside these START…END recording sections are unstructured and, IMO, are difficult to parse.
In other words, read_raw_eyelink looks for the START events, and parses lines until it hits an END event. Per Eyelink's specification, we always assume that any given START event will eventually be followed by an END event (before another START event occurs).
This assumption has held up until now. In the problematic file that the researcher shared, it looks like one of the recording blocks in the file is missing an “END” event, resulting in a format like:
START
xpos ypos pupil
xpos ypos pupil
END
...
START
xpos ypos pupil
xpos ypos pupil
...
START
xpos ypos pupil
xpos ypos pupil
END
So what happens is that for the block that is missing an END event, read_raw_eyelink tries to parse lines would typically occur outside recording blocks (specifically, these lines contain information about an eyetracking calibration), and thus that it is not prepared for. This breaks the reader.
I'm not sure how easy it will be to make our reader robust to this case. I might try some other EyeLink ASCII readers out there to see if they are able to read the file. In the mean time I'm opening this ticket so that we have a record of it.
Steps to reproduce
# Get the link to the file from the MNE forum (linked below)
from pathlib import Path
import mne
fname = Path().home() / "path" / "to" / "downloaded" / "file"
raw = mne.io.read_raw_eyelink(fname)
Link to data
https://drive.google.com/drive/folders/15SpQuoXZlmH6ZBLcOEoc4nzEA7ewoHuK
Expected results
a raw object
Actual results
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/mne/io/eyelink/eyelink.py", line 62, in read_raw_eyelink
raw_eyelink = RawEyelink(
File "<decorator-gen-202>", line 12, in __init__
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/mne/io/eyelink/eyelink.py", line 107, in __init__
eye_ch_data, info, raw_extras = _parse_eyelink_ascii(
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/mne/io/eyelink/_utils.py", line 71, in _parse_eyelink_ascii
raw_extras["dfs"]["samples"] = _adjust_times(
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/mne/io/eyelink/_utils.py", line 509, in _adjust_times
return pd.merge_asof(
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 708, in merge_asof
return op.get_result()
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1926, in get_result
join_index, left_indexer, right_indexer = self._get_join_info()
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 1151, in _get_join_info
(left_indexer, right_indexer) = self._get_join_indexers()
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 2239, in _get_join_indexers
right_values = self._convert_values_for_libjoin(right_values, "right")
File "/Users/teichmanna2/anaconda3/envs/occ_beh/lib/python3.9/site-packages/pandas/core/reshape/merge.py", line 2182, in _convert_values_for_libjoin
raise ValueError(f"{side} keys must be sorted")
ValueError: right keys must be sorted
Additional information
https://mne.discourse.group/t/mne-io-read-raw-eyelink-failure-adjust-times-sub-function-does-not-work-cant-merge/9012
Naively I would expect our reader to proceed line by line looking for START and parse until it hits a START or END block (formerly just END but we can assume if there's a START, it's like hitting an END and another START), in other words, it seems like it should be fairly easy to handle this case?
I don't think that would quite work because our reader can't currently parse the lines that are written between an END block and the next START block. So if there is no END block, our reader will try to parse these lines and error out (as in the case of the aforementioned researcher).
Usually the lines that are written in these non-recording blocks contain system information and/or information about a Calibration sequence (EyeLink will always stop recording gaze/pupil samples during a calibration sequence.. So if a user kicks out of an experiment to re-calibrate, an END block should occur followed by information about the Calibration).
Calibration blocks in ASCII files look like this:
>>>>>>> CALIBRATION (HV5,P-CR) FOR LEFT: <<<<<<<<<
MSG 7446696 !CAL Calibration points:
MSG 7446696 !CAL -46.3, -67.7 -0, 400
MSG 7446696 !CAL -48.8, -97.0 -0, -2854
MSG 7446696 !CAL -44.1, -38.3 -0, 3436
MSG 7446696 !CAL -111.0, -64.9 -5990, 400
MSG 7446696 !CAL 14.6, -61.2 5990, 400
MSG 7446696 !CAL eye check box: (L,R,T,B)
-124 27 -103 -32
MSG 7446696 !CAL href cal range: (L,R,T,B)
-8985 8985 -4427 5009
MSG 7446696 !CAL Cal coeff:(X=a+bx+cy+dxx+eyy,Y=f+gx+goaly+ixx+jyy)
-0 95.801 -7.7092 0.054973 0.022975
400.06 -3.604 107.47 -0.12785 -0.13718
In the case of our user, an END block is missing right before they initiated a calibration. Knowing that calibrations occur outside of recording blocks, one idea is too adjust this if-condition to check if the line is the start of a calibration block. Something like:
if tokens[0] == "END" or tokens[1] == "CALIBRATION": # end of recording block
is_recording_block = False
Which should solve the problem for the researcher, at least.
Sure, if it fixes that file then that could work too