pyprep icon indicating copy to clipboard operation
pyprep copied to clipboard

[WIP] Split PrepPipeline into separate methods, make final interpolation optional

Open a-hurst opened this issue 4 years ago β€’ 8 comments

PR Description

(Eventually) closes #73. Currently, this PR:

  • Adds separate remove_line_noise and robust_reference methods to PrepPipeline.
  • Makes final bad channel interpolation optional.
  • Removes a bunch of unused matplotlib-based MATLAB comparison code in the PrepPipeline unit tests.
  • Removes support for notch filter types other than spectrum_fit, since you can now just filter line noise however you want and then run PrepPipeline.robust_reference if you want to use another kind. This vastly simplifies the demands on the filter_kwargs API and should make the PrepSettings filtering options much easier to document.

There's still definitely more cleanup to do, but I figured it'd be good to get the core of this up for review sooner than later!

Merge Checklist

  • [ ] the PR has been reviewed and all comments are resolved
  • [ ] all CI checks pass
  • [ ] (if applicable): the PR description includes the phrase closes #<issue-number> to automatically close an issue
  • [ ] (if applicable): bug fixes, new features, or API changes are documented in whats_new.rst

a-hurst avatar Jul 03 '21 04:07 a-hurst

Whoops, looks like one of the examples relies on an undocumented attribute I removed for the sake of RAM (self.EEG_new). Will address that tomorrow.

a-hurst avatar Jul 03 '21 04:07 a-hurst

Codecov Report

Merging #99 (0688a5c) into master (ec44384) will decrease coverage by 0.35%. The diff coverage is 95.34%.

:exclamation: Current head 0688a5c differs from pull request most recent head 6ff2755. Consider uploading reports for the commit 6ff2755 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master      #99      +/-   ##
==========================================
- Coverage   99.04%   98.68%   -0.36%     
==========================================
  Files           7        7              
  Lines         734      762      +28     
==========================================
+ Hits          727      752      +25     
- Misses          7       10       +3     
Impacted Files Coverage Ξ”
pyprep/reference.py 97.88% <90.00%> (-1.30%) :arrow_down:
pyprep/prep_pipeline.py 98.59% <100.00%> (-1.41%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Ξ” = absolute <relative> (impact), ΓΈ = not affected, ? = missing data Powered by Codecov. Last update ec44384...6ff2755. Read the comment docs.

codecov-commenter avatar Jul 03 '21 04:07 codecov-commenter

It is possible that I introduced these conflicts :confounded: it's because I worked on dbe2062d3c7d50a0d87775e1111062ef854c366f before pulling master ... the pulled master, rebased, and force pushed. That was a bad idea - I hope nothing serious got destroyed.

sappelhoff avatar Jul 15 '21 08:07 sappelhoff

It is possible that I introduced these conflicts πŸ˜– it's because I worked on dbe2062 before pulling master ... the pulled master, rebased, and force pushed. That was a bad idea - I hope nothing serious got destroyed.

No worries, I'll rebase before I push any more work! I've set this aside for a few days while tackling some other projects, but I'll try and get this wrapped up within the next week or so :)

a-hurst avatar Jul 17 '21 00:07 a-hurst

Okay @sappelhoff, I've gotten the PrepPipeline API rewrite to a point I think I'm happy with (though I'd love to hear your thoughts and @yjmantilla's). Some of the attributes aren't documented yet and much of it is untested, but I figured it'd be best to handle that side of things once I was sure we were all happy with the changes.

My main changes here are:

  • Renamed EEG_before_interpolation to EEG_post_reference and reference_before_interpolation to robust_reference_signal
  • Added a getter method current_reference_signal that returns reference_before_interpolation if interpolation hasn't been done, or the post-interpolation average reference (previously reference_after_interpolation) if it has (and None if neither have been performed, though that could be changed).
  • Moved all the loose noisy_channels_original, noisy_channels_before_interpolation, bad_before_interpolation, still_noisy_channels, etc. attributes into two dicts: noisy_info (which contains the full noisy channel dicts) and bad_channels (which just contains the bad channel names), each with the keys 'original', 'post-reference', and 'post-interpolation'. This makes things easier to document and has the advantage of a more consistent/informative naming scheme.
  • Added current_noisy_info and remaining_bad_channels attributes that get the current noisy info and bad channel names for the pipeline (will retrieve post-interpolation values if interpolation was used, otherwise returns the pre-interpolation values). These allow for people to try enabling/disabling interpolation for their data without having to rewrite their code to access different attributes.
  • Added a new get_raw method, which is a more flexible version of the current raw getter that allows retrieving the full mne.io.Raw object with the EEG data from any given stage in the pipeline.

Let me know what you think!

a-hurst avatar Aug 04 '21 22:08 a-hurst

I didn't take a look at the diff because I am not sure what diff will stay after https://github.com/sappelhoff/pyprep/pull/99#discussion_r683235569 is resolved, but I still have concerns over these two points (I am commenting only based on your summary comment):

returns reference_before_interpolation if interpolation hasn't been done, or the post-interpolation average reference (previously reference_after_interpolation) if it has

and

(will retrieve post-interpolation values if interpolation was used, otherwise returns the pre-interpolation values

Could we not just return a dict in both cases, with the keys pre_interpolation and post_interpolation ... and the values for these keys being None or empty dicts if these values have not yet been computed? I think that'd be clearer. An alternative would be to send a log message upon the getter call that says something like: "you are getting the post_interpolation values". The minimum would be to have an attribute that can clearly tell you whether the prep object is "pre" or "post" interpolation, but we might already have something like that, not sure right now.

What do you think?

sappelhoff avatar Aug 05 '21 08:08 sappelhoff

Could we not just return a dict in both cases, with the keys pre_interpolation and post_interpolation ... and the values for these keys being None or empty dicts if these values have not yet been computed? I think that'd be clearer. An alternative would be to send a log message upon the getter call that says something like: "you are getting the post_interpolation values". The minimum would be to have an attribute that can clearly tell you whether the prep object is "pre" or "post" interpolation, but we might already have something like that, not sure right now.

That could work, though I still think it'd be useful to have an attribute like current_reference_signal so there's a consistent way to get the final reference signal regardless if interpolation was enabled or disabled. For the last project we did with PyPREP we tried the analysis pipeline both with and without final bad channel interpolation, so I think that having an API where you can toggle interpolation on and off without needing to change any attributes you access afterwards would be handy.

Instead of attributes, maybe methods in the style of the new get_raw function would be clearer? For example, a get_noisy_info method that returns the most recent noisy info available by default (like current_noisy_info does now), but also has an argument that lets you specify explicitly whether you want the original, post-reference, or post-interpolation noisy info.

Keep in mind that for the PrepPipeline API I decided against exposing interpolation as a separate method and instead made it an optional flag to robust_reference(), so it should hopefully be clear to users based on the settings they chose (and the documentation) whether the "current reference signal" reflects interpolated data or not.

a-hurst avatar Aug 05 '21 15:08 a-hurst

yes! That sounds good to me, I like the new get_raw πŸ‘ But I think the default should be "current", not "None", to be more explicit.

Great! I'll get to work on this, as well as proper tests and docs for everything new. In that case, for the sake of simplicity I'm going to leave the internal noisy info and reference signal attributes undocumented so that the new get_x methods are the one clear official way for users to get at that data.

I approve of the remaining changes, but I am slowly losing my grasp on the potential workflows we can have with our different classes and their methods πŸ˜‡ it's been too long that I actually worked with this code, I think.

Hopefully cleaning up the example scripts for 0.4 will refresh us all on that front!

a-hurst avatar Aug 07 '21 15:08 a-hurst