pliers icon indicating copy to clipboard operation
pliers copied to clipboard

General convention for handling TextStims with empty values

Open tyarkoni opened this issue 8 years ago • 3 comments

I feel like this has come up before, but I can't find any record of it here. In any case: there are some Transformers that take TextStim inputs that bug out if handed an empty value. For example, the IndicoAPITextExtractor raises an exception if given a TextStim with value ''. There are probably others like this.

The question is what to do about this (if anything) on a package-wide level. We could potentially handle this on a case-by-case basis (i.e., every _transform call is left to make sure that it can handle an empty TextStim), but that doesn't seem very principled, and will result in a lot of potentially redundant effort. Alternatively, we could provide a global config option (e.g., drop_empty_text_stims) that filters out any empty TextStim objects returned by any Transformer (much like we currently do for None returns).

A more systematic approach that would require more work would be to implement configurable Stim-level filtering options. For example, we could have an abstract _validate method in the base Stim class, which would be called at the beginning of transform() every time a Stim is about to be passed to the internal _transform(), and would only proceed if True is returned. This would make it easy for users to implement their own arbitrary filtering routines just by subclassing the Stim class they're working with and overriding _validate. It would also give us a nice place to consolidate the filtering logic if it's controlled by global options (as for the suggested drop_empty_text_stims option above).

tyarkoni avatar Jan 02 '18 23:01 tyarkoni

Yeah I like the _validate method approach, it could simply be called in the base Transformer _validate method.

qmac avatar Jan 03 '18 23:01 qmac

We could also do this at the Transformer level, where each transformer can override the Transformer _validate method. One scenario in the past we ran into was #184 where some extractors required RGB images (but maybe others don't/can't).

Not sure on which is preferable, would need to play with it in implementation.

qmac avatar Jan 04 '18 00:01 qmac

I think it probably makes sense to have hooks for both. I.e., for _validate in the base Transformer to pass, both the Stim and the Transformer subclass's _validate methods have to pass--assuming they're implemented (and if not implemented, the base classes always return True).

tyarkoni avatar Jan 04 '18 01:01 tyarkoni