How to handle "non-feature" results from Extractors

Open rbroc opened this issue 5 years ago • 0 comments

For extractors such as the Bert encoding extractors, we stumbled upon the issue of whether/how to return results which are not strictly speaking "features", but which may nonetheless be of interest for the user and not retrievable from the Stim itself.

An example of this is Bert tokens. The ComplexTextStim fed into the Bert extractor is first tokenized into sub-word tokens, then encoded by the Bert model. Here, the high-dimensional encodings returned by the models is what one would properly considered "features". The tokens in which the stimulus is split are not strictly speaking features, but it might be nice to retain them in the result object or even result data frame, as this would enable the user to keep track of what token each embedding encodes.

The ExtractorResult object does not currently handle this kind of non-feature and non-stimulus-attribute information. One potential fix could be adding a field to the ExtractorResult object where this kind of extra information can be stored, so to be a) accessible to the user from the Result object itself; b) retrieved by extractor-specific to_df methods and added to the dataframe.

Mar 19 '20 10:03 rbroc