saul icon indicating copy to clipboard operation
saul copied to clipboard

Using Edison FeatureExtractor

Open Rahgooy opened this issue 9 years ago • 3 comments

I have problem with the FeatureExtractor design. The interface design suggests that there is a getFeatures method that accepts a Constituent as input. So the implication of this design is that you cannot use it for features that are based on multiple Constituents, unless the other Constituents being implicitly determined or could be inferred.

For example if you have an extractor named PathFromTheFirstWord it is OK with this design, because as the name suggests, you must provide one Constituent for the extractor, and you expect that the path from the first word of the sentence to the current word should be returned. But, having a Path extractor, is not right. Because the nature of the Path suggests that should be 2 Constituents and you cannot infer them from the naming as well.

There are handful of features that belong to this category, and under the hood they assume that there is a Relation attached to the provided Constituent. They use the other end of that Relation as the second Constituent to extract the value. Which I think it is absolutely wrong and confuses the user.

Another shortcoming of this design, for example the Path design is, why we should restrict the user to a particular path?!, we can let the user choose the two end of the path. I think it will be way more flexible.

Rahgooy avatar Aug 08 '16 14:08 Rahgooy

I had the same issue with relational features when working on SRL, maybe @christos-c can help with this. We need a flexible solution for this or otherwise we should see how we can solve this in the Saul side without using the Edison's Relation(?).

kordjamshidi avatar Aug 08 '16 14:08 kordjamshidi

I agree that this a limitation of the interface, but it has been working so far (we never thought of the expressiveness of Saul when designing Edison). I suppose it depends who the user is: in the case of a system designer, we can "suffer" through this design, but if we're talking about the "naive user", maybe even the proposed design would be too complicated? I would think that we would want to provide automatic ways of accessing those features directly from the nodes-and-edges interface. Another alternative would be to design a Saul "skin" for Edison, where all the input is just nodes and edges and internally we handle the nasty work of converting to Constituent-based feature extractors. I'm open to the idea of allowing Relation-based extractors, but that would require approval from Mark and Dan.

christos-c avatar Aug 08 '16 20:08 christos-c

Thanks @christos-c , actually I think returning relational features like path is an essential functionality that is used in many tasks nowadays, like ER, SRL, etc. So having such a strict assumption for one part of the pair (that makes one relation) hardly makes sense. But I am in favor of your suggestion of accessing those features directly from nodes and edges.

kordjamshidi avatar Aug 08 '16 20:08 kordjamshidi