LISA Question about use of nn_utils.MLP in srl

Hi! Really nice and interesting work! I was having a look at the code to better understand the bilinear classification step for srl (srl_bilinear method of output_fns (line 169)). Why do you use a single MLP layer to project both the predicate vectors and the word (role) vectors (line 195), of which you take two slices afterward (line 196), to do all the computation, instead of using two separate MLPs for predicates and roles? Is it because in such a way the projection of roles affects also the predicates one, and the other way around (or at least, this is what should happen in my mind - it should be a fully connected layer).

many thanks!

Jan 16 '19 15:01 McKracken

There's no difference computation-wise between doing a single MLP or two distinct MLPs. It's purely for speed on the GPU (maybe better locality).

On Wed, Jan 16, 2019 at 10:05 AM Emanuele Bastianelli < [email protected]> wrote:

Hi! Really nice and interesting work! I was having a look at the code to better understand the bilinear classification step for srl (srl_bilinear method of output_fns (line 169)). Why do you use a single MLP layer to project both the predicate vectors and the word (role) vectors (line 195), of which you take two slices afterward (line 196), to do all the computation, instead of using two separate MLPs for predicates and roles? Is it because in such a way the projection of roles affects also the predicates one, and the other way around (or at least, this is what should happen in my mind - it should be a fully connected layer).

many thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/strubell/LISA/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/ADHZtx2dkqHnCB-YKvMjycPmCqdB8V2iks5vDz_VgaJpZM4aDRPz .

Jan 18 '19 00:01 strubell

Oh yes, now I see it! Thank you very much! Just another quick thing (just not to open a different issue): theoretically speaking what's the actual difference between the bilinear classifier used for per-word per-role-labels (in srl_bilinear) and the bi-affine classifier taken from (Dozat and Manning), used (I suppose) in conditional_bilinear for the syntax? Correct me if I'm wrong, but the only difference between a linear classifier and an affine classifier shouldn't be just the addition of a shift (the bias in a NN)? I read both your paper and Dozat and Manning's one, but I haven't understood whether there's a real difference between the two classifiers, or if it's just a naming convention.

Jan 18 '19 13:01 McKracken

Question about use of nn_utils.MLP in srl_bilinear