Dmitry Nikitko
Dmitry Nikitko
Did you use the default learning rate (0.01) ? If you use only one GPU for trining try to set lr = 0.00125
I'm using this approach You can also calculate mean of the last hidden state, but don't forget to apply L2 norm after that. It might work better than EOS embedding...
> @Puzer Thanks, What's your take on the quality of sentence representations using this method? i'm not sure the model manages to do that very good I tried to embed...
> The effective thing to do now is > > with dspy.settings.context(lm=…): …. Inside this block, the lm is different (you can also nest this pattern) Yep, it works. I...
> Hi! > > First, thanks for your work! > > I tried to interpolate between 2 faces in the dlatent space (18, 512) and the result seems to be...
@gradient-dissenter @stas-sl @tals @sam598 Thanks for your meaningful comments! My current status: 1) I'm actually playing with training an actual encoder which can predict dlatent (without optimization trick) - I...