Dmitry Nikitko comments

Results 6 comments of


                                            Dmitry Nikitko

FPN training: divide by zero, RPNL1Loss explodes

Did you use the default learning rate (0.01) ? If you use only one GPU for trining try to set lr = 0.00125

hidden state of the eos token?

I'm using this approach You can also calculate mean of the last hidden state, but don't forget to apply L2 norm after that. It might work better than EOS embedding...

hidden state of the eos token?

> @Puzer Thanks, What's your take on the quality of sentence representations using this method? i'm not sure the model manages to do that very good I tried to embed...

Serialization of LLM params and local\global LLM assignment

> The effective thing to do now is > > with dspy.settings.context(lm=…): …. Inside this block, the lm is different (you can also nest this pattern) Yep, it works. I...

Interpolation between 2 faces in dlatent space not as meaningful as it is in qlatent space

> Hi! > > First, thanks for your work! > > I tried to interpolate between 2 faces in the dlatent space (18, 512) and the result seems to be...

Interpolation between 2 faces in dlatent space not as meaningful as it is in qlatent space

@gradient-dissenter @stas-sl @tals @sam598 Thanks for your meaningful comments! My current status: 1) I'm actually playing with training an actual encoder which can predict dlatent (without optimization trick) - I...