Alykhan Tejani

Results 15 comments of Alykhan Tejani

Just ran into this myself. The change is pretty simple, this should do it: ``` import torch from torch.utils.model_zoo import load_url from torchvision import models sd = load_url("https://s3-us-west-2.amazonaws.com/jcjohns-models/vgg19-d01eb7cb.pth") sd['classifier.0.weight'] =...

Hi @lim0606 - did you manage to reproduce results with SGD? Thanks, Aly

@lim0606 Thanks for sharing the results!

Hi @alexgkendall, In the original paper it states ``` It was trained using stochastic gradient de- scent with a base learning rate of 10−5, reduced by 90% every 80 epochs...

This makes a lot of sense, thanks. I will make some changes On Fri, Feb 9, 2024 at 5:21 PM Haytham Abuelfutuh ***@***.***> wrote: > ***@***.**** commented on this pull...

Any update on this? I am also facing this issue

@MoFHeka It's weird, I can't produce a MWE as it only occurs in some settings for me. According to @rhdong here: https://github.com/tensorflow/recommenders-addons/issues/414 it is an issue that dev team are...

> @alykhantejani Most of TFRA users are using GPU sync training without PS. So it's few people to aware this issue. If this issue occurs only some of the time,...

I should not Im using an in-mem cluster for testing with 2 PS and passing these device names to `BasicEmbedding`

> @alykhantejani Don’t worry the memory, DE alltoall embedding layer will shard entire embedding into different worker rank. And also you can use cpu embedding table, but DE HKV backend...