Julio Perez
Julio Perez
Hello @tottenjordan what is the base driver version on the machine? Is that original picture of nvidia-smi output bare-metal or on the actual container?
@tottenjordan CUDA artifacts are loaded via the /opt/nvidia/nvidia_entrypoint.sh. I built your dockerfile, and it does not change your entrypoint. This means you should be loading the correct cuda version and...
So this is not the correct way to use the merlin dataloader with horovod. This requires a lot more background information. You should never be creating dataloaders in a for...
So I just ran this unit test: pytest tests/unit/loader/test_tf_dataloader.py::test_horovod_multigpu And it runs as expected. There are five partitions spread across two workers, so naturally one worker will get more partitions...
@sejal9507 Can you try a more updated container. It seems your not able to load in the cuda version on the docker container. So it tries to rely on the...
@sejal9507 OK so based on the information you gave, I think your main issue is that your version of CUDA is too old. You need to be on CUDA 10.1...
This is blocked on the following ticket: https://github.com/NVIDIA-Merlin/Merlin/issues/343 we need to refactor the way we leverage asvdb to accommodate for testbooks and non-notebook integration tests(
we should also decide what to migrate to other repos and what to remove all together.
rename to allocate_tensorflow_memory add kw `type=dynamic | fixed | None` if default None it will use best based on tf version if fixed force use of tf_memory_allocation if dynamic try...
https://github.com/NVIDIA-Merlin/Merlin/pull/474