Edward Kim issues

Results 32 issues of


                                            Edward Kim

Add select_by_tag to ParallelBlock

Implements #694

enhancement

[Task] Research/prototype torch module for Torchscript compatibility

### Description The team had discussions on porting Transformers4rec to Merlin Models and supporting the model in Triton inference server as one of the goals for 22.09. However, there needs...

[WIP] Add horovod for distributed training

WIP. Moved from #783 because CI was stuck.

[WIP] Add initial draft of example notebook using horovod

A draft PR that shows the workflow. Depends on #783. Currently uses a workaround that re-partitions the dataset, i.e., `ddf = train.to_ddf().repartition(npartitions=hvd.size())`. After some preprocessing with nvtabular, the training code...

Update merlin-tensorflow image to 23.08 in sagemaker example

This PR updates the sagemaker example to use the latest `merlin-tensorflow:23.08` image. Due to changes in triton/systems, [serve](https://github.com/triton-inference-server/server/blob/b5c2e38e3cfdac19d089d1254286aa714cb2b7b7/docker/sagemaker/serve) file that ships with the triton container is patched and we build...

documentation

enhancement

examples

Add LlamaBlock

Ports [lit-llama](https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/model.py) as a `Block`.

Add HybridQA dataset

Depends on #1206.

Add SentencePieceTokenizer and LlamaTokenizer

Introduce distributed embeddings

Part of https://github.com/NVIDIA-Merlin/Merlin/issues/733. ### Goals :soccer: There is a package called [distributed-embeddings](https://github.com/NVIDIA-Merlin/distributed-embeddings), a library for building large embedding based (e.g. recommender) models in Tensorflow. It's an alternative approach to SOK....

enhancement

Device assignment does not work in PyTorch

As of 12372f4c6562f296c510f6734e748ef54c375c33, device assignment in the PyTorch dataloader does not work correctly with multiple GPUs. ```python import os import pandas as pd from merlin.dataloader.torch import Loader from merlin.io.dataset import...