ladi-pomsar comments

Results 10 comments of


                                            ladi-pomsar

Support for containerized deployment of Director/aggregator based workflows

This may also spark another debate - At the moment, the experiments are executed via .ipynb. I don't think that is a way that could actually ever be used in...

Introduce a deterministic algorithm for choosing a port for the aggregator server.

Hi, Thank you for doing this. Motivation behind this is that when one is working on secure system, it is important to open just as much ports as is needed,...

Is it possible to have Federated Learning on Cloud-Edge?

@ESWZY Were you able to use tensorflow federated with clients communicating over the internet?

Download of BAAI/bge-m3 fails on 1.5 using ONNX

This might be related to [Issue 341](https://github.com/huggingface/text-embeddings-inference/issues/341). Try to use tag [cpu-latest](https://github.com/huggingface/text-embeddings-inference/pkgs/container/text-embeddings-inference/275472037?tag=cpu-latest) instead of 1.5.

Download of BAAI/bge-m3 fails on 1.5 using ONNX

> facing similar error when trying cpu-latest image with bge-reranker-v2-m3; > > ``` > 2024-11-19T06:44:07.554912Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None,...

[Volta] [No flash attention] Dependencies missing for running quantized Llama models in docker

Did double check, this issue is indeed caused by the lack of flash attention support on V100s. No such problem on Ada generation, but once you turn flash attention off,...

[Volta] [No flash attention] Llama 3.1 8B Instruct failed to start - "< not supported between instances of 'NoneType' and 'int'"

Doesn't seem to be case with flash attention-enabled ADA generation GPU, thus seems to be specific to lack of flash attention.

[Volta] [No flash attention] Llama 3.1 8B Instruct failed to start - "< not supported between instances of 'NoneType' and 'int'"

For anyone wondering about this, this is due to the fact that pad_token is not present in Llama's tokenizer_config.json. Something as simple as adding "pad_token": "" to the end of...

[Volta] [No flash attention] Llama 3.1 8B Instruct failed to start - "< not supported between instances of 'NoneType' and 'int'"

> @sywangyi : thank you for pointing this out. I missed this warning. Indeed `ghcr.io/huggingface/text-generation-inference:latest-intel-xpu` works for me. This also correlates with @ladi-pomsar assumption that this issue is specific to...

[Upstream dependence changes] The behavior about env var in `hf-hub` has changed.

Hello, can confirm. This also breaks offline deployments utilizing HF_HUB_OFFLINE = 1. When HF_HUB_OFFLINE = 0 ``` bash releasellm.internal | 2025-03-21T16:03:00.370524Z WARN text_generation_launcher: Could not import Flash Attention enabled models:...