Dean Wyatte comments

Results 9 comments of


                                            Dean Wyatte

Deploy to Google Cloud Platform (GCP)

This issue looks like it used to be focused on Google Cloud Run, but I'm interested in deploying using GCP AI Platform's [custom container](https://cloud.google.com/ai-platform/prediction/docs/use-custom-container) functionality which looks to be more...

Support cloud storage in load_dataset

I could use this functionality, so I put together a PR using @kyamagu's suggestion to use `fsspec` in `datasets.utils.file_utils` https://github.com/huggingface/datasets/pull/5580

Support cloud storage in load_dataset

The current implementation depends on gcsfs/s3fs being able to authenticate through some other means e.g., environmental variables. For AWS, it looks like you can set `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_SESSION_TOKEN` Note...

Support cloud storage in load_dataset

> Note that while testing this just now, I did note a discrepancy between gcsfs and s3fs that we might want to address where gcsfs passes the timeout from storage_options...

Support cloud storage in load_dataset

@lhoestq I've been using this feature for the last year on GCS without problem, but I think we need to fix an issue with S3 and then document the supported...

Can't install model_training

> In the meantime you can install ray manually first. Turns out this doesn't work because of the pinned version of trlx from git. We should be able to pin...

Add parquet-specific dataset IO for reduced memory usage

After digging in, I think blocks is doing the optimal thing here by reading all datafile frames into a list and then concatenating. Pandas will always require 2x memory when...

CUDA: an illegal memory access was encountered with Mistral FP8 Marlin kernels on NVIDIA driver 535.216.01 (AWS Sagemaker Real-time Inference)

Assuming **text-generation-inference 3.0.0** from here unless otherwise noted. With `CUDA_LAUNCH_BLOCKING=1`, the source of the error looks to be `flashinfer`/`BatchPrefillWithPagedKVCache` ``` #033[2m2025-01-17T15:08:27.088671Z#033[0m #033[32m INFO#033[0m #033[2mtext_generation_launcher#033[0m#033[2m:#033[0m Args { model_id: "/tmp/tgi/model", revision: None,...

Add `response_format` input parameter to `v1/chat/completions` endpoint

It looks like this may have been added in https://github.com/huggingface/text-generation-inference/pull/2046