uni2ts icon indicating copy to clipboard operation
uni2ts copied to clipboard

Bug when preparing data for finetuning

Open marcopeix opened this issue 1 year ago • 1 comments

I've run into a bug that I can't fix when trying to prepare a dataset for finetuning.

Here's the code:

def data_generator() -> Generator[dict[str, Any]]:
    yield {
        "target": df['Weekly_Sales'].to_numpy(),
        "start": df.index[0],
        "freq": pd.infer_freq(df.index),
        "item_id": "1",
    }

features = Features(
    dict(
        target=Sequence(Value("float32")),
        start=Value("date32")),
        freq=Value("string"),
        item_id=Value("string"),
    )

hf_dataset = Dataset.from_generator(data_generator, features=features)

hf_dataset.save_to_disk(Path("sales_dataset/"))

df = hf_dataset.to_pandas()

df.to_csv('sales_dataset/sales_data.csv', index=False)

Then, when I run python -m uni2ts.data.builder.simple sales_data sales_dataset/sales_data.csv --offset 40 --dataset_type long , I get the error:

IndexError: index 0 is out of bounds for axis 0 with size 0. Not sure why that happens, as my df is not empty, and the .csv is not empty either.

Here's the CSV I'm using: https://raw.githubusercontent.com/marcopeix/FoundationModelsForTimeSeriesForecasting/main/data/walmart_sales_small.csv

I'm only using data for Store==1 (143 rows of data) and the first three columns only (Store, Date, Weekly_Sales). Prior to running the function, I set the index as the Date column.

What am I missing?

marcopeix avatar Sep 11 '24 17:09 marcopeix

didn't look too deeply into this, but I'm guessing it's due to the format (column names) of your data frame?

https://github.com/SalesforceAIResearch/uni2ts/blob/2ba614de8878d350c62835c942b450d2f4d5a711/src/uni2ts/data/builder/simple.py#L58

gorold avatar Sep 30 '24 03:09 gorold

Hi @marcopeix, have you solved this issue? If so, I will close this issue

chenghaoliu89 avatar Dec 04 '24 07:12 chenghaoliu89