feast get_historical_features is super slow and memory inefficient

Expected Behavior

I have a feature service with 10 feature views and up to 1000 features. I would like to get all the features, so I use get_historical_features and a entity frame of 17000 rows. In previous feast versions I have used this worked with out problems. And i've got the features quiet fast.

Current Behavior

In the current version (and version 0.22 too)it takes up to half an hour to receive all features and 50Gb memory.

Steps to reproduce

My entity frame got the following types: datetime64[ns], string, bool. The column with the String type is the entity key column. The feature store consits of 10 feature views. In total there are 1000 features and 17000 rows. I'm using the local feast version with parquet files. The parquet files are in total 37 Mb small.

Specifications

Version:0.23
Platform:WSL Ubuntu 20.04
Subsystem:

Possible Solution

Maybe the string column as entity column are the problem and it could be solved using a categorical type for the joins.

Aug 03 '22 12:08 seb2704

Hey!

Do you remember what version of Feast was faster with this setup? Keep in mind also that we don't really recommend using Feast in local mode and that was designed more as a way to learn Feast.

It's a bit hard for us to reproduce this, but if you could help try to troubleshoot this that would be great! (eg do some quick profiling to see what is taking so long)

Aug 03 '22 12:08 adchia

in version 0.18 everything worked fine. Is there the possibility of attaching a file to a comment? If so I would love to atach the cprofile output

Aug 03 '22 22:08 seb2704

one thing is maybe worthy to mention in the version 0.18 it was fine if the entity key column was a string of the type object, this didn't worked anymore on the current version

Aug 03 '22 22:08 seb2704

Yeah I think you can attach a file. That would be very helpful!

Aug 05 '22 20:08 adchia

same experience when using BigQuery as the offline store. I think the generated SQL scripts seem not well-optimized

Sep 21 '22 10:09 sudohainguyen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Mar 18 '23 10:03 stale[bot]

I am having the same issue with a Postgres data source, string entity column and a table with roughly 10000 entries. The first time is super fast. Roughly 0.8 seconds. After that, performance breaks down and the query takes 30 seconds.

Specifications

Version: 0.30.2 Database: PostgreSQL 15.1 (Debian 15.1-1.pgdg110+1) on x86_64-pc-linux-gnu (Dockerfile run on MacOS) id column type: VARCHAR number of feature: 26 number of rows: 10000 - 30000

Apr 23 '23 20:04 mathiaspet

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sep 17 '23 14:09 stale[bot]