get_historical_features is super slow and memory inefficient
Expected Behavior
I have a feature service with 10 feature views and up to 1000 features. I would like to get all the features, so I use get_historical_features and a entity frame of 17000 rows. In previous feast versions I have used this worked with out problems. And i've got the features quiet fast.
Current Behavior
In the current version (and version 0.22 too)it takes up to half an hour to receive all features and 50Gb memory.
Steps to reproduce
My entity frame got the following types: datetime64[ns], string, bool. The column with the String type is the entity key column. The feature store consits of 10 feature views. In total there are 1000 features and 17000 rows. I'm using the local feast version with parquet files. The parquet files are in total 37 Mb small.
Specifications
- Version:0.23
- Platform:WSL Ubuntu 20.04
- Subsystem:
Possible Solution
Maybe the string column as entity column are the problem and it could be solved using a categorical type for the joins.
Hey!
Do you remember what version of Feast was faster with this setup? Keep in mind also that we don't really recommend using Feast in local mode and that was designed more as a way to learn Feast.
It's a bit hard for us to reproduce this, but if you could help try to troubleshoot this that would be great! (eg do some quick profiling to see what is taking so long)
in version 0.18 everything worked fine. Is there the possibility of attaching a file to a comment? If so I would love to atach the cprofile output
one thing is maybe worthy to mention in the version 0.18 it was fine if the entity key column was a string of the type object, this didn't worked anymore on the current version
Yeah I think you can attach a file. That would be very helpful!
same experience when using BigQuery as the offline store. I think the generated SQL scripts seem not well-optimized
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I am having the same issue with a Postgres data source, string entity column and a table with roughly 10000 entries. The first time is super fast. Roughly 0.8 seconds. After that, performance breaks down and the query takes 30 seconds.
Specifications
Version: 0.30.2 Database: PostgreSQL 15.1 (Debian 15.1-1.pgdg110+1) on x86_64-pc-linux-gnu (Dockerfile run on MacOS) id column type: VARCHAR number of feature: 26 number of rows: 10000 - 30000
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.