Aliro AI (AutoML) feature not working on recent builds

Recent changes seem to have 'broken' the AI feature. Regular ML algorithms can be run, but the "AI" button in the upper right corner of each database on the "Databases" dashboard seems to be permanently inactive.

I have tested this on both a MacOS laptop and on a Raspberry Pi 400. Interestingly, although the AI feature doesn't work on either of them, an informative error message is only given on the Raspberry Pi.

Excerpt from the Raspberry PI logs:

...
lab_1      | 1|ai     | surprise_recommenders: INFO: setting training data...
lab_1      | 1|ai     | base: INFO: updating hash_2_param...
lab_1      | 1|ai     | base: INFO: storing parameter hash...
lab_1      | 1|ai     | surprise_recommenders: INFO: append and drop dupes
lab_1      | 1|ai     | surprise_recommenders: INFO: load_from_df
lab_1      | 1|ai     | surprise_recommenders: ERROR: the results_df hash from the pickle is different
lab_1      | 1|ai     | Traceback (most recent call last):
lab_1      | 1|ai     |   File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
lab_1      | 1|ai     |     "__main__", mod_spec)
lab_1      | 1|ai     |   File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
lab_1      | 1|ai     |     exec(code, run_globals)
lab_1      | 1|ai     |   File "/appsrc/ai/ai.py", line 662, in <module>
lab_1      | 1|ai     |     main()
lab_1      | 1|ai     |   File "/appsrc/ai/ai.py", line 631, in main
lab_1      | 1|ai     |     term_condition=args.TERM_COND, max_time=args.MAX_TIME)
lab_1      | 1|ai     |   File "/appsrc/ai/ai.py", line 186, in __init__
lab_1      | 1|ai     |     self.initialize_recommenders(rec_class) # set self.rec_engines
lab_1      | 1|ai     |   File "/appsrc/ai/ai.py", line 247, in initialize_recommenders
lab_1      | 1|ai     |     self.rec_engines[pred_type] = rec_class(**recArgs)
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/surprise_recommenders.py", line 126, in __init__
lab_1      | 1|ai     |     random_state=random_state)
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/base.py", line 165, in __init__
lab_1      | 1|ai     |     serialized_rec_filename)
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/base.py", line 195, in _train_empty_rec
lab_1      | 1|ai     |     self.load(self.serialized_rec_path, knowledgebase_results)
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/surprise_recommenders.py", line 175, in load
lab_1      | 1|ai     |     source='knowledgebase')
lab_1      | 1|ai     |   File "/appsrc/ai/recommender/surprise_recommenders.py", line 162, in _reconstruct_training_data
lab_1      | 1|ai     |     raise ValueError(error_msg)
lab_1      | 1|ai     | ValueError: the results_df hash from the pickle is different
lab_1      | PM2      | App [ai:1] exited with code [1] via signal [SIGINT]
lab_1      | PM2      | App [ai:1] starting in -fork mode-
lab_1      | PM2      | App [ai:1] online
lab_1      | 1|ai     | ======= Penn AI =======
lab_1      | 0|lab    | POST /api/projects 200 - - 4.529 ms
lab_1      | 0|lab    | serverSocket.emitEvent('recommenderStatusUpdated', '[object Object]')
lab_1      | 0|lab    | {}
lab_1      | 0|lab    | =socketServer:recommenderStatusUpdated(initializing)
lab_1      | 0|lab    | POST /api/recommender/status 200 54 - 4.591 ms
lab_1      | 1|ai     | ai: INFO: loading pmlb knowledgebase
lab_1      | 1|ai     | knowledgebase_utils: INFO: load_default_knowledgebases('True', 'data/knowledgebases/user/results', 'data/knowledgebases/user/metafeatures'
lab_1      | 1|ai     | knowledgebase_utils: INFO: load_knowledgebase('['data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz', 'data/knowledgebases/pmlb_regression_results.pkl.gz']', ['data/knowledgebases/pmlb_classification_metafeatures.csv.gz', 'data/knowledgebases/pmlb_regression_metafeatures.csv.gz']', '')
lab_1      | 1|ai     | knowledgebase_utils: INFO: _load_results_from_file(data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz)
lab_1      | 1|ai     | knowledgebase_utils: INFO: returning 52249 results from data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz
lab_1      | 1|ai     | knowledgebase_utils: INFO: _load_results_from_file(data/knowledgebases/pmlb_regression_results.pkl.gz)
lab_1      | 1|ai     | knowledgebase_utils: INFO: concatenating results....
lab_1      | 1|ai     | knowledgebase_utils: INFO: load metafeatures...
lab_1      | 1|ai     | knowledgebase_utils: INFO: Loading metadata from file 'data/knowledgebases/pmlb_classification_metafeatures.csv.gz
lab_1      | 1|ai     | knowledgebase_utils: INFO: Loading metadata from file 'data/knowledgebases/pmlb_regression_metafeatures.csv.gz
lab_1      | 1|ai     | ai: INFO: updating AI with classification knowledgebase (52249 results)
lab_1      | 1|ai     | ai: INFO: pmlb classification knowledgebase loaded
...

And similarly, from MacOS:

...
lab_1      | 1|ai     | base: WARNING: algo changing from <surprise.prediction... to <surprise.prediction...
lab_1      | 1|ai     | base: WARNING: first_fit changing from True... to False...
lab_1      | 1|ai     | base: WARNING: reader changing from <surprise.reader.Rea... to <surprise.reader.Rea...
lab_1      | 1|ai     | base: WARNING: hash_2_param changing from {'c65edfb84911c2647a... to {'c65edfb84911c2647a...
lab_1      | 1|ai     | base: WARNING: adding trainset=<surprise.trainset.T...
lab_1      | 1|ai     | base: WARNING: adding results_df_hash=9213096e6869a9a4d9ea...
lab_1      | 1|ai     | base: WARNING: adding ml_p_hash=31fa2d17c46be017c19f...
lab_1      | 1|ai     | base: INFO: updating internal state
lab_1      | 1|ai     | base: INFO: ml_p hashes match
lab_1      | 1|ai     | surprise_recommenders: INFO: setting training data...
lab_1      | 1|ai     | base: INFO: updating hash_2_param...
lab_1      | PM2      | App [ai:1] exited with code [0] via signal [SIGKILL]
lab_1      | PM2      | App [ai:1] starting in -fork mode-
lab_1      | PM2      | App [ai:1] online
lab_1      | 1|ai     | ======= Penn AI =======
lab_1      | 0|lab    | POST /api/projects 200 - - 23.081 ms
lab_1      | 0|lab    | serverSocket.emitEvent('recommenderStatusUpdated', '[object Object]')
lab_1      | 0|lab    | {}
lab_1      | 0|lab    | =socketServer:recommenderStatusUpdated(initializing)
lab_1      | 0|lab    | POST /api/recommender/status 200 54 - 15.614 ms
lab_1      | 1|ai     | ai: INFO: loading pmlb knowledgebase
lab_1      | 1|ai     | knowledgebase_utils: INFO: load_default_knowledgebases('True', 'data/knowledgebases/user/results', 'data/knowledgebases/user/metafeatures'
lab_1      | 1|ai     | knowledgebase_utils: INFO: load_knowledgebase('['data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz', 'data/knowledgebases/pmlb_regression_results.pkl.gz']', ['data/knowledgebases/pmlb_classification_metafeatures.csv.gz', 'data/knowledgebases/pmlb_regression_metafeatures.csv.gz']', '')
lab_1      | 1|ai     | knowledgebase_utils: INFO: _load_results_from_file(data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz)
lab_1      | 0|lab    | results:
lab_1      | 0|lab    | [ { _id: 5fe3e870e2c61a175b7b7928,
lab_1      | 0|lab    |     type: 'recommender',
lab_1      | 0|lab    |     status: 'initializing' } ]
lab_1      | 0|lab    | GET /api/recommender 201 79 - 4.873 ms
lab_1      | 1|ai     | knowledgebase_utils: INFO: returning 52249 results from data/knowledgebases/sklearn-benchmark-data-knowledgebase-r6.tsv.gz
lab_1      | 1|ai     | knowledgebase_utils: INFO: _load_results_from_file(data/knowledgebases/pmlb_regression_results.pkl.gz)
lab_1      | 1|ai     | knowledgebase_utils: INFO: concatenating results....
lab_1      | 1|ai     | knowledgebase_utils: INFO: load metafeatures...
lab_1      | 1|ai     | knowledgebase_utils: INFO: Loading metadata from file 'data/knowledgebases/pmlb_classification_metafeatures.csv.gz
lab_1      | 1|ai     | knowledgebase_utils: INFO: Loading metadata from file 'data/knowledgebases/pmlb_regression_metafeatures.csv.gz
...

The Raspberry Pi logs suggest there is an issue loading the knowledge base - the hash doesn't match the dataset.

Possibly an issue caused after running BFG Repo-Cleaner, or related to Git LFS?

Jan 19 '21 16:01 JDRomano2

Steps to reproduce:

$ git clone https://github.com/epistasislab/pennai
$ cd pennai
$ cp config/ai.env-template config/ai.env
$ docker-compose build
$ docker-compose up

Jan 19 '21 16:01 JDRomano2

what happens when you click the AI button on Mac?

Jan 19 '21 16:01 lacava

@lacava The AI is grayed out and instead of a button there is a spinning grey progress wheel. This is the case for both the Mac and Raspberry Pi.

Jan 19 '21 16:01 JDRomano2

@JDRomano2, for the Mac, I think we need a little more information. Could you post a larger excerpt of the log?

Then could you:

Try starting pennai and waiting a few minutes and seeing if anything changes (if svd recommender couldn't be loaded and is being trained, that could take a few minutes)
Try starting with a different recommender (in config/ai.env change AI_RECOMMENDER to "random" and restart pennai) and see if the ai button becomes active
Check your docker runtime memory settings. What are they currently? We recommend at least 6gb of memory.

From the logs, there might be a different issue with the RaspberryPi. A good first step for that might be for us to get the unit tests running on the pi to check they all pass.

Jan 21 '21 21:01 hjwilli

These 3 suggestions seemed to do the trick on Mac. Therefore, this must be isolated to images running on the Pi.

I'll close this issue and continue work on the raspberrypi branch to get this up and running. As recommended, I'll focus on the unit tests. Given the constrained resources on the Pi, this may require some creative tweaking to convince everything to work correctly.

Jan 21 '21 21:01 JDRomano2

Hi @JDRomano2, Excellent! Do you know which of these fixed it? Just to check, are you now able to run the SVD recommender on your Mac?

Jan 21 '21 22:01 hjwilli

I suspect it was increasing the available RAM that did the trick. I just went in and re-enabled the SVD recommender and it still works correctly, so no problem there.

Jan 21 '21 22:01 JDRomano2

Great, thanks!

Jan 21 '21 22:01 hjwilli

Encountered this same issue on arm64. setting the recommender to svd displays an error when starting up Aliro. Steps to recreate:

On an arm64 machine, set RECOMMENDER=svd and run docker compose up

The following error is displayed:

aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | 1|ai aliro-lab-1 | PM2 aliro-lab-1 | PM2 aliro-lab-1 | PM2 aliro-lab-1 | 1|ai aliro-lab-1 aliro-lab-1 aliro-lab-1 | 0|lab | {} aliro-lab-1 | newHash d617b188ab49492d3c37bb083a37bd31cbcf3acc077d7bd3ab697115196c617c | test_newHash: 031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406 | self.results_df_hash 5a246d759bb571dbd867344ef8f282ca7b0cce46347f6db58986ffec8985eb34 | newHash == self.results_df_hash False | surprise_recommenders: ERROR: the results_df hash from the pickle is different | Traceback (most recent call last): | File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main | "main", mod_spec) | File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code | exec(code, run_globals) | File "/appsrc/ai/ai.py", line 658, in | main() | File "/appsrc/ai/ai.py", line 627, in main | term_condition=args.TERM_COND, max_time=args.MAX_TIME) | File "/appsrc/ai/ai.py", line 182, in init | self.initialize_recommenders(rec_class) # set self.rec_engines | File "/appsrc/ai/ai.py", line 243, in initialize_recommenders | self.rec_engines[pred_type] = rec_class(**recArgs) | File "/appsrc/ai/recommender/surprise_recommenders.py", line 126, in init | random_state=random_state) | File "/appsrc/ai/recommender/base.py", line 165, in init | serialized_rec_filename) | File "/appsrc/ai/recommender/base.py", line 195, in _train_empty_rec | self.load(self.serialized_rec_path, knowledgebase_results) | File "/appsrc/ai/recommender/surprise_recommenders.py", line 212, in load | source='knowledgebase') | File "/appsrc/ai/recommender/surprise_recommenders.py", line 199, in _reconstruct_training_data | raise ValueError(error_msg) | ValueError: the results_df hash from the pickle is different | App [ai:1] exited with code [1] via signal [SIGINT] | App [ai:1] starting in -fork mode- | App [ai:1] online | ======= Aliro ======= | 0|lab | POST /api/projects 200 - - 2.737 ms | 0|lab | serverSocket.emitEvent('recommenderStatusUpdated', '[object Object]') | 0|lab | POST /api/recommender/status 200 54 - 1.747 ms

Dec 22 '22 21:12 jay-m-dev

AI (AutoML) feature not working on recent builds - hash mismatch