openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

list_tasks() return less than shown on the website

Open xieleo5 opened this issue 2 years ago • 2 comments

Description

I'm trying to list all the tasks in OpenML database. I tried to use task_list = openml.tasks.list_tasks() but it only return a list of length 46779. I saw on the OpenML official website there are 261.0k tasks. Is there any APIs that can help me to get all these tasks?

I also tried to add task_type like task_list = openml.tasks.list_tasks(openml.tasks.TaskType.SUPERVISED_REGRESSION), the returned task are still less than the filtered result on website. I only get 3939 supervised classification tasks but the website shows 4345. I only get 2600 supervised regression tasks but the website shows 19459.

Steps/Code to Reproduce

import openml
task_list = openml.tasks.list_tasks()
print(task_list)

Expected Results

task_list contains all the 261.0k task_id and infos.

Actual Results

It only contains 46779 tasks.

xieleo5 avatar Feb 25 '23 00:02 xieleo5

Heyho,

Thanks for pointing this out.

This might be a server problem, or the numbers on the website might be wrong. The API, which list_tasks calls, also only returns 2600 entries (https://api.openml.org/api/v1/json/task/list/type/2).

@PGijsbers do you know more about this?

LennartPurucker avatar Mar 01 '23 08:03 LennartPurucker

No, I was under the impression that the website internally also uses the same API to get their data (+ elastic search), so based on that I can't explain the discrepancy. @joaquinvanschoren ?

PGijsbers avatar Mar 01 '23 09:03 PGijsbers