[BUG] API Token authentication not working when loading a dataset requiring the acceptation of an agreement
🐛🐛 Bug Report
⚗️ Current Behavior
When trying to load the ImageNet dataset from hub, the exception NotLoggedInError is raised even if a valid API token is given to the function. However, the agreement is successfully presented to the user if the user is logged in using the activeloop login CLI command.
Input Code
import hub
ds = hub.load('hub://activeloop/imagenet-val', token='<ACTIVELOOP_API_TOKEN>')
Opening dataset in read-only mode as you don't have write permissions.
hub://activeloop/imagenet-val loaded successfully.
---------------------------------------------------------------------------
NotLoggedInError Traceback (most recent call last)
Input In [6], in <cell line: 3>()
1 import hub
----> 3 ds = hub.load('hub://activeloop/imagenet-val', token='eyJhbGciOiJIUzUxMiIsImlhdCI6MTY1MDg3MDQ5NiwiZXhwIjoxNjU0MDY3MjE5fQ.eyJpZCI6Im1hcms0In0.kYjsO8Kht3A3wH2mgGQm0E5uuJFGAjsRaZbFD6utTb9oWxC0Pb5YdD-I-kfP7UiBPM9gGB-GSw-Rj2_Xq9sirw')
File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/api/dataset.py:279, in dataset.load(path, read_only, memory_cache_size, local_cache_size, creds, token, verbose)
271 return dataset_factory(
272 path=path,
273 storage=cache_chain,
(...)
276 verbose=verbose,
277 )
278 except AgreementError as e:
--> 279 raise e from None
File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/api/dataset.py:271, in dataset.load(path, read_only, memory_cache_size, local_cache_size, creds, token, verbose)
269 try:
270 read_only = storage.read_only
--> 271 return dataset_factory(
272 path=path,
273 storage=cache_chain,
274 read_only=read_only,
275 token=token,
276 verbose=verbose,
277 )
278 except AgreementError as e:
279 raise e from None
File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/core/dataset/__init__.py:22, in dataset_factory(path, *args, **kwargs)
19 clz = Dataset
21 if clz in {Dataset, HubCloudDataset}:
---> 22 ds = clz(path=path, *args, **kwargs)
23 if ds.info.get("virtual-datasource", False):
24 ds = ds._get_view()
File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/core/dataset/dataset.py:175, in Dataset.__init__(self, storage, index, group_index, read_only, public, token, verbose, version_state, path, is_iteration, **kwargs)
173 self.__dict__.update(d)
174 self._set_derived_attributes()
--> 175 self._first_load_init()
176 self._initial_autoflush: List[
177 bool
178 ] = [] # This is a stack to support nested with contexts
179 self._is_filtered_view = False
File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/core/dataset/hub_cloud_dataset.py:28, in HubCloudDataset._first_load_init(self)
26 self._set_org_and_name()
27 if self.is_actually_cloud:
---> 28 handle_dataset_agreement(
29 self.agreement, self.path, self.ds_name, self.org_id
30 )
31 if self.verbose:
32 logger.info(
33 f"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at [https://app.activeloop.ai/{self.org_id}/{self.ds_name](https://app.activeloop.ai/%7Bself.org_id%7D/%7Bself.ds_name)}"
34 )
File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/util/agreement.py:52, in handle_dataset_agreement(agreement, path, ds_name, org_id)
50 user_name = get_user_name()
51 if user_name == "public":
---> 52 raise NotLoggedInError()
53 if user_name == "org_id":
54 return
NotLoggedInError: This dataset includes an agreement that needs to be accepted before you can use it.
You need to be signed in to accept this agreement.
You can login using 'activeloop login' on the command line if you have an account or using 'activeloop register' if you don't have one.
Expected behavior/code
First, hub shouldn't notify the user that the dataset was successfully loaded if there is an authentication problem preventing the loading of the dataset (see 2nd line printed by the function: hub://activeloop/imagenet-val loaded successfully.)
Secondly, the exception NotLoggedInError shouldn't be raised if a valid API Token is provided to the function, and it should have the same behavior as if the user is logged in using the activeloop login CLI command.
⚙️ Environment
-
Pythonversion(s): 3.8.10 -
OS: Ubuntu 20.04.3 LTS -
IDE: VS-Code (1.66.2) + Jupyter Lab (3.3.0) -
Packages:hub==2.3.4 - latest
In order to reproduce the bug, it is useful to mentioned that I never used the CLI command activeloop login on my machine before trying to load the dataset using the API token only.
Hey @LucasVandroux Thanks for bringing up this issue. Our implementation of the Imagenet agreement is still fairly primitive, because it identifies individual machines and not users.
We are updating the implementation in our next sprint to track the agreement based on users, and we will release it in 3 weeks. Do you need this to be fixed sooner?
@istranic thank you for your prompt reply. No, this is not urgent. #1612 and #1613 are way more urgent and important to me.
Hey @LucasVandroux This issue has been resolved. Apologies for the delay in the fix.