deeplake icon indicating copy to clipboard operation
deeplake copied to clipboard

[BUG] API Token authentication not working when loading a dataset requiring the acceptation of an agreement

Open LucasVandroux opened this issue 4 years ago • 3 comments

🐛🐛 Bug Report

⚗️ Current Behavior

When trying to load the ImageNet dataset from hub, the exception NotLoggedInError is raised even if a valid API token is given to the function. However, the agreement is successfully presented to the user if the user is logged in using the activeloop login CLI command.

Input Code

import hub

ds = hub.load('hub://activeloop/imagenet-val', token='<ACTIVELOOP_API_TOKEN>')
Opening dataset in read-only mode as you don't have write permissions.
hub://activeloop/imagenet-val loaded successfully.
---------------------------------------------------------------------------
NotLoggedInError                          Traceback (most recent call last)
Input In [6], in <cell line: 3>()
      1 import hub
----> 3 ds = hub.load('hub://activeloop/imagenet-val', token='eyJhbGciOiJIUzUxMiIsImlhdCI6MTY1MDg3MDQ5NiwiZXhwIjoxNjU0MDY3MjE5fQ.eyJpZCI6Im1hcms0In0.kYjsO8Kht3A3wH2mgGQm0E5uuJFGAjsRaZbFD6utTb9oWxC0Pb5YdD-I-kfP7UiBPM9gGB-GSw-Rj2_Xq9sirw')

File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/api/dataset.py:279, in dataset.load(path, read_only, memory_cache_size, local_cache_size, creds, token, verbose)
    271     return dataset_factory(
    272         path=path,
    273         storage=cache_chain,
   (...)
    276         verbose=verbose,
    277     )
    278 except AgreementError as e:
--> 279     raise e from None

File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/api/dataset.py:271, in dataset.load(path, read_only, memory_cache_size, local_cache_size, creds, token, verbose)
    269 try:
    270     read_only = storage.read_only
--> 271     return dataset_factory(
    272         path=path,
    273         storage=cache_chain,
    274         read_only=read_only,
    275         token=token,
    276         verbose=verbose,
    277     )
    278 except AgreementError as e:
    279     raise e from None

File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/core/dataset/__init__.py:22, in dataset_factory(path, *args, **kwargs)
     19     clz = Dataset
     21 if clz in {Dataset, HubCloudDataset}:
---> 22     ds = clz(path=path, *args, **kwargs)
     23     if ds.info.get("virtual-datasource", False):
     24         ds = ds._get_view()

File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/core/dataset/dataset.py:175, in Dataset.__init__(self, storage, index, group_index, read_only, public, token, verbose, version_state, path, is_iteration, **kwargs)
    173 self.__dict__.update(d)
    174 self._set_derived_attributes()
--> 175 self._first_load_init()
    176 self._initial_autoflush: List[
    177     bool
    178 ] = []  # This is a stack to support nested with contexts
    179 self._is_filtered_view = False

File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/core/dataset/hub_cloud_dataset.py:28, in HubCloudDataset._first_load_init(self)
     26 self._set_org_and_name()
     27 if self.is_actually_cloud:
---> 28     handle_dataset_agreement(
     29         self.agreement, self.path, self.ds_name, self.org_id
     30     )
     31     if self.verbose:
     32         logger.info(
     33             f"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at [https://app.activeloop.ai/{self.org_id}/{self.ds_name](https://app.activeloop.ai/%7Bself.org_id%7D/%7Bself.ds_name)}"
     34         )

File /media/storage/lucas/workspace/ml-support/dataset-tracking/.env/lib/python3.8/site-packages/hub/util/agreement.py:52, in handle_dataset_agreement(agreement, path, ds_name, org_id)
     50 user_name = get_user_name()
     51 if user_name == "public":
---> 52     raise NotLoggedInError()
     53 if user_name == "org_id":
     54     return

NotLoggedInError: This dataset includes an agreement that needs to be accepted before you can use it.
You need to be signed in to accept this agreement.
You can login using 'activeloop login' on the command line if you have an account or using 'activeloop register' if you don't have one.

Expected behavior/code First, hub shouldn't notify the user that the dataset was successfully loaded if there is an authentication problem preventing the loading of the dataset (see 2nd line printed by the function: hub://activeloop/imagenet-val loaded successfully.)

Secondly, the exception NotLoggedInError shouldn't be raised if a valid API Token is provided to the function, and it should have the same behavior as if the user is logged in using the activeloop login CLI command.

⚙️ Environment

  • Python version(s): 3.8.10
  • OS: Ubuntu 20.04.3 LTS
  • IDE: VS-Code (1.66.2) + Jupyter Lab (3.3.0)
  • Packages: hub==2.3.4 - latest

LucasVandroux avatar Apr 25 '22 07:04 LucasVandroux

In order to reproduce the bug, it is useful to mentioned that I never used the CLI command activeloop login on my machine before trying to load the dataset using the API token only.

LucasVandroux avatar Apr 25 '22 07:04 LucasVandroux

Hey @LucasVandroux Thanks for bringing up this issue. Our implementation of the Imagenet agreement is still fairly primitive, because it identifies individual machines and not users.

We are updating the implementation in our next sprint to track the agreement based on users, and we will release it in 3 weeks. Do you need this to be fixed sooner?

istranic avatar Apr 25 '22 12:04 istranic

@istranic thank you for your prompt reply. No, this is not urgent. #1612 and #1613 are way more urgent and important to me.

LucasVandroux avatar Apr 25 '22 13:04 LucasVandroux

Hey @LucasVandroux This issue has been resolved. Apologies for the delay in the fix.

istranic avatar Aug 13 '22 13:08 istranic