label-studio icon indicating copy to clipboard operation
label-studio copied to clipboard

supress_autotime to allow `created_at` and `updated_at` Annotation timestamps to persist in import operation

Open mpesavento opened this issue 10 months ago • 2 comments

Is your feature request related to a problem? Please describe. When I use storage sync (eg S3) to import tasks to a project, the Annotation importer overwrites the existing timestamps for created_at and updated_at for each annotation.

The JSON task saved in S3 looks like this:

{
  "annotations": [
  {
    "id": 4206,
    "created_at": "2025-02-12T22:24:30.974817Z",
    "updated_at": "2025-02-12T22:24:30.974829Z",
    "created_username": " [email protected], 3",
    "completed_by": 3,
    "result": [{...}]
  }]
}

showing the annotation was created on 2025-02-12.

When I sync the S3 tasks to the LabelStudio project, the task content shows:

{
  "annotations": [
  {
    "id": 6292,  
    "created_username": " [email protected], 3",
    "created_ago": "4 minutes",
    "completed_by": {
      "id": 3,
      "first_name": "",
      "last_name": "",
      "avatar": null,
      "email": "[email protected]",
      "initials": "cp"
    },
    "was_cancelled": false,
    "ground_truth": false,
    "created_at": "2025-03-25T16:31:13.558753Z",
    "updated_at": "2025-03-25T16:31:13.558766Z",
    "draft_created_at": null,
    "lead_time": null,
    "import_id": null,
    "last_action": null,
    "task": 44118,
    "project": 8,
    "updated_by": 1,
    "parent_prediction": null,
    "parent_annotation": null,
    "last_created_by": null
    "result": [{...}],
  }]
}

with the sync time as the created_at and updated_at timestamps.

There are many circumstances in which creating a project from manual json Tasks with annotations from prior projects could benefit from having maintained timestamps, eg temporal sorting of when an annotation was created.

Describe the solution you'd like I would like to maintain persistence of the timestamps for Annotations imported, either via SDK/API or via storage.

core/old_ls_migration.py has a suppress_autotime context manager for this purpose. https://github.com/HumanSignal/label-studio/blob/develop/label_studio/core/old_ls_migration.py#L21-L36 To avoid a circular import in the solution, I made a copy of this context manager in label_studio/core/context_processors.py (which seemed a reasonable location).

I've done a functional update to labelstudio.io_storages.base_models.ImportStorage that uses the suppress_autotime context manager when importing via S3. https://gist.github.com/mpesavento/cc23be6a963ee642aaa8650016e68d81#file-base_models-py-L361-L403 At the top of the file is a constant for reading in an ENV var to allow suppression of autotime.

This solution works for all storage imports, but has not been tested for SDK/API imports. I'm not sure where would be the best place to put that, since i'm not sure where we are creating the Annotation object as part of the api import call. Best would be to have the solution in all places where Annotation.save() is called. The challenge is that AnnotationSerializer.save() is called for lists of annotations now, and i'm not sure how the context manager would work with the serializer interface.

Describe alternatives you've considered Looked for other places to put the suppress_autotime context manager, and only found ImportStorage. Someone that knows the codebase better may find a better location, though the solution would likely remain the same.

Testing whether or not the context manager works with AnnotationSerializer() calls is another required step, which would simplify the code a bit.

mpesavento avatar Mar 27 '25 17:03 mpesavento

Hello,

Thank you for contacting Label Studio,

I am checking internally with our engineering team and we will get back to you as soon as possible!

Best Regards!

Comment by Oussama Assili Workflow Run

heidi-humansignal avatar Apr 04 '25 11:04 heidi-humansignal

Hello,

Thank you for your patience while we were looking into the information

Currently, importing created_at, updated_at, and other system-managed fields for annotations is not officially supported. These fields are intentionally auto-managed by the system to maintain consistency and traceability across annotations, which is why the import process overwrites them with the sync timestamp. That said, a practical workaround would be to include any historical metadata — including original timestamps — in the task data. While this won’t affect how Label Studio internally sets the annotation timestamps, it will preserve the original information for reference, filtering, or display purposes if needed

Best Regards1

Comment by Oussama Assili Workflow Run

heidi-humansignal avatar Apr 17 '25 18:04 heidi-humansignal