unstract icon indicating copy to clipboard operation
unstract copied to clipboard

feat: Pipeline data migration to correct frequent cron strings

Open chandrasekharan-zipstack opened this issue 1 year ago • 6 comments

What

  • Data migration to correct frequent cron strings which have a frequency of less than an hour

Why

  • Since no limits were placed initially, this migration is needed to correct existing pipelines

How

  • Iterate through all pipelines
  • For the ones that have a cron string, check if its valid
  • In case of invalid ones, correct the minute field to be a single minute alone
  • Save the pipeline and call the function to update the public tables managed by celery-beat (period_task, crontab)

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • Yes, since its a data migration on existing pipelines

Database Migrations

  • Data migration on the pipeline app (0005)

Related Issues or PRs

  • #652

Notes on Testing

  • Populated system with some invalid cron strings (commented out the error raising part for this)
  • Ran the migration and checked the DB, able to run these changed tasks as well

Screenshots


  1. pipeline_pipeline before migration image

  2. django_celery_beat_crontabschedule before migration (might contain some stray records for previous runs, ignore those - need to be addressed separately) image

  3. pipeline_pipeline after migration image

  4. django_celery_beat__crontabschedule after migration (notice the new rows added - from 27) image

  5. django_celery_beat__periodictask after migration (only refers to the new rows added in the above table) image

  6. In application after migration image

Output during migration


(backend-3.9) *[feat/tool-container-name-validation][~/zipstuff/unstract/backend]$ python manage.py migrate pipeline 0005                                                          [4:47:18] 
WARNING : [2024-09-17 11:24:13,082] [public:None]{module:register process:565701 thread:140245695654784 request_id:none} :- No connector found.
WARNING : [2024-09-17 11:24:14,060] [public:None]{module:register process:565701 thread:140245695654784 request_id:none} :- Unable to import embedding adapters : No module named 'llama_index.embeddings.huggingface'
WARNING : [2024-09-17 11:24:14,161] [public:None]{module:register process:565701 thread:140245695654784 request_id:none} :- Unable to import embedding adapters : No module named 'llama_index.embeddings.fastembed'
WARNING : [2024-09-17 11:24:16,806] [public:None]{module:register process:565701 thread:140245695654784 request_id:none} :- Unable to import vectorDB adapters : No module named 'llama_index.vector_stores.supabase'
/home/chandru/zipstuff/unstract/backend/.venv/lib/python3.9/site-packages/environs/__init__.py:58: DeprecationWarning: The '__version_info__' attribute is deprecated and will be removed in in a future version. Use feature detection or 'packaging.Version(importlib.metadata.version("marshmallow")).release' instead.
  _SUPPORTS_LOAD_DEFAULT = ma.__version_info__ >= (3, 13)
INFO : [2024-09-17 11:24:17,695] [public:None]{module:execution_log_utils process:565701 thread:140245695654784 request_id:none} :- Log consumer scheduler updated successfully.
WARNING : [2024-09-17 11:24:17,706] [public:None]{module:authentication_plugin_registry process:565701 thread:140245695654784 request_id:none} :- Metadata is not active for auth_sample authentication module.
WARNING : [2024-09-17 11:24:17,706] [public:None]{module:authentication_plugin_registry process:565701 thread:140245695654784 request_id:none} :- No authentication modules found.Application will start without authentication module
INFO : [2024-09-17 11:24:17,707] [public:None]{module:subscription_loader process:565701 thread:140245695654784 request_id:none} :- Loaded subscription plugin: time_trials, is_active: True
INFO : [2024-09-17 11:24:17,739] [public:None]{module:helper process:565701 thread:140245695654784 request_id:none} :- Loaded tools from registry YAML: ['docker:unstract/tool-classifier:1.0.29']
INFO : [2024-09-17 11:24:17,765] [public:None]{module:subscription_loader process:565701 thread:140245695654784 request_id:none} :- Loaded subscription plugin: time_trials, is_active: True
INFO : [2024-09-17 11:24:17,787] [public:None]{module:processor_loader process:565701 thread:140245695654784 request_id:none} :- No processor plugins found.
[standard:public] === Starting migration
[standard:public] Operations to perform:
[standard:public]   Target specific migration: 0005_remove_crons_within_hour, from pipeline
[standard:public] Running migrations:
[standard:public]   Applying pipeline.0005_remove_crons_within_hour...
[standard:public]  OK
[1/2 (50%) standard:mock_org] === Starting migration
[1/2 (50%) standard:mock_org] Operations to perform:
[1/2 (50%) standard:mock_org]   Target specific migration: 0005_remove_crons_within_hour, from pipeline
[1/2 (50%) standard:mock_org] Running migrations:
[1/2 (50%) standard:mock_org]   Applying pipeline.0005_remove_crons_within_hour...
INFO : [2024-09-17 11:24:18,309] [mock_org:None]{module:0005_remove_crons_within_hour process:565701 thread:140245695654784 request_id:none} :- Updating pipeline 97290806-cecd-4142-acf2-0a8a0425d9f5 which has cron 6-11 * * * *
INFO : [2024-09-17 11:24:18,311] [mock_org:None]{module:helper process:565701 thread:140245695654784 request_id:none} :- Scheduling job for Pipeline object (97290806-cecd-4142-acf2-0a8a0425d9f5)
INFO : [2024-09-17 11:24:18,362] [mock_org:None]{module:tasks process:565701 thread:140245695654784 request_id:none} :- Updated periodic task 97290806-cecd-4142-acf2-0a8a0425d9f5: 6 * * * * (m/h/dM/MY/d) UTC
INFO : [2024-09-17 11:24:18,362] [mock_org:None]{module:0005_remove_crons_within_hour process:565701 thread:140245695654784 request_id:none} :- Updating pipeline c7967ad5-9de6-471e-afb4-d195fc6192a2 which has cron 1,6,11,16,21,26,31,36,41,46,56 * * * *

INFO : [2024-09-17 11:24:18,364] [mock_org:None]{module:helper process:565701 thread:140245695654784 request_id:none} :- Scheduling job for Pipeline object (c7967ad5-9de6-471e-afb4-d195fc6192a2)
INFO : [2024-09-17 11:24:18,386] [mock_org:None]{module:tasks process:565701 thread:140245695654784 request_id:none} :- Updated periodic task c7967ad5-9de6-471e-afb4-d195fc6192a2: 1 * * * * (m/h/dM/MY/d) UTC
INFO : [2024-09-17 11:24:18,386] [mock_org:None]{module:0005_remove_crons_within_hour process:565701 thread:140245695654784 request_id:none} :- Updating pipeline 17b37980-4ec6-4e67-9893-785bbeb2dbf9 which has cron 0-1,11-12,21 * * * *
INFO : [2024-09-17 11:24:18,388] [mock_org:None]{module:helper process:565701 thread:140245695654784 request_id:none} :- Scheduling job for Pipeline object (17b37980-4ec6-4e67-9893-785bbeb2dbf9)
INFO : [2024-09-17 11:24:18,410] [mock_org:None]{module:tasks process:565701 thread:140245695654784 request_id:none} :- Updated periodic task 17b37980-4ec6-4e67-9893-785bbeb2dbf9: 0 * * * * (m/h/dM/MY/d) UTC
INFO : [2024-09-17 11:24:18,410] [mock_org:None]{module:0005_remove_crons_within_hour process:565701 thread:140245695654784 request_id:none} :- Updating pipeline 700042d0-d5f0-4650-b696-7bdc1e010cf0 which has cron 5,7-9 * * * *
INFO : [2024-09-17 11:24:18,412] [mock_org:None]{module:helper process:565701 thread:140245695654784 request_id:none} :- Scheduling job for Pipeline object (700042d0-d5f0-4650-b696-7bdc1e010cf0)
INFO : [2024-09-17 11:24:18,430] [mock_org:None]{module:tasks process:565701 thread:140245695654784 request_id:none} :- Updated periodic task 700042d0-d5f0-4650-b696-7bdc1e010cf0: 5 * * * * (m/h/dM/MY/d) UTC
INFO : [2024-09-17 11:24:18,430] [mock_org:None]{module:0005_remove_crons_within_hour process:565701 thread:140245695654784 request_id:none} :- Updating pipeline dda1aaca-c1a2-4413-86b3-9fd830503b4f which has cron * * * * *
INFO : [2024-09-17 11:24:18,432] [mock_org:None]{module:helper process:565701 thread:140245695654784 request_id:none} :- Scheduling job for Pipeline object (dda1aaca-c1a2-4413-86b3-9fd830503b4f)
INFO : [2024-09-17 11:24:18,452] [mock_org:None]{module:tasks process:565701 thread:140245695654784 request_id:none} :- Updated periodic task dda1aaca-c1a2-4413-86b3-9fd830503b4f: 0 * * * * (m/h/dM/MY/d) UTC
[1/2 (50%) standard:mock_org]  OK
[2/2 (100%) standard:org_Q4qgjLWIbaJlfSts] === Starting migration
[2/2 (100%) standard:org_Q4qgjLWIbaJlfSts] Operations to perform:
[2/2 (100%) standard:org_Q4qgjLWIbaJlfSts]   Target specific migration: 0005_remove_crons_within_hour, from pipeline
[2/2 (100%) standard:org_Q4qgjLWIbaJlfSts] Running migrations:
[2/2 (100%) standard:org_Q4qgjLWIbaJlfSts]   Applying pipeline.0005_remove_crons_within_hour...
[2/2 (100%) standard:org_Q4qgjLWIbaJlfSts]  OK

Checklist

I have read and understood the Contribution Guidelines.

This data migration involves updates to a tenant table pipeline_pipeline and to few public tables managed by celery-beat. Unsure if we'll run into locking related concerns when run for multiple orgs, could be tested when we take it to staging

@chandrasekharan-zipstack Please check the new sonar issue related to variable name.

Deepak-Kesavan avatar Sep 19 '24 23:09 Deepak-Kesavan

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{9}}$$ $$\textcolor{#23d18b}{\tt{9}}$$

github-actions[bot] avatar Sep 19 '24 23:09 github-actions[bot]

@chandrasekharan-zipstack Please check the new sonar issue related to variable name.

@Deepak-Kesavan @hari-kuriakose do you think this is something that can be ignored? This variable is intentionally named that way to improve readability and remain in-line with django's query model

@chandrasekharan-zipstack LGTM overall.

However since typically there won't be a path for reverse data migration and we don''t know all the possible existing values for involved columns, suggest to separate out the transformation logic into a separate function and add unit tests for the same.

We can follow this in future to increase the test coverage as much as we can.

@hari-kuriakose The conversion that happens here is only limited to this migration. Also this migration would be a one-time activity and its impact has been analyzed here. Since we have other priority items of the v2 migration, I prefer not investing time to write unit tests for the function(s) used in this migration. If you feel otherwise, will be happy to include some tests once I have some bandwidth

@hari-kuriakose since this migration is part of the V1 version - do you think these changes are required? Can we close this PR assuming that all existing frequently deployed pipelines in prod were already corrected and we already ensure that we don't allow new frequent pipelines to come in

@hari-kuriakose since this migration is part of the V1 version - do you think these changes are required? Can we close this PR assuming that all existing frequently deployed pipelines in prod were already corrected and we already ensure that we don't allow new frequent pipelines to come in

Closing this since these data migrations only apply to the deprecated v1 version