mlcomp icon indicating copy to clipboard operation
mlcomp copied to clipboard

How to share data between different workers?

Open megachester opened this issue 5 years ago • 18 comments

Suppose I've done some preprocessing on raw data and want to train several models via grid search. Tasks that had been assigned to the same worker with multiple gpus(gpu: 1 in config) completed without problems. However, task that had been assigned to another worker with gpu failed as there was no data there located in path specified in catalyst config. So how do I share data from previous tasks of the DAG between different workers? Is the data folder in ROOT_FOLDER being synchronised across workers?

megachester avatar Mar 05 '20 18:03 megachester

@megachester , Earlier there were 2 sync folders: data and models.

They had been synchronizing after each task.

Now, we have turned off the synchronization of the data folder temporarily.

I think a configuration should be available in the project settings editor.

I will think about that tomorrow. If you desire to turn it on immediately, you can change the lines to

            ignore_folders = [
                [join('data', project.name), []],
                [join('models', project.name), []],
            ]

at https://github.com/catalyst-team/mlcomp/blob/master/mlcomp/worker/sync.py#L103

lightforever avatar Mar 05 '20 18:03 lightforever

Besides that, you can either use rsync or perform mlcomp sync COMPUTER_NAME command. That command expects the name of another computer.

lightforever avatar Mar 05 '20 18:03 lightforever

Ok, thanks, you mean manually between tasks before those that fail?

megachester avatar Mar 06 '20 12:03 megachester

Suppose I've done some preprocessing on raw data and want to train several models via grid search.

If the preprocessing is performed only once, that could be easy to sync computers manually via rsync (for example) after the preprocessing tasks. I.e. to split one DAG.

Then run grid-search (train tasks). Every machine has the updated files to that moment.

Of course, if you are experimenting with different pre-processing versions, the auto-sync is crucial. I will do customizable functionality today or tomorrow.

lightforever avatar Mar 06 '20 13:03 lightforever

Great!

megachester avatar Mar 06 '20 13:03 megachester

Besides that, you can either use rsync or perform mlcomp sync COMPUTER_NAME command. That command expects the name of another computer.

Btw, it's not computer name but the name of the project

megachester avatar Mar 06 '20 15:03 megachester

Yes, you are right!

mlcomp 20.2.4d is released.

File sync is configurable: https://catalyst-team.github.io/mlcomp/filesync.html

What do you think about that functuonality?

lightforever avatar Mar 07 '20 14:03 lightforever

Great functionality, thanks! However, auto sync doesn't work, folders don't sync. Also there are overall problems with 20.3 version, e.g. somehow there is catalyst folder along with catalyst_ in mlcomp/worker/executors which causes mlcomp to import catalyst functions from wrong paths(pre-20.3refactored catalyst.utils.config)

megachester avatar Mar 13 '20 14:03 megachester

Thank you for having informed me!

I will fix that problem ASAP.

lightforever avatar Mar 13 '20 18:03 lightforever

About auto-sync:

  1. Have you ensured that every computer can reach others?

  2. Have you tried to run rsync commands manually to check 1) ?

  3. Are any messages in the Logs panel which relate to rsync commands?

lightforever avatar Mar 13 '20 18:03 lightforever

  1. Yes, definitely.
  2. Yep, I’ve executed exact commands from installation guide.
  3. No, I didn’t notice any

(Manual mlcomp sync PROJ_NAME works)

megachester avatar Mar 13 '20 21:03 megachester

Also there are overall problems with 20.3 version, e.g. somehow there is catalyst folder along with catalyst_ in mlcomp/worker/executors which causes mlcomp to import catalyst functions from wrong paths(pre-20.3refactored catalyst.utils.config)

The repository does not have catalyst folder right now. Only catalyst_. You could remove the installed version and try to install mlcomp again.

lightforever avatar Mar 16 '20 14:03 lightforever

About auto-sync:

  1. Have you filled sync folders in your project settings?

  2. Has any task finished? I mean, auto-sync works only after success-executed tasks.

lightforever avatar Mar 16 '20 15:03 lightforever

Can you please provide full pipeline to test this functionality?

  • sync folders: [data, models] should be in MLComp's yaml config file?
  • How to reference files in these folders from local executors, e.g. ./data/myfile ?

megachester avatar Mar 16 '20 16:03 megachester

In the Project tab choose Edit your project

image

The folders are synced after success tasks. Or you can sync them manually from UI as described here: https://catalyst-team.github.io/mlcomp/filesync.html

./data/myfile - yeah data and models are automatically linked to the appropriate ones. So, yes, data/myFile is enough

lightforever avatar Mar 16 '20 18:03 lightforever

~/mlcomp/tasks/TASK_NUMBER - mlcomp downloads the files of your DAG here. And links ~/mlcomp/data/PROJECTN_NAME to the ~/mlcomp/tasks/TASK_NUMBER/data

lightforever avatar Mar 16 '20 18:03 lightforever

@megachester, have you coped with that? Does it work as expected?

lightforever avatar Mar 26 '20 17:03 lightforever

Unfortunately, it didn't work, no files automatically appear in ~/mlcomp/data/PROJECT_NAME, they do so only after syncing via mlcomp sync PROJECT_NAME

megachester avatar Apr 08 '20 10:04 megachester