Issue with running simulation example.
I would like to run some simulation with FedML library.
Previously I can install fedml locally but i met the following error:
python3 torch_fedavg_cifar10_resnet56_step_by_step_example.py --cf fedml_config.yaml
======== FedML (https://fedml.ai) ========
FedML version: 0.7.222
Execution path:/home/lianke/.local/lib/python3.8/site-packages/fedml/__init__.py
======== Running Environment ========
OS: Linux-5.15.0-41-generic-x86_64-with-glibc2.29
Hardware: x86_64
Python version: 3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0]
PyTorch version: 1.12.0+cu102
No protocol specified
MPI4py is installed
======== CPU Configuration ========
The CPU usage is : 3%
Available CPU Memory: 57.1 G / 62.460079193115234G
======== GPU Configuration ========
No GPU devices
Traceback (most recent call last):
File "torch_fedavg_cifar10_resnet56_step_by_step_example.py", line 6, in <module>
args = fedml.init()
File "/home/lianke/.local/lib/python3.8/site-packages/fedml/__init__.py", line 87, in init
raise Exception("no such setting")
Exception: no such setting
I do not know why it can not find my RTX 3090 GPU. I can run nvidia-smi successfully to see it.
If i run under python/examples/simulation/sp_fedavg_cifar10_cnn_example directory, I met another issue:
_test_local = 1000
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:51] [INFO] [efficient_loader.py:380:efficient_load_partition_data_cifar10] client_idx = 98, local_sample_number = 542
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:51] [INFO] [efficient_loader.py:389:efficient_load_partition_data_cifar10] client_idx = 98, batch_num_train_local = 54, batch_num_test_local = 1000
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:51] [INFO] [efficient_loader.py:380:efficient_load_partition_data_cifar10] client_idx = 99, local_sample_number = 395
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:51] [INFO] [efficient_loader.py:389:efficient_load_partition_data_cifar10] client_idx = 99, batch_num_train_local = 39, batch_num_test_local = 1000
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:51] [INFO] [model_hub.py:23:create] create_model. model_name = cnn, output_dim = 10
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:52] [INFO] [fedavg_api.py:42:__init__] model = LogisticRegression(
(linear): Linear(in_features=784, out_features=10, bias=True)
)
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:52] [INFO] [fedavg_api.py:52:__init__] self.model_trainer = <fedml.simulation.sp.fedavg.my_model_trainer_classification.MyModelTrainer object at 0x7f0c183e2610>
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:52] [INFO] [fedavg_api.py:68:_setup_clients] ############setup_clients (START)#############
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:52] [INFO] [fedavg_api.py:80:_setup_clients] ############setup_clients (END)#############
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:52] [INFO] [fedavg_api.py:83:train] self.model_trainer = <fedml.simulation.sp.fedavg.my_model_trainer_classification.MyModelTrainer object at 0x7f0c183e2610>
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:06:52] [ERROR] [mlops_runtime_log.py:32:handle_exception] Uncaught exception
Traceback (most recent call last):
File "torch_fedavg_cifar10_cnn_step_by_step_example.py", line 19, in <module>
fedml_runner.run()
File "/home/lianke/.local/lib/python3.8/site-packages/fedml/runner.py", line 100, in run
self.runner.run()
File "/home/lianke/.local/lib/python3.8/site-packages/fedml/simulation/simulator.py", line 56, in run
self.fl_trainer.train()
File "/home/lianke/.local/lib/python3.8/site-packages/fedml/simulation/sp/fedavg/fedavg_api.py", line 85, in train
mlops.log_training_status(mlops.ClientStatus.MSG_MLOPS_CLIENT_STATUS_TRAINING)
AttributeError: module 'fedml.mlops' has no attribute 'ClientStatus'
then I follow this tutorial:https://doc.fedml.ai/simulation/examples/sp_fedavg_mnist_lr_example.html but i can not even install fedml within conda environment:
Collecting fedml
Using cached fedml-0.7.221.tar.gz (422 kB)
Preparing metadata (setup.py) ... done
Using cached fedml-0.7.220.tar.gz (422 kB)
Preparing metadata (setup.py) ... done
Using cached fedml-0.7.218.tar.gz (419 kB)
Preparing metadata (setup.py) ... done
Using cached fedml-0.7.216.tar.gz (395 kB)
Preparing metadata (setup.py) ... done
Using cached fedml-0.7.0.tar.gz (301 kB)
Preparing metadata (setup.py) ... done
Collecting joblib
Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting grpcio
Using cached grpcio-1.47.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB)
Collecting mpi4py
Using cached mpi4py-3.1.3.tar.gz (2.5 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
ERROR: Cannot install fedml==0.7.0, fedml==0.7.216, fedml==0.7.218, fedml==0.7.220, fedml==0.7.221 and fedml==0.7.222 because these package versions have conflicting dependencies.
The conflict is caused by:
fedml 0.7.222 depends on MNN==1.1.6
fedml 0.7.221 depends on MNN==1.1.6
fedml 0.7.220 depends on MNN==1.1.6
fedml 0.7.218 depends on MNN==1.1.6
fedml 0.7.216 depends on MNN==1.1.6
fedml 0.7.0 depends on MNN==1.1.6
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
May I ask if anyone has met this problem before?
Please change this "single_process" value to "sp".
I will meet the following error:
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:12:24] [INFO] [fedavg_api.py:52:__init__] self.model_trainer = <fedml.simulation.sp.fedavg.my_model_trainer_classification.MyModelTrainer object at 0x7fd66805d190>
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:12:24] [INFO] [fedavg_api.py:68:_setup_clients] ############setup_clients (START)#############
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:12:24] [INFO] [fedavg_api.py:80:_setup_clients] ############setup_clients (END)#############
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:12:24] [INFO] [fedavg_api.py:83:train] self.model_trainer = <fedml.simulation.sp.fedavg.my_model_trainer_classification.MyModelTrainer object at 0x7fd66805d190>
[FedML-Server(0) @device-id-0] [Tue, 26 Jul 2022 15:12:24] [ERROR] [mlops_runtime_log.py:32:handle_exception] Uncaught exception
Traceback (most recent call last):
File "torch_fedavg_cifar10_resnet56_step_by_step_example.py", line 19, in <module>
fedml_runner.run()
File "/home/lianke/.local/lib/python3.8/site-packages/fedml/runner.py", line 100, in run
self.runner.run()
File "/home/lianke/.local/lib/python3.8/site-packages/fedml/simulation/simulator.py", line 56, in run
self.fl_trainer.train()
File "/home/lianke/.local/lib/python3.8/site-packages/fedml/simulation/sp/fedavg/fedavg_api.py", line 85, in train
mlops.log_training_status(mlops.ClientStatus.MSG_MLOPS_CLIENT_STATUS_TRAINING)
AttributeError: module 'fedml.mlops' has no attribute 'ClientStatus'
please check our latest version 0.7.272