Mava icon indicating copy to clipboard operation
Mava copied to clipboard

[INVESTIGATION] Reasons behind having multi-process that allows adding multiple nodes with same name (label)

Open OmaymaMahjoub opened this issue 3 years ago • 1 comments

What do you want to investigate?

In mava/systems/jax/launcher.py, the launcher allows to add nodes with the same name in the case of multi-process and does not allow it in the case of non-multi-process

In the case of multi-process:

  • We have the freedom to name the nodes whatever we want
  • We can add two nodes with the same name using launcher.add method and in this case we have launcher._program._groups[label].append(node) since it calls add_node(node_type(node_fn, *arguments)) from the launchpad: E.g: in this example, groups["parameter_server_test"] have as values two nodes with different functionalities
 parameter_server_1 = launcher.add(
        mock_parameter_server_fn,
        node_type=NodeType.courier,
        name="parameter_server_test",
    )
    parameter_server_2 = launcher.add(
        mock_parameter_server_second_fn,
        node_type=NodeType.courier,
        name="parameter_server_test",
    )

-We are not allowed to call launcher.get_nodes method

In the case of single-process:

  • We have predefined names of the nodes that we can't use any other node name which are ["data_server", "parameter_server", "executor", "evaluator", "trainer"]`
  • We can't add two nodes under the same name (label) when we use launcher.add since we have the following code:
if name not in self._node_dict:
                raise ValueError(
                    f"{name} is not a valid node name."
                    + "Single process currently only supports "
                    + "nodes named: {list(self._node_dict.keys())}"
                )
            elif self._node_dict[name] is not None:
                raise ValueError(
                    f"Node named {name} initialised more than onces."
                    + "Single process currently only supports one node per type."
                )

            process = node_fn(*arguments)
            if node_type == lp.ReverbNode:
                # Assigning server to self to keep it alive.
                self._replay_server = reverb.Server(process, port=None)
                process = reverb.Client(f"localhost:{self._replay_server.port}")
            self._nodes.append(process)
            self._node_dict[name] = process
            return process
  • We are allowed to call launcher.get_nodes method

Definition of done

A joint decision is reached and the changes (if any) are made.

OmaymaMahjoub avatar Jul 20 '22 09:07 OmaymaMahjoub

Should be closed by https://github.com/instadeepai/Mava/issues?q=single @OmaymaMahjoub

AsadJeewa avatar Aug 15 '22 12:08 AsadJeewa