NetSecGame icon indicating copy to clipboard operation
NetSecGame copied to clipboard

The AIDojoCoordinator is ending the game incorrectly

Open eldraco opened this issue 9 months ago • 11 comments

I was testing

  • Aidojo scenario 1
  • Defender heuristic-random
  • Attacker qlearning

The problem is that the attacker ends by timeout (correct), but the env sends the defender the message AgentStatus.Success with reward = 99. Signaling the defender that it won!!! But it didn't, it actually send 1 action and the block list is empty..

NSG_coordinator_2.log

2025-04-27 16:04:04  AIDojo-AgentServer INFO Sending response to agent ('127.0.0.1', 39326):
    Status: GameStatus.OK
    End Reason: AgentStatus.TimeoutReached
    To Agent: 127.0.0.1:39326
    Known Networks: 192.168.3.0/24, 192.168.1.0/24, 192.168.2.0/24
    Known Hosts: 192.168.2.2, 192.168.2.4, 192.168.2.6, 192.168.2.5, 213.47.23.195, 192.168.2.1, 192.168.2.3
    Controlled Hosts: 213.47.23.195, 192.168.2.6, 192.168.2.3
    Services on 213.47.23.195: bash, listener
    Services on 192.168.2.3: powershell, ms-wbt-server
    Services on 192.168.2.6: bash
    Services on 192.168.2.4: ssh
    Data on 192.168.2.6: logfile
    Data on 213.47.23.195: logfile
    Blocked Hosts: None
    Reward: -11
    End: True
2025-04-27 16:04:04  AIDojo-AgentServer INFO Sending response to agent ('127.0.0.1', 48786):
    Status: GameStatus.OK
    End Reason: AgentStatus.Success
    To Agent: 127.0.0.1:48786
    Known Networks: 192.168.0.0/24, 192.168.1.0/24, 192.168.3.0/24, 192.168.2.0/24
    Known Hosts: 192.168.2.2, 192.168.1.3, 192.168.1.4, 192.168.1.5, 192.168.2.4, 192.168.1.1, 192.168.2.6, 192.168.2.5, 192.168.2.1, 192.168.1.6, 192.168.1.2, 192.168.2.3
    Controlled Hosts: 192.168.2.2, 192.168.1.3, 192.168.1.4, 192.168.1.5, 192.168.2.4, 192.168.1.1, 192.168.2.6, 192.168.2.5, 192.168.2.1, 192.168.1.6, 192.168.1.2, 192.168.2.3
    Data on 192.168.1.6: logfile
    Blocked Hosts: None
    Reward: 99
    End: True

Full file is attached here of the log of the env.

To reproduce

Run env as netsecenv

  • Commit: 5facdd4047f6ed04587a31738572dd63329758b6 branch add-logs
  • Config: Config netsecenv 2
  • python3 -m AIDojoCoordinator.worlds.NSEGameCoordinator --task_config=./AIDojoCoordinator/netsecenv_conf.yaml -gp 10002
  • Logs are in NSG_coordinator_2.log

Run defender

  • Commit: 964de6dc9a86cc490454b8aa2fb643eb0959a73c Adding-defender-agent
  • cd NetSecGameAgents
  • python3 agents/defenders/stochastic-random/stochastic_random_agent.py --port 10002 --episodes 100

Run attacker

  • Commit: 964de6dc9a86cc490454b8aa2fb643eb0959a73c Adding-defender-agent
  • Model: https://github.com/stratosphereips/NetSecGameModels
  • cd NetSecGameAgents
  • python3 agents/attackers/q_learning/q_agent.py --episodes 100 --port 10002 --experiment_id 3 --store_actions True --env_conf ../AIDojoCoordinator/netsecenv_conf.yaml --previous_model /data/AIDojo/Models/q_agent_marl.experiment2-episodes-40000.pickle --logdir agents/attackers/q_learning/logs/q_agent_2

eldraco avatar Apr 27 '25 14:04 eldraco

This is expected behavior. The defender should be also penalized for any FP.

ondrej-lukas avatar Apr 28 '25 08:04 ondrej-lukas

We can talk about this but this should not be expected behavior. The defender can not win when it didn't win. Win for the defender is to block all the ips of the attacker, to fulfill its goal. This is giving the defender a win when the attacker lost, which is not the same. Check that the blocked ips can be empty for the defender and still it is given a win Also the defender can not learn.

eldraco avatar Apr 28 '25 08:04 eldraco

What about this

  • If the goal of the defender is fulfilled: The defender wins.
  • if the goal of the attacker is fulfilled: The attacker wins
  • if the goal of the defender is NOT fulfilled: The defender loses
  • if the goal of the attacker is NOT fulfilled: The attacker loses

They can lose or win simultaneously maybe

eldraco avatar Apr 28 '25 08:04 eldraco

That is what we had initially, but the problem is in the definition of the goal for the defender. The latest version was following:

  • The attacker plays with a timeout (max allowed steps)
  • The defender does not have the timeout
  • The episode ends if a) the attacker reaches the goal b) the attacker reaches the timeout
  • when the episode ends the rewards are distributed as follows:
  1. For each attacker who reaches the goal, they got the goal reward (win)
  2. The defender gets the reward ONLY if NO attacker reached the goal
  3. The defenders reward is lowered for any false positives

ondrej-lukas avatar Apr 28 '25 08:04 ondrej-lukas

That is fine, but some points need clarification. The steps to adjust are:

  • "The defender does not have the timeout". This can not be true anymore. It already happen to me that when the attacker is stopped due to timeout, the defender keeps playing forever and the env does not stop it. Solution: On the start of every episode check if there correct number and type of agents is there. If not stop the game for everyone.

eldraco avatar Apr 28 '25 08:04 eldraco

  • The defender gets the reward ONLY if NO attacker reached the goal This is true, but it should be "The defender gets the reward ONLY if NO attacker reached the goal AND if the defender reached the goal"

We always need to check the goal of the defender, if not it ca not learn!

eldraco avatar Apr 28 '25 08:04 eldraco

for the first one, If that is the case, there is an error in the stopping of the episode. Whenever a timeout is reached, the environment should check if there is any other Active player that can reach the timeout (other attacker). If not, it should terminate the episode. Do you know in which setup this happened? I will try to replicate.

ondrej-lukas avatar Apr 28 '25 08:04 ondrej-lukas

What do you mean setup?

Check the logs and confs in the description of this issue on top. That should be all the data.

eldraco avatar Apr 28 '25 08:04 eldraco

So from the logs above:

This is the attackers action that leads to the timeout:

2025-04-27 16:01:50 AIDojo-GameCoordinator INFO Coordinator received from agent ('127.0.0.1', 39326): {"action_type": "ActionType.FindServices", "parameters": {"target_host": {"ip": "213.47.23.195"}, "source_host": {"ip": "192.168.2.6"}}}. 2025-04-27 16:01:50 AIDojo-GameCoordinator INFO Updating log file in host outside_node 2025-04-27 16:01:50 AIDojo-GameCoordinator INFO Agent ('127.0.0.1', 39326)('QAgent', 'Attacker') reached timeout (25 steps).

Afterwards, the coordinator receives a next action from the defender:

2025-04-27 16:04:04 AIDojo-GameCoordinator INFO Coordinator received from agent ('127.0.0.1', 48786): {"action_type": "ActionType.FindData", "parameters": {"target_host": {"ip": "192.168.1.6"}, "source_host": {"ip": "192.168.1.6"}}}.

which triggers the final rewards assignment of the episode:

2025-04-27 16:04:04 AIDojo-GameCoordinator INFO Stopping episode for ('127.0.0.1', 48786) because the is no ACTIVE agent playing. 2025-04-27 16:04:04 AIDojo-GameCoordinator INFO Episode finished. Assigning final rewards to agents.

So the issue is that the defender's action should not be processed or at least not included in the new GameState of the defender. It matters for the computation of the FP as it will not influence the attacker anymore (the attacker is in timeout and is waiting fro the final reward for the episode).

I propose we split this into 2 issues:

  1. the episode termination
  2. the reward assigment

ondrej-lukas avatar Apr 28 '25 09:04 ondrej-lukas

I think yes. The defender is not stopped and given a win reward. I agree with the solution

eldraco avatar Apr 28 '25 09:04 eldraco

it is stopped and given a reward: 2025-04-27 16:04:04 AIDojo-AgentServer INFO Sending response to agent ('127.0.0.1', 48786): {"to_agent": ["127.0.0.1", 48786], "observation": {"state": {"known_networks": [{"ip": "192.168.0.0", "mask": 24}, {"ip": "192.168.1.0", "mask": 24}, {"ip": "192.168.3.0", "mask": 24}, {"ip": "192.168.2.0", "mask": 24}], "known_hosts": [{"ip": "192.168.2.2"}, {"ip": "192.168.1.3"}, {"ip": "192.168.1.4"}, {"ip": "192.168.1.5"}, {"ip": "192.168.2.4"}, {"ip": "192.168.1.1"}, {"ip": "192.168.2.6"}, {"ip": "192.168.2.5"}, {"ip": "192.168.2.1"}, {"ip": "192.168.1.6"}, {"ip": "192.168.1.2"}, {"ip": "192.168.2.3"}], "controlled_hosts": [{"ip": "192.168.2.2"}, {"ip": "192.168.1.3"}, {"ip": "192.168.1.4"}, {"ip": "192.168.1.5"}, {"ip": "192.168.2.4"}, {"ip": "192.168.1.1"}, {"ip": "192.168.2.6"}, {"ip": "192.168.2.5"}, {"ip": "192.168.2.1"}, {"ip": "192.168.1.6"}, {"ip": "192.168.1.2"}, {"ip": "192.168.2.3"}], "known_services": {}, "known_data": {"192.168.1.6": [{"owner": "system", "id": "logfile", "size": 70, "type": "log", "content": "[{"source_host": "192.168.1.6", "action_type": "ActionType.FindData"}]"}]}, "known_blocks": {}}, "reward": 99, "end": true, "info": {"end_reason": "AgentStatus.Success"}}, "status": "GameStatus.OK"}

but this 1 step later. I'll open an issue for this. The end of episode due to no active player should be also done BEFORE each step

ondrej-lukas avatar Apr 28 '25 09:04 ondrej-lukas