tuned icon indicating copy to clipboard operation
tuned copied to clipboard

Expose TuneD API to the Unix Domain Socket.

Open CZerta opened this issue 3 years ago • 9 comments

TuneD is listening on paths specified in config in option unix_socket_paths and send signals to paths in option unix_socket_signal_paths. Example call:

ssock_file = "/tmp/tuned_socket" csock_file = "/tmp/tuned_socket_client" if os.path.exists(csock_file): os.remove(csock_file) s=socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM) s.bind(csock_file) try: s.sendto(pickle.dumps(("switch_profile", "balanced")), ssock_file) print(pickle.loads(s.recvfrom(4096)[0])) except Exception as e: print("E: %s" % e) s.close()

This PR also introduce possibility to disable dbus API in main TuneD config.

Resolves: rhbz#2113900

Signed-off-by: Jan Zerdik [email protected]

CZerta avatar Oct 26 '22 14:10 CZerta

This pull request introduces 3 alerts when merging 9e71402032e4c262602db7d8a44c77aced56fed7 into 420267fa3463ef4a06885847c7ca4cd24c781dc8 - view on LGTM.com

new alerts:

  • 3 for Unused import

lgtm-com[bot] avatar Oct 26 '22 14:10 lgtm-com[bot]

This pull request introduces 3 alerts when merging 1fa7493c6db7135d251ac61cc9111884debda753 into 420267fa3463ef4a06885847c7ca4cd24c781dc8 - view on LGTM.com

new alerts:

  • 3 for Unused import

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine :gear: that powers LGTM.com. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] avatar Nov 21 '22 09:11 lgtm-com[bot]

During testing, I noticed I can DoS TuneD over the socket like this:

printf '\x00\x00\x00\x03' | nc -u -U /run/tuned/tuned.sock
printf 'DoS' | nc -u -U /run/tuned/tuned.sock
2022-11-22 11:07:55,987 ERROR    tuned.exports.unix_socket_exporter: Failed to load json data 'b'DoS'': Expecting value: line 1 column 1 (char 0)
Exception in thread Thread-1 (_thread_code):
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/tuned/exports/unix_socket_exporter.py", line 185, in period_check
    data = json.loads(data.decode("utf-8"))
  File "/usr/lib64/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/site-packages/tuned/daemon/daemon.py", line 221, in _thread_code
    exports.period_check()
  File "/usr/lib/python3.10/site-packages/tuned/exports/__init__.py", line 46, in period_check
    return ctl.period_check()
  File "/usr/lib/python3.10/site-packages/tuned/exports/controller.py", line 54, in period_check
    exporter.period_check()
  File "/usr/lib/python3.10/site-packages/tuned/exports/unix_socket_exporter.py", line 188, in period_check
    self._send_data(self._socket_object, address, "err", "Failed to load json data '%s': %s" % (data, e))
  File "/usr/lib/python3.10/site-packages/tuned/exports/unix_socket_exporter.py", line 131, in _send_data
    s.sendto(struct.pack("!sL", type.encode("utf-8"), len(j)), address)
FileNotFoundError: [Errno 2] No such file or directory

While the client should be trusted and send valid input, it would be nice if the server (TuneD) wouldn't fall over that easily.

jmencak avatar Nov 22 '22 10:11 jmencak

...

While the client should be trusted and send valid input, it would be nice if the server (TuneD) wouldn't fall over that easily.

@jmencak Did you pull the newest version? I added some tries around json loading yesterday, so it should handle invalid requests better.

CZerta avatar Nov 22 '22 10:11 CZerta

Did you pull the newest version? I added some tries around json loading yesterday, so it should handle invalid requests better.

Yes, I did, you cannot reproduce this?

jmencak avatar Nov 22 '22 10:11 jmencak

It would also help documenting the API a bit more. It looks like I cannot send both the payload length header and JSON payload in one go (over one sendto call).

jmencak avatar Nov 22 '22 11:11 jmencak

You can, I'll write better example. I found out what's problem with the exception. ncat does not wait for response, deletes it's socket before we process the request and try to send response to non existing file, so it fails.

CZerta avatar Nov 22 '22 11:11 CZerta

You can, I'll write better example.

Great, in the end though, we'll probably need a golang client. I managed to write one.

I found out what's problem with the exception. ncat does not wait for response, deletes it's socket before we process the request and try to send response to non existing file, so it fails

Yeah, I can see that now. However, that's no reason for the server to fail like that. Imagine the client doesn't care about the response like the netcat example or just crashes before getting a response back. That's a valid use case for me in my point of view and the server should not fail.

jmencak avatar Nov 24 '22 08:11 jmencak

@jmencak Definitely, I added warning log when sending msg fails with reason. Also changed default path and permissions and added ownership option.

CZerta avatar Nov 24 '22 10:11 CZerta

Definitely, I added warning log when sending msg fails with reason. Also changed default path and permissions and added ownership option.

@CZerta Have you tried testing with socat? There are basically two native tools we can use in our environment with this. nc and socat. nc seems to work for sending, but I do not believe it is possible to receive data through it at the same time. socat seems a lot more complex tool that might be able to do this, but at the moment I'm only able to do the same thing as with nc -- just send data:

socat -ddd - UNIX-CONNECT:/run/tuned/tuned.sock

Edit: additional info. With HAProxy socket, for example, the interaction is as simple as:

echo show env | socat -ddd - UNIX-CONNECT:/var/lib/haproxy/run/haproxy.sock

jmencak avatar Dec 13 '22 09:12 jmencak

Good news, sort of. I was able to make this work with socat:

send() {
  local data="$1"
  local len=${#data}
  local b0=$(printf '%02x' ${len})	# TODO: this is limited to the payload length of 255 B
  local b1=00
  local b2=00
  local b3=00
  local socket=/run/tuned/tuned.sock

  printf "\x$b3\x$b2\x$b1\x$b0" | socat - UNIX-SENDTO:$socket,bind=/tmp/client.sock
  printf "$data" | socat - UNIX-SENDTO:$socket,bind=/tmp/client.sock
}

send '["profiles"]'

However, it seems not to always succeed. One example of a failure from TuneD:

2022-12-14 16:55:00,507 ERROR    tuned.exports.unix_socket_exporter: Failed to load data of message: [Errno 11] Resource temporarily unavailable

This needs to be 100% reliable and working with netcat or socat. I wonder if we should switch from SOCK_DGRAM to SOCK_STREAM as HAProxy does, for example, for their stats socket.

/cc @MarSik

jmencak avatar Dec 14 '22 16:12 jmencak

Thank you for the latest commit, Honzo! This is great progress not having to use the payload length and just:

printf '["active_profile"]' | nc -U /run/tuned/tuned.sock  # or
printf '["active_profile"]' | socat - UNIX-CONNECT:/run/tuned/tuned.sock

The default socat call often fails to output any result (with WARNING tuned.exports.unix_socket_exporter: Failed send data '<profile-name>' type 'res': [Errno 32] Broken pipe in TuneD logs). This is due to client disconnects, where nc seems to wait (longer?) for a response, whereas the default 0.5s timeout for socat is just not sufficient for this server-side implementation.

printf '["active_profile"]' | socat -t5 - UNIX-CONNECT:/run/tuned/tuned.sock

seems to work every time.

Testing against HAProxy implementation I had no such issues. So at this point I'm wondering whether:

  • it is worth/possible/practical improving the server-side (TuneD) implementation to improve the client (user) experience
  • we should just document this

I think I'm probably fine with the latter. @MarSik , thoughts?

jmencak avatar Jan 17 '23 09:01 jmencak

Reporting a traceback during testing:

Traceback (most recent call last):
  File "/usr/sbin/tuned", line 98, in <module>
    app.run(args.daemon)
  File "/usr/lib/python3.6/site-packages/tuned/daemon/application.py", line 214, in run
    result = self._controller.run()
  File "/usr/lib/python3.6/site-packages/tuned/daemon/controller.py", line 65, in run
    exports.period_check()
  File "/usr/lib/python3.6/site-packages/tuned/exports/__init__.py", line 46, in period_check
    return ctl.period_check()
  File "/usr/lib/python3.6/site-packages/tuned/exports/controller.py", line 54, in period_check
    exporter.period_check()
  File "/usr/lib/python3.6/site-packages/tuned/exports/unix_socket_exporter.py", line 162, in period_check
    r, _, _ = select.select([self._socket_object], (), (), 0)
TypeError: argument must be an int, or have a fileno() method.

So far I've seen it only once

jmencak avatar Jan 18 '23 10:01 jmencak

Thank you for the latest commit, Honzo! This is great progress not having to use the payload length and just:

printf '["active_profile"]' | nc -U /run/tuned/tuned.sock  # or
printf '["active_profile"]' | socat - UNIX-CONNECT:/run/tuned/tuned.sock

The default socat call often fails to output any result (with WARNING tuned.exports.unix_socket_exporter: Failed send data '<profile-name>' type 'res': [Errno 32] Broken pipe in TuneD logs). This is due to client disconnects, where nc seems to wait (longer?) for a response, whereas the default 0.5s timeout for socat is just not sufficient for this server-side implementation.

printf '["active_profile"]' | socat -t5 - UNIX-CONNECT:/run/tuned/tuned.sock

seems to work every time.

Testing against HAProxy implementation I had no such issues. So at this point I'm wondering whether:

* it is worth/possible/practical improving the server-side (TuneD) implementation to improve the client (user) experience

* we should just document this

I think I'm probably fine with the latter. @MarSik , thoughts?

Can add different logging, but I don't think socket allow to check if other side is still connected.

CZerta avatar Jan 18 '23 16:01 CZerta

Reporting a traceback during testing:

Traceback (most recent call last):
  File "/usr/sbin/tuned", line 98, in <module>
    app.run(args.daemon)
  File "/usr/lib/python3.6/site-packages/tuned/daemon/application.py", line 214, in run
    result = self._controller.run()
  File "/usr/lib/python3.6/site-packages/tuned/daemon/controller.py", line 65, in run
    exports.period_check()
  File "/usr/lib/python3.6/site-packages/tuned/exports/__init__.py", line 46, in period_check
    return ctl.period_check()
  File "/usr/lib/python3.6/site-packages/tuned/exports/controller.py", line 54, in period_check
    exporter.period_check()
  File "/usr/lib/python3.6/site-packages/tuned/exports/unix_socket_exporter.py", line 162, in period_check
    r, _, _ = select.select([self._socket_object], (), (), 0)
TypeError: argument must be an int, or have a fileno() method.

So far I've seen it only once

Could you replicate it or dis it happen again? I couldn't replicate it

CZerta avatar Jan 18 '23 17:01 CZerta

So the good news is that after the latest commit I can no longer see the trace above and that the requests are being served reliably. However, one per second. Is this expected?

jmencak avatar Jan 19 '23 18:01 jmencak

So the good news is that after the latest commit I can no longer see the trace above and that the requests are being served reliably. However, one per second. Is this expected?

You're right, we can do better.

CZerta avatar Jan 20 '23 09:01 CZerta

... we can do better.

To give you some sense of what is expected. On a 112 CPU r750, with the default 50 QPS I created 100 pods in parallel. They were all PodScheduled within 1s delta and the average "startup" time between PodScheduled and Ready was 5.19s . I may try tweaking the QPS, but I doubt it will have much significance in reducing the startup times. I used CPUManager static policy and all containers were assigned to its own CPU.

jmencak avatar Jan 20 '23 09:01 jmencak

Thank you for the changes, Honzo! I was able to test them with success, however, it needs a bit more work:

printf "DoS" | nc --send-only -U /run/tuned/tuned.sock
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/tuned/exports/unix_socket_exporter.py", line 220, in period_check
    data = json.loads(data)
  File "/usr/lib64/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/tuned/exports/unix_socket_exporter.py", line 140, in _send_data
    s.send(json.dumps(data).encode("utf-8"))
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/tuned", line 98, in <module>
    app.run(args.daemon)
  File "/usr/lib/python3.10/site-packages/tuned/daemon/application.py", line 215, in run
    result = self._controller.run()
  File "/usr/lib/python3.10/site-packages/tuned/daemon/controller.py", line 65, in run
    exports.period_check()
  File "/usr/lib/python3.10/site-packages/tuned/exports/__init__.py", line 46, in period_check
    return ctl.period_check()
  File "/usr/lib/python3.10/site-packages/tuned/exports/controller.py", line 54, in period_check
    exporter.period_check()
  File "/usr/lib/python3.10/site-packages/tuned/exports/unix_socket_exporter.py", line 223, in period_check
    self._send_data(conn, self._create_error_responce(-32700, "Parse error", str(e)))
  File "/usr/lib/python3.10/site-packages/tuned/exports/unix_socket_exporter.py", line 142, in _send_data
    log.warning("Failed to send data '%s' type '%s': %s" % (data, e))
TypeError: not enough arguments for format string

Good catch, should be corrected

CZerta avatar Feb 01 '23 09:02 CZerta