Migrate hd_monitor.py to ROS2
Hi there!
As part of my work at Heudiasyc, I'd like to migrate some common diagnostics to ROS2, including hd_monitor.py.
I plan on submitting a Pull Request based on #94 (file) when the time is right but first I have several questions:
- I am unsure why the previous PR on the matter (#94) closed. Seeing that #242 was merged, is it still ok for me to migrate this?
-
hddtempis a system daemon used to retrieve the HDD temperature. It is discontinued and won't be shipped with bookworm or jammy according to the debian bug tracker. What should we do going forward? From what I scavenged, we have only three options:- Use the
sensorscommand from thelm-sensorspackage orpsutils.sensors_temperatures. However it is not possible with these tools to differentiate between HDD temperatures and other sensors. For example here is the output on my system which has a SSD. The last entries are for the CPU, there rest is.. something else:
acpitz-acpi-0 Adapter: ACPI interface temp1: +27.8°C (crit = +105.0°C) temp2: +29.8°C (crit = +105.0°C)
coretemp-isa-0000 Adapter: ISA adapter Package id 0: +29.0°C (high = +80.0°C, crit = +100.0°C) Core 0: +28.0°C (high = +80.0°C, crit = +100.0°C) Core 1: +26.0°C (high = +80.0°C, crit = +100.0°C) Core 2: +28.0°C (high = +80.0°C, crit = +100.0°C) Core 3: +30.0°C (high = +80.0°C, crit = +100.0°C)
- Go the S.M.A.R.T way with
smartctl, which can easily be parsed and could open to more checks. However, this requiressudopowers or at least for users to be in thediskgroup (though according to this article it does not even work). I believe these operations are too cumbersome for the average use of this package, which brings me to the final possibility; - Drop this check altogether. IMHO temperature is not the most important variable about storage and using other sources of temperature information (e.g. CPU, mother board) is more relevant.
- Use the
- Disk usage is currently only used if a home directory is provided, and only for the corresponding disk. Wouldn't it be a better approach to provide diagnostics about the usage of all (physical) disks unless specified or blacklisted? This would match the behavior from temperature checks and better match what I would expect from a hard drive monitor.
- What is the
diag_hostnameargument used for? It only seems to be used for the diag name. In particular, what cases is having separatehostnameanddiag_hostnamewish to cover?
If any user or maintainer have at least partial answers, I'd be glad to hear discuss them :)
Hi @limaanto
Thanks for your interest in working on this.
- I was not involved with that. But I think the main point was, that it was unmanageably big. So not against the porting per se. I think porting the hd_monitor is a good idea.
- From porting the NTP monitor I learned, that finding the right dependency is probably the hardest task in this. I don't have an overview on HDD tools. But it would be ideal if you find something that is well-supported also under RHEL. Looking for existing rosdep keys is also a good idea. I agree that the temperature could be skipped if not readily available. If someone needs it, please open an issue.
- I guess the partition of the home folder is the most important one, because it could cause boot issues. So I would suggest having a list of disks to monitor and by default populate it just with the one of the home folder
- I agree, one hostname is enough.
Hi @ct2034 Thank you for your thorough answer :) I agree with your points. I will provide a merge request in the following days/weeks.