BlueOS icon indicating copy to clipboard operation
BlueOS copied to clipboard

Deal with issues when the disk is full

Open patrickelectric opened this issue 2 years ago • 11 comments

Check: https://github.com/bluerobotics/BlueOS/issues/2327, https://github.com/bluerobotics/BlueOS/issues/2323, https://github.com/bluerobotics/BlueOS/issues/2326, https://github.com/bluerobotics/BlueOS/issues/1015

The docker is able to start, but everything after that just results in unstable behavior. Some points that you suggested are already available as issues, others are relevant to recover the system.

  • [x] We should delete old logs when doing the rotation and noticing the the disk space is almost full.
  • [ ] We should stop logging if the disk space is almost full.
    • This conflicts with rotation configuration in loguru
  • [ ] We should clean up old dockers that are not being used.
  • [ ] We should clean up old docker artifacts that are not being used.
  • [ ] We should allow user to delete all unused docker images.
  • [ ] We should warn the user though cockpit that the companion computer is almost full in disk.
  • [ ] We should erase older tlog or bin files if the disk is almost full.
    • #1257
  • [ ] We should warn the user though BlueOS header that the disk is almost full and in critical state.
  • [ ] We may not allow the user to arm the vehicle if the disk is almost full.
  • [ ] We may do some of this steps automatically to try to recover the system once it starts.
  • [ ] We may need a page like filelight on BlueOS to help identify the root of such problems.
  • [x] We should limit journald max size

Originally posted by @patrickelectric in https://github.com/bluerobotics/BlueOS/issues/2325#issuecomment-1902117968

patrickelectric avatar Jan 22 '24 13:01 patrickelectric

I know it might be trickier to manage the installation, but another valid strategy is to put /var in another partition.

joaoantoniocardoso avatar Jan 22 '24 16:01 joaoantoniocardoso

#2359

patrickelectric avatar Feb 01 '24 20:02 patrickelectric

I installed one extension [Nortek Nucleus], that grew the docker log to 18GB in about a week, maybe kraken could add a limit in the size of the docker logs:

https://docs.docker.com/config/containers/logging/configure/#configure-the-default-logging-driver

--log-opt max-size=100m

voorloopnul avatar Apr 30 '24 09:04 voorloopnul

I've just freed ~12 gb here by doing:

sudo docker system prune -a # ~9 gb
sudo journalctl --vacuum-time=2d  # ~1 gb
sudo apt-get clean  # ~1 gb

joaoantoniocardoso avatar May 10 '24 19:05 joaoantoniocardoso

@joaoantoniocardoso how we end up with 1GB of unnecessary stuff in our apt ?

patrickelectric avatar May 10 '24 19:05 patrickelectric

I've just freed ~12 gb here by doing:

sudo docker system prune -a # ~9 gb
sudo journalctl --vacuum-time=2d  # ~1 gb
sudo apt-get clean  # ~1 gb

The docker prune is specially important, as there are A LOT of leftover overlays hanging there forever. It would be good to do it automatically, or at least putting a button on BlueOS to do that.

rafaellehmkuhl avatar May 10 '24 20:05 rafaellehmkuhl

@joaoantoniocardoso how we end up with 1GB of unnecessary stuff in our apt ?

Maybe I've installed many things on mine, but it'd be good to check how it is in a fresh install.

joaoantoniocardoso avatar May 11 '24 16:05 joaoantoniocardoso

We are discussing this subject a bit in our project, as we run robots 24/7, and they could be running for several days, maybe even weeks without restarting/power cycling.

I've not done very thorough digging, but I think it would be very nice to have some kind of parameter (maybe even user facing), that would permit you to set a target age for tlog files. If I set 7 days, then any tlog files older than 7 days would be auto purged (no sure how often that should run). Maybe a bit out of scope for this issue, but should help nonetheless.

I can make a separate issue if that is better.

EDIT: Also, before this can even happen, we would need mavlinkrouter to somehow auto split/rotate files every 200mb or 12 hours maybe.

goasChris avatar Jul 05 '24 07:07 goasChris

Hi @goasChris, thanks for your input. Indeed the tlogs are also important for us to track. Adding on it, the tlogs are already in the list. Let us know if you have anything that is not being tracked at the moment.

patrickelectric avatar Jul 05 '24 16:07 patrickelectric

About tlogs: https://github.com/mavlink-router/mavlink-router/issues/426

patrickelectric avatar Jul 05 '24 17:07 patrickelectric

Some things that take a lot of space and can be removed:

  • Use docker system prune -a to delete unnecessary overlays in /var/lib/docker/overlay2. However, we need somehow to ensure that the factory image is tagged, because right now it is not and got deleted with other overlays
  • Limit or remove .tlog files as they can accumulate and take up significant space.
  • Remove all unused images in BlueOS version, except for the factory image and the current running image.

joaomariolago avatar Jul 05 '24 17:07 joaomariolago