openvas icon indicating copy to clipboard operation
openvas copied to clipboard

Issues with scans showing as "Interrupted" and server not responding

Open dustinbird opened this issue 4 months ago • 9 comments

Describe the bug I have been seeing some issues with one of my Linux OpenSUSE 15.6 servers running scans. We have 6 boxes similarly set-up but one box has been having 90% of the scans returning "interrupted" and the server running almost 100% CPU and unresponsive.

To Reproduce Steps to reproduce the behavior:

  1. Docker container is started using a docker-compose command (yml file attached)
  2. The container starts with no issues but multiple scans (not always same ones) fail to complete with a result showing "interrupted" in web interface.
  3. Issue has been with me for 2 weeks now.

Expected behavior Scans to complete successfully

Environment (please complete the following information):

  • OS: OpenSuse Leap 15.6
  • Memory available to OS: 8G
  • Container environment used with version: docker

logs ( commands assume the container name is 'openvas' ) Attached

Additional context Server is patched each Friday. Server is same specification as other servers also running docker containers containing OpenVAS. Docker container normally stopped using docker-compose down but also has been pruned using docker system prune -a. No pattern to when a scan fails and not always the same scan failing.

I have in the past week changed the scans to be less demanding and to not run multiple scans at the same time to see if this was helping we have gone from 2 scans running at the same time 4 times a day to 1 scan running 3 times a day.

There has been no improvement.

Another of the scan servers runs very high on CPU like this one but has no failures or issues and CPU usage returns to normal after a completed scan (as does this one). This server when running scans that may have an issue is showing as running when checked in OVH hosting but not responding with SSH connection and web interface will not work, however after a scan is a success or fail access is allowed again.

At this point I do not know if there is an issue with hardware, connections, containers or DB.

Compose commands.txt GVM Version.txt

Container logs 011225.txt

dustinbird avatar Dec 01 '25 09:12 dustinbird

Could this be a corrupted DB, and if so are there any commands I can use to check the DB for errors and fix before I attempt to restore an old backup of this?

dustinbird avatar Dec 02 '25 13:12 dustinbird

I don't think this is an issue with corrupt DB - I ran into what I think is a similar issue. I left a clean docker compose deployment to run for a couple of days and noticed that it was in an "unhealthy". It would seem that some processes terminate (maybe Redis, maybe Postgres)? Restarting the containers seems to work right up until the issue resurfaces.

@immauss have a look at my logs and see what you think. openvas-logs.tar.gz

karlisk avatar Dec 04 '25 17:12 karlisk

@dustinbird

You have piqued my curiosity ... so I must ask.

Why scan so frequently?

By default, OpenVAS does not perform any type of configuration verification. (That I'm aware of ... ) The Feeds only update, at most, daily, so what are you expecting to change in that time frame?

I ask, because the first thing I thought of was that the scans are overlapping. (scan starts before the previous scan finished.)

-Scott

immauss avatar Dec 11 '25 00:12 immauss

Is there a recommended frequency for scanning if this is two much?

This scanning server (one of three we have) runs scans on all our Infrastructure servers via their external IP addresses, another scan checks on internal IP addresses, a third scan runs on two other servers in another network, and a final scan runs checking all IP addresses in range checking no other systems have been addded to the group or any ports have been opened or changed on existing servers.

These scans then run on a rotating schedules so that they are checked at least twice a week. in some scans they will run daily. We have one day or downtime on the server for patching the server OS and updating the docker images.

Is this overkill or is this about right for security but should be split over more than one server. The other two servers work in a similar way to this with as many scans but have not been showing signs of strain / difficulties like this.

dustinbird avatar Dec 11 '25 12:12 dustinbird

I think I have solved the issues with the box for now with the following actions:

  • Increased the size for the data drive from 120GB to 250GB to increase the I/O count (helped reduce CPU demand but still issueswith interruptions and server hanging / issues with container)
  • A rollback to an older snaphot of the Data drive (still issues with one scan interrupting but server and image stability improved)
  • removing the scan of all IP addresses in range (the only scan that seemed to get regular interuptions).

So it seems the main culprit was the big scan but I am not sure why this scan that has been running multiple times a week for upwards of 3 years suddenly caused such issues.

dustinbird avatar Dec 11 '25 12:12 dustinbird

My bad ... I misunderstood. I thought ( must have been before coffee ... or too close to sleep time) that you were running the same scan that often.

This could be differing resource requirements for the move from notus-scanner to openvasd. I haven't seen anything on that ... but ...

-Scott

immauss avatar Dec 12 '25 22:12 immauss

Thank you.

Could you please advise when this change of scanner was implemented, this could help me see if the timeline of issues matches up to when I started seeing issues.

dustinbird avatar Dec 15 '25 09:12 dustinbird

It should have been here: #365

immauss avatar Dec 17 '25 08:12 immauss

Ah that seems about right, been having problems a few weeks with it now. Weirdly just the one of the boxes having this issue. Even after removing the full scan of an IP range I am still having issues with scsns sometimes working and sometimes not. During this time the box is unresponsive and I cannot even SSH on to the box to see what process is taking up all the CPU and causing the hang.

I think for now I need to change the schedules further or manually trigger so I can try and trap what is happening at the time

dustinbird avatar Dec 17 '25 08:12 dustinbird