occ and cron.php hang after upgrade from 19.0.3 to 19.0.4
Hi,
Today we tried to upgrade several Docker-based Nextcloud instances. Since the upgrade running cron.php as well as occ hangs forever using 100% CPU. We downgraded one of the instances by restoring from a backup and this fixed the problem.
Here is the output of the following command:
docker exec -it e5c2dfc4c9a5 strace php /var/www/html/occ
and the output of running cron.php
docker exec -it e5c2dfc4c9a5 strace php -f /var/www/html/cron.php
Both commands use up 100% of one CPU core the whole time. Once the output of strace reaches the sequence of mremap() calls that's all that is printed, so no other syscalls are called after that.
Running php alone does work:
# docker exec -it e5c2dfc4c9a5 php --version
PHP 7.4.11 (cli) (built: Oct 1 2020 19:35:35) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Zend OPcache v7.4.11, Copyright (c), by Zend Technologies
regards christian
I forgot to mention that besides the mentioned problems the instances seemed to be working normally and switching to AJAX based background jobs worked.
Hi again,
Deploying a completely new Instance using 19.0.4 shows the same behaviour.
regards christian
I can confirm the problem. I am running the container with a non-root user by specifying user: 123:1234 in my docker-compose.yml for the Nextcloud service. For the occ command I debugged the problem to the following lines in console.php (lines 67-76):
$user = posix_getpwuid(posix_getuid());
$configUser = posix_getpwuid(fileowner(OC::$configDir . 'config.php'));
if ($user['name'] !== $configUser['name']) {
echo "Console has to be executed with the user that owns the file config/config.php" . PHP_EOL;
echo "Current user: " . $user['name'] . PHP_EOL;
echo "Owner of config.php: " . $configUser['name'] . PHP_EOL;
echo "Try adding 'sudo -u " . $configUser['name'] . " ' to the beginning of the command (without the single quotes)" . PHP_EOL;
echo "If running with 'docker exec' try adding the option '-u " . $configUser['name'] . "' to the docker command (without the single quotes)" . PHP_EOL;
exit(1);
}
The id of my host user does not existing in the container and with the current 19.0.4 container posix_getpwuid(posix_getuid()); returns false instead of an array. The same is true for the configUser line. And for some reason - that I do not understand - php then hangs in the line with the if statement for hours until the process terminates with an out of memory error.
cron.php uses the same code, so I guess it has the same problem. However, the affected code in console.php has not been changed in a long so something that has changed in the image with version 19.0.4 causes the code to break.
Thanks @jgerken! We too are running the container using a user that does not exist in /etc/passwd of the container.
Ok i can confirm that adding the user to /etc/passwd of the container fixes the issue.
Hi! I have a Nextcloud container running as a non-root user as well. I can confirm that adding the user of the container fixes the issue. After the fix I was able to run occ commands and /var/www/html/cron.php without problems.
But the solution is only temporary. When I tried to update NC from 19.0.4 to 20.0.0, the container was recreated and the /etc/passwd line for the user was lost. As a consequence the command occ upgrade failed. After adding the user again, I run occ upgrade and it worked.
I think that we should have a better solution. Maybe creating the user, if needed, in the Dockerfile or in the entrypoint.sh script?
Hi! The problem with adding the user to the upstream entrypoint or Dockerfile is that anyone will then have to use the same user id since it is baked into the image. In our deployment environment i have worked around the issue by adding a custom image build step anytime a new image gets deployed using something like this:
FROM nextcloud:19.0.4
RUN set -x \
&& addgroup --gid 950 nc-app \
&& adduser --uid 950 --gid 950 --system --no-create-home --home /var/www/html --disabled-login --disabled-password nc-app
For now we deploy the nextcloud instances using standalone kubelet and ansible which means that it is enough to just build those images locally on the machine it will be running on. Something similar could also be done using docker-compose.
However things get a little bit more complicated once you want to run nextcloud inside a full-fledged kubernetes cluster or you want to use another container runtime (i.e. containerd) in your kubernetes setup. In this case you would need an additional service that builds the custom images and pushes them to a docker repository where all your cluster nodes can pull images from. This is why being able to run containers as "unknown" users is still a desirable feature.
Thanks for your help @equinox0815, I really appreciate it!
I noticed that 13 days ago the function posix_getpwuid was removed from console.php and cron.php. Now Nextcloud compares only the userid values:
$user = posix_getuid();
$configUser = fileowner(OC::$configDir . 'config.php');
if ($user !== $configUser) {
https://github.com/nextcloud/server/commit/563f1318cdaf8681c4977fb32d7ea2441269aa1c#diff-d9b6845260189235fbd69eea77adbd0d45ef27391914158d90c2abb8e4527c8d
Is this included in nextcloud 20 hotfix somewhere? Because I'm hitting this hang as well and after debugging I can confirm the findings of @jgerken.
I haven't tested it yet, but it looks like the changes where included in the NC 20.0.1 release:
$user = posix_getuid();
$configUser = fileowner(OC::$configDir . 'config.php');
if ($user !== $configUser) {
echo "Console has to be executed with the user that owns the file config/config.php" . PHP_EOL;
echo "Current user id: " . $user . PHP_EOL;
echo "Owner id of config.php: " . $configUser . PHP_EOL;
echo "Try adding 'sudo -u #" . $configUser . "' to the beginning of the command (without the single quotes)" . PHP_EOL;
echo "If running with 'docker exec' try adding the option '-u " . $configUser . "' to the docker command (without the single quotes)" . PHP_EOL;
exit(1);
}
https://github.com/nextcloud/server/blob/v20.0.1/console.php#L67
$user = posix_getuid();
$configUser = fileowner(OC::$configDir . 'config.php');
if ($user !== $configUser) {
echo "Console has to be executed with the user that owns the file config/config.php" . PHP_EOL;
echo "Current user id: " . $user . PHP_EOL;
echo "Owner id of config.php: " . $configUser . PHP_EOL;
exit(1);
}
https://github.com/nextcloud/server/blob/v20.0.1/cron.php#L97
I cloned the server repository and did:
git tag --contains 563f1318cdaf8681c4977fb32d7ea2441269aa1c
No tags came up (the SHA1 is from this comment). But then I realized their release process is to cherry pick commits back instead of using merge, so commands like these are useless for auditing version control (and why cherry pick for release hotfixing is a terrible practice).
Thanks for checking for me.
This topic seems related to https://github.com/nextcloud/server/pull/23158 I hope this can help.
This topic seems related to nextcloud/server#23158
Perhaps not. The 20.0.3 image still has the same problem. As with 19, I had to create my own image with the user/group explicitly added for the uid/gid nextcloud runs as.
I have the same issue on FreeBSD with NextCloud 26.0.1 (and without any containers - pure Nginx + PHP-FPM setup). I see that in console.php posix_getuid() is in use indeed, but the if clause still makes occ to hang. When I comment out the if ($user !== $configUser) block, occ finishes OK.
Upstream: Fixed in nextcloud/server#23436