helm icon indicating copy to clipboard operation
helm copied to clipboard

Kubernetes - Redeploying not working when using S3 as default storage backend

Open kev-mas opened this issue 6 years ago • 22 comments

Nextcloud version (eg, 12.0.2): 16.0.4 Operating system and version (eg, Ubuntu 17.04): Kubernetes / Docker Apache or nginx version (eg, Apache 2.4.25): Docker Image nextcloud:16.0.4-apache PHP version (eg, 7.1): Docker Image nextcloud:16.0.4-apache

The issue you are facing:

  • We are deploying Nextcloud on Kubernetes using the Helm Chart from https://github.com/helm/charts/tree/master/stable/nextcloud .
  • We changed the Docker image to be nextcloud:16.0.4-apache
  • We use the s3.config.php option to store our files on S3.
  • We use the external database option to use a MariaDB server we already have.

We launch Nextcloud the first time and it creates the DB correctly, creates the first user correctly, and starts up as expected. We can login, and create / upload files. To verify our files are secure and retrievable after a major failure, we re-deploy the Nextcloud deployment (scale to 0, scale to 1). After this, while starting the logs show the following:

Initializing nextcloud 16.0.4.1 ...
Initializing finished
New nextcloud instance
Installing with MySQL database
starting nextcloud installation
The username is already being used
retrying install...
The username is already being used
retrying install...
The username is already being used
retrying install...

This is the first issue. WHY does it try to re-install? The Database is still there, and so is the previous user. Why does it not just connect and re-use what is there?

After a couple of minutes the container dies and starts again, this time without failure. BUT, when trying to browse to nextcloud, we are greeted with the following message:

Error
It looks like you are trying to reinstall your Nextcloud. However the file CAN_INSTALL is missing from your config directory. Please create the file CAN_INSTALL in your config folder to continue.

If I create the CAN_INSTALL file, I am prompted with the installation/setup screen and am told that the admin account I want to use does already exist.

Is this the first time you've seen this error? (Y/N): Y

The output of your config.php file in /path/to/nextcloud (make sure you remove any identifiable information!):

<?php
$CONFIG = array (
  'debug' => true,
  'htaccess.RewriteBase' => '/',
  'memcache.local' => '\\OC\\Memcache\\APCu',
  'apps_paths' => 
  array (
    0 => 
    array (
      'path' => '/var/www/html/apps',
      'url' => '/apps',
      'writable' => false,
    ),
    1 => 
    array (
      'path' => '/var/www/html/custom_apps',
      'url' => '/custom_apps',
      'writable' => true,
    ),
  ),
  'objectstore' => 
  array (
    'class' => '\\OC\\Files\\ObjectStore\\S3',
    'arguments' => 
    array (
      'bucket' => 'nextcloud-files',
      'autocreate' => true,
      'key' => '**************',
      'secret' => '****************',
      'region' => 'eu-west-1',
      'use_ssl' => true,
    ),
  ),
  'passwordsalt' => '********************',
  'secret' => '******************',
  'trusted_domains' => 
  array (
    0 => 'localhost',
  ),
  'datadirectory' => '/var/www/html/data',
  'dbtype' => 'mysql',
  'version' => '16.0.4.1',
  'overwrite.cli.url' => 'http://localhost',
  'dbname' => 'nextcloud',
  'dbhost' => 'mariadb.mariadb',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'mysql.utf8mb4' => true,
  'dbuser' => 'oc_user129',
  'dbpassword' => '*****************',
  'instanceid' => '************',
);

Any idea on how to solve this issue?

kev-mas avatar Sep 23 '19 08:09 kev-mas

Same problem!

cuihaikuo avatar Sep 25 '19 05:09 cuihaikuo

exactly the same problem even with manual yaml manifest (not helm) and persistentvolumeclaims. I tested to delete the user in database, I works for this step but the next problem is :

Command "maintenance:install" is not defined.

  Did you mean one of these?
      app:install
      maintenance:data-fingerprint
      maintenance:mimetype:update-db
      maintenance:mimetype:update-js
      maintenance:mode
      maintenance:repair
      maintenance:theme:update
      maintenance:update:htaccess


retrying install...

fle108 avatar Nov 05 '19 14:11 fle108

Just out of curiosity what do you use as storage backend? I never got it working with nfs backed persistent volumes. The rsync happening in the entrypoint.sh didn't fully finish for some reason. Also it took a pretty long time until the "install" was finished. And when I killed the pod the new one was trying to install nextcloud again.

JasperZ avatar Nov 05 '19 16:11 JasperZ

Just out of curiosity what do you use as storage backend? I never got it working with nfs backed persistent volumes. The rsync happening in the entrypoint.sh didn't fully finish for some reason. Also it took a pretty long time until the "install" was finished. And when I killed the pod the new one was trying to install nextcloud again.

hi, I use azure file storage but I've to mount it with specific mount options (uid 33 for www-data) in my persistentVolume manifest otherwise it doesn't work

  mountOptions:
  - dir_mode=0770
  - file_mode=0770
  - uid=33
  - gid=33

actually I battle with initContainers to be able to push ConfigMap file (.user.ini) to set up php options like upload_max_filesize

sources: https://github.com/rabbitmq/rabbitmq-peer-discovery-k8s/issues/37 https://docs.nextcloud.com/server/13.0.0/admin_manual/configuration_files/big_file_upload_configuration.html

fle108 avatar Nov 05 '19 18:11 fle108

Anybody figure out a fix to this problem?

I have all data persisted on a NAS and just wiped my Kubernetes host to re-start my containers from scratch.

When I launch NExtCloud, I get the same "It looks like you are trying to reinstall your Nextcloud. However the file CAN_INSTALL is missing from your config directory. Please create the file CAN_INSTALL in your config folder to continue." error.

I can not find any documentation on this nor many other threads.

GoingOffRoading avatar Aug 30 '20 22:08 GoingOffRoading

@GoingOffRoading You need to make sure that instanceid persists across the pod lifecycle. Do this by making sure that /var/www/html/config is on a persistent vol.

see: https://github.com/nextcloud/docker/issues/1006#issuecomment-682276451

kquinsland avatar Aug 31 '20 02:08 kquinsland

This seems to be a bug when you use S3 as primary storage. We don't want to have any persistent storages at all but use S3, only.

robertoschwald avatar Jan 05 '24 11:01 robertoschwald

@robertoschwald just get the file out of the config folder and put it into your helm chart as well?

Or use an init container that creates that file from something you stored in S3...

agowa avatar Jan 07 '24 07:01 agowa

For right now, you still need a persistent volume for the config directory as well, even when using S3. That's been my experience, at least. You can set persistence in the helm chart, but we probably still need to separate out the config dir persistence entirely from the data dir. I'll see if I can find the other issue mentioning this and link it back here.

jessebot avatar Jul 11 '24 20:07 jessebot

you still need a persistent volume for the config directory as well

you don't configmap works as well at least last time I checked the files within the config folder weren't dynamically updated at runtime by the application itself...

And alternatively an init container could just bootstrap the config directory using a script or something...

agowa avatar Jul 12 '24 10:07 agowa

I haven't tested this in about 6 months, but I thought there was something that changed in the config directory that prevented this from working. I can't remember what it was though. Oh, maybe it was the /var/www/html directory itself or something else in the /var/www/html/data directory?

Either way, I haven't had time to test this again in a while, so I'm open to anyone else in the community testing installing the latest version of this helm chart, enabling s3 as the default storage backend via a the nextcloud.configs parameter (which should create a ConfigMap), and verifying it's still broken. If it is broken still, we need to know precisely which directory needs to be persisted to fix this and why. From there, we can figure out what needs to be done, including the suggestions you've made, @agowa to see if there's anything we can mount or script away to solve this. 🙏

jessebot avatar Jul 12 '24 12:07 jessebot

This is still broken, I've installed the latest version and hit this after a few redeploys of the pod. Any workarounds to get it back running, preferably without wiping the DB and S3 and starting from scratch?

vtmocanu avatar Sep 07 '24 18:09 vtmocanu

@jessebot I stopped using nextcloud years ago because of this and another S3 backend related issue. I just were still subscribed to this issue...

agowa avatar Sep 11 '24 19:09 agowa

I haven't had a chance to test this again because I was waiting for the following to be merged:

  • https://github.com/nextcloud/docker/pull/2271
  • https://github.com/nextcloud/helm/pull/614

In the meantime, @wrenix have you used s3 as a primary object store and done a restore successfully yet? I plan on testing this again soonish, but not before the above are merged. @provokateurin, @joshtrichards not sure if either of you use s3 either? 🤔

jessebot avatar Sep 20 '24 09:09 jessebot

Maybe installed version also needs to be persisted? Not just instance_id. Because I made instanceid static via nextcloud.config, but it still did not work. Because of this:

            # Install
            if [ "$installed_version" = "0.0.0.0" ]; then
                echo "New nextcloud instance"

https://github.com/nextcloud/docker/blob/30b570f0b553736d63dc63cf487ff1e5e5331474/docker-entrypoint.sh#L183

vtmocanu avatar Sep 21 '24 09:09 vtmocanu

@jessebot sorry i do not use S3 current in my setup and has no time to build a testsetup with S3

wrenix avatar Sep 21 '24 11:09 wrenix

So I think you, @WladyX, and @kquinsland are onto something with the installed_version note. I also took a look at https://github.com/nextcloud/docker/issues/1006 again.

So in the docker-entrypoint.sh script, we're looking for installed_version in /var/www/html/version.php:

        installed_version="0.0.0.0"
        if [ -f /var/www/html/version.php ]; then
            # shellcheck disable=SC2016
            installed_version="$(php -r 'require "/var/www/html/version.php"; echo implode(".", $OC_Version);')"
        fi

Which as @WladyX pointed out, then later down hits this conditional:

            if [ "$installed_version" = "0.0.0.0" ]; then
                echo "New nextcloud instance"

I checked on my instance and version.php looks like this:

<?php
$OC_Version = array(29,0,7,1);
$OC_VersionString = '29.0.7';
$OC_Edition = '';
$OC_Channel = 'stable';
$OC_VersionCanBeUpgradedFrom = array (
  'nextcloud' =>
  array (
    '28.0' => true,
    '29.0' => true,
  ),
  'owncloud' =>
  array (
    '10.13' => true,
  ),
);
$OC_Build = '2024-09-12T12:35:46+00:00 873a4d0e1db10a5ae0e50133c7ef39e00750015b';
$vendor = 'nextcloud';

The issue is that I'm not sure how to persist that file, without just using our normal PVC setup, since it's not created by nextcloud/helm or nextcloud/docker. I think it's created by nextcloud/server 🤔

Perhaps we can do some sort of check to see if s3 is already enabled? 🤔 Maybe checking if $OBJECTSTORE_S3_BUCKET is set? Open to ideas and suggestions. Will cross post to the other thread in nextcloud/docker too.

jessebot avatar Sep 24 '24 07:09 jessebot

The issue is that I'm not sure how to persist that file

The file is part of the source code and not generated at runtime. See https://github.com/nextcloud/server/blob/master/version.php

provokateurin avatar Sep 24 '24 07:09 provokateurin

So then the question is: is there a way to accommodate not having to manage PVCs while using S3? 🤔 Could we maybe add some sort of configmap with a simple php script like:

<?php
$S3_INSTALLED = true;

and then we tweak docker-entrypoint.sh upstream in nextcloud/docker to check there? I'm just throwing out suggestions, as I haven't tested anything on a live system yet, but want to try and help.

jessebot avatar Sep 24 '24 07:09 jessebot

Just thinking outloud:

  • Explain how to generate an instanceid for the helm values and populate the config with that
  • If S3_INSTALLED variable is defined or something then update the upstream docker-entrypoint to check the DB if the instance was installed or not instead of looking at the version for S3 cases.

Or I think I saw the version also in the DB, maybe docker-entrypoint should check the DB instead of the config for the version to decide if nextcloud was installed or not. Thank you for looking into this one, I went for static PVC for the time beeing.

vtmocanu avatar Sep 25 '24 10:09 vtmocanu

I face this problems. How to solve this

Step

  1. helm install with value below
  2. first time pod start and running
  3. log in and upload some file
  4. delete nexcloud pod
  5. waiting pod start and then goto home and see "The Login is already being used"
nextcloud:
  existingSecret:
    enabled: true
    secretName: nextcloud-secret
    usernameKey: nextcloud-username
    passwordKey: nextcloud-password
  objectStore:
    s3:
      enabled: true
      accessKey: "xxxxx"
      secretKey: "xxxxx"
      region: xxxxx
      bucket: "xxxxx"

replicaCount: 1

internalDatabase:
  enabled: false

externalDatabase:
  enabled: true
  existingSecret:
    enabled: true
    secretName: nextcloud-secret
    hostKey: externaldb-host
    databaseKey: externaldb-database
    usernameKey: externaldb-username
    passwordKey: externaldb-password

mariadb:
  enabled: true
  auth:
    rootPassword: test

minkbear avatar Oct 04 '24 13:10 minkbear

So then the question is: is there a way to accommodate not having to manage PVCs while using S3? 🤔 Could we maybe add some sort of configmap with a simple php script like: [...] and then we tweak docker-entrypoint.sh upstream in nextcloud/docker to check there? I'm just throwing out suggestions, as I haven't tested anything on a live system yet, but want to try and help.

Just some Sunday afternoon thoughts...

What problem are we actually trying to solve here? If the aim is to eliminate persistent storage, that's not feasible at this juncture. That's a much larger discussion (that touches on a re-design of the image and/or Nextcloud Server itself).

I guess OP didn't have any persistent storage^1 for /var/www/html in place? Then this sounds like expected behavior. At the risk of putting my foot in my mouth because I'm coming from the docker repo and less familiar with the helm side of things, it seems the issue is that it is very important that persistence.enabled be on (so maybe there's room for doc enhancements or examples or something).

But you definitely need to have version.php around. It's part of the app itself, as Kate said. If it's not available, it's not a valid deployment. It means you don't have the persistent storage in-place that the image expects to be around for /var/www/html/.

Context

The version check in the entry point is used by the image to determine if there is already a version of Server installed on the container's persistent storage then:

  • if not detected, it installs it
  • if detected, see if it needs to be upgraded to match the new version from the image

The key here is that Server doesn't technically run from the image itself. The image installs a version of Server on persistent storage (i.e. the contents of /var/www/html/ within a running a container).

This is due to a mixture of how Nextcloud Server functions historically + how the image currently functions. But the bottom line is:

  • S3 Primary Storage + a database alone are not sufficient for a Nextcloud deployment. The former is only for user home directories/etc.

So /var/www/html/ + config /var/www/html/config + datadirectory (/var/www/html/data by default) are still expected to be available to any containers that boot the image.

If there are challenges like nextcloud/docker#1006, those need to be tackled directly. The OP in that one may have hit a weird NFS issue or similar. In part that's why I recently did nextcloud/docker#2311 (increasing verbosity of the rsync part of the entrypoint, which is the typical culprit for NFS interaction problems; the locking in the entrypoint to prevent multiple containers from updating simultaneously being another).

joshtrichards avatar Oct 27 '24 18:10 joshtrichards