solr cloud backup gcs
I have created solrcloud using the yaml defined below
`apiVersion: solr.apache.org/v1beta1
kind: SolrCloud
metadata:
name: example
namespace: dev-backend
spec:
backupRepositories:
- name: "gcs-backups-1"
gcs:
bucket: "vipul-test-bucket" # Required
gcsCredentialSecret:
name: "newsecret"
key: "my-key.json"
replicas: 3
solrImage:
tag: 9.0.0`
I am taking backup of my solr collections in gcs bucket
My solr backup yaml is as below
apiVersion: solr.apache.org/v1beta1
kind: SolrBackup
metadata:
name: gcs-backup
namespace: dev-backend
spec:
solrCloud: example
repositoryName: "gcs-backups-1"
collections:
- demo
- test
- new
My backup is starting but not getting anything on my bucket it is empty. Also my solrbackup is not getting completed it is as follows
NAME CLOUD STARTED FINISHED SUCCESSFUL NEXTBACKUP AGE
gcs-backup example 60m 60m
local-backup example 4h20m true true 4h20m
I firstly created a pvc backup it got completed but my gcs backup is started but not finishing also no data in my bucket
As mentioned in your slack thread, you are seeing the following error:
ERROR controller-runtime.manager.controller.solrbackup Error while taking SolrCloud backup {"reconciler group": "[solr.apache.org](http://solr.apache.org/)", "reconciler kind": "SolrBackup", "name": "gcs-backup", "namespace": "dev-backend", "error": "Recieved bad response code of 500 from solr with response: {\n \"responseHeader\":{\n \"status\":500,\n \"QTime\":258},\n \"error\":{\n \"metadata\":[\n \"error-class\",\"org.apache.solr.common.SolrException\",\n \"root-error-class\",\"org.apache.solr.common.SolrException\"],\n \"msg\":\"specified location / does not exist.\",\n
The Backup command requires a location field, and the operator uses "/" as the default location. This works nicely with S3, since "/" can be used as the root node. With GCS "/" and "" are both valid starts to paths, so you would need to create the "/" path yourself.
Another option to move forward is to specify a real "location" in the backup, or the GCS repo spec, and manually create that path in GCS before starting everything.
I encountered the same problem on GCS backup. I solved the first part as you said with first creating a path and then pointing the baseLocation there. Counter-intuitively it needs to be written without a preceding slash. The backup is now created (files and folders) but the backup still doesn't complete. It remains in a pending status indefinitely. I see no errors in the log. But the created files do not seem to be a complete backup. The installation is on an autoPilot cluster on GKE. No TLS between pods but TLS on ingress.
This is my backup file:
kind: SolrBackup
metadata:
name: local-backup9
namespace: sop030
spec:
repositoryName: "gcs-backups-1"
solrCloud: explore
collections:
- dsearch
And this is my gcs repository from the original setup:
spec:
backupRepositories:
- name: "gcs-backups-1"
gcs:
bucket: "backupbx"
gcsCredentialSecret:
name: "gcssecret1"
key: "service-account-key.json"
baseLocation: "d"
And this is the describe result on the backup:
Name: local-backup9
Namespace: sop030
Labels: <none>
Annotations: <none>
API Version: solr.apache.org/v1beta1
Kind: SolrBackup
Metadata:
Creation Timestamp: 2023-04-08T12:37:18Z
Generation: 1
Managed Fields:
API Version: solr.apache.org/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:collections:
f:repositoryName:
f:solrCloud:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2023-04-08T12:37:18Z
API Version: solr.apache.org/v1beta1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:collectionBackupStatuses:
f:solrVersion:
f:startTimestamp:
Manager: solr-operator
Operation: Update
Subresource: status
Time: 2023-04-08T12:37:18Z
Resource Version: 330672
UID: 23ea0f0b-0db7-412d-943f-e3ca58bda906
Spec:
Collections:
dsearch
Repository Name: gcs-backups-1
Solr Cloud: explore
Status:
Collection Backup Statuses:
Async Backup Status: notfound
Backup Name: local-backup9-dsearch
Collection: dsearch
In Progress: true
Start Timestamp: 2023-04-08T12:37:18Z
Solr Version: 8.11.0
Start Timestamp: 2023-04-08T12:37:18Z
Events: <none>```
Hi, I am observing exactly the same symptoms described here, and in the same order (I am using GCS and first I was missing a location, after adding it I got stuck in the same next step as the OP). I believe this is the same issue than https://github.com/apache/solr-operator/issues/547 (which also fully matches with my situation).
To sum up the issue, when using backups users sometimes have backups that succeed in Solr, but the SolrBackup status has the following two properties:
-
In Progress: true -
Async Backup Status: notfound
From what I can discern, there are two possible reasons why the backup isn't being "finished" by the Solr Operator:
- The error was never handled when the backup was started. Thus an asyncId was never actually created, and why we get "notfound" when querying the status of the async command. (This can happen and needs to be fixed)
- The backup was finished and the Solr operator deleted the asyncId in Solr, however the backup status failed to update, and thus on the next iteration of the reconcile loop, it could not find the backup status anymore.
Given that the backup succeeded in Solr, the second option is more likely for the failures that we are seeing listed here.
There are issues with the status updates failing because of conflicts.
But this issue should happen much, much less starting in v0.7.0 because of #544.
However, given that this is happening to users every single time, I am less confident.
This should happen sporadically, since its a race condition.
I will create a PR that starts to address these issues, so we can have y'all test it out and see what works.