solr-operator icon indicating copy to clipboard operation
solr-operator copied to clipboard

Solr Restore Space Considerations in Kubernetes

Open mchennupati opened this issue 1 year ago • 0 comments

I am restoring a large index (655G) that is currently on google cloud storage to a new solr cloud on kubernetes instance. I am trying to understand how much space I need to allocate to each of my node pvcs.

I am currently using the collections api, with async to restore a collection saved in gcs.

When I check my disk usage for /var/solr/data on each of the nodes, it looks like this. So each of them appears to be downloading the entire index. I initially allocated 500G to each of the pvcs but that turned out to be too little. I am now doing it with 700G.

Is this expected behaviour or am I doing something wrong ? One would have expected the metadata has enough information to download the index in parts and not do it 655G x 3. It's cost me a fair bit in network costs already as I reiterate :)

In general, how would one restore a large index, I did not find a solrrestore similar to solrbackups in the solr operator crds.

So I ran an async job using the solr collections api.

Thanks !

/var/solr/data$ du 4 ./userfiles 4 ./backup-restore/gcs-backups/gcscredential/..2024_10_11_06_16_24.1266852566 4 ./backup-restore/gcs-backups/gcscredential 8 ./backup-restore/gcs-backups 12 ./backup-restore 4 ./filestore 4 ./mycoll_shard3_replica_n3/data/tlog 4 ./mycoll_shard3_replica_n3/data/snapshot_metadata 8 ./mycoll_shard3_replica_n3/data/index 85744132 ./mycoll_shard3_replica_n3/data/restore.20241011062904489 85744152 ./mycoll_shard3_replica_n3/data 85744160 ./mycoll_shard3_replica_n3 85744192 . solr@mycoll-solrcloud-0:/var/solr/data$ du -sh

mchennupati avatar Oct 11 '24 07:10 mchennupati