[FEATURE] Remove reliance on disk storage, store everything in the database
Describe the feature you'd like Currently most information is stored in the database with the exception of a few additional things like:
- Secrets (encryption and API keys)
- File storage
- Logs
Additional context Deploying with a database is easy and common with Kubernetes and container setups, however persistent volume claims are not as simple. Moving the remaining items in the database would simplify deployment a great deal.
The keys would be an easy move as one is a string (and already has an override which can be pulled from a Kubernetes secret) and the other is a JSON string.
The files could be kept in blob storage, there's some additional consideration on file size but it's terribly complex.
actually the uploaded files are stored as base64 string in database, but we are moving away from this and store to a path folder specified by BLOB_STORAGE_PATH.
log files however are not possible to store in database because of its large size
secrets can be stored in database, however it doesnt seems too secured
Thanks for the reply, Henry. The reason is PVC has its own complications when scaling, for example N Flowise pods with a PVC would require read/write on all of them, and there may be issues if they're moving through nodes. A database, especially if using RDS or something else, is a lot easier to manage and is often cheaper.
I think for logs it's actually cheaper per byte to store them in a database typically vs. disk? I would have to run the numbers, but on cloud I think this might be the case.
For secrets you'd encrypt them in the db that's not a problem, and for the salt or hash you can accept that as a secret in the .env which would be a kubectl secret (single string) just like you do today for the override key.
Files are more difficult, as you mentioned depends on size. At a minimum, an option to keep it all in the database would be very nice and make scaling with helm charts easier (at least, for me).
secrets can be stored in database, however it doesnt seems too secured
Cool, how can we persist the API keys? I didn't find it in the docs. In my case it would fit well, because every time that I update the docker image to a newer version, my API keys are refreshed
You can create a PVC and set the environment variable path to the PVC. Then it'll persist.
+1 for this.
Deploying to Render.com it seems there's no way to enable a zero-downtime deploy because of the dependency on an attached persistent disk.
This has me wondering how other people are achieving zero-downtime deploys (with Render, or other deployment targets).
Thanks for the reply, Henry. The reason is PVC has its own complications when scaling, for example N Flowise pods with a PVC would require read/write on all of them, and there may be issues if they're moving through nodes. A database, especially if using RDS or something else, is a lot easier to manage and is often cheaper.
I think for logs it's actually cheaper per byte to store them in a database typically vs. disk? I would have to run the numbers, but on cloud I think this might be the case.
For secrets you'd encrypt them in the db that's not a problem, and for the salt or hash you can accept that as a secret in the .env which would be a kubectl secret (single string) just like you do today for the override key.
Files are more difficult, as you mentioned depends on size. At a minimum, an option to keep it all in the database would be very nice and make scaling with helm charts easier (at least, for me).
+1.
Switching to a stateless system would greatly improve how we deploy flows to production. Of course we can use Flowise for prototyping and Langchain for moving to production. However, being able to quickly deploy flows and make live changes with Flowise in prod would be very beneficial.
For logs, storing them on disk is fine. In K8S clusters, logs are usually collected and kept by systems like Loki or ELK, so it’s not a problem if we lose disk logs when a pod is terminated.
Agree. We have maybe 6 instances of Flowise deployed in k8s, and all have challenges with PVC.
Would still greatly prefer everything (except logs, which as you mention are optional) to be stored in the database.
Could you explain a bit more how you have those 6 instances deployed?
Some questions that come to my mind:
- Is it a single deployment with 6 instances for HA or are those 6 independent deployments?
- If it's a single deployment, are those sharing the same PV? or do you have a different PV for each one?
How I'm thinking of solving it for now:
- For API keys I'm thinking on storing the api.json file in a secret and mounting it as a file for the pods. Yes, API keys won't be manageable through Flowise, but for me that's not important in a prod environment.
- For files, I'm considering having two deployments of Flowise running. One with one replica for the admin panel and a PVC for the files, and another one with at least 3 replicas only used to integrate the flows through the API without access to the admin panel.
And I might even consider not attaching a PVC at all for the admin replica. We will upload new files every time we run the upsert process, so we don't really need the old ones I guess.
Thanks btw! It's great to know there's more people on the same boat.
It's N distinct deployments, but each instance has HPA on it (we deploy via helm). I'm not DevOps, but I believe they put ReadWriteMany on the PVCs to solve the scaling issue. However the name of the PVC can be a problem and is often lost when we upgrade the instance requiring manual intervention.
It's a pain, frequently causes us issues.
I just noticed S3 is supported as storage for the files, it's mentioned in the documentation. I don't know how I missed it.
So from the list:
- Secrets:
- Encryption: Can be overwritten with an env variable
- API keys: Potentially can be overwritten with a k8s secret mounted as a file.
- File storage: S3
- Logs: Stored in ephemeral storage.
If this works, we can deploy Flowise without PV.
I can confirm the approach mentioned in my previous comment works great, without the need of an attached storage.
Only downside is that API keys are not manageable through the UI, and they need to be modified from the k8s secret or the secret manager tool of your choice.
So as a next improvement, it would be great to have API keys stored in the database with the rest of the configuration. I know it was mentioned in the thread that this is for security reasons, but if access to the DB is managed correctly (non-public access + secure credentials) I don't think this is a real problem.
You can store API keys now to database as well! https://docs.flowiseai.com/configuration/environment-variables#for-flowise-api-keys
Sorry for commenting on closed topics - I've a side question: One of the comments (https://github.com/FlowiseAI/Flowise/issues/1858#issuecomment-1975325431 ) implies that we can run multiple pods of Flowise. The helm chart asks us not to change number of replicas from 1 to other number - If I back Flowise with postgres instead of sqlite, I'm wondering why cannot I run more replicas of Flowise pods? Does anyone have idea a about this?