Flowise icon indicating copy to clipboard operation
Flowise copied to clipboard

[FEATURE] Remove reliance on disk storage, store everything in the database

Open automaton82 opened this issue 1 year ago • 10 comments

Describe the feature you'd like Currently most information is stored in the database with the exception of a few additional things like:

  • Secrets (encryption and API keys)
  • File storage
  • Logs

Additional context Deploying with a database is easy and common with Kubernetes and container setups, however persistent volume claims are not as simple. Moving the remaining items in the database would simplify deployment a great deal.

The keys would be an easy move as one is a string (and already has an override which can be pulled from a Kubernetes secret) and the other is a JSON string.

The files could be kept in blob storage, there's some additional consideration on file size but it's terribly complex.

automaton82 avatar Mar 03 '24 01:03 automaton82

actually the uploaded files are stored as base64 string in database, but we are moving away from this and store to a path folder specified by BLOB_STORAGE_PATH.

log files however are not possible to store in database because of its large size

secrets can be stored in database, however it doesnt seems too secured

HenryHengZJ avatar Mar 03 '24 01:03 HenryHengZJ

Thanks for the reply, Henry. The reason is PVC has its own complications when scaling, for example N Flowise pods with a PVC would require read/write on all of them, and there may be issues if they're moving through nodes. A database, especially if using RDS or something else, is a lot easier to manage and is often cheaper.

I think for logs it's actually cheaper per byte to store them in a database typically vs. disk? I would have to run the numbers, but on cloud I think this might be the case.

For secrets you'd encrypt them in the db that's not a problem, and for the salt or hash you can accept that as a secret in the .env which would be a kubectl secret (single string) just like you do today for the override key.

Files are more difficult, as you mentioned depends on size. At a minimum, an option to keep it all in the database would be very nice and make scaling with helm charts easier (at least, for me).

automaton82 avatar Mar 03 '24 21:03 automaton82

secrets can be stored in database, however it doesnt seems too secured

Cool, how can we persist the API keys? I didn't find it in the docs. In my case it would fit well, because every time that I update the docker image to a newer version, my API keys are refreshed

mateusluizfb avatar Mar 25 '24 17:03 mateusluizfb

You can create a PVC and set the environment variable path to the PVC. Then it'll persist.

automaton82 avatar Mar 25 '24 17:03 automaton82

+1 for this.

Deploying to Render.com it seems there's no way to enable a zero-downtime deploy because of the dependency on an attached persistent disk.

This has me wondering how other people are achieving zero-downtime deploys (with Render, or other deployment targets).

jonhilt avatar May 01 '24 08:05 jonhilt

Thanks for the reply, Henry. The reason is PVC has its own complications when scaling, for example N Flowise pods with a PVC would require read/write on all of them, and there may be issues if they're moving through nodes. A database, especially if using RDS or something else, is a lot easier to manage and is often cheaper.

I think for logs it's actually cheaper per byte to store them in a database typically vs. disk? I would have to run the numbers, but on cloud I think this might be the case.

For secrets you'd encrypt them in the db that's not a problem, and for the salt or hash you can accept that as a secret in the .env which would be a kubectl secret (single string) just like you do today for the override key.

Files are more difficult, as you mentioned depends on size. At a minimum, an option to keep it all in the database would be very nice and make scaling with helm charts easier (at least, for me).

+1.

Switching to a stateless system would greatly improve how we deploy flows to production. Of course we can use Flowise for prototyping and Langchain for moving to production. However, being able to quickly deploy flows and make live changes with Flowise in prod would be very beneficial.

For logs, storing them on disk is fine. In K8S clusters, logs are usually collected and kept by systems like Loki or ELK, so it’s not a problem if we lose disk logs when a pod is terminated.

danieldabate avatar May 10 '24 23:05 danieldabate

Agree. We have maybe 6 instances of Flowise deployed in k8s, and all have challenges with PVC.

Would still greatly prefer everything (except logs, which as you mention are optional) to be stored in the database.

automaton82 avatar May 10 '24 23:05 automaton82

Could you explain a bit more how you have those 6 instances deployed?

Some questions that come to my mind:

  • Is it a single deployment with 6 instances for HA or are those 6 independent deployments?
  • If it's a single deployment, are those sharing the same PV? or do you have a different PV for each one?

How I'm thinking of solving it for now:

  • For API keys I'm thinking on storing the api.json file in a secret and mounting it as a file for the pods. Yes, API keys won't be manageable through Flowise, but for me that's not important in a prod environment.
  • For files, I'm considering having two deployments of Flowise running. One with one replica for the admin panel and a PVC for the files, and another one with at least 3 replicas only used to integrate the flows through the API without access to the admin panel.

And I might even consider not attaching a PVC at all for the admin replica. We will upload new files every time we run the upsert process, so we don't really need the old ones I guess.

Thanks btw! It's great to know there's more people on the same boat.

danieldabate avatar May 11 '24 00:05 danieldabate

It's N distinct deployments, but each instance has HPA on it (we deploy via helm). I'm not DevOps, but I believe they put ReadWriteMany on the PVCs to solve the scaling issue. However the name of the PVC can be a problem and is often lost when we upgrade the instance requiring manual intervention.

It's a pain, frequently causes us issues.

automaton82 avatar May 11 '24 03:05 automaton82

I just noticed S3 is supported as storage for the files, it's mentioned in the documentation. I don't know how I missed it.

So from the list:

  • Secrets:
    • Encryption: Can be overwritten with an env variable
    • API keys: Potentially can be overwritten with a k8s secret mounted as a file.
  • File storage: S3
  • Logs: Stored in ephemeral storage.

If this works, we can deploy Flowise without PV.

danieldabate avatar May 14 '24 16:05 danieldabate

I can confirm the approach mentioned in my previous comment works great, without the need of an attached storage.

Only downside is that API keys are not manageable through the UI, and they need to be modified from the k8s secret or the secret manager tool of your choice.

So as a next improvement, it would be great to have API keys stored in the database with the rest of the configuration. I know it was mentioned in the thread that this is for security reasons, but if access to the DB is managed correctly (non-public access + secure credentials) I don't think this is a real problem.

danieldabate avatar May 23 '24 15:05 danieldabate

You can store API keys now to database as well! https://docs.flowiseai.com/configuration/environment-variables#for-flowise-api-keys

HenryHengZJ avatar Sep 19 '24 17:09 HenryHengZJ

Sorry for commenting on closed topics - I've a side question: One of the comments (https://github.com/FlowiseAI/Flowise/issues/1858#issuecomment-1975325431 ) implies that we can run multiple pods of Flowise. The helm chart asks us not to change number of replicas from 1 to other number - If I back Flowise with postgres instead of sqlite, I'm wondering why cannot I run more replicas of Flowise pods? Does anyone have idea a about this?

prasanna4742 avatar Dec 11 '24 13:12 prasanna4742