Deploying data formulator App
Dear Community We'd like to deploy this excellent App to a cloud service (example Google Cloud Run for backend, Vercel for frontend). Guidance and instructions are welcome. Thank you.
[vercel.json](https://github.com/user-attachments/files/22358026/vercel.json)
Dockerfile
Stage 1: build frontend
FROM node:18 AS frontend-build WORKDIR /app/frontend
copy frontend package and source (project stores package.json inside Scripts/)
COPY Scripts/package.json Scripts/package-lock.json* ./ COPY src ./src COPY Scripts/public ./public COPY Scripts/vite.config.ts ./
RUN npm ci --silent RUN npm run build
Stage 2: build backend image
FROM python:3.11-slim WORKDIR /app
system deps needed for psycopg2 / building wheels
RUN apt-get update &&
apt-get install -y --no-install-recommends build-essential libpq-dev gcc &&
rm -rf /var/lib/apt/lists/*
install python deps
COPY Scripts/requirements.txt ./requirements.txt RUN pip install --no-cache-dir -r requirements.txt
copy backend source
COPY py-src/data_formulator ./data_formulator COPY py-src/pyproject.toml ./pyproject.toml
copy built frontend into Flask static directory (adjust if Flask static path differs)
RUN mkdir -p /app/data_formulator/static COPY --from=frontend-build /app/frontend/dist /app/data_formulator/static
ENV PORT=8080 EXPOSE 8080
run with gunicorn; adjust module path if your Flask app entry differs
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "data_formulator.app:app", "--workers", "2", "--timeout", "120"]
param( [string]$PROJECT_ID = "spheric-crow-424609-t1", [string]$REGION = "asia-southeast1", [string]$SERVICE_NAME = "data-formulator", [string]$IMAGE = "", [string]$INSTANCE_CONNECTION_NAME = "spheric-crow-424609-t1:asia-southeast1:data-formulator" )
Check gcloud is available
if (-not (Get-Command gcloud -ErrorAction SilentlyContinue)) { Write-Error "gcloud CLI not found. Install Google Cloud SDK and ensure 'gcloud' is on PATH. See https://cloud.google.com/sdk/docs/install" exit 1 }
Check Docker is available
if (-not (Get-Command docker -ErrorAction SilentlyContinue)) { Write-Error "Docker CLI not found. Install Docker and ensure 'docker' is on PATH. See https://docs.docker.com/get-docker/" exit 1 }
prepare image variable
if (-not $IMAGE) { $IMAGE = "gcr.io/$PROJECT_ID/$SERVICE_NAME:latest" }
Write-Host "Project: $PROJECT_ID" Write-Host "Region: $REGION" Write-Host "Service: $SERVICE_NAME" Write-Host "Image: $IMAGE" Write-Host "Cloud SQL instance: $INSTANCE_CONNECTION_NAME" Write-Host ""
deploy-cloud-run.ps1
1. ensure gcloud is configured
gcloud auth login --quiet gcloud config set project $PROJECT_ID
2. enable required APIs
gcloud services enable run.googleapis.com cloudbuild.googleapis.com sqladmin.googleapis.com secretmanager.googleapis.com --project $PROJECT_ID
3. read DB password securely and store in Secret Manager
Write-Host "Enter the Postgres DB password (will be stored in Secret Manager 'data-formulator-db-password'):" $secure = Read-Host -AsSecureString
convert SecureString to plain (used only to create secret; avoid storing plain in files)
$ptr = [Runtime.InteropServices.Marshal]::SecureStringToBSTR($secure) $plain = [Runtime.InteropServices.Marshal]::PtrToStringAuto($ptr) [Runtime.InteropServices.Marshal]::ZeroFreeBSTR($ptr)
$tempFile = [System.IO.Path]::GetTempFileName() Set-Content -Path $tempFile -Value $plain -NoNewline Remove-Variable plain, ptr
create or add secret
$secretName = "data-formulator-db-password" $exists = & gcloud secrets describe $secretName --project $PROJECT_ID 2>$null if ($LASTEXITCODE -ne 0) { Write-Host "Creating secret $secretName in Secret Manager..." gcloud secrets create $secretName --replication-policy="automatic" --data-file="$tempFile" --project $PROJECT_ID } else { Write-Host "Adding new version to secret $secretName..." gcloud secrets versions add $secretName --data-file="$tempFile" --project $PROJECT_ID } Remove-Item $tempFile -Force
4. build and push container image using Cloud Build
Write-Host "Building container and submitting to Google Cloud Build..." gcloud builds submit --tag $IMAGE --project $PROJECT_ID
5. deploy to Cloud Run, attach Cloud SQL, and mount secret as env var
Write-Host "Deploying to Cloud Run..."
gcloud run deploy $SERVICE_NAME --image $IMAGE
--region $REGION --platform managed
--allow-unauthenticated --add-cloudsql-instances $INSTANCE_CONNECTION_NAME
--set-env-vars "DB_HOST=/cloudsql/$INSTANCE_CONNECTION_NAME,DB_USER=postgres,DB_DATABASE=postgres,EXTERNAL_DB=true" --set-secrets "DB_PASSWORD=$secretName:latest"
--project $PROJECT_ID
6. grant service account access to Cloud SQL and Secret Manager
$serviceSa = gcloud run services describe $SERVICE_NAME --region $REGION --format="value(spec.template.spec.serviceAccountName)" --project $PROJECT_ID if ($serviceSa) { Write-Host "Granting roles/cloudsql.client and roles/secretmanager.secretAccessor to $serviceSa ..." gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:$serviceSa" --role="roles/cloudsql.client" --project $PROJECT_ID gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:$serviceSa" --role="roles/secretmanager.secretAccessor" --project $PROJECT_ID } else { Write-Host "Could not determine Cloud Run service account. Please grant roles/cloudsql.client and roles/secretmanager.secretAccessor to the Cloud Run service account manually." }
7. show service URL
$svcUrl = gcloud run services describe $SERVICE_NAME --region $REGION --format="value(status.url)" --project $PROJECT_ID Write-Host "" Write-Host "Deployment complete. Service URL: $svcUrl" Write-Host "Note: Ensure py-src/data_formulator/db_manager.py treats DB_HOST starting with '/cloudsql/' as a unix socket path (e.g. pass host='/cloudsql/INSTANCE_CONNECTION_NAME' to psycopg2)."
...existing code...
That would be very exciting! We have some scripts for deployment in Azure App Service, using gunicorn to deploy it. I believe we can adapt it to different web service.
# build frontend
yarn build
# build new deployment izp
zip -r dist/data-formulator.zip py-src/data_formulator api-keys.env pyproject.toml requirements.txt -x '.??*' node_modules/\* venv/\* __pycache__/\*
RESOURCE_GROUP_NAME= #AZURE RESOURCE GROUP NAME
APP_SERVICE_NAME= #AZURE WEB APP SERVICE NAME
az webapp config appsettings set \
--resource-group $RESOURCE_GROUP_NAME \
--name $APP_SERVICE_NAME \
--settings \
DISABLE_DATABASE="true" \
SCM_DO_BUILD_DURING_DEPLOYMENT="true"
az webapp config set \
--resource-group $RESOURCE_GROUP_NAME \
--name $APP_SERVICE_NAME \
--startup-file "pip install -r requirements.txt && pip install --force-reinstall . && gunicorn --bind=0.0.0.0 --timeout 600 --chdir py-src/data_formulator app:app"
az webapp deploy \
--name $APP_SERVICE_NAME \
--resource-group $RESOURCE_GROUP_NAME \
--src-path dist/data-formulator.zip
agent_routes.py sse_routes.py tables_routes.py
Dockerfile
Stage 1: build frontend
FROM node:18 AS frontend-build WORKDIR /app/frontend
package.json is in Scripts/
COPY Scripts/package.json Scripts/package-lock.json* ./ COPY package.json package-lock.json* ./
copy index.html (Vite expects it at project root)
COPY Scripts/index.html ./
frontend sources live in Scripts/src
COPY Scripts/src ./src COPY Scripts/public ./public COPY Scripts/vite.config.ts ./
Use npm ci when lockfile exists, otherwise fall back to npm install
#RUN bash -lc 'if [ -f package-lock.json ]; then npm ci --silent; else npm install --silent; fi' RUN npm install --legacy-peer-deps RUN npm run build
Stage 2: build backend image
FROM python:3.11-slim WORKDIR /app
system deps needed for psycopg2 / building wheels
RUN apt-get update &&
apt-get install -y --no-install-recommends build-essential libpq-dev gcc &&
rm -rf /var/lib/apt/lists/*
install python deps
COPY Scripts/requirements.txt ./requirements.txt RUN pip install --upgrade pip setuptools wheel RUN pip install --no-cache-dir -r requirements.txt
ensure gunicorn is available even if not in requirements
RUN pip install --no-cache-dir gunicorn
copy backend source
COPY Scripts/py-src/data_formulator ./data_formulator
copy built frontend into Flask static directory
RUN mkdir -p /app/data_formulator/static COPY --from=frontend-build /app/frontend/py-src/data_formulator/dist/ /app/data_formulator/static/
ENV PORT=8080 EXPOSE 8080
run with gunicorn; adjust module path if your Flask app entry differs
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "data_formulator.app:app", "--workers", "2", "--timeout", "120"]
Build and Deploy to Cloud Run service
docker build -t gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest -f .\Dockerfile .
docker run --rm gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest python -c "import psycopg2; print('ok', psycopg2.version)"
docker tag gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest
docker push gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest
gcloud run deploy data-formulator --image gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest
--region REGION --platform managed --allow-unauthenticated
--add-cloudsql-instances GOOGLE_CLOUD_PEOJECT_ID:REGION:data-formulator --set-env-vars DB_HOST=/cloudsql/GOOGLE_CLOUD_PROJECT_ID:REGION:data-formulator,DB_USER=postgres,DB_DATABASE=postgres,EXTERNAL_DB=true
--set-secrets DB_PASSWORD=data-formulator-db-password:latest `
--project GOOGLE_CLOUD_PROJECT_ID
Google Cloud Run returns Service URL
Note: code modifications were made to handle exceptions with session_id
That would be very exciting! We have some scripts for deployment in Azure App Service, using gunicorn to deploy it. I believe we can adapt it to different web service.
# build frontend yarn build # build new deployment izp zip -r dist/data-formulator.zip py-src/data_formulator api-keys.env pyproject.toml requirements.txt -x '.??*' node_modules/\* venv/\* __pycache__/\* RESOURCE_GROUP_NAME= #AZURE RESOURCE GROUP NAME APP_SERVICE_NAME= #AZURE WEB APP SERVICE NAME az webapp config appsettings set \ --resource-group $RESOURCE_GROUP_NAME \ --name $APP_SERVICE_NAME \ --settings \ DISABLE_DATABASE="true" \ SCM_DO_BUILD_DURING_DEPLOYMENT="true" az webapp config set \ --resource-group $RESOURCE_GROUP_NAME \ --name $APP_SERVICE_NAME \ --startup-file "pip install -r requirements.txt && pip install --force-reinstall . && gunicorn --bind=0.0.0.0 --timeout 600 --chdir py-src/data_formulator app:app" az webapp deploy \ --name $APP_SERVICE_NAME \ --resource-group $RESOURCE_GROUP_NAME \ --src-path dist/data-formulator.zip
Let me learn more about Azure App Service. Thank you.
I feel using either cloud service should be fine, whichever one is easier (and cost effective) :)
I'm wondering if we can get some open source project credit to host Data Formulator from some cloud provider.