data-formulator icon indicating copy to clipboard operation
data-formulator copied to clipboard

Deploying data formulator App

Open JohnTan38 opened this issue 4 months ago • 4 comments

Dear Community We'd like to deploy this excellent App to a cloud service (example Google Cloud Run for backend, Vercel for frontend). Guidance and instructions are welcome. Thank you.

[vercel.json](https://github.com/user-attachments/files/22358026/vercel.json)

Dockerfile

Stage 1: build frontend

FROM node:18 AS frontend-build WORKDIR /app/frontend

copy frontend package and source (project stores package.json inside Scripts/)

COPY Scripts/package.json Scripts/package-lock.json* ./ COPY src ./src COPY Scripts/public ./public COPY Scripts/vite.config.ts ./

RUN npm ci --silent RUN npm run build

Stage 2: build backend image

FROM python:3.11-slim WORKDIR /app

system deps needed for psycopg2 / building wheels

RUN apt-get update &&
apt-get install -y --no-install-recommends build-essential libpq-dev gcc &&
rm -rf /var/lib/apt/lists/*

install python deps

COPY Scripts/requirements.txt ./requirements.txt RUN pip install --no-cache-dir -r requirements.txt

copy backend source

COPY py-src/data_formulator ./data_formulator COPY py-src/pyproject.toml ./pyproject.toml

copy built frontend into Flask static directory (adjust if Flask static path differs)

RUN mkdir -p /app/data_formulator/static COPY --from=frontend-build /app/frontend/dist /app/data_formulator/static

ENV PORT=8080 EXPOSE 8080

run with gunicorn; adjust module path if your Flask app entry differs

CMD ["gunicorn", "--bind", "0.0.0.0:8080", "data_formulator.app:app", "--workers", "2", "--timeout", "120"]

param( [string]$PROJECT_ID = "spheric-crow-424609-t1", [string]$REGION = "asia-southeast1", [string]$SERVICE_NAME = "data-formulator", [string]$IMAGE = "", [string]$INSTANCE_CONNECTION_NAME = "spheric-crow-424609-t1:asia-southeast1:data-formulator" )

Check gcloud is available

if (-not (Get-Command gcloud -ErrorAction SilentlyContinue)) { Write-Error "gcloud CLI not found. Install Google Cloud SDK and ensure 'gcloud' is on PATH. See https://cloud.google.com/sdk/docs/install" exit 1 }

Check Docker is available

if (-not (Get-Command docker -ErrorAction SilentlyContinue)) { Write-Error "Docker CLI not found. Install Docker and ensure 'docker' is on PATH. See https://docs.docker.com/get-docker/" exit 1 }

prepare image variable

if (-not $IMAGE) { $IMAGE = "gcr.io/$PROJECT_ID/$SERVICE_NAME:latest" }

Write-Host "Project: $PROJECT_ID" Write-Host "Region: $REGION" Write-Host "Service: $SERVICE_NAME" Write-Host "Image: $IMAGE" Write-Host "Cloud SQL instance: $INSTANCE_CONNECTION_NAME" Write-Host ""

deploy-cloud-run.ps1

1. ensure gcloud is configured

gcloud auth login --quiet gcloud config set project $PROJECT_ID

2. enable required APIs

gcloud services enable run.googleapis.com cloudbuild.googleapis.com sqladmin.googleapis.com secretmanager.googleapis.com --project $PROJECT_ID

3. read DB password securely and store in Secret Manager

Write-Host "Enter the Postgres DB password (will be stored in Secret Manager 'data-formulator-db-password'):" $secure = Read-Host -AsSecureString

convert SecureString to plain (used only to create secret; avoid storing plain in files)

$ptr = [Runtime.InteropServices.Marshal]::SecureStringToBSTR($secure) $plain = [Runtime.InteropServices.Marshal]::PtrToStringAuto($ptr) [Runtime.InteropServices.Marshal]::ZeroFreeBSTR($ptr)

$tempFile = [System.IO.Path]::GetTempFileName() Set-Content -Path $tempFile -Value $plain -NoNewline Remove-Variable plain, ptr

create or add secret

$secretName = "data-formulator-db-password" $exists = & gcloud secrets describe $secretName --project $PROJECT_ID 2>$null if ($LASTEXITCODE -ne 0) { Write-Host "Creating secret $secretName in Secret Manager..." gcloud secrets create $secretName --replication-policy="automatic" --data-file="$tempFile" --project $PROJECT_ID } else { Write-Host "Adding new version to secret $secretName..." gcloud secrets versions add $secretName --data-file="$tempFile" --project $PROJECT_ID } Remove-Item $tempFile -Force

4. build and push container image using Cloud Build

Write-Host "Building container and submitting to Google Cloud Build..." gcloud builds submit --tag $IMAGE --project $PROJECT_ID

5. deploy to Cloud Run, attach Cloud SQL, and mount secret as env var

Write-Host "Deploying to Cloud Run..." gcloud run deploy $SERVICE_NAME --image $IMAGE --region $REGION --platform managed --allow-unauthenticated --add-cloudsql-instances $INSTANCE_CONNECTION_NAME --set-env-vars "DB_HOST=/cloudsql/$INSTANCE_CONNECTION_NAME,DB_USER=postgres,DB_DATABASE=postgres,EXTERNAL_DB=true" --set-secrets "DB_PASSWORD=$secretName:latest" --project $PROJECT_ID

6. grant service account access to Cloud SQL and Secret Manager

$serviceSa = gcloud run services describe $SERVICE_NAME --region $REGION --format="value(spec.template.spec.serviceAccountName)" --project $PROJECT_ID if ($serviceSa) { Write-Host "Granting roles/cloudsql.client and roles/secretmanager.secretAccessor to $serviceSa ..." gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:$serviceSa" --role="roles/cloudsql.client" --project $PROJECT_ID gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:$serviceSa" --role="roles/secretmanager.secretAccessor" --project $PROJECT_ID } else { Write-Host "Could not determine Cloud Run service account. Please grant roles/cloudsql.client and roles/secretmanager.secretAccessor to the Cloud Run service account manually." }

7. show service URL

$svcUrl = gcloud run services describe $SERVICE_NAME --region $REGION --format="value(status.url)" --project $PROJECT_ID Write-Host "" Write-Host "Deployment complete. Service URL: $svcUrl" Write-Host "Note: Ensure py-src/data_formulator/db_manager.py treats DB_HOST starting with '/cloudsql/' as a unix socket path (e.g. pass host='/cloudsql/INSTANCE_CONNECTION_NAME' to psycopg2)."

...existing code...

JohnTan38 avatar Sep 16 '25 07:09 JohnTan38

That would be very exciting! We have some scripts for deployment in Azure App Service, using gunicorn to deploy it. I believe we can adapt it to different web service.

# build frontend
yarn build

# build new deployment izp
zip -r dist/data-formulator.zip py-src/data_formulator api-keys.env pyproject.toml requirements.txt -x '.??*' node_modules/\* venv/\* __pycache__/\*

RESOURCE_GROUP_NAME= #AZURE RESOURCE GROUP NAME
APP_SERVICE_NAME= #AZURE WEB APP SERVICE NAME

az webapp config appsettings set \
    --resource-group $RESOURCE_GROUP_NAME \
    --name $APP_SERVICE_NAME \
    --settings \
    DISABLE_DATABASE="true" \
    SCM_DO_BUILD_DURING_DEPLOYMENT="true"

az webapp config set \
    --resource-group $RESOURCE_GROUP_NAME \
    --name $APP_SERVICE_NAME \
    --startup-file "pip install -r requirements.txt && pip install --force-reinstall . && gunicorn --bind=0.0.0.0 --timeout 600 --chdir py-src/data_formulator app:app"

az webapp deploy \
    --name $APP_SERVICE_NAME \
    --resource-group $RESOURCE_GROUP_NAME \
    --src-path dist/data-formulator.zip

Chenglong-MS avatar Sep 16 '25 19:09 Chenglong-MS

agent_routes.py sse_routes.py tables_routes.py

app.py

Dockerfile

Stage 1: build frontend

FROM node:18 AS frontend-build WORKDIR /app/frontend

package.json is in Scripts/

COPY Scripts/package.json Scripts/package-lock.json* ./ COPY package.json package-lock.json* ./

copy index.html (Vite expects it at project root)

COPY Scripts/index.html ./

frontend sources live in Scripts/src

COPY Scripts/src ./src COPY Scripts/public ./public COPY Scripts/vite.config.ts ./

Use npm ci when lockfile exists, otherwise fall back to npm install

#RUN bash -lc 'if [ -f package-lock.json ]; then npm ci --silent; else npm install --silent; fi' RUN npm install --legacy-peer-deps RUN npm run build

Stage 2: build backend image

FROM python:3.11-slim WORKDIR /app

system deps needed for psycopg2 / building wheels

RUN apt-get update &&
apt-get install -y --no-install-recommends build-essential libpq-dev gcc &&
rm -rf /var/lib/apt/lists/*

install python deps

COPY Scripts/requirements.txt ./requirements.txt RUN pip install --upgrade pip setuptools wheel RUN pip install --no-cache-dir -r requirements.txt

ensure gunicorn is available even if not in requirements

RUN pip install --no-cache-dir gunicorn

copy backend source

COPY Scripts/py-src/data_formulator ./data_formulator

copy built frontend into Flask static directory

RUN mkdir -p /app/data_formulator/static COPY --from=frontend-build /app/frontend/py-src/data_formulator/dist/ /app/data_formulator/static/

ENV PORT=8080 EXPOSE 8080

run with gunicorn; adjust module path if your Flask app entry differs

CMD ["gunicorn", "--bind", "0.0.0.0:8080", "data_formulator.app:app", "--workers", "2", "--timeout", "120"]

Build and Deploy to Cloud Run service

docker build -t gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest -f .\Dockerfile .

docker run --rm gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest python -c "import psycopg2; print('ok', psycopg2.version)"

docker tag gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest

docker push gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest

gcloud run deploy data-formulator --image gcr.io/GOOGLE_CLOUD_PROJECT_ID/data-formulator:latest --region REGION --platform managed --allow-unauthenticated --add-cloudsql-instances GOOGLE_CLOUD_PEOJECT_ID:REGION:data-formulator --set-env-vars DB_HOST=/cloudsql/GOOGLE_CLOUD_PROJECT_ID:REGION:data-formulator,DB_USER=postgres,DB_DATABASE=postgres,EXTERNAL_DB=true --set-secrets DB_PASSWORD=data-formulator-db-password:latest ` --project GOOGLE_CLOUD_PROJECT_ID

Google Cloud Run returns Service URL

Note: code modifications were made to handle exceptions with session_id

JohnTan38 avatar Sep 20 '25 11:09 JohnTan38

That would be very exciting! We have some scripts for deployment in Azure App Service, using gunicorn to deploy it. I believe we can adapt it to different web service.

# build frontend
yarn build

# build new deployment izp
zip -r dist/data-formulator.zip py-src/data_formulator api-keys.env pyproject.toml requirements.txt -x '.??*' node_modules/\* venv/\* __pycache__/\*

RESOURCE_GROUP_NAME= #AZURE RESOURCE GROUP NAME
APP_SERVICE_NAME= #AZURE WEB APP SERVICE NAME

az webapp config appsettings set \
    --resource-group $RESOURCE_GROUP_NAME \
    --name $APP_SERVICE_NAME \
    --settings \
    DISABLE_DATABASE="true" \
    SCM_DO_BUILD_DURING_DEPLOYMENT="true"

az webapp config set \
    --resource-group $RESOURCE_GROUP_NAME \
    --name $APP_SERVICE_NAME \
    --startup-file "pip install -r requirements.txt && pip install --force-reinstall . && gunicorn --bind=0.0.0.0 --timeout 600 --chdir py-src/data_formulator app:app"

az webapp deploy \
    --name $APP_SERVICE_NAME \
    --resource-group $RESOURCE_GROUP_NAME \
    --src-path dist/data-formulator.zip

Let me learn more about Azure App Service. Thank you.

JohnTan38 avatar Sep 20 '25 11:09 JohnTan38

I feel using either cloud service should be fine, whichever one is easier (and cost effective) :)

I'm wondering if we can get some open source project credit to host Data Formulator from some cloud provider.

Chenglong-MS avatar Sep 23 '25 21:09 Chenglong-MS