Add AgentOps project type and vector search data preparation workflows

Open veenaramesh opened this issue 2 months ago • 0 comments

Overview

This PR adds a new project type "AgentOps" to the existing MLOps Stacks template. Users can now select between two project types when initializing a stack:

mlops - Existing template; traditional ML pipeline for model training and batch inference
agentops - NEW template; agent-specific workflows for data ingestion + eventually agent development/deployment.

This PR also adds the vector search data ingestion pipeline for the AgentOps projects.

Features

AgentOps Template Updates

1. Project type selection

Added input_project_type parameter to databricks_template_schema.json
- Options: mlops (default) or agentops
- First-order parameter in template initialization
Updated minimum Databricks CLI version to v0.266.0 to support new features
Default project name now reflects selected project type: my_{{ .input_project_type }}_project
Other changes:
- Reordered parameters with input_project_type as order 1
- Updated all subsequent parameter orders
- Conditional parameter display (e.g., input_include_models_in_unity_catalog skipped for agentops)
- Updated default values to be project-type aware

2. Updating project structure layout

Added conditional logic to generate appropriate project structure based on input_project_type to update_layout.tmpl
- Ensures MLOps-specific files are only generated for MLOps projects
Added conditional logic to certain files:
Separate code structure sections for MLOps vs AgentOps, which conditionally renders based on input_project_type
- requirements.txt.tmpl
  - Adds dependencies (e.g. vector search SDK)
- README.md.tmpl
  - Adds basic documentation for agentops project
- databricks.yml.tmpl
  - Extends bundle configuration to support agentops resources
  - Adds data preparation workflow targets
- All CI/CD pipelines (more on this later)

3. Updating CI/CD workflows

Extended CI/CD pipelines to handle AgentOps projects and test the correct workflows:
- GitHub Actions (.github/workflows/{{.input_project_name}}-run-tests.yml.tmpl)
- Azure DevOps (.azure/devops-pipelines/{{.input_project_name}}-tests-ci.yml.tmpl)
- GitLab CI (.gitlab/pipelines/{{.input_project_name}}-bundle-ci.yml.tmpl)

Data preparation with vector search for AgentOps

1. Data preparation code

Notebook: DataIngestion.py.tmpl
- Processes raw documentation from data source URLs and stores data in UC
- uses utility function fetch_data.py.tmpl for retrieval
Notebook: DataPreprocessing.py.tmpl
- Cleans and chunks documentation to prepare for vector search
- uses utiltiy function create_chunk.py.tmpl for chunking logic
- define configs for chunking in config.py.tmpl
Notebook: VectorSearch.py.tmpl
- Creates Vector Search endpoint and index using delta sync (TRIGGERED mode)
- uses utility function vector_search_utils.py.tmpl for management + waiting for endpoint to be ready

2. Workflow resource configuration

Defined the data preparation workflow in data-preparation-resource.yml.tmpl, which includes each notebook as a separate task (sequential execution)
- Parameters for notebooks are given here
- Scheduled for running everyday in the morning 5am
- Severless environment and dependencies are also defined here
Included resource in list of resources in databricks.yml.tmpl

3. Defined variables in `databricks.yml`

Included variables (that will feed into data preparation workflow parameters)
- catalog_name
  - Defined uniquely for each deployment target using template input (e.g.databricks_staging_workspace_host)
- schema
  - Defined the same for each deployment target using template input input_schema_name
- raw_data_table
  - Will automatically populate as "raw_documentation"
- preprocessed_data_table
  - Will automatically populate as "databricks_documentation"
- eval_table
  - Will automatically populate as "databricks_documentation_eval"
- vector_search_endpoint
  - Will automatically populate as "ai_agent_endpoint"
- vector_search_index
  - Will automatically populate as "databricks_documentation_vs_index"

What I have tested:

Validated project generation for both mlops and agentops project types
Tested original mlops-stacks project + confirmed that the default behavior is unchanged
Validated data preparation pipeline works end-to-end
Validated that bundle variables are used properly by resources

Nov 19 '25 20:11 veenaramesh

Add AgentOps project type and vector search data preparation workflows

Overview

Features

AgentOps Template Updates

1. Project type selection

2. Updating project structure layout

3. Updating CI/CD workflows

Data preparation with vector search for AgentOps

1. Data preparation code

2. Workflow resource configuration

3. Defined variables in databricks.yml

What I have tested:

3. Defined variables in `databricks.yml`