Add sql2dbx: LLM-powered SQL to Databricks notebook converter
Add sql2dbx tool to databrickslabs/sandbox
Overview
This PR adds sql2dbx to the databrickslabs/sandbox repository. sql2dbx is an automation tool designed to convert SQL files into Databricks notebooks. It leverages Large Language Models (LLMs) to perform the conversion based on system prompts tailored for various SQL dialects. sql2dbx consists of a series of Databricks notebooks.
Features
- Batch processing workflow for SQL file conversion
- Extensible prompt-based architecture for SQL dialect handling
- LLM-powered conversion with syntax validation
- Automatic error correction and cell splitting
- Direct output as ready-to-use Databricks notebook files (.py format)
- Support for multiple language models (Claude, Azure OpenAI, etc.)
Sample SQL Dialect Prompts
The tool includes sample YAML-based conversion prompts for:
- T-SQL (SQL Server, Azure Synapse)
- Oracle
- Teradata
- MySQL/MariaDB
- PostgreSQL
- Snowflake
- Redshift
- Netezza
Each prompt file contains a system message and few-shot examples tailored to the specific SQL dialect's syntax and semantics.
Documentation
The main notebook (00_main) serves as the entry point with documentation on the conversion workflow and instructions for creating custom dialect prompts or extending the existing samples.
@nakazax please sign your commits - PR can't be merged until this condition is fulfilled
Commits must have verified signatures.
@alexott Thanks for your comment. I've added a verified signature to the commit.
Hi @grusin-db @alexott This PR has been open for over a month now. I've also addressed the GPG verified signature requirement. Could you please let me know if there's anything else that needs to be done or if you have any feedback? Thanks!
Closing this PR to contribute to other repositories.