Add support for Microsoft Fabric
Microsoft Fabric is becoming more and more popular with our clients. And most of the time it's new projects or migrations. I.e., the perfect time to implement SQLMesh.
I know that there are some issues with their lakehouse entity and creating schemas / tables. And that's why dbt only work on their warehouse entity.
Anyway, I hope this is something you'd be willing to look into!
Hello and thanks for the suggestion. Unfortunately, Fabric seems to have a number of limitations relative to SQL Server, and some of those are used in core SQLMesh operations. If there is sufficient demand, it may be possible to add Fabric once those limitations are addressed.
@treysp which limitations do you think are critical? it seems like if the user doesn't do certain operations, we won't hit these limitations?
The limits on ALTER TABLE ADD/ALTER/DROP COLUMN were the main ones that concerned me.
But you're right that those limitations would be feature-specific (primarily on forward-only), so there's no inherent problem if we just tell people the affected features aren't supported.
Sorry for the premature closing @mattiasthalen!
@treysp No worries!
I'm curious though, are those there for the warehouse object as well? I was under the impression that it only was on the lakehouse.
Totally possible I'm misinterpreting that page - this text made me think it was platform-wide:
Applies to: ✅ SQL analytics endpoint and Warehouse in Microsoft Fabric
This article covers the T-SQL language syntax capabilities of Microsoft Fabric, when querying the SQL analytics endpoint or Warehouse.
@treysp you are absolutely right, I missed that part!
It sure does have it quirks, e.g., supporting UUID as a data type in the lakehouse/warehouse, but not in PowerBI. So much for consistency 😅
+1
our company has to use fabric, and if a connector for fabric is created we will use sqlmesh.
we're planning to work on this soon after the holidays
Update on this:
SQLMesh needs some fundamental constructs in order to work properly, namely - schemas, tables and views.
It turns out the Lakehouse entity in Fabric does not support schemas by default. However, if we enable the preview feature for schemas - it stops supporting views. This is documented as a limitation in the public preview of schemas.
We need schemas for the physical layer and views for the virtual layer, so SQLMesh cannot work with Lakehouse until Microsoft resolves this. According to their documentation:
They'll be resolved in the coming releases before General Availability.
So we can revisit this after schema-enabled Lakehouses becomes generally available and theoretically support views.
Of course, the Warehouse endpoints that are TSQL-only are unaffected.
I'm curious - is there any demand for Warehouse / TSQL or are people mostly wanting to use Lakehouse / Spark ?
IIRC, dbt uses Warehouse for this reason. And you can reference the lakehouse tables as sources (external table)...
So our workflow is that the bronze layer is the lakehouse, where the EL lands. Then we do all our silver/gold (transform) in the warehouse.
That's how I'd want to use SQLMesh as well.
And tbh, while being able to use a lakehouse would be nice, it's not a deal breaker. A deal breaker would be not to be able to use SQLMesh at all 😉
So to answer your question, (pretty) please proceed with supporting the warehouse endpoint! ❤️
+1 agree with @mattiasthalen above
We also would like to use SQLMesh with Warehouse. Lakehouse is a nice to have. Our workflow is also using Lakehouse as a landing zone, and Warehouse for transforms.
And for reference, you can target the lakehouse via the warehouse endpoint, it's "just" a different database.
E.g.: lakehouse__landing_zone.dbo.raw__table warehouse.silver.stg__table
+1 also agree with @mattiasthalen above.
we’ll be transitioning to fabric architecture for primary data products in the coming year, and would like to take advantage of sqlmesh across our multi-cloud enterprise. Will be fine to start with warehouse endpoint. As Microsoft irons out the limitations you mention @erindru , support for Lakehouse/Spark will be great long term!
+1 with @mattiasthalen here!
For our clients we offer either a Databricks or Fabric centric approach often with DBT. It would be great if SQLMesh would support the warehouse endpoint, as you advertise to solve some of the limitations/annoyances we faced with DBT.
Fabric (Microsoft/Azure) has quite a dominant presence here in the Netherlands for data engineering capabilities. I figure it would be an interesting userbase.
With regards to the lakehouse vs. warehouse discussion, ideally SQLMesh would support both eventually (when technically feasible). However, the warehouse endpoint is currently by far the most resource efficient transform solution within Fabric, and would always have my preference for cost reasons.
+1 for warehouse
our implementation is, lakehouse for bronze, warehouse for silver and gold
we could definitely live with the limitation
Any news yet? ☺️
@mattiasthalen this is not a priority for us at the moment. however, if you're interested, we do offer feature prioritization as a part of our enterprise / paid offering.
It recently dawned on me, the SQL Database object in Fabric is a regular mssql server. I've previously been talking about it in regards to storing state, but why not use it for the full warehouse?
Said and done, I spent the day to add pyodbc support to the mssql engine and have a draft PR for that.
I just tried it out in fabric, and it works!
+1 for Fabric Data Warehouse as well!
We can now use sqlmesh with the sql db object: https://github.com/TobikoData/sqlmesh/pull/4686
I've tried it with a warehouse object, but since it's not pure tsql, there are still some stuff to take care of. Now it's complaining about merge.
I'll probably have a look soonish.
@mattiasthalen @tobymao how about orchestrating sqlmesh with fabric python notebooks?
you could save state (sqlmesh/ducklake) in "sql database" and warehouse it in duckdb/ducklake
I can do %init successfully but %context fails at
Creating new DuckDB adapter for data files: {db.db} os._exit called with value 0. The interpreter will be restarted.
We can now use sqlmesh with the sql db object: https://github.com/TobikoData/sqlmesh/pull/4686
I've tried it with a warehouse object, but since it's not pure tsql, there are still some stuff to take care of. Now it's complaining about merge.
I'll probably have a look soonish.
For what it's worth, MERGE isn't currently supported with the Fabric Warehouse but it's estimated to land sometime in the next 3 months: Fabric Warehouse Roadmap
Linked to PR in #4751
We'd also consider Fabric for future implementations, though the sqlmesh component is more important in a warehouse/sql database context more than the lakehouse.
That said there are issues with sqlmesh compatibility with MS SQL, and Azure SQL, and therefore presumably Fabric DWH + Fabric SQL DB. Specifically relating to macro syntax compatibility (all) and multi-part object naming (Azure SQL).
We'd also consider Fabric for future implementations, though the sqlmesh component is more important in a warehouse/sql database context more than the lakehouse.
That said there are issues with sqlmesh compatibility with MS SQL, and Azure SQL, and therefore presumably Fabric DWH + Fabric SQL DB. Specifically relating to macro syntax compatibility (all) and multi-part object naming (Azure SQL).
Me and @fresioAS are working on an engine adapter specifically using he sql endpoint for the warehouse and have come quite far, just some integration tests that needs to pass. And those are mostly because we added some new config inputs (workspace id IIRC) and that's not setup in the CI env. But I think @erindru was to take a look next week or the week after.
I'm also vibe coding a spark adapter for the Lakehouse using the Livy API.
I'm curious, what are the macro issues? Could you perhaps log an issue and tag me in it?
The multipart thing is wonky in fabric. Db switching doesn't really work, MS recommends three part identifiers. bBut drop schema and create view doesn't support multi part identifiers. 🙈
Hey @mattiasthalen , I've linked you into https://github.com/TobikoData/sqlmesh/issues/5034 for more context on the macro syntax issue.
We're not in a rush for Fabric in production due to the numerous bugs and limitations we have encountered in PoC's, but believe these will be resolved in 6-12 months.
Awesome you are working towards it though, are you considering Fabric SQL DB as well or just Fabric DWH? In theory it should be the same end point, but the limitations/features of SQL DB may be different than DWH.
The SQL DB already works via mssql-odbc ☺️
Initial support has been merged in https://github.com/TobikoData/sqlmesh/pull/4751 and should make it to a release soon.
Thanks to @mattiasthalen and @fresioAS !