spline-spark-agent Support of setting the arangoDB name on the configuration

Hi I am using spline to capture lineage from Databricks notebooks I put on the cluseter - on the advanced settings

spark.spline.mode ENABLED
spark.spline.lineageDispatcher.http.producer.url http://10.0.19.4:8080/producer
spark.spline.lineageDispatcher http

since i have several customers- i dont want to keep the data of all of them on the same arangoDB so I want a way that the response will be kept on a db per customer.

can we send also the arangoDb name as a parameter so the execution plan lineage data will be kept on a different db for each cluster i use

thanks in advance

Dec 05 '23 07:12 zacayd

No, this isn't possible. The database is an internal part of the system and is not something you can easily select on a request basis.

My recommendation for your use-case would be to simply augment your execution plan and event objects with the DBR cluster name stored as an extra parameter, or a tag, and filter the stuff on the UI based on that (the feature beta is available in the develop version of the server and the UI).

Alternatively, you may augment the URIs for the input/output sources to include the cluster name as a part of the name. That is another way to logically separate the lineage data.

If you absolutely want to use different DBs then you can run separate Spline instances, put a custom proxy gateway in front of the Spline Producer REST API (or implement a custom LineageDispatcher wrapper) and route your requests to different Spline instances based on your custom conditions.

Dec 05 '23 16:12 wajda

About DBR cluster name stored as an extra parameter, or a tag, and filter the stuff on the UI based on that (the feature beta is available in the develop version of the server and the UI).

Do you mean that the name of the cluster is on the execution plan?

Dec 05 '23 17:12 zacayd

Does the feature Beta is available as a maven in the Databricks?

Dec 05 '23 17:12 zacayd

No. You need to build and install from the laters development branch.

Dec 06 '23 00:12 wajda

Any chance that it will be on the cloud of Databricks soon? since i have trouble to build and install it

Dec 06 '23 07:12 zacayd

no ETA unfortunately. The team has no capacity and the business priorities changed. So the project is on hold at the moment.

Dec 06 '23 15:12 wajda

Hi I succeeded to compile the project and create a Jar and load via the DBFS But seems that when i run the Notebook - i get lineage data but the info of the notebook is missing i took the branch of develop https://github.com/AbsaOSS/spline-spark-agent/tree/develop can you advise?

Jan 07 '24 07:01 zacayd