azure-databricks-client icon indicating copy to clipboard operation
azure-databricks-client copied to clipboard

Microsoft.Azure.Databricks.Client library on NuGet doesn't expose '/fs/files' API

Open VivendoByte opened this issue 1 year ago • 6 comments

Hello everybody!

I'm using this library from NuGet: https://www.nuget.org/packages/Microsoft.Azure.Databricks.Client/ because I need to connect and ingest data inside a Databricks service hosted on Azure.

In my particular case, I need to upload JSON file into a volume. According to this documentation: https://docs.databricks.com/api/workspace/files/upload I need to use the endpoint with PUT method: /api/2.0/fs/files{file_path}

It seems that this endpoint is not exposed into the latest version of Microsoft.Azure.Databricks.Client (currently 2.6.0). Am I wrong?

VivendoByte avatar Nov 05 '24 16:11 VivendoByte

At the moment, and just to make a test on my local machine, I downloaded the source code from GitHub, I added required new method on IDbfsApi interface + DbfsApiClient class implementation. And using the endpoint specified on the documentation, I'm able to upload file in the correct volume on Databricks. It seems strange to me that Microsoft.Azure.Databricks.Client on NuGet doesn't support this kind operation. Anyone can help me?

VivendoByte avatar Nov 05 '24 16:11 VivendoByte

Can you use DbfsApiClient.Upload to upload the file? Azure Databricks supports dbfs format for volume paths: dbfs:/Volumes/<catalog_identifier>/<schema_identifier>/<volume_identifier>/<path>/<file_name>

memoryz avatar Nov 06 '24 07:11 memoryz

Thanks for your reply @memoryz . I tried with DbfsApiClient.Upload method, but it doesn't works. I get this error:

Image

I'm using this format to specify the remote filename: var remoteFilename = "dbfs:/Volumes///<volume_name>/" + fi.Name;

This Upload method internally use the Create method, that build and use the endpoint: $"{ApiVersion}/dbfs/create". But I get the error above.

Just to make a test, I implemented a Create2 method:

public async Task<long> Create2(string path, bool overwrite, CancellationToken cancellationToken = default)
{
    var request = new { path, overwrite };
    var endpoint = $"/api/{ApiVersion}/fs/files{path}";
    var response = await HttpPut<dynamic, FileHandle>(this.HttpClient, endpoint, request, cancellationToken).ConfigureAwait(false);
    return response.Handle;
}

that use PUT method, on a different endpoint, following the documentation here: https://docs.databricks.com/api/workspace/files/upload. And using this I'm able to upload JSON file in my volume. Where is my mistake?

VivendoByte avatar Nov 06 '24 08:11 VivendoByte

I see. Maybe the DBFS API doesn't support volumes, given that volumes feature was released much later than DBFS. I'll see if I can setup an environment with catalog enabled and give it a try.

memoryz avatar Nov 06 '24 08:11 memoryz

I'm trying to push a new branch on this repo with my temporary solution on my issue, but it seems that I don't have grants/permissions. :-)

VivendoByte avatar Nov 06 '24 10:11 VivendoByte

Can you fork the repo and send a PR from your fork?

memoryz avatar Nov 07 '24 07:11 memoryz

Hello ! I stumbled upon the same needs discussed in this thread and started working on it, I opened a pull request (https://github.com/Azure/azure-databricks-client/pull/266) if you got time to check it out 🙂

Flavinou avatar May 13 '25 09:05 Flavinou