Microsoft.Azure.Databricks.Client library on NuGet doesn't expose '/fs/files' API
Hello everybody!
I'm using this library from NuGet: https://www.nuget.org/packages/Microsoft.Azure.Databricks.Client/ because I need to connect and ingest data inside a Databricks service hosted on Azure.
In my particular case, I need to upload JSON file into a volume. According to this documentation: https://docs.databricks.com/api/workspace/files/upload I need to use the endpoint with PUT method: /api/2.0/fs/files{file_path}
It seems that this endpoint is not exposed into the latest version of Microsoft.Azure.Databricks.Client (currently 2.6.0). Am I wrong?
At the moment, and just to make a test on my local machine, I downloaded the source code from GitHub, I added required new method on IDbfsApi interface + DbfsApiClient class implementation. And using the endpoint specified on the documentation, I'm able to upload file in the correct volume on Databricks. It seems strange to me that Microsoft.Azure.Databricks.Client on NuGet doesn't support this kind operation. Anyone can help me?
Can you use DbfsApiClient.Upload to upload the file? Azure Databricks supports dbfs format for volume paths:
dbfs:/Volumes/<catalog_identifier>/<schema_identifier>/<volume_identifier>/<path>/<file_name>
Thanks for your reply @memoryz . I tried with DbfsApiClient.Upload method, but it doesn't works. I get this error:
I'm using this format to specify the remote filename:
var remoteFilename = "dbfs:/Volumes/
This Upload method internally use the Create method, that build and use the endpoint: $"{ApiVersion}/dbfs/create". But I get the error above.
Just to make a test, I implemented a Create2 method:
public async Task<long> Create2(string path, bool overwrite, CancellationToken cancellationToken = default)
{
var request = new { path, overwrite };
var endpoint = $"/api/{ApiVersion}/fs/files{path}";
var response = await HttpPut<dynamic, FileHandle>(this.HttpClient, endpoint, request, cancellationToken).ConfigureAwait(false);
return response.Handle;
}
that use PUT method, on a different endpoint, following the documentation here: https://docs.databricks.com/api/workspace/files/upload. And using this I'm able to upload JSON file in my volume. Where is my mistake?
I see. Maybe the DBFS API doesn't support volumes, given that volumes feature was released much later than DBFS. I'll see if I can setup an environment with catalog enabled and give it a try.
I'm trying to push a new branch on this repo with my temporary solution on my issue, but it seems that I don't have grants/permissions. :-)
Can you fork the repo and send a PR from your fork?
Hello ! I stumbled upon the same needs discussed in this thread and started working on it, I opened a pull request (https://github.com/Azure/azure-databricks-client/pull/266) if you got time to check it out 🙂