modeldb icon indicating copy to clipboard operation
modeldb copied to clipboard

Minio support for ModelDB

Open Atharex opened this issue 5 years ago • 24 comments

Can the S3 storage adapter support a Minio backend?

Atharex avatar Jun 25 '20 07:06 Atharex

Hi, @Atharex!

Currently the artifacts go directly to S3 via signed URLs. To my knowledge, Minio supports such calls, so it should work out of the box, but we have never tested against it. Are you getting some specific error? Maybe we can help figure out what's going on.

conradoverta avatar Jun 26 '20 00:06 conradoverta

Hi, @conradoverta!

Probably there are not many changes needed for it. Could be I'm missing something in the configuration or there is no capability yet to specify a custom endpoint in the S3 configuration (like a local Minio installation).

I've got the S3 artifact store type in my config.yaml configured like this:

artifactStoreConfig:
  artifactStoreType: S3
  S3:
    cloudAccessKey: {{ minio_access_key }}
    cloudSecretKey: {{ minio_secret_key }}
    cloudBucketName: {{ modeldb_minio_bucket }}

And I get the following error: error: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 83474DE39F314335; S3 Extended Request ID: wq4SSxhJMqBpyR+TgtoK3TCRLXylajG+x7iuCuOoOS8RP6XJIU5UI1WzViU9u8WR06qb054PWn8=)

So it seems that ModelDB tries to use those credentials to save the data into AWS, instead of my local Minio installation. Is there a way to configure the endpoint for the S3 calls?

Atharex avatar Jun 26 '20 06:06 Atharex

Oh, that is a fair point. I don't think we have any configuration for the custom endpoint. It should be easy to add a configuration and pass it around, but we don't have a Minio setup currently to test.

Would you be willing to contribute a PR with that new configuration? We'd be happy to point you to useful information for this. Otherwise, I need to discuss with the team and put this in one of our coming sprints.

conradoverta avatar Jun 26 '20 16:06 conradoverta

OK, I guess I could give it a try :)

Send me the information you have and I'll see what I can do.

Atharex avatar Jun 26 '20 17:06 Atharex

@Atharex : I believe modifying https://github.com/VertaAI/modeldb/blob/master/backend/src/main/java/ai/verta/modeldb/artifactStore/storageservice/S3Service.java#L34-L51 should get you unblocked. If it does n't, it will be helpful for me if you can share a few more lines from the stack trace.

ravishetye avatar Jun 26 '20 18:06 ravishetye

@ravishetye @conradoverta

I've started from where you pointed me out and I got a working example up and running for my Minio installation. I was able to log datasets into Minio successfully with it. Now I also opened the pull request (#889) with my proposed changes.

The changes also support setting the config:

      artifactStoreType: S3
      S3:
        cloudAccessKey: {{ minio_access }}
        cloudSecretKey: {{ minio_secret }}
        cloudBucketName: {{ modeldb_minio_bucket }}
        minioEndpoint: {{ minio_endpoint }}

Atharex avatar Jun 27 '20 21:06 Atharex

Awesome! That was fast =) We'll take a look tomorrow.

conradoverta avatar Jun 28 '20 18:06 conradoverta

Thanks @Atharex for the request and the fix. Could you close the ticket if things are functional for you.

ravishetye avatar Jul 02 '20 01:07 ravishetye

My pleasure @ravishetye :)

I would rather keep this ticket still open, as the support is not yet 100% (because of the still needed changes in the DB artifact storage path). You can show me where the changes should be made, but I cannot guarantee I will have time for another pull request in the near future :/

Atharex avatar Jul 02 '20 06:07 Atharex

@ravishetye I got some time to take another look at this. Can someone from your side point out to me the code, which is creating the frontend links?

Atharex avatar Sep 24 '20 10:09 Atharex

@ravishetye I see you guys are doing loads of refactoring on the codebase. I presume you are planning for a new release, where Minio support will already be completed by someone from your side?

Atharex avatar Oct 24 '20 17:10 Atharex

Hi, @Atharex! Could you clarify what you mean by links? I might be missing something here.

conradoverta avatar Oct 26 '20 20:10 conradoverta

Might have been misled... I thought the DB stores direct links to the artifacts, which the frontend uses for downloads. I've tried a build directly from the master branch now to try and debug my problem.

I install ModelDB with this config:

    artifactStoreConfig:
      artifactStoreType: S3
      S3:
        cloudAccessKey: [my-access-key]
        cloudSecretKey: [my-secret-key]
        cloudBucketName: modeldb-bucket
        minioEndpoint: http://minio-storage.minio.svc.cluster.local:9000

Then I followed this example: https://github.com/VertaAI/modeldb/blob/master/client/workflows/demos/census-end-to-end-local-data-example.ipynb

This is my postgres DB output when I tried your latest modeldb version (initially thought the column artifacts stores the full S3 signed URLs of the artifacts).

select * from artifact;

10 |             4 | ExperimentRunEntity | artifacts  | json               | model_api.json   |                                      | 0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json     | f         |      
         | 75f807db-c6fc-462d-9438-e39f0b0d7ee0 |            | s3://modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json     | 7763c9d7-be7c-4b36-be09-3c1a40e68537 | t
  4 |             4 | ExperimentRunEntity | artifacts  | zip                | custom_modules   |                                      | 5f95561f29a9f81f637fa50237d3729542b45c76ac47018b56dbfb16b277b37c/custom_modules.zip | f         |      
         | c2d12f87-2529-45be-bb8b-84828b4f35d1 |            | s3://modeldb-bucket/5f95561f29a9f81f637fa50237d3729542b45c76ac47018b56dbfb16b277b37c/custom_modules.zip | 61a7315e-d608-4e42-aab2-a207954fdb6f | t
...

The URL request (seen in the network analyzer of the browser) when I click on the download artifact button in the ModelDB web UI seems correct: GET http://minio-storage.minio.svc.cluster.local:9000/modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20201102T103803Z&X-Amz-SignedHeaders=host&X-Amz-Expires=299&X-Amz-Credential=[my-credential]/20201102/us-east-1/s3/aws4_request&X-Amz-Signature=8e1ce4a94757d3d9d4a40be37629cca4a791c882125e78a75483bc0ce3224b33

When I look up my local Minio instance, I see the artifacts correctly stored there and I can download them directly: [my-minio-url]/modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json

Even "docker exec-ing" into the backend container and fetching the artifact links from there works. But somehow when I try to download that same file from the web UI I get an error message:

b1edc8f80de6c050e00debb2e3b401f15bec77650351f433923f61a85490a34c/custom_modules.zip
Error in downloading file: Something went wrong!

The webapp log seems fine...

/api/v1/modeldb/experiment-run/getUrlForArtifact
Requesting /api/v1/modeldb/experiment-run/getUrlForArtifact
Returning 200 OK; 433b sent

Also the modeldb-backend logs don't look suspicious

{"thread":"grpc-default-executor-6","level":"INFO","loggerName":"ai.verta.modeldb.ModelDBAuthInterceptor","message":"methodName: ai.verta.modeldb.ExperimentRunService/getUrlForArtifact","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1604316775,"nanoOfSecond":195000000},"threadId":455,"threadPriority":5,"hostName":"modeldb-backend-0","kubernetes.podIP":""}
{"thread":"grpc-default-executor-6","level":"DEBUG","loggerName":"ai.verta.modeldb.experimentRun.ExperimentRunDAORdbImpl","message":"Got ProjectId by ExperimentRunId ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1604316775,"nanoOfSecond":215000000},"threadId":455,"threadPriority":5,"hostName":"modeldb-backend-0","kubernetes.podIP":""}

But now I'm out of ideas how to further investigate... Where I can get more debug information? Why would only the frontend get problems downloading the artifact, when all other approaches work?

Atharex avatar Nov 02 '20 11:11 Atharex

Is http://minio-storage.minio.svc.cluster.local:9000 the same as [my-minio-url]?

My current suspicion is that you have different DNS resolution for things running in the cluster than when you access from your other machine. What happens is that the webapp tries to fetch the URL http://minio-storage.minio.svc.cluster.local:9000/... since that's the URL that ModelDB is aware of.

Could you verify if you can resolve that hostname? You can usually do dig minio-storage.minio.svc.cluster.local or ping minio-storage.minio.svc.cluster.local, depending on your setup.

conradoverta avatar Nov 02 '20 16:11 conradoverta

No [my-minio-url] is not http://minio-storage.minio.svc.cluster.local:9000 That is the URL to the web UI of my minio instance, which is reachable outside of my kubernetes cluster.

Though that external URL should not be used by ModelDB at all, since all of it's traffic is happening inside of the kubernetes cluster, where it has access to the http://minio-storage.minio.svc.cluster.local:9000 service (I presume this config at installation time is used by both backend and frontend services). Also as I mentioned, if I go into the model-backend container and download the generated URL of the artifact, it works fine and also DNS resolution inside that container with nslookup minio-storage.minio.svc.cluster.local works correctly.

Atharex avatar Nov 02 '20 16:11 Atharex

The problem here seems to be that ModelDB and your browser are seeing different hostnames for the same system. So when ModelDB asks minio for the link to the artifact, the link comes back with ModelDB's hostname perspective. When the backend sends to the webapp, the webapp tries to make the request and it fails because it's a different name.

Would you mind configuring ModelDB to use the same hostname you use internally?

conradoverta avatar Nov 02 '20 20:11 conradoverta

Aha, I see your point!

I thought that GET request I see in the traffic analyzer happens on the web app side, (the web app transfers the file from the artifact storage and then let's me download that cached copy), but it actually gives me a direct link to the storage from it's internally resolved DNS address http://minio-storage.minio.svc.cluster.local:9000

where on the user side I want the externally defined DNS address: https://minio.my-own-domain.net

Got confused because deleting an artifact did not throw an error (later realized it's because the webapp invokes it's REST API to perform the step (e.g. /api/v1/modeldb/experiment-run/deleteArtifact {"id":"8c248b70-f001-452e-8ed0-9d3616eb4e81","key":"model_api.json"})

With this it deletes the entry from ModelDB, but leaves the artifact in MinIO intact (guess that is so by design also with other artifact stores? Or should the delete also happen inside the store?)

I guess some URL rewriting would need to take place to correctly resolve address handling on the web UI for this particular use-case (an external storage service, which has both an internal (cluster) and external (ingress) DNS name). Maybe an optional "AlternativeStoreURL" parameter supplied in the ModelDB configuration file to rewrite the generated links on the webapp side?

Just a thought... Not sure how other projects handle similar situations. Configuring ModelDB to the external name might not be easy, as there is a port in the internal service name and I would not be able to CNAME an external entry onto an internal address with a port, if I reconfigured my internal kubernetes DNS resolver.

Atharex avatar Nov 03 '20 19:11 Atharex

We use the direct link because it's usually much faster (since their services are built for big downloads and uploads). I think adding an alternative base makes sense to me to simplify the process. Usually we handle this by adding the CNAME entries in the right place, but it might be a high barrier to use.

If we pointed you to the right places for the change, would you be willing to contribute a PR with support for this feature? It would be greatly appreciated!

conradoverta avatar Nov 03 '20 21:11 conradoverta

Sure, I'd go for it! This feature would help me out nicely.

Atharex avatar Nov 04 '20 07:11 Atharex

Great!

@ad-47 @ravishetye could you share some pointers on how we could add a config field AlternativeStoreURL that would replace the base url for artifacts? The context is that the user browser and ModelDB need to see different hostnames for the minio endpoint.

conradoverta avatar Nov 04 '20 16:11 conradoverta

@Atharex Would setting the minio endpoint to https://minio.my-own-domain.net work and not require more code change?

ravishetye avatar Nov 04 '20 19:11 ravishetye

Sadly no. There is a port in my service name and I cannot get DNS to resolve https://minio.my-own-domain.net to the internal address http://minio-storage.minio.svc.cluster.local:9000. The ingress controller also does not enable me to rewrite response URLs (only request URLs), so having this as an optional configuration step would be easiest to solve the problem.

Atharex avatar Nov 06 '20 09:11 Atharex

The challenge that Ravi correctly pointed out when I discussed this with him is that MDB would always use that alternative URL, even if the client was running inside the cluster. Would that be an issue for you?

conradoverta avatar Nov 08 '20 18:11 conradoverta

Would be cool if ModelDB team created an example for Minio so future users can just refer to the example

samru-rai avatar Nov 26 '20 13:11 samru-rai