nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Endpoint url error

Open nttg8100 opened this issue 2 years ago • 5 comments

Bug report

Nextflow file main.nf

#!/usr/bin/env nextflow
params.values = Channel.from(1)
process echoValue {
        publishDir "${params.outdir}/echoValue/", mode: 'copy'
        input:
        val value 
        output:
        path "*_echoValue.txt"
        script:
        """
        echo "Value: $value" > ${value}_echoValue.txt
        """
    }
workflow {
    echoValue(params.values)
}

nextflow.conf

aws {
    accessKey = "***"
    secretKey = "***"
    client {
        endpoint = 'https://s3-hcm-r1.s3cloud.vn'
        s3PathStyleAccess = true
    }
}

command

nextflow run main.nf --outdir s3://project_1

Versions: nextflow: 23.10.1 nf-amazon:2.1.4

Expected behavior and actual behavior

I expected that the file will be uploaded to the s3 bucket similar to the minio s3 server that I tested successfully on minio image with endpoint="http://localhost:9000".

I tested with the aws cli command that showed I have permission to put objects. The error showed that it failed to parse the endpoint with no additional information.

com.amazonaws.SdkClientException: Unable to execute HTTP request: s3.hcm.amazonaws.com

I think it is failed because the nextflow plugin nf-amazon 2.1.4 or its dependencies SDK provided by the AWS failed to parsing the endpoint.

Steps to reproduce the problem

I can not provide the endpoint url and its credentials with specific patterns like above.

Program output

N E X T F L O W  ~  version 23.10.1
Launching `tmp/main.nf` [jovial_boltzmann] DSL2 - revision: 7325aaaf63
executor >  local (1)
[98/0715e2] process > echoValue (1) [  0%] 0 of 1
ERROR ~ Error executing process > 'echoValue (1)'

Caused by:
  s3.hcm.amazonaws.com


 -- Check '.nextflow.log' file for details

Environment

  • Nextflow version: 23.10.1
  • Java version: 11.0.13 2021-10-19
  • Operating system: macOS
  • Bash version: zsh 5.9 (x86_64-apple-darwin23.0)

Additional context

Is there any documents for recompiling the plugin nf-amazon ? I tried to compile the nextflow after modification of the nf-amazon plugin but it created a build folder structure that is quite different from the [email protected] downloaded by nextflow. I cloned the nextflow repo using tag v23.10.1, then ran

make compile

nttg8100 avatar Mar 12 '24 10:03 nttg8100

Can be reproduced with a custom S3 endpoint hosted on Scaleway.

  • Nextflow version: 24.04.2.5914
  • Java version: 17.0.11 2024-04-16
  • Operating system: Linux Ubuntu 22.04.4 LTS
aws {
    accessKey = 'ACCESSKEY'
    secretKey = 'SECRETKEY'
    region = 'fr-par'
    client {
        endpoint = 'https://s3.fr-par.scw.cloud'
        protocol = 'https'
        s3PathStyleAccess = true
    }
}

rjb32 avatar Jun 19 '24 17:06 rjb32

This is very important to be able to use private S3 implementations in Europe when you are analyzing data from hospitals that are forbidden to use AWS services for GDPR and regulatory reasons.

rjb32 avatar Jun 20 '24 11:06 rjb32

The issue is that at some point something adds back the "amazonaws.com" suffix instead of using the custom S3 endpoint URI provided.

./launch.sh -trace nextflow run ../../hello.nf -work-dir s3://turing/test

The stack trace is as follows:

com.amazonaws.SdkClientException: Unable to execute HTTP request: s3.fr-par.amazonaws.com
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1219)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1165)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5558)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5505)
	at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:423)
	at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:6639)
	at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1892)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1852)
	at nextflow.cloud.aws.nio.S3Client.putObject(S3Client.java:209)
	at nextflow.cloud.aws.nio.S3FileSystemProvider.createDirectory(S3FileSystemProvider.java:492)
	at java.base/java.nio.file.Files.createDirectory(Files.java:700)
	at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:807)
	at java.base/java.nio.file.Files.createDirectories(Files.java:753)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.extension.FilesEx.mkdirs(FilesEx.groovy:493)
	at nextflow.Session.init(Session.groovy:406)
	at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:129)
	at nextflow.cli.CmdRun.run(CmdRun.groovy:372)
	at nextflow.cli.Launcher.run(Launcher.groovy:503)
	at nextflow.cli.Launcher.main(Launcher.groovy:657)
Caused by: java.net.UnknownHostException: s3.fr-par.amazonaws.com
	at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:801)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1385)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1306)
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
	at com.amazonaws.http.conn.$Proxy27.connect(Unknown Source)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
	... 25 common frames omitted

We can look at what's happening in nextflow.cloud.aws.nio.S3Client.putObject(S3Client.java:209) which is at the boundary of nextflow package and going into AWS SDK.

	public PutObjectResult putObject(String bucket, String keyName, InputStream inputStream, ObjectMetadata metadata, List<Tag> tags, String contentType) {
		PutObjectRequest req = new PutObjectRequest(bucket, keyName, inputStream, metadata);
		if( cannedAcl != null ) {
			req.withCannedAcl(cannedAcl);
		}
		if( tags != null && tags.size()>0 ) {
			req.setTagging(new ObjectTagging(tags));
		}
		if( kmsKeyId != null ) {
			req.withSSEAwsKeyManagementParams( new SSEAwsKeyManagementParams(kmsKeyId) );
		}
		if( storageEncryption!=null ) {
			metadata.setSSEAlgorithm(storageEncryption.toString());
		}
		if( contentType!=null ) {
			metadata.setContentType(contentType);
		}
		if( log.isTraceEnabled() ) {
			log.trace("S3 PutObject request {}", req);
		}
		return client.putObject(req);
	}

The exception is raised at the last line, by the call to the AWS SDK client.putObject(req). I did a little experiment to determine if the S3 client configuration is already wrong at this point, by trying a few calls to the S3 SDK:

	public PutObjectResult putObject(String bucket, String keyName, InputStream inputStream, ObjectMetadata metadata, List<Tag> tags, String contentType) {
		PutObjectRequest req = new PutObjectRequest(bucket, keyName, inputStream, metadata);
		if( cannedAcl != null ) {
			req.withCannedAcl(cannedAcl);
		}
		if( tags != null && tags.size()>0 ) {
			req.setTagging(new ObjectTagging(tags));
		}
		if( kmsKeyId != null ) {
			req.withSSEAwsKeyManagementParams( new SSEAwsKeyManagementParams(kmsKeyId) );
		}
		if( storageEncryption!=null ) {
			metadata.setSSEAlgorithm(storageEncryption.toString());
		}
		if( contentType!=null ) {
			metadata.setContentType(contentType);
		}
		if( log.isTraceEnabled() ) {
			log.trace("S3 PutObject request {}", req);
		}

                for (Bucket b : client.listBuckets()) {
                    System.out.println("bucket "+b.getName());
                }

		return client.putObject(req);
	}

We can see that the accessible buckets are in fact correctly listed on standard output:

bucket lucl
bucket martina
bucket maxime
bucket turing

So we can conclude that the AWS S3 client can actually access the custom S3 endpoint and gets correct answers in principle, at least enough to be able to list buckets. How strange! Let's try now to list objects inside a bucket, at the same point in the code.

for (S3ObjectSummary obj : client.listObjects("turing", "db").getObjectSummaries()) {
    System.out.println("object "+obj.getKey());
}

This time it fails and we get the exception raised inside AWS SDK coming from the listObjects method.

com.amazonaws.SdkClientException: Unable to execute HTTP request: s3.fr-par.amazonaws.com
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1219)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1165)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5558)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5505)
	at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:950)
	at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:915)
	at nextflow.cloud.aws.nio.S3Client.putObject(S3Client.java:210)
	at nextflow.cloud.aws.nio.S3FileSystemProvider.createDirectory(S3FileSystemProvider.java:492)
	at java.base/java.nio.file.Files.createDirectory(Files.java:700)
	at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:807)
	at java.base/java.nio.file.Files.createDirectories(Files.java:753)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.extension.FilesEx.mkdirs(FilesEx.groovy:493)
	at nextflow.Session.init(Session.groovy:406)
	at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:129)
	at nextflow.cli.CmdRun.run(CmdRun.groovy:372)
	at nextflow.cli.Launcher.run(Launcher.groovy:503)
	at nextflow.cli.Launcher.main(Launcher.groovy:657)
Caused by: java.net.UnknownHostException: s3.fr-par.amazonaws.com
	at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:801)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1385)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1306)
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
	at com.amazonaws.http.conn.$Proxy27.connect(Unknown Source)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
	... 23 common frames omitted

Conclusion

my guess at this point is that it is a bug inside AWS SDK because the S3 client appears to be well configured, with the right endpoint URI and works for just listing the buckets and for any query that does not attempt to read or write objects inside buckets. Some piece inside AWS SDK must overwrite parts of the endpoint URI for some reason.

Why not AWS SDK > 2?

Looking at the Gradle dependencies, question: is there a particular reason why Nextflow still uses AWS SDK 1.12.70 although it is clearly said that it is deprecated and we are at AWS SDK > 2 now?

rjb32 avatar Jun 20 '24 13:06 rjb32

@rjb32 thanks for the triage. SDK v2 is on our roadmap but we just haven't gotten to it yet. It is not a trivial change.

bentsherman avatar Jun 26 '24 13:06 bentsherman

See #4741

bentsherman avatar Jun 26 '24 13:06 bentsherman

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '25 06:04 stale[bot]

@rjb32 Nextflow is now using AWS Java SDK v2 as of 25.06.0-edge. Give it a try and let us know if this error still persists

tagging @jorgee for visibility

bentsherman avatar Jul 15 '25 00:07 bentsherman

It solved with my localstack s3 bucket. Thank you for your support @bentsherman. I will close it now

nttg8100 avatar Jul 15 '25 10:07 nttg8100