pathway icon indicating copy to clipboard operation
pathway copied to clipboard

[Bug]: support for S3-compatible services other than AWS

Open 0xhanh opened this issue 1 year ago • 5 comments

Steps to reproduce

if let s3::Region::Custom { endpoint, .. } = &s3_settings.region {

if let s3::Region::Custom { endpoint, .. } = &s3_settings.region {
    if endpoint.starts_with("https://") || endpoint.starts_with("http://") {
        storage_options.insert("AWS_ENDPOINT_URL".to_string(), endpoint.to_string());
    } else {
        storage_options.insert(
            "AWS_ENDPOINT_URL".to_string(),
            format!("https://{endpoint}"),
        );
    }
    // TODO: add support for S3-compatible services other than AWS
    storage_options.insert("AWS_ALLOW_HTTP".to_string(), "True".to_string());
    storage_options.insert("AWS_STORAGE_ALLOW_HTTP".to_string(), "True".to_string());
} else {
    storage_options.insert("AWS_REGION".to_string(), s3_settings.region.to_string());
}

Relevant log output

No log

What did you expect to happen?

def read_results(): table = pw.io.deltalake.read( base_path + "timezone_unified", schema=InputStreamSchema, autocommit_duration_ms=100, s3_connection_settings=s3_connection_settings, ) pw.io.csv.write(table, "./results.csv") pw.run(monitoring_level=pw.MonitoringLevel.NONE)

Version

0.15.0

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

0xhanh avatar Oct 08 '24 08:10 0xhanh

Hi @0xhanh thanks for the report. While there is indeed a big internal TODO around variable name cleanup for internal variable names starting with AWS_..., have you checked that what you need actually does not work with non-AWS S3 setup, despite the misleading names? If so, could you specify the setup you are using?

Please see also:

  • https://pathway.com/developers/api-docs/pathway-io/deltalake -> s3_connection_settings (AwsS3Settings | MinIOSettings | WasabiS3Settings | DigitalOceanS3Settings | None) – Configuration for S3 credentials when using S3 storage
  • https://pathway.com/developers/api-docs/pathway-io/s3/ for a list of possible settings.

dxtrous avatar Oct 08 '24 09:10 dxtrous

I tested with a non-AWS setup, I used a local minio, I ran the example in examples/projects/kafka-alternatives/minio-ETL in the local-minio environment

--base.py

from dotenv import load_dotenv

import pathway as pw

load_dotenv()

bucket = "lab1"
base_path = f"s3://{bucket}/"
str_repr = "%Y-%m-%d %H:%M:%S.%f %z"

s3_connection_settings = pw.io.minio.MinIOSettings(
    bucket_name=bucket,
    access_key=os.environ["MINIO_S3_ACCESS_KEY"],
    secret_access_key=os.environ["MINIO_S3_SECRET_ACCESS_KEY"],
    endpoint="http://localhost:9090",
    region = "us-east-1",
)

custom_settings = pw.io.s3.AwsS3Settings(
    endpoint="http://localhost:9090",
    bucket_name=,
    access_key=os.environ["MINIO_S3_ACCESS_KEY"],
    secret_access_key=os.environ["MINIO_S3_SECRET_ACCESS_KEY"],
    with_path_style=True,
    region="us-east-1",
)
$ python read-results.py 
[2024-10-09T02:22:27]:INFO:Preparing Pathway computation
[2024-10-09T02:22:27]:INFO:Telemetry enabled
[2024-10-09T02:22:27]:WARNING:{"AWS_BUCKET_NAME": "lab1", "AWS_ACCESS_KEY_ID": "minio", "AWS_VIRTUAL_HOSTED_STYLE_REQUEST": "False", "AWS_S3_ALLOW_UNSAFE_RENAME": "True", "AWS_SECRET_ACCESS_KEY": "minio123", "AWS_ENDPOINT_URL": "http://localhost:9090"}
[2024-10-09T02:22:27]:INFO:Using Static credential provider
thread 'pathway:work-0' panicked at src/engine/report_error.rs:83:50:
OSError: Failed to connect to DeltaLake: Failed to read delta log object: Generic S3 error: Error after 0 retries in 3.668µs, max_retries:10, retry_timeout:180s, source:builder error for url (http://localhost:9090/lab1/timezone_unified/_delta_log/_last_checkpoint)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

0xhanh avatar Oct 09 '24 02:10 0xhanh

run within fixed venv

 python read-results.py 
[2024-10-09T09:34:50]:INFO:Preparing Pathway computation
[2024-10-09T09:34:50]:WARNING:{"AWS_ALLOW_HTTP": "True", "AWS_BUCKET_NAME": "lab1", "AWS_STORAGE_ALLOW_HTTP": "True", "AWS_ACCESS_KEY_ID": "minio", "AWS_VIRTUAL_HOSTED_STYLE_REQUEST": "False", "AWS_ENDPOINT_URL": "http://localhost:9090", "AWS_SECRET_ACCESS_KEY": "minio123", "AWS_S3_ALLOW_UNSAFE_RENAME": "True"}
[2024-10-09T09:34:50]:INFO:Using Static credential provider
[2024-10-09T09:34:50]:INFO:DeltaTableReader-0: 0 entries (1 minibatch(es)) have been sent to the engine
[2024-10-09T09:34:50]:INFO:FileWriter-0: Done writing 0 entries, time 1728441290884. Current batch writes took: 0 ms. All writes so far took: 0 ms.
[2024-10-09T09:34:55]:INFO:DeltaTableReader-0: 71 entries (51 minibatch(es)) have been sent to the engine
[2024-10-09T09:34:55]:INFO:FileWriter-0: Done writing 71 entries, time 1728441295984. Current batch writes took: 0 ms. All writes so far took: 0 ms.
[2024-10-09T09:35:01]:INFO:DeltaTableReader-0: 0 entries (51 minibatch(es)) have been sent to the engine
[2024-10-09T09:35:01]:INFO:FileWriter-0: Done writing 0 entries, time 1728441301084. Current batch writes took: 0 ms. All writes so far took: 0 ms.

0xhanh avatar Oct 09 '24 02:10 0xhanh

@0xhanh I am not sure how to read your second comment - did you manage to fix the issue? If so, how?

dxtrous avatar Oct 09 '24 10:10 dxtrous

I tried fixing it, it seems ok, comment at the top see

0xhanh avatar Oct 10 '24 02:10 0xhanh

Hello @0xhanh Please note that the PR supporting this change has been merged for a while. Please let us know if there are any other improvements required.

zxqfd555 avatar Feb 17 '25 12:02 zxqfd555