[Bug]: support for S3-compatible services other than AWS
Steps to reproduce
if let s3::Region::Custom { endpoint, .. } = &s3_settings.region {
if let s3::Region::Custom { endpoint, .. } = &s3_settings.region {
if endpoint.starts_with("https://") || endpoint.starts_with("http://") {
storage_options.insert("AWS_ENDPOINT_URL".to_string(), endpoint.to_string());
} else {
storage_options.insert(
"AWS_ENDPOINT_URL".to_string(),
format!("https://{endpoint}"),
);
}
// TODO: add support for S3-compatible services other than AWS
storage_options.insert("AWS_ALLOW_HTTP".to_string(), "True".to_string());
storage_options.insert("AWS_STORAGE_ALLOW_HTTP".to_string(), "True".to_string());
} else {
storage_options.insert("AWS_REGION".to_string(), s3_settings.region.to_string());
}
Relevant log output
No log
What did you expect to happen?
def read_results(): table = pw.io.deltalake.read( base_path + "timezone_unified", schema=InputStreamSchema, autocommit_duration_ms=100, s3_connection_settings=s3_connection_settings, ) pw.io.csv.write(table, "./results.csv") pw.run(monitoring_level=pw.MonitoringLevel.NONE)
Version
0.15.0
Docker Versions (if used)
No response
OS
Linux
On which CPU architecture did you run Pathway?
x86-64
Hi @0xhanh thanks for the report. While there is indeed a big internal TODO around variable name cleanup for internal variable names starting with AWS_..., have you checked that what you need actually does not work with non-AWS S3 setup, despite the misleading names? If so, could you specify the setup you are using?
Please see also:
- https://pathway.com/developers/api-docs/pathway-io/deltalake -> s3_connection_settings (AwsS3Settings | MinIOSettings | WasabiS3Settings | DigitalOceanS3Settings | None) – Configuration for S3 credentials when using S3 storage
- https://pathway.com/developers/api-docs/pathway-io/s3/ for a list of possible settings.
I tested with a non-AWS setup, I used a local minio, I ran the example in examples/projects/kafka-alternatives/minio-ETL in the local-minio environment
--base.py
from dotenv import load_dotenv
import pathway as pw
load_dotenv()
bucket = "lab1"
base_path = f"s3://{bucket}/"
str_repr = "%Y-%m-%d %H:%M:%S.%f %z"
s3_connection_settings = pw.io.minio.MinIOSettings(
bucket_name=bucket,
access_key=os.environ["MINIO_S3_ACCESS_KEY"],
secret_access_key=os.environ["MINIO_S3_SECRET_ACCESS_KEY"],
endpoint="http://localhost:9090",
region = "us-east-1",
)
custom_settings = pw.io.s3.AwsS3Settings(
endpoint="http://localhost:9090",
bucket_name=,
access_key=os.environ["MINIO_S3_ACCESS_KEY"],
secret_access_key=os.environ["MINIO_S3_SECRET_ACCESS_KEY"],
with_path_style=True,
region="us-east-1",
)
$ python read-results.py
[2024-10-09T02:22:27]:INFO:Preparing Pathway computation
[2024-10-09T02:22:27]:INFO:Telemetry enabled
[2024-10-09T02:22:27]:WARNING:{"AWS_BUCKET_NAME": "lab1", "AWS_ACCESS_KEY_ID": "minio", "AWS_VIRTUAL_HOSTED_STYLE_REQUEST": "False", "AWS_S3_ALLOW_UNSAFE_RENAME": "True", "AWS_SECRET_ACCESS_KEY": "minio123", "AWS_ENDPOINT_URL": "http://localhost:9090"}
[2024-10-09T02:22:27]:INFO:Using Static credential provider
thread 'pathway:work-0' panicked at src/engine/report_error.rs:83:50:
OSError: Failed to connect to DeltaLake: Failed to read delta log object: Generic S3 error: Error after 0 retries in 3.668µs, max_retries:10, retry_timeout:180s, source:builder error for url (http://localhost:9090/lab1/timezone_unified/_delta_log/_last_checkpoint)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
run within fixed venv
python read-results.py
[2024-10-09T09:34:50]:INFO:Preparing Pathway computation
[2024-10-09T09:34:50]:WARNING:{"AWS_ALLOW_HTTP": "True", "AWS_BUCKET_NAME": "lab1", "AWS_STORAGE_ALLOW_HTTP": "True", "AWS_ACCESS_KEY_ID": "minio", "AWS_VIRTUAL_HOSTED_STYLE_REQUEST": "False", "AWS_ENDPOINT_URL": "http://localhost:9090", "AWS_SECRET_ACCESS_KEY": "minio123", "AWS_S3_ALLOW_UNSAFE_RENAME": "True"}
[2024-10-09T09:34:50]:INFO:Using Static credential provider
[2024-10-09T09:34:50]:INFO:DeltaTableReader-0: 0 entries (1 minibatch(es)) have been sent to the engine
[2024-10-09T09:34:50]:INFO:FileWriter-0: Done writing 0 entries, time 1728441290884. Current batch writes took: 0 ms. All writes so far took: 0 ms.
[2024-10-09T09:34:55]:INFO:DeltaTableReader-0: 71 entries (51 minibatch(es)) have been sent to the engine
[2024-10-09T09:34:55]:INFO:FileWriter-0: Done writing 71 entries, time 1728441295984. Current batch writes took: 0 ms. All writes so far took: 0 ms.
[2024-10-09T09:35:01]:INFO:DeltaTableReader-0: 0 entries (51 minibatch(es)) have been sent to the engine
[2024-10-09T09:35:01]:INFO:FileWriter-0: Done writing 0 entries, time 1728441301084. Current batch writes took: 0 ms. All writes so far took: 0 ms.
@0xhanh I am not sure how to read your second comment - did you manage to fix the issue? If so, how?
I tried fixing it, it seems ok, comment at the top see
Hello @0xhanh Please note that the PR supporting this change has been merged for a while. Please let us know if there are any other improvements required.