PyAirbyte icon indicating copy to clipboard operation
PyAirbyte copied to clipboard

Feat: Add an Executor for Docker sources

Open Tarekk opened this issue 1 year ago • 8 comments

What

Adding a DockerExecutor class to the get_source factory.

Why

I absolutely love this library as it allows me to rapidly pull data from a variety of different sources and load them into a "local DW" in duckDB for quick analysis using SQL. All this without having to deploy or startup any servers!! However, on numerous occasions I needed to pull in data from Java based Airbyte sources such as PostgreSQL and MySQL and couldn't use PyAirbyte to do that. Therefore, it would be great if the library supported the execution of Docker images that adhere to the Airbyte protocol. This will greatly increase the number of compatible sources and allow users to develop additional sources in their language of choice.

How

DockerExecutor is almost identical to PathExecutor but instead of receiving a path to the executable it expects a list of docker commands. These commands are passed down from the factory and include the mounting of the the temp_dir used for materializing the JSON config files.

if docker_executable:
    version = version or "latest"

    temp_dir = tempfile.gettempdir()
    return Source(
        name=name,
        config=config,
        streams=streams,
        executor=DockerExecutor(
            name=name,
            executable=[
                "docker",
                "run",
                "--rm",
                "-i",
                "--volume",
                f"{temp_dir}:{temp_dir}",
                f"airbyte/{name}:{version}",
            ],
        ),
    )

Example Usage

source = ab.get_source(
    "source-faker",
    docker_executable=True,
    config={
        "count": 10,
    },
)

Tested via test_docker_executable.py which is a replica of test_source_faker_integration.py with docker_executable=True

P.S. This is my first ever open source contribution 😃

Tarekk avatar Apr 26 '24 11:04 Tarekk

/test-pr

PR test job started... Check job output.

❌ Tests failed.

aaronsteers avatar May 07 '24 03:05 aaronsteers

@Tarekk - Thanks for your patience. We're going to move forward and merge this, but we'll keep this functionality (for a short while) under a new airbyte.experimental module. No need to make any changes here - I'll tackle that work post-merge of this PR.

I did see when invoking tests that the we're seeing failures on the Windows runners - but Ubuntu runners are passing all tests 🎉 . We already skip other docker-dependent user tests on Windows, so I would suggest the following:

  1. We should try to catch docker-executor failures when on Windows is detected, we report these under an Exception class that messages that Windows isn't supported.
    • Technically it is possible with WSL, but by default, Windows docker backends will be Windows-based containers.
  2. We can skip integration tests related to docker executable if we detect we are running on Windows. We have this pattern in place for other tests, so we can replicate it for this case also.

aaronsteers avatar May 07 '24 05:05 aaronsteers

Thank you @aaronsteers for reviewing this PR and bringing it up for review with the team, it's great to hear that it will be merged in 😃 And thank you fro the valuable feedback, I replied to your comments above and they all make sense to me!

I am not very familiar with the testing infra but sounds like we would exclude running tests on Windows somewhere here?

Tarekk avatar May 07 '24 11:05 Tarekk

Thank you for the guidance @aaronsteers! Updated to add the docker fixtures to the exclusion list.

Tarekk avatar May 08 '24 00:05 Tarekk

/test-pr

PR test job started... Check job output.

✅ Tests passed.

aaronsteers avatar May 08 '24 00:05 aaronsteers

@Tarekk - I just ran the test suite and all tests are green ✅ 🎉 🙌

We'll try to merge this in within the next 24-48 hours. As noted above, I'll take a follow-on PR to move this into an experimental module while we iterate and gather feedback from the community, but this won't disrupt your ability to use the functionality in the meanwhile post-release.

CC @bindipankhudi - Do you mind taking a pass in the code review also? Any follow-ups we can optionally take internally before release.

aaronsteers avatar May 08 '24 01:05 aaronsteers

That's great to hear @aaronsteers 🥳 appreciate your help with getting this landed!!

Tarekk avatar May 08 '24 01:05 Tarekk

@aaronsteers, just took a pass at this PR. Looks great to me! No additional comments from me. The implementation feels very clean and loving the integration tests :). Nicely done @Tarekk

bindipankhudi avatar May 08 '24 17:05 bindipankhudi