druid icon indicating copy to clipboard operation
druid copied to clipboard

arm64 Docker images openly available

Open igorvpcleao opened this issue 4 years ago • 29 comments

Description

apache/druid repository on Dockerhub stores amd64 images only. I'd like to suggest generating official arm64 images.

Motivation

arm64 workloads are becoming more and more popular. If possible, Docker images should also be available for this architecture.

igorvpcleao avatar Oct 20 '21 21:10 igorvpcleao

@igorvpcleao

https://github.com/arm64-compat/apache-druid

I have implemented a workaround for the same, if you are using Mac with M1 chip processor, these images will greatly speed up your processes (from my experience). Note that images are only for development purpose.

Not to be used in production.

I haven't throughly tested the same, I would love to hear your comments.

anuragagarwal561994 avatar Apr 15 '22 14:04 anuragagarwal561994

Woho~.

Did you have do to anything special to get the arm build up and running?

2bethere avatar Apr 15 '22 15:04 2bethere

Nothing special as such, to summarise the changes I guess only 2 things are mainly required:

  • change the amd64/busybox to just busybox it will pick arm or amd as per the availabile arch
  • change the base image from distroless, distroless java 8 is on amd only. So I used adoptopenjdk

But in my repo I did a little bit more changes, I shifted the mvn build step from image to ci to make it utilise the maven build cache next time.

And I did some related changes.

Although the image that I pushed is not currently working I tested after I posted the comment, I repeated the same steps on my local and it was working fine. I will be able to figure out the issue and correct over the next 2 days.

The UI is not loading and giving 404 instead to me it seems like the console didn't compile properly in the ci environment and nothing was reported while building, this is my initial guess, will drill down.

anuragagarwal561994 avatar Apr 15 '22 18:04 anuragagarwal561994

OK, thanks for trying this out. I think it'll be awesome to get a arm build to run well on travis.

2bethere avatar Apr 15 '22 22:04 2bethere

@2bethere https://github.com/arm64-compat/apache-druid/issues/1 I identified the issue and fixed the same.

If you want you can give this image a try as well.

I can also contribute to the apache/druid repo, the fixes in the docker file. I saw that the .travis build in the check already tests for ARM.

Just we might have to decide on a base image.

anuragagarwal561994 avatar Apr 16 '22 13:04 anuragagarwal561994

Yeah, on travis the console is skipped to improve build speed right now. We should probably try to push https://github.com/apache/druid/pull/11109 forward to get an arm build on docker hub.

Double checking this is what you are asking for?

2bethere avatar Apr 18 '22 16:04 2bethere

Yes I may have mixed the commands while building on travis right now it is fixed in my image, I was asking if you would want to test out my image and see if everything works correctly or is there anyway I can contribute to the project to server arm compatible docker image. For me now I haven't seen any issues.

anuragagarwal561994 avatar Apr 18 '22 17:04 anuragagarwal561994

ARM64 image now can be built on the master branch from both Linux and Mac M1/M2. But I don't know when the official ARM64 image will be provided on Docker Hub.

FrankChen021 avatar Sep 24 '22 04:09 FrankChen021

Any update on this?

m17kea avatar Jul 07 '23 21:07 m17kea

Let's add 2024 to the mix.

Any update on this? I'm pulling apache/druid:28.0.1 and still not seeing ARM support in docker hub.

I see the docs show how to manually build, but is it that much of a hassle to host them? (legitimately curious)

dudo avatar Feb 10 '24 05:02 dudo

You can pull onofficial arm64 compatible image from https://github.com/arm64-compat/apache-druid

let me know if you want me to upgrade the version for the same or help contribute

anuragagarwal561994 avatar Feb 10 '24 11:02 anuragagarwal561994

Thanks, @anuragagarwal561994! It looks like there is "official support" now, you just have to build it yourself.

Unfortunately I wasn't able to compile main on my M1 last night. This is the error I'm getting, if anyone is more familiar with Java.


-------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.1.0:exec (generate-binary-license) on project distribution: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.1.0:exec (generate-binary-license) on project distribution: Command execution failed.

dudo avatar Feb 10 '24 22:02 dudo

@dudo actually since it is more or less java, it is actually compiable multi platform. I will try to build it myself once, if there is more to look into, I will try to invest some time this week. Can you also tell me which version or tag you are trying to build

anuragagarwal561994 avatar Feb 12 '24 05:02 anuragagarwal561994

@dudo I have built the latest version 28.0.1

https://github.com/arm64-compat/apache-druid/pkgs/container/apache%2Fdruid/177604533?tag=28.0.1

you can try and use it for your local / staging setup. Please refrain from using it in production as I am just a maintainer of this repo / oreganization but I can't actively test these images and their features myself, I hope I am able to make your life easier :)

anuragagarwal561994 avatar Feb 12 '24 05:02 anuragagarwal561994

I second this. Would be very handy if we could run an "out-of-box" Druid image on AWS Graviton. Makes a lot of sense both for small and large scale setups, especially if sharing a K8s cluster with other ARM workloads.

dmitry-livchak-qco avatar Feb 12 '24 17:02 dmitry-livchak-qco

I have built locally the latest version 30.0.0 but when I use it I face issues with middleManager container which crashes during ingestion.

fabricebaranski avatar Jun 20 '24 08:06 fabricebaranski

Any specific issue you are seeing? Maybe some error logs will help.

2bethere avatar Jun 20 '24 15:06 2bethere

No issue visible as my container is no longer visible when it crashes. Just exit code 137

fabricebaranski avatar Jun 20 '24 15:06 fabricebaranski

OK, can you do a docker logs #container_id and provide some details there? It's hard to know what's crashing.

2bethere avatar Jun 20 '24 15:06 2bethere

Here the last lines of logs

2024-06-20 17:30:27 2024-06-20T15:30:27,303 DEBUG [qtp141697265-160] org.apache.druid.jetty.RequestLog - 192.168.192.11 GET //192.168.192.11:8091/druid/worker/v1/chat/query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-worker0_0/counters HTTP/1.1 200
2024-06-20 17:30:27 2024-06-20T15:30:27,305 INFO [qtp141697265-148] org.apache.druid.msq.exec.WorkerImpl - Finish received for task [query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-worker0_0]
2024-06-20 17:30:27 2024-06-20T15:30:27,305 DEBUG [qtp141697265-148] org.apache.druid.jetty.RequestLog - 192.168.192.11 POST //192.168.192.11:8091/druid/worker/v1/chat/query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-worker0_0/finish HTTP/1.1 202
2024-06-20 17:30:27 2024-06-20T15:30:27,306 INFO [[query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-worker0_0]-threading-task-runner-executor-1] org.apache.druid.msq.exec.LoadedSegmentDataProviderFactory - Waiting for any data server queries to be canceled.
2024-06-20 17:30:27 2024-06-20T15:30:27,306 WARN [controller-status-checker-0] org.apache.druid.msq.indexing.IndexerWorkerContext - Periodic fetch of controller location returned [ServiceLocations{locations=[], closed=true}]. Worker task [query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-worker0_0] will exit.
2024-06-20 17:30:27 2024-06-20T15:30:27,306 INFO [controller-status-checker-0] org.apache.druid.msq.exec.WorkerImpl - Stopping gracefully for taskId [query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-worker0_0]
2024-06-20 17:30:27 2024-06-20T15:30:27,308 INFO [threading-task-runner-executor-1] org.apache.druid.indexing.overlord.ThreadingTaskRunner - Removed task directory: var/druid/task/slot3/query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-worker0_0
2024-06-20 17:30:27 2024-06-20T15:30:27,347 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-worker0_0] completed with status [SUCCESS].
2024-06-20 17:30:27 2024-06-20T15:30:27,471 DEBUG [qtp141697265-136] org.apache.druid.jetty.RequestLog - 127.0.0.1 GET //localhost:8091/status/health HTTP/1.1 200
2024-06-20 17:30:27 2024-06-20T15:30:27,514 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persisted in-memory data for segment[19b98542-5e68-4bfd-982c-8c45356fd76b_vertex_-146136543-09-08T08:23:32.096Z_146140482-04-24T15:36:27.903Z_2024-06-20T15:30:26.849Z] spill[0] to disk in [401] ms (23,321 rows).
2024-06-20 17:30:27 2024-06-20T15:30:27,522 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persisted in-memory data for segments: 19b98542-5e68-4bfd-982c-8c45356fd76b_vertex_-146136543-09-08T08:23:32.096Z_146140482-04-24T15:36:27.903Z_2024-06-20T15:30:26.849Z
2024-06-20 17:30:27 2024-06-20T15:30:27,522 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persisted stats: processed rows: [46494], persisted rows[23321], persisted sinks: [1], persisted fireHydrants (across sinks): [1]
2024-06-20 17:30:27 2024-06-20T15:30:27,522 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persisted rows[23,321] and bytes[30,990,856] and removed all sinks & hydrants from memory in[408] millis
2024-06-20 17:30:27 2024-06-20T15:30:27,522 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persist is done.
2024-06-20 17:30:27 2024-06-20T15:30:27,522 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Spawning intermediate persist
2024-06-20 17:30:27 2024-06-20T15:30:27,565 INFO [[query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108]-threading-task-runner-executor-0] org.apache.druid.msq.exec.ControllerImpl - Controller will now wait for segments to be loaded. The query has already finished executing, and results will be included once the segments are loaded, even if this query is cancelled now.
2024-06-20 17:30:27 2024-06-20T15:30:27,567 INFO [query-e6f6ac5a-b2c1-41d8-bf4c-9798ef360108-segment-load-waiter-0] org.apache.druid.msq.exec.SegmentLoadStatusFetcher - Fetching segment load status for datasource[19b98542-5e68-4bfd-982c-8c45356fd76b] from broker
2024-06-20 17:30:27 2024-06-20T15:30:27,737 INFO [processing-0] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Incremental persist to disk because bytesCurrentlyInMemory[30995920] is greater than maxBytesInMemory[30994978].
2024-06-20 17:30:27 2024-06-20T15:30:27,897 DEBUG [qtp141697265-148] org.apache.druid.jetty.RequestLog - 192.168.192.8 GET //192.168.192.11:8091/druid/listen/v1/lookups HTTP/1.1 200
2024-06-20 17:30:27 2024-06-20T15:30:27,992 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persisted in-memory data for segment[19b98542-5e68-4bfd-982c-8c45356fd76b_vertex_-146136543-09-08T08:23:32.096Z_146140482-04-24T15:36:27.903Z_2024-06-20T15:30:26.849Z] spill[1] to disk in [468] ms (23,173 rows).
2024-06-20 17:30:28 2024-06-20T15:30:28,001 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persisted in-memory data for segments: 19b98542-5e68-4bfd-982c-8c45356fd76b_vertex_-146136543-09-08T08:23:32.096Z_146140482-04-24T15:36:27.903Z_2024-06-20T15:30:26.849Z
2024-06-20 17:30:28 2024-06-20T15:30:28,002 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persisted stats: processed rows: [68525], persisted rows[23173], persisted sinks: [1], persisted fireHydrants (across sinks): [1]
2024-06-20 17:30:28 2024-06-20T15:30:28,002 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persisted rows[23,173] and bytes[30,990,704] and removed all sinks & hydrants from memory in[478] millis
2024-06-20 17:30:28 2024-06-20T15:30:28,002 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Persist is done.
2024-06-20 17:30:28 2024-06-20T15:30:28,002 INFO [[19e455bc-a674-4f9b-a550-436ff843709f_0_0:0]-batch-appenderator-persist] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Spawning intermediate persist
2024-06-20 17:30:28 2024-06-20T15:30:28,232 INFO [processing-4] org.apache.druid.segment.realtime.appenderator.BatchAppenderator - Incremental persist to disk because bytesCurrentlyInMemory[30995490] is greater than maxBytesInMemory[30994978].

fabricebaranski avatar Jun 20 '24 15:06 fabricebaranski

I don't see any errors in the logs. What's the underlying machine you are testing this on? I assume this is native batch ingestion? (index_parallel task)? I can try to reproduce it.

2bethere avatar Jun 20 '24 15:06 2bethere

MSQ ingestion using a parquet file. I launch 3 MSQ ingestions at the same time.

fabricebaranski avatar Jun 20 '24 15:06 fabricebaranski

My machine is an Apple M2 Max

fabricebaranski avatar Jun 20 '24 15:06 fabricebaranski

Dope, will try a repro with docker.

2bethere avatar Jun 20 '24 16:06 2bethere

An alternative, is to build using buildx for amd64 platform docker buildx build --platform linux/amd64 . I have to deactivate 'Use Rosetta for x86_64/amd64 emulation on Apple Silicon'. And after, reactivate 'Use Rosetta' and in your docker-compose add

    platform: linux/amd64
    cpuset: '0'

for all druid containers. It works but quite slow.

fabricebaranski avatar Jun 21 '24 09:06 fabricebaranski

I think I got an arm64 build partially working locally in docker, but running into some service discovery issues. Will continue to chip away at this.

I'm also trying to figure out how docker images are published to docker hub from this project. Will report back as I make progress.

2bethere avatar Jun 21 '24 15:06 2bethere