Maytas Monsereenusorn comments

Results 10 comments of


                                            Maytas Monsereenusorn

Add Spark Writer support.

Hi @JulianJaffePinterest, I saw that https://github.com/apache/druid/pull/11474 and https://github.com/apache/druid/pull/11823 was already merged into `apache:spark_druid_connector` and that this PR (write support) is the only piece left. My aim is to review this...

Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty

@jon-wei @capistrant Is this PR good to merge? @jon-wei , I tried clicking on https://github.com/apache/druid/pull/12599/files#r940734962 in your previous comment but i'm not seeing any comment when I open the link

Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty

Should used_flag_last_updated column be part of the index in the segment table?

"Too Many Open Files" error when running GroupBy query against large system due to poor file handling

I think it's worth reopening this issue. Even if you increase maximum number of opened file descriptors on system level, opening many tmp files can caused your historical to OOM....

Optimize isOvershadowed when there is a unique minor version for an interval

On a cluster with 600k active segments, this patch reduce the time to build timeline from 160,000ms to 2,000ms. On a cluster with ~7 million active segments, this patch reduce...

Kafka ingestion lag spikes up whenever tasks are rolling

One more thing to add to (2) from the above, the PR https://github.com/apache/druid/pull/14533 may also help with (2). This may help if the supervisor is configured with a lot of...

Multi-cluster Stream (Kafka/Kinesis) Druid Ingest Proposal

+1 on this feature. @abhishekagarwal87 and I have this discussion a while back at https://github.com/apache/druid/pull/14424#issuecomment-1933738231 Looking forward to the PR!

Implement per-segment query timeout on data nodes

@abhishekagarwal87 This idea came from my discussion with @gianm and @gianm suggested this change (see: https://apachedruidworkspace.slack.com/archives/C030CMF6B70/p1746578390954639?thread_ts=1745436989.786489&cid=C030CMF6B70) We have also observed some queries that takes on avg 1-2 minutes to process...

How do I find if there is residual in the table scan/plan files?

Seems like we used to have something like https://github.com/apache/iceberg-python/commit/4f0a5c6203888ff105c1f09f41c17245f477d2ab but it's gone? @Fokko @TGooch44

How do I find if there is residual in the table scan/plan files?

@Fokko Thanks for getting back to me. I can look into contributing. I am not too familiar with the new pyiceberg rewrite (current state of this library) but was wondering...