[SNAP-2231] Limit maximum cores for a job to physical cores on a node

Open sumwale opened this issue 8 years ago • 0 comments

See some details in the JIRA https://jira.snappydata.io/browse/SNAP-2231

These changes limit the maximum cores given to a job to the physical cores on a machine. With the default of (2 * physical cores) in the cluster, this allows other cores to be free for any other concurrent jobs. Especially important for short point-lookup queries.

Additionally these improve performance for disk intensive queries. For example measured a 30-50% improvement in performance in TPCH load and some queries when cores were limited to physical cores and lot of data has overflowed to disk.

Question: should the default cores in ExecutorInitiator be increased to (4 * physical cores) to allow for more concurrency?

Changes proposed in this pull request

overrides in SnappyTaskSchedulerImpl to track per executor cores used by a job and cap it to number of physical cores on a node
combined some maps in TaskSchedulerImpl to recover performance due to above and improve further compared to base TaskSchedulerImpl
property "spark.scheduler.limitJobCores=false" can be set to revert to previous behaviour

Patch testing

precheckin -Pstore -Pspark

TODO: working on porting Spark's TaskScheduler unit tests

ReleaseNotes.txt changes

document the new property and behaviour

Other PRs

https://github.com/SnappyDataInc/spark/pull/96

Feb 27 '18 09:02 sumwale