[SNAP-2231] Limit maximum cores for a job to physical cores on a node
See some details in the JIRA https://jira.snappydata.io/browse/SNAP-2231
These changes limit the maximum cores given to a job to the physical cores on a machine. With the default of (2 * physical cores) in the cluster, this allows other cores to be free for any other concurrent jobs. Especially important for short point-lookup queries.
Additionally these improve performance for disk intensive queries. For example measured a 30-50% improvement in performance in TPCH load and some queries when cores were limited to physical cores and lot of data has overflowed to disk.
Question: should the default cores in ExecutorInitiator be increased to (4 * physical cores) to allow for more concurrency?
Changes proposed in this pull request
- overrides in SnappyTaskSchedulerImpl to track per executor cores used by a job and cap it to number of physical cores on a node
- combined some maps in TaskSchedulerImpl to recover performance due to above and improve further compared to base TaskSchedulerImpl
- property "spark.scheduler.limitJobCores=false" can be set to revert to previous behaviour
Patch testing
precheckin -Pstore -Pspark
TODO: working on porting Spark's TaskScheduler unit tests
ReleaseNotes.txt changes
document the new property and behaviour
Other PRs
https://github.com/SnappyDataInc/spark/pull/96