Unable to start task on agent with role defined.
I have the following agent running with the following args --
/usr/sbin/mesos-slave --master=zk://prod-zookeeper-1.aws.xxx.com:2181,prod-zookeeper-2.aws.orchardplatform.com:2181,prod-zookeeper-3.aws.xxx.com:2181/mesos --log_dir=/var/log/mesos --attributes=rack:sparkr --cgroups_limit_swap=true --containerizers=mesos,docker --default_role=sparkr --executor_registration_timeout=5mins --hostname=prod-rstudio-1.aws.xxx.com --ip=172.1.34.13 --isolation=cgroups/cpu,cgroups/mem --resources=cpus(sparkr):15;mem(sparkr):80000;disk(sparkr):79000 --switch_user=true
Trying to run the following job it goes into a queued state and never runs
{ "name": "PROD_PySpark_DatabaseBidding", "command": "env && /opt/spark-1.6.1-bin-hadoop2.6/bin/spark-submit --jars /opt/spark-1.6.1-bin-hadoop2.6/jars/elasticsearch-hadoop-2.2.0-rc1.jar --master mesos://leader.mesos:5050 --name DatabaseBidding --conf spark.mesos.role=sparkr --conf spark.cores.max=3 --driver-memory 2g --executor-memory 2g /datascience/spark-jobs/pyspark/elasticsearch.py", "shell": true, "epsilon": "PT30M", "executor": "", "executorFlags": "", "retries": 3, "owner": "[email protected]", "ownerName": "mesos", "description": "", "async": false, "successCount": 0, "errorCount": 0, "lastSuccess": "", "lastError": "", "cpus": 0.1, "disk": 10, "mem": 6000, "disabled": false, "softError": false, "dataProcessingJobType": false, "errorsSinceLastSuccess": 0, "uris": [], "environmentVariables": [], "arguments": [], "highPriority": false, "runAsUser": "mesos", "constraints": [["rack", "EQUALS", "sparkr"]], "schedule": "R/2016-6-14T12:00:00.000Z/PT1H" }
In the logs I see the following
Jun 15 03:24:15 prod-mesos-m-3.aws.xxx.com chronos[25910]: [2016-06-15 03:24:15,376] WARN Insufficient resources remaining for task 'ct:1465959600000:0:PROD_PySpark_DatabaseBidding:', will append to queue. (Needed: [cpus: 0.2 mem: 100.0 disk: 10.0], Found: [cpus: 6.3 mem: 50602.0 disk: 192376.0]) (org.apache.mesos.chronos.scheduler.mesos.MesosJobFramework:155) Jun 15 03:24:15 prod-mesos-m-3.aws.orchardplatform.com chronos[25910]: [2016-06-15 03:24:15,376] INFO JobNotificationObserver does not handle JobQueued(ScheduleBasedJob(R/2016-6-14T12:00:00.000Z/PT1H,PROD_PySpark_DatabaseBidding,env && /opt/spark-1.6.1-bin-hadoop2.6/bin/spark-submit --jars /opt/spark-1.6.1-bin-hadoop2.6/jars/elasticsearch-hadoop-2.2.0-rc1.jar --master mesos://leader.mesos:5050 --name DatabaseBidding --conf spark.mesos.constraints='rack:sparkr' --conf spark.cores.max=5 --driver-memory 2g --executor-memory 2g /datascience/spark-jobs/pyspark/elasticsearch.py,PT30M,0,0,,,3,[email protected],mesos,,,,false,0.2,10.0,100.0,false,0,ListBuffer(),true,mesos,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer(EqualsConstraint(rack,sparkr))),ct:1465959600000:0:PROD_PySpark_DatabaseBidding:,0) (org.apache.mesos.chronos.scheduler.jobs.JobsObserver$:27) Jun 15 03:24:15 prod-mesos-m-3.aws.orchardplatform.com chronos[25910]: [2016-06-15 03:24:15,376] INFO Updating state for job (PROD_PySpark_DatabaseBidding) to queued (org.apache.mesos.chronos.scheduler.jobs.stats.JobStats:62)
Running the job from the CLI everything seems to work fine anyone know what could be going on here?
I have the same issue. Placing constraints on the job leads to them being queued but never executed.
Because, attribute and roles configured mesos-slave has not enough system resources.