[BSP model in ps-lite] Discussion about BSP model implementation using PS-lite
I have surveyed lots of projects using ps-lite to implement BSP model. Most of them simply behave like:
kv.wait(kv.push)
kv.wait(kv.pull)
I do not think they are real BSP model because each worker only wait for the accomplishment of its own push (not other workers)
Based on the test_simple_app and docs/overview.md, the BSP way should be:
Scheduler
/* The code also shows why the scheduler cannot easily implement SSP or some other complicated models because it uses wait to know the progress of each worker.
In fact, you can using a big table to store all timestamp~(s*N), and when entering the (s+1)-th iteration, you need to wait for timestamps of all workers at the 1-st itertaion. This is similiar to SSP model, but is not efficiect,
*/
if (IsScheduler()) {
std::vector<int> ts;
for (int i = 0; i < n; ++i) {
ts.clear()
for(worker in workergroup):
ts.push_back(app.Request(head, "body", receive_id)) // worker_id=i*2+9, see WorkerRankToID, this step needs to be confirmed.
for(int t : ts)
app.Wait(t);
//If this can broadcast the request to all workers, these two step may be simply rewrite as :
//app.Wait(app.Request(head, "body", kWorkerGroup))
}
}
Server
server->set_request_handle(KVServerDefaultHandle<float>()); //using the default
Worker
worker->set_request_handle(request_handle)
request_handle(){
// we can check the head and body sent from scheduler
Read(&X, &Y); // read minibatch with b / num_workers examples
kv.wait(kv.Pull(&w)); // pull the recent weight from the servers
ComputeGrad(X, Y, w, &grad); // compute the gradient
kv.wait(kv.Push(grad)); // push my update to server
worker->Response(req); //response to scheduler.
}
I think the overall logic is similar to the BSP SGD described in the docs/overview
on the server side, it will wait all workers' data to merge them before sending back ACK for workers' requests
on the server side, it will wait all workers' data to merge them before sending back ACK for workers' requests
Exactly, but this is the description of BSP model. (It means the server needs to wait all workers' data)
In the real implementation, we need to use the scheduler to manage the data synchronization (see here) without changing the KVServerDefaultHandle.
// WaitAllFinished();
for(int t : ts)
app.Wait(t); //wait all workers finish push