Skip to content
mli edited this page Dec 30, 2014 · 4 revisions

The parameter server architecture

Parameter server nodes are grouped into a server group and one or several worker groups.

A server node in the server group maintains a partition of the globally shared parameters. Server nodes communicate with each other to replicate and/or to migrate parameters for reliability and scaling.

Each worker group runs an application. A worker typically stores locally a portion of the training data to compute local statistics such as gradients. Workers communicate only with the server nodes (not among themselves), updating and retrieving the shared parameters via push and pull.

There is a scheduler node for each worker group. It assigns tasks to workers and monitors their progress. If workers are added or removed, it reschedules unfinished tasks.

Clone this wiki locally