Arch

The parameter server architecture

Parameter server nodes are grouped into a server group and one or several worker groups.

A server node in the server group maintains a partition of the globally shared parameters. Server nodes communicate with each other to replicate and/or to migrate parameters for reliability and scaling.

Each worker group runs an application. A worker typically stores locally a portion of the training data to compute local statistics such as gradients. Workers communicate only with the server nodes (not among themselves), updating and retrieving the shared parameters via push and pull.

There is a scheduler node for each worker group. It assigns tasks to workers and monitors their progress. If workers are added or removed, it reschedules unfinished tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arch

The parameter server architecture

Clone this wiki locally