-
Notifications
You must be signed in to change notification settings - Fork 237
Data
The parameter server supports flexible text formats, where each example is presented by an ASCII line. For example:
label feature_id:weight feature_id:weight feature_id:weight ...
- label: float
- feature_id: int32
- weight: float
label ...; group_id feature[:weight] feature[:weight] ...; group_id ...; ...
Each example has several slots, which are separated by semicolon. The first slot contains the label. There may be several labels (multi-label learning) or empty label (unsupervised learning). Then there are several slots, each of them presents a feature group. A group starts with a nonzero int32 group id (0 is preserved for the label), and then multiple feature weight pairs. The meaning of these pairs depends on the data format:
- SPARSE_BINARY: the example is a sparse binary vector.
feature
is a 64-bit unsigned integer ID, and there is noweight
- SPARSE: the example is a general sparse vector,
feature
is a 64-bit unsigned integer ID, andweight
is a float value - DENSE: the example is a dense vector.
feature
is the float value and there is no weight
Internally, the parameter server uses protobuf to store the example:
message Slot { optional int32 id = 1; repeated uint64 key = 2 [packed=true]; repeated float val = 3 [packed=true]; } message Example { repeated Slot slot = 1; }
To add a new text format, one need to first add the format name in DataFormat, and then add a function in class ExampleParser which converts a line of text into Example