Large requests cause data corruption because of UDP message size limit #2

pfons · 2016-04-22T03:41:46Z

There is a bug in the Chapar implementation that allows a client to issue requests that cause corrupted packets to be sent and to be accepted by servers. The problem is triggered when a client attempts to send a request larger than the UDP limit. When this happens, headers and partial request data are kept in internal buffers which can potentially be concatenated with subsequent requests.

This bug has several harmful consequences: 1) large requests are silently ignored; 2) small requests can be discarded because of preceding large requests and 3) servers can accept requests that no client issued.

The following steps, which are tuned for Algorithm 2, are able to reproduce the bug:

a. Construct three strings with size 60000, 65791, and 5711, denoted by str1, str2, and str3.
b. To more easily reproduce the bug it is convenient to modify the "value" type to string, although this is not strictly necessary to reproduce the bug.
c. Issue the following requests:

       Req 1: PUT  k0 “Hello World”
       Req 2: PUT  k1 str1    
       Req 3: PUT  k1 str2
       Req 4: PUT  k1 str3

d. Simulate reordering of the two network packets sent when invoking Req 2 and Req 4

After this sequence of steps the receiver will concatenate the two packets and interpret them as a single request (that does not match any of the PUTs issued). Furthermore, the server will wrongly consider the request to be valid and, consequently, the server will incorrectly update the key-value store.

The root of this problem lies on the use of the function Marshal.to_channel (from the OCaml library) to send messages to other servers (file runtime.ml). Internally, Marshal.to_channel works in the following way:

While marshaling the components of message, it makes several calls to the function caml_putblock [1], through the sequence of invocations: to_channel [3] -> caml_output_value [2] -> caml_output_val [2] -> caml_really_putblock [1] -> caml_putblock [1].
On each invocation caml_putblock appends the input data to an internal buffer.
This buffer is flushed when (1) flush is explicitly called on the channel or (2) inside caml_putblock when new contents cannot be appended to the buffer because it would reach its maximum capacity. Flushing the buffer consists in calling the write system call on the socket file descriptor. The size of the internal buffer is larger than the UDP packet limit
When caml_putblock tries to flush the internal buffer (in a case where it is filled beyond the UDP limit) the call to the write system call fails (as expected) and it fails with the error EMSGSIZE. Upon such error, an exception (UnixError) is thrown, which will be handled by an exception handler but crucially two things happen (1) the internal buffer is not cleared (i.e., it retains the prefix of the message) and (2) nobody tries to resend the suffix of the message that was discarded.

In practice, during runtime, the following happens behind the scenes for each of the reproduction steps:

     Req 1 is a dummy request that initializes internal data
     Req 2 sends a correct packet (which subsequently is appended to 
the packet sent by Req 4 by the receiver)
     Req 3 does not send any packet because the message is too long 
and garbage data (headers + key) is left behind in the OCaml library
     Req 4 sends an incorrect packet that concatenates (1) the garbage 
data left behind from Req 3 and (2) the data from Req 4

Note that the sizes of the string buffers are carefully chosen to ensure that the packet headers match. In addition, reordering of the packets is necessary only to ensure that the request delivered to the receiving server has the causal-consistent meta-data that is expected.

Given the above, it is hard to correctly use the to_channel function without explicitly checking before invoking to_channel that the message size fits within the content limit of UDP packets.

References:
[1] https://github.com/ocaml/ocaml/blob/trunk/byterun/io.c
[2] https://github.com/ocaml/ocaml/blob/trunk/byterun/extern.c
[3] https://github.com/ocaml/ocaml/blob/trunk/stdlib/marshal.ml

The text was updated successfully, but these errors were encountered:

MohsenLesani · 2016-07-04T17:34:10Z

The packet sizes in the experiments are small as the value type that is used is int.

type valu = int

Further, the calls to "Marshal.to_channel" are immediately followed by a call "flush".

let send_chan env chan m nid =
Marshal.to_channel chan m [];
flush chan

vicuna mentioned this issue Mar 14, 2019

Marshal.to_channel does not clean the channel internal buffers when it fails ocaml/ocaml#7238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large requests cause data corruption because of UDP message size limit #2

Large requests cause data corruption because of UDP message size limit #2

pfons commented Apr 22, 2016

MohsenLesani commented Jul 4, 2016

Large requests cause data corruption because of UDP message size limit #2

Large requests cause data corruption because of UDP message size limit #2

Comments

pfons commented Apr 22, 2016

MohsenLesani commented Jul 4, 2016