This repo contains the datasets and code for our paper:
"Near-Optimal Constrained Padding for Object Retrievals with Dependencies"
Pranay Jain, Andrew C. Reed, and Michael K. Reiter
33rd USENIX Security Symposium, August 2024
All of our executable code is written in Python, either in .py scripts or in Jupyter notebooks. To set up the required packages, we recommend using a Python virtual environment, such as conda
, virtualenv
or venv
. Once you have activated your virtual environment of choice, run the command:
pip install -r requirements.txt
For python3
environments, replace pip
with pip3
if needed. This will install all the packages required by the code in this repository.
Our code uses the Gurobi Optimizer. If you do not already have access to Gurobi, our recommendation is to obtain an Academic Named-User License. Instructions for how to obtain a Gurobi academic license can be found at:
Gurobi Academic License Program
Note on Gurobi License Installation:
The final step of setting up the Gurobi license requires you to run the command grbgetkey
with the serial number provided to you by Gurobi. This command must be executed while your computer is connected to your academic network.
If your computer is not connected to your academic network, then using a VPN connected to the academic network will work. However, sometimes VPNs are only configured for split-tunneling, i.e., only traffic destined for the academic network itself will be routed through the VPN. In those instances, if you have SSH access to a server located on the academic network, then here is a potential workaround:
- Connect to your academic network using your VPN.
- In a terminal window, use a program such as
sshuttle
to route all internet traffic through SSH. This can be done using the following command:~/anaconda3/bin/sshuttle -r username@servername 0/0 -x servername
. In the preceding command, adjust the path tosshuttle
according to your own development environment, and replaceusername
andservername
with your own credentials for the SSH server that you will be using. - In a second terminal window, run the command
grbgetkey
with the serial number provided to you by Gurobi. - Once the Gurobi license has been activated, you can close the
sshuttle
connection usingCTRL-C
in the first terminal window, and then disconnect from the VPN.
This repo is organized as follows:
- data/ - Our three datasets are located in subdirectories here.
- experiments/ - The Jupyter notebooks that produce our Figures (from the paper) are here.
- repo root dir - The implementation of our algorithm, as well as our implementations of the algorithms to which we compare, are located in the root directory of this repo.
We provide Jupyter notebooks that run each of our primary experiments from the paper. To run them, first start an instance of Jupyter Notebook and then navigate to the experiments/
directory and open the .ipynb file named after the figure that you'd like to run. Each notebook is designed to be run as-is, i.e., Kernel > Restart & Run All
.
Descriptions of our datasets can be found in Section 5.1 of our paper.
The function load_dataset
located in load_dataset.py
loads each of our datasets. For the purposes of running our experiments, everything is automated within the Jupyter notebooks (described above).
This is the proposed algorithm for producing near-optimal padding scheme for sequences. The experiments
directory shows how the model (and baselines) can be executed in notebooks. This section describes how to execute the PFS algorithm as a python executable. To run this algorithm, execute the main.py
code. Sample execution:
python main.py -d wikipedia -c 1.05
-
[Required] Dataset (use flag
-d
or--dataset
) : This selects the dataset to run the model against. The choices for this parameter are{'wikipedia', 'linode_from_index', 'autocomplete', 'synthetic'}
. Each of these datasets are explained below. The datasets are stored inside thedata
directory. -
[Required] Padding factor
c
(use flag-c
or--pad_factor
): This value sets the maximum padding factor allowed to the PFS or PWOD algorithms. The padding factor for an object of sizex
is defined aspad(x) / x
. We recommend a value forc
in{1.05, 1.25, 1.50, 2.00}
. -
[Optional] Stride count
k
(use flag-k
or--stride
): -
[Optional] Prefix closed (use flag
--prefix-closed
): This is a boolean argument which selects if the prefix-closed set of sequences is used as input to the model. The definition for prefix closure is provided in the paper.