This repository contains the official Pytorch implementation for the paper A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data accepted by ICML'24.
[2024/10/12] Now DySymNet can be installed via 'pip install DySymNet'. You only need 1 command to start exploring expressions!
- DySymNet is a new search paradigm for symbolic regression (SR) that searches the symbolic network with various architectures instead of searching expressions in the large functional space.
- DySymNet possesses promising capabilities in solving high-dimensional problems and optimizing coefficients, which are lacking in current SR methods.
- DySymNet outperforms state-of-the-art baselines across various SR standard benchmark datasets and the well-known SRBench with more variables.
Install DySymNet using only one command:
pip install DySymNet
You can create and run the following script in any directory:
# Demo.py
import numpy as np
from DySymNet import SymbolicRegression
from DySymNet.scripts.params import Params
from DySymNet.scripts.functions import *
# You can customize some hyperparameters according to parameter configuration
config = Params()
# such as operators
funcs = [Identity(), Sin(), Cos(), Square(), Plus(), Sub(), Product()]
config.funcs_avail = funcs
# Example 1: Input ground truth expression
SR = SymbolicRegression.SymboliRegression(config=config, func="x_1**3 + x_1**2 + x_1", func_name="Nguyen-1")
eq, R2, error, relative_error = SR.solve_environment()
print('Expression: ', eq)
print('R2: ', R2)
print('error: ', error)
print('relative_error: ', relative_error)
print('log(1 + MSE): ', np.log(1 + error))
Then you can get a folder named as "results" in the current directory, which contains subfolders named func_name that record the logs of the script running process.
The main running script is SymbolicRegression.py
and it relies on configuring runs via params.py
. The params.py
includes various hyperparameters of the controller RNN and the symbolic network. You can configure the following hyperparameters as required:
Parameters | Description | Example Values |
---|---|---|
funcs_avail |
Operator library | See params.py |
n_layers |
Range of symbolic network layers | [2, 3, 4, 5] |
num_func_layer |
Range of the number of neurons per layer of a symbolic network | [2, 3, 4, 5, 6] |
Note: You can add the additional operators in the functions.py
by referring to existing operators and place them inside funcs_avail
if you want to use them.
Parameters | Description | Example Values |
---|---|---|
num_epochs |
epochs for sampling | 500 |
batch_size |
Size for a batch sampling | 10 |
optimizer |
Optimizer for training RNN | Adam |
hidden_size |
Hidden dim. of RNN layer | 32 |
embedding_size |
Embedding dim. | 16 |
learning_rate1 |
Learning rate for training RNN | 0.0006 |
risk_seeking |
using risk seeking policy gradient or not | True |
risk_factor |
Risk factor | 0.5 |
entropy_weight |
Entropy weight | 0.005 |
reward_type |
Loss type for computing reward | mse |
Parameters | Description | Example Values |
---|---|---|
learning_rate2 |
Learning rate for training symbolic network | 0.01 |
reg_weight |
Regularizaiton weight | 5e-3 |
threshold |
Prunning threshold | 0.05 |
trials |
Training trials for training symbolic network | 1 |
n_epochs1 |
Epochs for the first training stage | 10001 |
n_epochs2 |
Epochs for the second training stage | 10001 |
summary_step |
Summary for every n training steps |
1000 |
clip_grad |
Using adaptive gradient clipping or not | True |
max_norm |
Norm threshold for gradient clipping | 1.0 |
window_size |
Window size for adaptive gradient clipping | 50 |
refine_constants |
Refining constants or not | True |
n_restarts |
Number of restarts for BFGS optimization | 1 |
add_bias |
adding bias or not | False |
verbose |
Print training process or not | True |
use_gpu |
Using cuda or not | False |
plot_reward |
Plot reward curve or not | False |
Note: threshold
controls the complexity of the final expression, and is a trade-off between complexity and precision, which you can customise according to your actual requirements.
Parameters | Description | Example Values |
---|---|---|
N_TRAIN |
Size of input data | 100 |
N_VAL |
Size of validation dataset | 100 |
NOISE |
Standard deviation of noise for input data | 0 |
DOMAIN |
Domain of input data | (-1, 1) |
N_TEST |
Size of test dataset | 100 |
DOMAIN_TEST |
Domain of test dataset | (-1, 1) |
results_dir
configures the save path for all results
We provide two ways to perform symbolic regression tasks.
When you want to discover an expression for which the ground truth is known, for example to test a standard benchmark, you can edit the script SymbolicRegression.py
as follows:
# SymbolicRegression.py
params = Params() # configuration for a specific task
ground_truth_eq = "x_1 + x_2" # variable names should be written as x_i, where i>=1.
eq_name = "x_1+x_2"
SR = SymbolicRegression(config=params, func=ground_truth_eq, fun_name=eq_name) # A new folder named "func_name" will be created to store the result files.
eq, R2, error, relative_error = SR.solve_environment() # return results
In this way, the function generate_data
is used to automatically generate the corresponding data set
Then, you can run SymbolicRegression.py
directly, or you can run it in the terminal as follows:
python SymbolicRegression.py
After running this script, the results will be stored in path ./results/test/func_name
.
When you only have observed data and do not know the ground truth, you can perform symbolic regression by entering the path to the csv data file:
# SymbolicRegression.py
params = Params() # configuration for a specific task
data_path = './data/Nguyen-1.csv' # data file should be in csv format
SR = SymbolicRegression(config=params, func_name='Nguyen-1', data_path=data_path) # you can rename the func_name as any other you want.
eq, R2, error, relative_error = SR.solve_environment() # return results
Note: the data file should contains (
Then, you can run SymbolicRegression.py
directly, or you can run it in the terminal as follows:
python SymbolicRegression.py
After running this script, the results will be stored in path ./results/test/func_name
.
Once the script stops early or finishes running, you will get the following output:
Expression: x_1 + x_2
R2: 1.0
error: 4.3591795754679974e-13
relative_error: 2.036015757767018e-06
log(1 + MSE): 4.3587355946774144e-13
If you find our work and this codebase helpful, please consider starring this repo and cite:
@inproceedings{
li2024a,
title={A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data},
author={Wenqiang Li and Weijun Li and Lina Yu and Min Wu and Linjun Sun and Jingyi Liu and Yanjie Li and Shu Wei and Deng Yusong and Meilan Hao},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=IejxxE9DO2}
}