Skip to content

Commit

Permalink
Merge pull request #22 from roycoding/0.4.0
Browse files Browse the repository at this point in the history
0.4.0
  • Loading branch information
roycoding authored Jan 13, 2020
2 parents 16cfb43 + 6f99968 commit 38f7681
Show file tree
Hide file tree
Showing 8 changed files with 235 additions and 137 deletions.
49 changes: 34 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,56 @@
# slots
### *A multi-armed bandit library for Python*

## *A multi-armed bandit library for Python*

Slots is intended to be a basic, very easy-to-use multi-armed bandit library for Python.

[![PyPI](https://img.shields.io/pypi/v/slots)](https://pypi.org/project/slots/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/slots)](https://pypi.org/project/slots/)
[![Downloads](https://pepy.tech/badge/slots)](https://pepy.tech/project/slots)

#### Author
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![type hints with mypy](https://img.shields.io/badge/type%20hints-mypy-brightgreen)](http://mypy-lang.org/)

### Author

[Roy Keyes](https://roycoding.github.io) -- roy.coding@gmail

#### License: MIT
See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
### License: MIT

See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)

### Introduction

slots is a Python library designed to allow the user to explore and use simple multi-armed bandit (MAB) strategies. The basic concept behind the multi-armed bandit problem is that you are faced with *n* choices (e.g. slot machines, medicines, or UI/UX designs), each of which results in a "win" with some unknown probability. Multi-armed bandit strategies are designed to let you quickly determine which choice will yield the highest result over time, while reducing the number of tests (or arm pulls) needed to make this determination. Typically, MAB strategies attempt to strike a balance between "exploration", testing different arms in order to find the best, and "exploitation", using the best known choice. There are many variation of this problem, see [here](https://en.wikipedia.org/wiki/Multi-armed_bandit) for more background.

slots provides a hopefully simple API to allow you to explore, test, and use these strategies. Basic usage looks like this:

Using slots to determine the best of 3 variations on a live website.

```Python
import slots

mab = slots.MAB(3, live=True)
```

Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.

```Python
mab.online_trial(bandit=2,payout=1)
```

The response of `mab.online_trial()` is a dict of the form:

```Python
{'new_trial': boolean, 'choice': int, 'best': int}
```

Where:

- If the criterion is met, `new_trial` = `False`.
- `choice` is the current choice of arm to try.
- `best` is the current best estimate of the highest payout arm.


To test strategies on arms with pre-set probabilities:

```Python
Expand All @@ -50,28 +60,31 @@ b.run()
```

To inspect the results and compare the estimated win probabilities versus the true win probabilities:

```Python
# Current best guess
b.best()
> 0

# Assuming payout of 1.0 for all "wins"
b.est_payouts()
# Estimate of the payout probabilities
b.est_probs()
> array([ 0.83888149, 0.78534031, 0.32786885])

# Ground truth payout probabilities (if known)
b.bandits.probs
> [0.8020877268854065, 0.7185844454955193, 0.16348877912363646]
```

By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax, upper confidence bound (UCB1), and Bayesian bandit strategies are also implemented.

#### Regret analysis

A common metric used to evaluate the relative success of a MAB strategy is "regret". This reflects that fraction of payouts (wins) that have been lost by using the sequence of pulls versus the currently best known arm. The current regret value can be calculated by calling the `mab.regret()` method.

For example, the regret curves for several different MAB strategies can be generated as follows:
```Python

```Python
import matplotlib.pyplot as plt
import seaborn as sns
import slots

# Test multiple strategies for the same bandit probabilities
Expand All @@ -97,8 +110,7 @@ for t in range(10000):
s['regret'].append(s['mab'].regret())

# Pretty plotting
sns.set_style('whitegrid')
sns.set_context('poster')
plt.style.use(['seaborn-poster','seaborn-whitegrid'])

plt.figure(figsize=(15,4))

Expand All @@ -111,22 +123,29 @@ plt.ylabel('Regret')
plt.title('Multi-armed bandit strategy performance (slots)')
plt.ylim(0,0.2);
```
![](./misc/regret_plot.png)

![Regret plot](./misc/regret_plot.png)

### API documentation
For documentation on the slots API, see [slots-docs.md](https://github.com/roycoding/slots/blob/master/docs/slots-docs.md).

For documentation on the slots API, see [slots-docs.md](https://github.com/roycoding/slots/blob/master/docs/slots-docs.md).

### Todo list:

- More MAB strategies
- Argument to save regret values after each trial in an array.
- TESTS!

### Contributing

I welcome contributions, though the pace of development is highly variable. Please file issues and sumbit pull requests as makes sense.
I welcome contributions, though the pace of development is highly variable. Please file issues and submit pull requests as makes sense.

The current development environment uses:

- pytest >= 5.3 (5.3.2)
- black >= 19.1 (19.10b0)
- mypy = 0.761

You can pip install these easily by including `dev-requirements.txt`.

For mypy config, see `mypy.ini`. For black config, see `pyproject.toml`.
3 changes: 3 additions & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mypy>=0.761
black>=19.10b0
pytest>=5.3.2
35 changes: 19 additions & 16 deletions docs/slots-docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,46 +13,49 @@ This documents details the current and planned API for slots. Non-implemented fe
1. Current choice
2. number of trials completed for each arm
3. scores for each arm
4. average payout per arm (payout*wins/trials?)
4. average payout per arm (wins/trials?)
5. Current regret. Regret = Trials*mean_max - sum^T_t=1(reward_t)
- See [ref](http://research.microsoft.com/en-us/um/people/sebubeck/SurveyBCB12.pdf)
6. Use sane defaults.
7. Be obvious and clean.
8. For the time being handle only binary payouts.

### Library API ideas:
#### Running slots with a live website
```Python
# Using slots to determine the best of 3 variations on a live website. 3 is the default.
# Using slots to determine the best of 3 variations on a live website. 3 is the default number of bandits and epsilon greedy is the default strategy.
mab = slots.MAB(3, live=True)

# Make the first choice randomly, record responses, and input reward
# 2 was chosen.
# Run online trial (input most recent result) until test criteria is met.
# Update online trial (input most recent result) until test criteria is met.
mab.online_trial(bandit=2,payout=1)

# Repsonse of mab.online_trial() is a dict of the form:
{'new_trial': boolean, 'choice': int, 'best': int}

# Where:
# If the criterion is met, new_trial = False.
# choice is the current choice of arm to try.
# choice is the current choice of arm to try next.
# best is the current best estimate of the highest payout arm.
```

#### Creating a MAB test instance:

```Python
# Default: 3 bandits with random p_i and pay_i = 1
mab = slots.MAB(live=False)
# Default: 3 bandits with random probabilities, p_i.
mab = slots.MAB()

# Set up 4 bandits with random p_i and pay_i
mab = slots.MAB(4, live=False)
# Set up 4 bandits with random p_i.
mab = slots.MAB(4)

# 4 bandits with specified p_i
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1], live=False)
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1])

# 3 bandits with specified pay_i
mab = slots.MAB(payouts = [1,10,15], live=False)
# Creating 3 bandits with histoprical payout data
mab = slots.MAB(3, hist_payouts = np.array([[0,0,1,...],
[1,0,0,...],
[0,0,0,...]]))
```

#### Running tests with strategy, S
Expand Down Expand Up @@ -98,8 +101,8 @@ mab.bandits.reset()

# Set probabilities or payouts
# (NOT YET IMPLEMENTED)
mab.bandits.probs_set([0.1,0.05,0.2,0.15])
mab.bandits.payouts_set([1,1.5,0.5,0.8])
mab.bandits.set_probs([0.1,0.05,0.2,0.15])
mab.bandits.set_hist_payouts([[1,1,0,0],[0,1,0,0]])
```

#### Displaying / retrieving test info
Expand All @@ -114,10 +117,10 @@ mab.prob_est()

# Retrieve bandit probability estimate of bandit i
# (NOT YET IMPLEMENTED)
mab.prob_est(i)
mab.est_prob(i)

# Retrieve bandit payout estimates (p * payout)
mab.est_payout()
# Retrieve bandit probability estimates
mab.est_probs()

# Retrieve current bandit choice
# (NOT YET IMPLEMENTED, use mab.choices[-1])
Expand Down
6 changes: 6 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[mypy]
disallow_untyped_calls = True
disallow_untyped_defs = True

[mypy-numpy]
ignore_missing_imports = True
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[tool.black]
line-length = 79
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[bdist_wheel]
# This flag says that the code is written to work on both Python 2 and Python
# 3.
universal=1

6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
setup(
name='slots',

version='0.3.1',
version='0.4.0',

description='A multi-armed bandit library for Python',
long_description=long_description,
Expand Down Expand Up @@ -50,9 +50,9 @@

# Specify the Python versions you support here. In particular, ensure
# that you indicate whether you support Python 2, Python 3 or both.
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
],

# What does your project relate to?
Expand Down
Loading

0 comments on commit 38f7681

Please sign in to comment.