Merge pull request #22 from roycoding/0.4.0

0.4.0
roycoding · Jan 13, 2020 · 38f7681 · 38f7681
2 parents 16cfb43 + 6f99968
commit 38f7681
Show file tree

Hide file tree

Showing 8 changed files with 235 additions and 137 deletions.
diff --git a/README.md b/README.md
@@ -1,46 +1,56 @@
 # slots
-### *A multi-armed bandit library for Python*
+
+## *A multi-armed bandit library for Python*
 
 Slots is intended to be a basic, very easy-to-use multi-armed bandit library for Python.
 
 [![PyPI](https://img.shields.io/pypi/v/slots)](https://pypi.org/project/slots/)
-[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/slots)](https://pypi.org/project/slots/)
 [![Downloads](https://pepy.tech/badge/slots)](https://pepy.tech/project/slots)
 
-#### Author
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![type hints with mypy](https://img.shields.io/badge/type%20hints-mypy-brightgreen)](http://mypy-lang.org/)
+
+### Author
+
 [Roy Keyes](https://roycoding.github.io) -- roy.coding@gmail
 
-#### License: MIT
-See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
+### License: MIT
 
+See [LICENSE.txt](https://github.com/roycoding/slots/blob/master/LICENSE.txt)
 
 ### Introduction
+
 slots is a Python library designed to allow the user to explore and use simple multi-armed bandit (MAB) strategies. The basic concept behind the multi-armed bandit problem is that you are faced with *n* choices (e.g. slot machines, medicines, or UI/UX designs), each of which results in a "win" with some unknown probability. Multi-armed bandit strategies are designed to let you quickly determine which choice will yield the highest result over time, while reducing the number of tests (or arm pulls) needed to make this determination. Typically, MAB strategies attempt to strike a balance between "exploration", testing different arms in order to find the best, and "exploitation", using the best known choice. There are many variation of this problem, see [here](https://en.wikipedia.org/wiki/Multi-armed_bandit) for more background.
 
 slots provides a hopefully simple API to allow you to explore, test, and use these strategies. Basic usage looks like this:
 
 Using slots to determine the best of 3 variations on a live website.
+
 ```Python
 import slots
 
 mab = slots.MAB(3, live=True)
 ```
 
 Make the first choice randomly, record responses, and input reward 2 was chosen. Run online trial (input most recent result) until test criteria is met.
+
 ```Python
 mab.online_trial(bandit=2,payout=1)
 ```
 
 The response of `mab.online_trial()` is a dict of the form:
+
 ```Python
 {'new_trial': boolean, 'choice': int, 'best': int}
 ```
+
 Where:
+
 - If the criterion is met, `new_trial` = `False`.
 - `choice` is the current choice of arm to try.
 - `best` is the current best estimate of the highest payout arm.
 
-
 To test strategies on arms with pre-set probabilities:
 
 ```Python
@@ -50,28 +60,31 @@ b.run()
 ```
 
 To inspect the results and compare the estimated win probabilities versus the true win probabilities:
+
 ```Python
+# Current best guess
 b.best()
 > 0
 
-# Assuming payout of 1.0 for all "wins"
-b.est_payouts()
+# Estimate of the payout probabilities
+b.est_probs()
 > array([ 0.83888149,  0.78534031,  0.32786885])
 
+# Ground truth payout probabilities (if known)
 b.bandits.probs
 > [0.8020877268854065, 0.7185844454955193, 0.16348877912363646]
 ```
 
 By default, slots uses the epsilon greedy strategy. Besides epsilon greedy, the softmax, upper confidence bound (UCB1), and Bayesian bandit strategies are also implemented.
 
 #### Regret analysis
+
 A common metric used to evaluate the relative success of a MAB strategy is "regret". This reflects that fraction of payouts (wins) that have been lost by using the sequence of pulls versus the currently best known arm. The current regret value can be calculated by calling the `mab.regret()` method.
 
 For example, the regret curves for several different MAB strategies can be generated as follows:
-```Python
 
+```Python
 import matplotlib.pyplot as plt
-import seaborn as sns
 import slots
 
 # Test multiple strategies for the same bandit probabilities
@@ -97,8 +110,7 @@ for t in range(10000):
         s['regret'].append(s['mab'].regret())
 
 # Pretty plotting
-sns.set_style('whitegrid')
-sns.set_context('poster')
+plt.style.use(['seaborn-poster','seaborn-whitegrid'])
 
 plt.figure(figsize=(15,4))
 
@@ -111,22 +123,29 @@ plt.ylabel('Regret')
 plt.title('Multi-armed bandit strategy performance (slots)')
 plt.ylim(0,0.2);
 ```
-![](./misc/regret_plot.png)
+
+![Regret plot](./misc/regret_plot.png)
 
 ### API documentation
-For documentation on the slots API, see [slots-docs.md](https://github.com/roycoding/slots/blob/master/docs/slots-docs.md).
 
+For documentation on the slots API, see [slots-docs.md](https://github.com/roycoding/slots/blob/master/docs/slots-docs.md).
 
 ### Todo list:
+
 - More MAB strategies
 - Argument to save regret values after each trial in an array.
 - TESTS!
 
 ### Contributing
 
-I welcome contributions, though the pace of development is highly variable. Please file issues and sumbit pull requests as makes sense.
+I welcome contributions, though the pace of development is highly variable. Please file issues and submit pull requests as makes sense.
 
 The current development environment uses:
 
 - pytest >= 5.3 (5.3.2)
 - black >= 19.1 (19.10b0)
+- mypy = 0.761
+
+You can pip install these easily by including `dev-requirements.txt`.
+
+For mypy config, see `mypy.ini`. For black config, see `pyproject.toml`.
diff --git a/dev-requirements.txt b/dev-requirements.txt
@@ -0,0 +1,3 @@
+mypy>=0.761
+black>=19.10b0
+pytest>=5.3.2
diff --git a/docs/slots-docs.md b/docs/slots-docs.md
@@ -13,46 +13,49 @@ This documents details the current and planned API for slots. Non-implemented fe
     1. Current choice
     2. number of trials completed for each arm
     3. scores for each arm
-    4. average payout per arm (payout*wins/trials?)
+    4. average payout per arm (wins/trials?)
     5. Current regret.  Regret = Trials*mean_max - sum^T_t=1(reward_t)
         - See [ref](http://research.microsoft.com/en-us/um/people/sebubeck/SurveyBCB12.pdf)
 6. Use sane defaults.
 7. Be obvious and clean.
+8. For the time being handle only binary payouts.
 
 ### Library API ideas:
 #### Running slots with a live website
 ```Python
-# Using slots to determine the best of 3 variations on a live website. 3 is the default.
+# Using slots to determine the best of 3 variations on a live website. 3 is the default number of bandits and epsilon greedy is the default strategy.
 mab = slots.MAB(3, live=True)
 
 # Make the first choice randomly, record responses, and input reward
 # 2 was chosen.
-# Run online trial (input most recent result) until test criteria is met.
+# Update online trial (input most recent result) until test criteria is met.
 mab.online_trial(bandit=2,payout=1)
 
 # Repsonse of mab.online_trial() is a dict of the form:
 {'new_trial': boolean, 'choice': int, 'best': int}
 
 # Where:
 #   If the criterion is met, new_trial = False.
-#   choice is the current choice of arm to try.
+#   choice is the current choice of arm to try next.
 #   best is the current best estimate of the highest payout arm.
 ```
 
 #### Creating a MAB test instance:
 
 ```Python
-# Default: 3 bandits with random p_i and pay_i = 1
-mab = slots.MAB(live=False)
+# Default: 3 bandits with random probabilities, p_i.
+mab = slots.MAB()
 
-# Set up 4 bandits with random p_i and pay_i
-mab = slots.MAB(4, live=False)
+# Set up 4 bandits with random p_i.
+mab = slots.MAB(4)
 
 # 4 bandits with specified p_i
-mab = slots.MAB(probs = [0.2,0.1,0.4,0.1], live=False)
+mab = slots.MAB(probs = [0.2,0.1,0.4,0.1])
 
-# 3 bandits with specified pay_i
-mab = slots.MAB(payouts = [1,10,15], live=False)
+# Creating 3 bandits with histoprical payout data
+mab = slots.MAB(3, hist_payouts = np.array([[0,0,1,...],
+                                            [1,0,0,...],
+                                            [0,0,0,...]]))
 ```
 
 #### Running tests with strategy, S
@@ -98,8 +101,8 @@ mab.bandits.reset()
 
 # Set probabilities or payouts
 # (NOT YET IMPLEMENTED)
-mab.bandits.probs_set([0.1,0.05,0.2,0.15])
-mab.bandits.payouts_set([1,1.5,0.5,0.8])
+mab.bandits.set_probs([0.1,0.05,0.2,0.15])
+mab.bandits.set_hist_payouts([[1,1,0,0],[0,1,0,0]])
 ```
 
 #### Displaying / retrieving test info
@@ -114,10 +117,10 @@ mab.prob_est()
 
 # Retrieve bandit probability estimate of bandit i
 # (NOT YET IMPLEMENTED)
-mab.prob_est(i)
+mab.est_prob(i)
 
-# Retrieve bandit payout estimates (p * payout)
-mab.est_payout()
+# Retrieve bandit probability estimates
+mab.est_probs()
 
 # Retrieve current bandit choice
 # (NOT YET IMPLEMENTED, use mab.choices[-1])

diff --git a/mypy.ini b/mypy.ini
@@ -0,0 +1,6 @@
+[mypy]
+disallow_untyped_calls = True
+disallow_untyped_defs = True
+
+[mypy-numpy]
+ignore_missing_imports = True
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,2 @@
+[tool.black]
+line-length = 79
diff --git a/setup.cfg b/setup.cfg
@@ -1,4 +1,4 @@
 [bdist_wheel]
 # This flag says that the code is written to work on both Python 2 and Python
 # 3.
-universal=1
+
diff --git a/setup.py b/setup.py
@@ -16,7 +16,7 @@
 setup(
     name='slots',
 
-    version='0.3.1',
+    version='0.4.0',
 
     description='A multi-armed bandit library for Python',
     long_description=long_description,
@@ -50,9 +50,9 @@
 
         # Specify the Python versions you support here. In particular, ensure
         # that you indicate whether you support Python 2, Python 3 or both.
-        'Programming Language :: Python :: 2.7',
-        'Programming Language :: Python :: 3.4',
         'Programming Language :: Python :: 3.5',
+        'Programming Language :: Python :: 3.6',
+        'Programming Language :: Python :: 3.7',
     ],
 
     # What does your project relate to?