Use $ instead of $$ for formulae (fixes #106)

Toloka · Jun 8, 2024 · fb67391 · fb67391
1 parent faa071f
commit fb67391
Show file tree

Hide file tree

Showing 7 changed files with 44 additions and 63 deletions.
diff --git a/crowdkit/aggregation/classification/dawid_skene.py b/crowdkit/aggregation/classification/dawid_skene.py
@@ -24,15 +24,9 @@ class DawidSkene(BaseClassificationAggregator):
 
     ![Dawid-Skene latent label model](https://tlk.s3.yandex.net/crowd-kit/docs/ds_llm.png)
 
-    Here the prior true label probability is
-    $$
-    \operatorname{Pr}(z_j = c) = p[c],
-    $$
-    and the probability distribution of the worker responses with the true label $c$ is represented by the
-    corresponding column of the error matrix:
-    $$
-    \operatorname{Pr}(y_j^w = k | z_j = c) = e^w[k, c].
-    $$
+    Here the prior true label probability is $\operatorname{Pr}(z_j = c) = p[c]$, and the probability distribution
+    of the worker responses with the true label $c$ is represented by the corresponding column of the error matrix:
+    $\operatorname{Pr}(y_j^w = k | z_j = c) = e^w[k, c]$.
 
     Parameters $p$, $e^w$, and latent variables $z$ are optimized with the Expectation-Maximization algorithm:
     1. **E-step**. Estimates the true task label probabilities using the specified workers' responses,
@@ -239,15 +233,14 @@ class OneCoinDawidSkene(DawidSkene):
     at the M-step of the algorithm.
 
     For the one-coin model, a worker confusion (error) matrix is parameterized by a single parameter $s_w$:
-    $$
-    e^w_{j,z_j}  = \begin{cases}
+
+    $e^w_{j,z_j}  = \begin{cases}
         s_{w} & y^w_j = z_j \\
         \frac{1 - s_{w}}{K - 1} & y^w_j \neq z_j
-    \end{cases}
-    $$
+    \end{cases}$,
+
     where $e^w$ is a worker confusion (error) matrix of size $K \times K$ in case of the $K$ class classification,
-    $z_j$ be a true task label, $y^w_j$ is a worker
-    response to the task $j$, and $s_w$ is a worker skill (accuracy).
+    $z_j$ be a true task label, $y^w_j$ is a worker response to the task $j$, and $s_w$ is a worker skill (accuracy).
 
     In other words, the worker $w$ uses a single coin flip to decide their assignment. No matter what the true label is,
     the worker has the $s_w$ probability to assign the correct label, and

diff --git a/crowdkit/aggregation/classification/glad.py b/crowdkit/aggregation/classification/glad.py
@@ -33,20 +33,17 @@ class GLAD(BaseClassificationAggregator):
 
     ![GLAD latent label model](https://tlk.s3.yandex.net/crowd-kit/docs/glad_llm.png)
 
-    The prior probability of $z_j$ being equal to $c$ is
-    $$
-    \operatorname{Pr}(z_j = c) = p[c],
-    $$
+    The prior probability of $z_j$ being equal to $c$ is $\operatorname{Pr}(z_j = c) = p[c]$,
     and the probability distribution of the worker responses with the true label $c$ follows the
     single coin Dawid-Skene model where the true label probability is a sigmoid function of the product of the
     worker ability and the inverse task difficulty:
-    $$
-    \operatorname{Pr}(y^i_j = k | z_j = c) = \begin{cases}a(i, j), & k = c \\ \frac{1 - a(i,j)}{K-1}, & k \neq c\end{cases},
-    $$
-    where
-    $$
-    a(i,j) = \frac{1}{1 + \exp(-\alpha_i\beta_j)}.
-    $$
+
+    $\operatorname{Pr}(y^i_j = k | z_j = c) = \begin{cases}
+        a(i, j), & k = c \\
+        \frac{1 - a(i,j)}{K-1}, & k \neq c
+    \end{cases}$,
+
+    where $a(i,j) = \frac{1}{1 + \exp(-\alpha_i\beta_j)}$.
 
     Parameters $p$, $\alpha$, $\beta$, and latent variables $z$ are optimized with the Expectation-Minimization algorithm:
     1. **E-step**. Estimates the true task label probabilities using the alpha parameters of workers' abilities,

diff --git a/crowdkit/aggregation/classification/kos.py b/crowdkit/aggregation/classification/kos.py
@@ -24,10 +24,9 @@ class KOS(BaseClassificationAggregator):
     how reliable worker $j$ is.
 
     At $k$-th iteration, the values are updated as follows:
-    $$
-    x_{i \rightarrow j}^{(k)} = \sum_{j^{'} \in \partial i \backslash j} A_{ij^{'}} y_{j^{'} \rightarrow i}^{(k-1)} \\
-    y_{j \rightarrow i}^{(k)} = \sum_{i^{'} \in \partial j \backslash i} A_{i^{'}j} x_{i^{'} \rightarrow j}^{(k-1)}
-    $$
+    $x_{i \rightarrow j}^{(k)} = \sum_{j^{'} \in \partial i \backslash j} A_{ij^{'}} y_{j^{'} \rightarrow i}^{(k-1)}$
+    and
+    $y_{j \rightarrow i}^{(k)} = \sum_{i^{'} \in \partial j \backslash i} A_{i^{'}j} x_{i^{'} \rightarrow j}^{(k-1)}$.
 
     David R. Karger, Sewoong Oh, and Devavrat Shah. Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems.
     *Operations Research 62.1 (2014)*, 1-38.

diff --git a/crowdkit/aggregation/classification/m_msr.py b/crowdkit/aggregation/classification/m_msr.py
@@ -16,13 +16,14 @@
 
 @attr.s
 class MMSR(BaseClassificationAggregator):
-    r"""The **Matrix Mean-Subsequence-Reduced Algorithm** (M-MSR) model assumes that workers have different expertise levels and are represented
-    as a vector of "skills" $s$ which entries $s_i$ show the probability
-    that the worker $i$ will answer the given task correctly. Having that, we can estimate the probability of each worker via solving a rank-one matrix completion problem as follows:
-    $$
-    \mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right]
-     = \boldsymbol{s}\boldsymbol{s}^T,
-    $$
+    r"""The **Matrix Mean-Subsequence-Reduced Algorithm** (M-MSR) model assumes that workers have different
+    expertise levels and are represented as a vector of "skills" $s$ which entries $s_i$ show the probability
+    that the worker $i$ will answer the given task correctly. Having that, we can estimate the probability of
+    each worker via solving a rank-one matrix completion problem as follows:
+
+    $\mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right] =
+    \boldsymbol{s}\boldsymbol{s}^T$,
+
     where $M$ is the total number of classes, $\widetilde{C}$ is a covariance matrix between
     workers, and $\boldsymbol{1}\boldsymbol{1}^T$ is the all-ones matrix which has the same
     size as $\widetilde{C}$.

diff --git a/crowdkit/aggregation/embeddings/hrrasa.py b/crowdkit/aggregation/embeddings/hrrasa.py
@@ -33,23 +33,18 @@ class HRRASA(BaseClassificationAggregator):
     **Step 2**. Estimate the *local* workers' reliabilities that represent how well a
     worker responds to one particular task. The local reliability of the worker $k$ on the task $i$ is
     denoted by $\gamma_i^k$ and is calculated by incorporating both types of representations:
-    $$
-    \gamma_i^k = \lambda_{emb}\gamma_{i,emb}^k + \lambda_{seq}\gamma_{i,seq}^k, \; \lambda_{emb} + \lambda_{seq} = 1,
-    $$
-    where the $\gamma_{i,emb}^k$ value is a reliability calculated on `embedding`, and the $\gamma_{i,seq}^k$ value is a
-    reliability calculated on `output`.
+    $\gamma_i^k = \lambda_{emb}\gamma_{i,emb}^k + \lambda_{seq}\gamma_{i,seq}^k, \; \lambda_{emb} + \lambda_{seq} = 1$,
+    where the $\gamma_{i,emb}^k$ value is a reliability calculated on `embedding`, and the $\gamma_{i,seq}^k$
+    value is a reliability calculated on `output`.
 
     The $\gamma_{i,emb}^k$ value is calculated by the following equation:
-    $$
-    \gamma_{i,emb}^k = \frac{1}{|\mathcal{U}_i| - 1}\sum_{a_i^{k'} \in \mathcal{U}_i, k \neq k'}
-    \exp\left(\frac{\|e_i^k-e_i^{k'}\|^2}{\|e_i^k\|^2\|e_i^{k'}\|^2}\right),
-    $$
+    $\gamma_{i,emb}^k = \frac{1}{|\mathcal{U}_i| - 1}\sum_{a_i^{k'} \in \mathcal{U}_i, k \neq k'}
+    \exp\left(\frac{\|e_i^k-e_i^{k'}\|^2}{\|e_i^k\|^2\|e_i^{k'}\|^2}\right)$,
     where $\mathcal{U_i}$ is a set of workers' responses on task $i$.
 
-    The $\gamma_{i,seq}^k$ value uses some similarity measure $sim$ on the `output` data, e.g. GLEU similarity on texts:
-    $$
-    \gamma_{i,seq}^k = \frac{1}{|\mathcal{U}_i| - 1}\sum_{a_i^{k'} \in \mathcal{U}_i, k \neq k'}sim(a_i^k, a_i^{k'}).
-    $$
+    The $\gamma_{i,seq}^k$ value uses some similarity measure $sim$ on the `output` data,
+    e.g. GLEU similarity on texts:
+    $\gamma_{i,seq}^k = \frac{1}{|\mathcal{U}_i| - 1}\sum_{a_i^{k'} \in \mathcal{U}_i, k \neq k'}sim(a_i^k, a_i^{k'})$.
 
     **Step 3**. Estimate the *global* workers' reliabilities $\beta$ by iteratively performing two steps:
     1. For each task, estimate the aggregated embedding: $\hat{e}_i = \frac{\sum_k \gamma_i^k
@@ -58,12 +53,9 @@ class HRRASA(BaseClassificationAggregator):
     |\mathcal{V}_k|)}}{\sum_i\left(\|e_i^k - \hat{e}_i\|^2/\gamma_i^k\right)}$, where $\mathcal{V}_k$
     is a set of tasks completed by the worker $k$.
 
-    **Step 4**. Estimate the aggregated result. It is the output which embedding is
-    the closest one to $\hat{e}_i$. If `calculate_ranks` is true, the method also calculates ranks for
-    each worker response as
-    $$
-    s_i^k = \beta_k \exp\left(-\frac{\|e_i^k - \hat{e}_i\|^2}{\|e_i^k\|^2\|\hat{e}_i\|^2}\right) + \gamma_i^k.
-    $$
+    **Step 4**. Estimate the aggregated result. It is the output which embedding is the closest one to
+    $\hat{e}_i$. If `calculate_ranks` is true, the method also calculates ranks for each worker response as
+    $s_i^k = \beta_k \exp\left(-\frac{\|e_i^k - \hat{e}_i\|^2}{\|e_i^k\|^2\|\hat{e}_i\|^2}\right) + \gamma_i^k$.
 
     Jiyi Li. Crowdsourced Text Sequence Aggregation based on Hybrid Reliability and Representation.
     In *Proceedings of the 43rd International ACM SIGIR Conference on Research and Development

diff --git a/crowdkit/aggregation/pairwise/bradley_terry.py b/crowdkit/aggregation/pairwise/bradley_terry.py
@@ -18,14 +18,13 @@ class BradleyTerry(BasePairwiseAggregator):
     The algorithm constructs an items' ranking based on pairwise comparisons. Given
     a pair of two items $i$ and $j$, the probability of $i$ to be ranked higher is,
     according to the Bradley-Terry's probabilistic model,
-    $$
-    P(i > j) = \frac{p_i}{p_i + p_j}.
-    $$
+    $P(i > j) = \frac{p_i}{p_i + p_j}$.
+
     Here $\boldsymbol{p}$ is a vector of positive real-valued parameters that the algorithm optimizes. These
     optimization process maximizes the log-likelihood of observed comparisons outcomes by the MM-algorithm:
-    $$
-    L(\boldsymbol{p}) = \sum_{i=1}^n\sum_{j=1}^n[w_{ij}\ln p_i - w_{ij}\ln (p_i + p_j)],
-    $$
+
+    $L(\boldsymbol{p}) = \sum_{i=1}^n\sum_{j=1}^n[w_{ij}\ln p_i - w_{ij}\ln (p_i + p_j)]$,
+
     where $w_{ij}$ denotes the number of comparisons of $i$ and $j$ "won" by $i$.
 
     {% note info %}

diff --git a/crowdkit/metrics/data/_classification.py b/crowdkit/metrics/data/_classification.py
@@ -113,7 +113,7 @@ def uncertainty(
 ) -> Union[float, "pd.Series[Any]"]:
     r"""Label uncertainty metric: entropy of labels probability distribution.
     Computed as Shannon's Entropy with label probabilities computed either for tasks or workers:
-    $$H(L) = -\sum_{label_i \in L} p(label_i) \cdot \log(p(label_i))$$
+    $H(L) = -\sum_{label_i \in L} p(label_i) \cdot \log(p(label_i))$.
 
     Args:
         answers: A data frame containing `task`, `worker` and `label` columns.