DOC: Remove computation.rst in favor of better docstrings (pandas-dev…

…#46170) * DOC: Remove computation.rst in favor of better docstrings: * Remove other ref
me-kbs · Feb 28, 2022 · 21a3b2f · 21a3b2f
1 parent 367f8a1
commit 21a3b2f
Show file tree

Hide file tree

Showing 9 changed files with 74 additions and 225 deletions.
diff --git a/doc/source/user_guide/computation.rst b/doc/source/user_guide/computation.rst
diff --git a/doc/source/user_guide/index.rst b/doc/source/user_guide/index.rst
@@ -76,7 +76,6 @@ Guides
     boolean
     visualization
     style
-    computation
     groupby
     window
     timeseries

diff --git a/doc/source/user_guide/window.rst b/doc/source/user_guide/window.rst
@@ -427,10 +427,16 @@ can even be omitted:
 .. note::
 
     Missing values are ignored and each entry is computed using the pairwise
-    complete observations.  Please see the :ref:`covariance section
-    <computation.covariance>` for :ref:`caveats
-    <computation.covariance.caveats>` associated with this method of
-    calculating covariance and correlation matrices.
+    complete observations.
+
+    Assuming the missing data are missing at random this results in an estimate
+    for the covariance matrix which is unbiased. However, for many applications
+    this estimate may not be acceptable because the estimated covariance matrix
+    is not guaranteed to be positive semi-definite. This could lead to
+    estimated correlations having absolute values which are greater than one,
+    and/or a non-invertible covariance matrix. See `Estimation of covariance
+    matrices <https://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_matrices>`_
+    for more details.
 
 .. ipython:: python
 

diff --git a/doc/source/whatsnew/v0.6.0.rst b/doc/source/whatsnew/v0.6.0.rst
@@ -24,7 +24,7 @@ New features
 - :ref:`Added <groupby.multiindex>` multiple levels to groupby (:issue:`103`)
 - :ref:`Allow <basics.sorting>` multiple columns in ``by`` argument of ``DataFrame.sort_index`` (:issue:`92`, :issue:`362`)
 - :ref:`Added <indexing.basics.get_value>` fast ``get_value`` and ``put_value`` methods to DataFrame (:issue:`360`)
-- :ref:`Added <computation.covariance>` ``cov`` instance methods to Series and DataFrame (:issue:`194`, :issue:`362`)
+- Added ``cov`` instance methods to Series and DataFrame (:issue:`194`, :issue:`362`)
 - :ref:`Added <visualization.barplot>` ``kind='bar'`` option to ``DataFrame.plot`` (:issue:`348`)
 - :ref:`Added <basics.idxmin>` ``idxmin`` and ``idxmax`` to Series and DataFrame (:issue:`286`)
 - :ref:`Added <io.clipboard>` ``read_clipboard`` function to parse DataFrame from clipboard (:issue:`300`)

diff --git a/doc/source/whatsnew/v0.6.1.rst b/doc/source/whatsnew/v0.6.1.rst
@@ -7,7 +7,7 @@ Version 0.6.1 (December 13, 2011)
 New features
 ~~~~~~~~~~~~
 - Can append single rows (as Series) to a DataFrame
-- Add Spearman and Kendall rank :ref:`correlation <computation.correlation>`
+- Add Spearman and Kendall rank correlation
   options to Series.corr and DataFrame.corr (:issue:`428`)
 - :ref:`Added <indexing.basics.get_value>` ``get_value`` and ``set_value`` methods to
   Series, DataFrame, and Panel for very low-overhead access (>2x faster in many
@@ -19,7 +19,7 @@ New features
 - Implement new :ref:`SparseArray <sparse.array>` and ``SparseList``
   data structures. SparseSeries now derives from SparseArray (:issue:`463`)
 - :ref:`Better console printing options <basics.console_output>` (:issue:`453`)
-- Implement fast :ref:`data ranking <computation.ranking>` for Series and
+- Implement fast data ranking for Series and
   DataFrame, fast versions of scipy.stats.rankdata (:issue:`428`)
 - Implement ``DataFrame.from_items`` alternate
   constructor (:issue:`444`)

diff --git a/doc/source/whatsnew/v0.8.0.rst b/doc/source/whatsnew/v0.8.0.rst
@@ -145,7 +145,7 @@ Other new features
 - Add :ref:`'kde' <visualization.kde>` plot option for density plots
 - Support for converting DataFrame to R data.frame through rpy2
 - Improved support for complex numbers in Series and DataFrame
-- Add :ref:`pct_change <computation.pct_change>` method to all data structures
+- Add ``pct_change`` method to all data structures
 - Add max_colwidth configuration option for DataFrame console output
 - :ref:`Interpolate <missing_data.interpolate>` Series values using index values
 - Can select multiple columns from GroupBy

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -9592,6 +9592,14 @@ def corr(
             DataFrame or Series.
         Series.corr : Compute the correlation between two Series.
 
+        Notes
+        -----
+        Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.
+
+        * `Pearson correlation coefficient <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
+        * `Kendall rank correlation coefficient <https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient>`_
+        * `Spearman's rank correlation coefficient <https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient>`_
+
         Examples
         --------
         >>> def histogram_intersection(a, b):
@@ -9603,7 +9611,14 @@ def corr(
               dogs  cats
         dogs   1.0   0.3
         cats   0.3   1.0
-        """
+
+        >>> df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],
+        ...                   columns=['dogs', 'cats'])
+        >>> df.corr(min_periods=3)
+              dogs  cats
+        dogs   1.0   NaN
+        cats   NaN   1.0
+        """  # noqa:E501
         numeric_df = self._get_numeric_data()
         cols = numeric_df.columns
         idx = cols.copy()
@@ -9797,7 +9812,28 @@ def corrwith(self, other, axis: Axis = 0, drop=False, method="pearson") -> Serie
         See Also
         --------
         DataFrame.corr : Compute pairwise correlation of columns.
-        """
+
+        Examples
+        --------
+        >>> index = ["a", "b", "c", "d", "e"]
+        >>> columns = ["one", "two", "three", "four"]
+        >>> df1 = pd.DataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
+        >>> df2 = pd.DataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)
+        >>> df1.corrwith(df2)
+        one      1.0
+        two      1.0
+        three    1.0
+        four     1.0
+        dtype: float64
+
+        >>> df2.corrwith(df1, axis=1)
+        a    1.0
+        b    1.0
+        c    1.0
+        d    1.0
+        e    NaN
+        dtype: float64
+        """  # noqa:E501
         axis = self._get_axis_number(axis)
         this = self._get_numeric_data()
 

diff --git a/pandas/core/generic.py b/pandas/core/generic.py
@@ -8522,6 +8522,18 @@ def rank(
         3   spider          8.0
         4    snake          NaN
 
+        Ties are assigned the mean of the ranks (by default) for the group.
+
+        >>> s = pd.Series(range(5), index=list("abcde"))
+        >>> s["d"] = s["b"]
+        >>> s.rank()
+        a    1.0
+        b    2.5
+        c    4.0
+        d    2.5
+        e    5.0
+        dtype: float64
+
         The following example shows how the method behaves with the above
         parameters:
 
@@ -10251,7 +10263,7 @@ def pct_change(
         periods : int, default 1
             Periods to shift for forming percent change.
         fill_method : str, default 'pad'
-            How to handle NAs before computing percent changes.
+            How to handle NAs **before** computing percent changes.
         limit : int, default None
             The number of consecutive NAs to fill before stopping.
         freq : DateOffset, timedelta, or str, optional

diff --git a/pandas/core/series.py b/pandas/core/series.py
@@ -2566,6 +2566,14 @@ def corr(self, other, method="pearson", min_periods=None) -> float:
         DataFrame.corrwith : Compute pairwise correlation with another
             DataFrame or Series.
 
+        Notes
+        -----
+        Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.
+
+        * `Pearson correlation coefficient <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
+        * `Kendall rank correlation coefficient <https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient>`_
+        * `Spearman's rank correlation coefficient <https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient>`_
+
         Examples
         --------
         >>> def histogram_intersection(a, b):
@@ -2575,7 +2583,7 @@ def corr(self, other, method="pearson", min_periods=None) -> float:
         >>> s2 = pd.Series([.3, .6, .0, .1])
         >>> s1.corr(s2, method=histogram_intersection)
         0.3
-        """
+        """  # noqa:E501
         this, other = self.align(other, join="inner", copy=False)
         if len(this) == 0:
             return np.nan
-Original file line number
+Diff line change
@@ Expand Up / @@ -76,7 +76,6 @@ Guides @@
         boolean
         visualization
         style
-        computation
         groupby
         window
         timeseries
@@ Expand Down @@