pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way towards this goal.
Here are just a few of the things that pandas does well:
- Easy handling of [missing data][missing-data] (represented as
NaN
) in floating point as well as non-floating point data - Size mutability: columns can be [inserted and deleted][insertion-deletion] from DataFrame and higher dimensional objects
- Automatic and explicit [data alignment][alignment]: objects can
be explicitly aligned to a set of labels, or the user can simply
ignore the labels and let
Series
,DataFrame
, etc. automatically align the data for you in computations - Powerful, flexible [group by][groupby] functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
- Make it [easy to convert][conversion] ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
- Intelligent label-based [slicing][slicing], [fancy indexing][fancy-indexing], and [subsetting][subsetting] of large data sets
- Intuitive [merging][merging] and [joining][joining] data sets
- Flexible [reshaping][reshape] and [pivoting][pivot-table] of data sets
- [Hierarchical][mi] labeling of axes (possible to have multiple labels per tick)
- Robust IO tools for loading data from [flat files][flat-files] (CSV and delimited), [Excel files][excel], [databases][db], and saving/loading data from the ultrafast [HDF5 format][hdfstore]
- [Time series][timeseries]-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging.
# conda
conda install pandas
# or PyPI
pip install pandas