Skip to content

Concurrent Pandas is a Python Library that allows you to use Pandas to concurrently download bulk data using threads or processes.

License

Notifications You must be signed in to change notification settings

briwilcox/Concurrent-Pandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Concurrent-Pandas

Concurrent Pandas

Concurrent Pandas is a Python Library that allows you to use Pandas and / or Quandl to concurrently download bulk data using threads or processes. What does concurrency do for you? Download your data simultaneously instead of one key at a time, Concurrent Pandas automatically spawns an optimal number of processes or threads based on the number of processes available on your machine.

Note: Concurrent Pandas is not associated with Quandl or Python Pandas, it just allows you to access them faster.


####Features

  • Sequential Downloading of Keys
  • Concurrent downloading of keys using thread or process pools
  • All Concurrent Downloading will automatically pick an optimal number of threads or processes to use for your system
  • Recursive data structure unpacking for key insertion
    • Pass one or many:
      • Lists
      • Sets
      • Deques
      • Any other data structures that inherit from abstract base class Container provided it is not also inheriting from Python basestring and it allows for iteration.
  • Automatic re-attempts if the download fails or times out
    • Retries increase the time to try again with each successive failure
  • Variety of data sources supported
    • Quandl
    • Federal Reserve Economic Data
    • Google Finance
    • Yahoo Finance
    • More coming soon!
  • Data is returned in a hashmap for fast lookups ( O(1) average case )
    • Hash Map Keys are the strings entered for lookup, buckets contain your Panda data frame

####Easy to use

# Define your keys
yahoo_keys = ["aapl", "xom", "msft", "goog", "brk-b", "TSLA", "IRBT"]
# Instantiate Concurrent Pandas
fast_panda = concurrentpandas.ConcurrentPandas()
# Set your data source
fast_panda.set_source_yahoo_finance()
# Insert your keys
fast_panda.insert_keys(yahoo_keys)
# Choose either asynchronous threads, processes, or a single sequential download
fast_panda.consume_keys_asynchronous_threads()
# The Concurrent Pandas object contains a dict of your results now
mymap = fast_panda.return_map()
# Easily pull the data out of the map for your research
print(mymap["aapl"].head)

#####Installation Instructions

Note : only tested on Linux

To install execute:

pip install ConcurrentPandas

#####Updates

New in 0.1.2 Ability to interact with stock options

Now requires BeautifulSoup4, and Pandas 0.16 or newer.


#####Misc

Tested on Python 2.7.6 and Python 3.4.0

To see what else I'm building or follow / contact me check out my github, twitter, and my personal site.

About

Concurrent Pandas is a Python Library that allows you to use Pandas to concurrently download bulk data using threads or processes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages