Apify

All

133 repositories

crawlee-python
Public
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
python crawler scraper automation web-crawler headless scraping crawling pip web-scraping
Python
•
Apache License 2.0
•325•4.9k•77•7•Updated Dec 20, 2024Dec 20, 2024
apify-cli
Public
Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.
command-line headless-chrome puppeteer serveless apify
TypeScript
•19•122•36•4•Updated Dec 20, 2024Dec 20, 2024
apify-sdk-python
Public
The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
automation scraping apify python sdk
Python
•
Apache License 2.0
•10•120•13•3•Updated Dec 20, 2024Dec 20, 2024
apify-client-python
Public
Apify API client for Python
api client scraping apify python
Python
•
Apache License 2.0
•13•51•9•3•Updated Dec 20, 2024Dec 20, 2024
apify-docs
Public
This project is the home of Apify's documentation.
API Blueprint
•
Apache License 2.0
•80•29•76•34•Updated Dec 20, 2024Dec 20, 2024
actor-vector-database-integrations
Public
Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)
Python
•
Apache License 2.0
•4•4•1•0•Updated Dec 20, 2024Dec 20, 2024
apify-eslint-config
Public
Apify ESLint preset to be shared between projects
JavaScript
•
Apache License 2.0
•0•2•1•1•Updated Dec 20, 2024Dec 20, 2024
apify-shared-js
Public
Utilities and constants shared across Apify projects.
TypeScript
•
Apache License 2.0
•11•12•5•2•Updated Dec 20, 2024Dec 20, 2024
crawlee
Public
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping
TypeScript
•
Apache License 2.0
•707•16k•126•17•Updated Dec 20, 2024Dec 20, 2024
apify-client-js
Public
Apify API client for JavaScript / Node.js.
TypeScript
•
Apache License 2.0
•27•69•17•6•Updated Dec 19, 2024Dec 19, 2024
actor-whitepaper
Public
This whitepaper describes a new concept for building serverless microapps called Actors, which are easy to develop, share, integrate, and build upon. Actors are a reincarnation of the UNIX philosophy for programs running in the cloud.
python automation serverless scraping node-js agents
Apache License 2.0
•0•5•6•6•Updated Dec 19, 2024Dec 19, 2024
apify-sdk-js
Public
Apify SDK monorepo
actor apify nodejs javascript typescript sdk
TypeScript
•
Apache License 2.0
•39•128•11•9•Updated Dec 19, 2024Dec 19, 2024
workflows
Public
Apify's reusable github workflows
Python
•4•7•4•6•Updated Dec 19, 2024Dec 19, 2024
mcp-server-rag-web-browser
Public
A MCP Server for the RAG Web Browser Actor
JavaScript
•
Apache License 2.0
•2•12•0•1•Updated Dec 18, 2024Dec 18, 2024
actor-templates
Public
This project is the 🏠 home of Apify actor template projects to help users quickly get started.
Python
•18•25•9•1•Updated Dec 18, 2024Dec 18, 2024
fingerprint-suite
Public
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
scraping fingerprinting playwright typescript puppeteer
TypeScript
•
Apache License 2.0
•112•1.1k•20•8•Updated Dec 16, 2024Dec 16, 2024
docusaurus-plugin-typedoc-api
Public
Apify's fork of `docusaurus-plugin-typedoc-api`, customized for our Python documentation.
TypeScript
•28•0•0•0•Updated Dec 16, 2024Dec 16, 2024
apify-actor-docker
Public
Base Docker images for Apify actors.
Dockerfile
•
Apache License 2.0
•23•70•9•3•Updated Dec 16, 2024Dec 16, 2024
.github
Public
Repository to define an organization (or team) wide Github Actions workflows
0•0•0•0•Updated Dec 13, 2024Dec 13, 2024
homebrew-tap
Public
A Homebrew tap for Apify tools
Ruby
•1•8•0•4•Updated Dec 12, 2024Dec 12, 2024
rag-web-browser
Public
RAG Web Browser is an Apify Actor to feed your LLM applications and RAG pipelines with up-to-date text content scraped from the web.
scraper ai crawling serp rag llm
TypeScript
•
Apache License 2.0
•1•11•3•0•Updated Dec 11, 2024Dec 11, 2024
actor-aws-costs-to-slack
Public
This tool integrates with AWS to monitor service usage costs and posts a summary of these costs to a Slack channel. The summary includes costs for various AWS services along with a chart that provides a visual breakdown of the costs over time.
TypeScript
•
MIT License
•0•0•0•1•Updated Dec 10, 2024Dec 10, 2024
apify-haystack
Public
The official integration for Apify and Haystack 2.0
apify rag haystack-ai
Python
•
Apache License 2.0
•0•2•0•0•Updated Dec 9, 2024Dec 9, 2024
cypress-test-runner-actor
Public
JavaScript
•0•0•0•1•Updated Dec 7, 2024Dec 7, 2024
push-actor-action
Public
A GitHub Action to push an Actor the the Apify platform
Apache License 2.0
•0•15•0•0•Updated Dec 6, 2024Dec 6, 2024
apify-shared-python
Public
Constants and utilities shared across Apify's Python libraries and projects.
Python
•
Apache License 2.0
•1•0•1•0•Updated Dec 6, 2024Dec 6, 2024
proxy-chain
Public
Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
javascript-library headless-chrome proxy-server proxychains
JavaScript
•
Apache License 2.0
•146•857•7•11•Updated Dec 3, 2024Dec 3, 2024
apify-zapier-integration
Public
Apify integration for Zapier
api zapier web-scraping apify
JavaScript
•
Apache License 2.0
•1•8•4•0•Updated Nov 29, 2024Nov 29, 2024
pull-request-toolkit-action
Public
The Github action that makes sure that each PR is correctly set up and has a milestone set.
TypeScript
•
Apache License 2.0
•1•1•1•0•Updated Nov 29, 2024Nov 29, 2024
super-scraper
Public
Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!
nodejs javascript api typescript cheerio scraping web-scraping apify playwright
TypeScript
•5•18•0•0•Updated Nov 29, 2024Nov 29, 2024