Extract web data by GraphQL and DOM API. Demo
- Support APIs like DOM API
- Custom HTTP Headers and HTTP Cookies
- Emulate devices and User-Agent
- Render JavaScript
- Support robots.txt
- Expand short URL
- Protect your privacy using incognito mode
The following example extracts featured presentations attributes(title, url, thumbnail url and meta) from the SpeakerDeck.
{
page(url: "https://speakerdeck.com/p/featured") {
decks: querySelectorAll(selector: "div.container div.mb-5") {
title: querySelector(selector: "a") {
text: getAttribute(name: "title")
}
link: querySelector(selector: "a") {
href: getAttribute(name: "href")
}
image: querySelector(selector: "div.deck-preview") {
thumbnail: getAttribute(name: "data-cover-image")
}
meta: querySelectorAll(selector: "div.deck-preview-meta > div.py-3") {
value: innerText
}
}
}
}
See examples for more detailed examples.
title
head
body
children
childNodes
innerText
getElementById
getElementsByClassName
getElementsByTagName
querySelector
querySelectorAl
attributes
children
childNodes
innerText
innerHTML
outerHTML
getAttribute
getElementById
getElementsByClassName
getElementsByTagName
querySelector
querySelectorAl
GraphQL is introspective. You can query a GraphQL schema using __schema
and __type
as below.
__schema
lists all types defined in the schema.
query {
__schema {
types {
name
kind
description
fields {
name
}
}
}
}
__type
gets details about a specific type.
query {
__type(name: "Document") {
name
kind
description
fields {
name
}
}
}
docker-compose up
GraphDOM will be running at http://localhost:8080
and endpoint will be available.
And also Playground will be running at http://localhost:8080/graphql
, if NODE_ENV
is development
.
http;//localhost:8080/graphql
The 'ping' query is useful to check whether GraphDOM works.
curl \
-H 'Content-Type: application/json' \
-X POST http://localhost:8080/graphql \
-d '{"query":"{ping}"}'
The request should receive the following response, if GraphDOM works appropriately.
{
"data": {
"ping": "pong"
}
}
Environment variables are the follows and every variable is optional.
NODE_ENV
:development
orproduction
.(Defaults todevelopment
)SERVER_PORT
: Port listened by the GraphDOM.(Defaults to8080
)LOG_LEVEL
:DEBUG
,INFO
,WARN
,ERROR
orTRACE
.(Defaults toINFO
)APOLLO_API_KEY
: API key for the Apollo GraphManager.APOLLO_SCHEMA_TAG
: Tag name of a GraphQL schema.BROWSER_PATH
: Path to a browser.(Defaults to detect automatically)BROWSER_HEADLESS
: Whether to launch browser in headless mode.(Defaults totrue
)QUERY_COMPLEXITY_LIMIT
: Maximum allowed complexity for query.(Defaults to15
)QUERY_DEPTH_LIMIT
: Maximum allowed depth for query.(Defaults to5
)REDIS_URL
: URL used to connect to Redis. If the environment variable is not set, the GraphDOM uses in-memory as a cache.
See .env.example for more detailed variables.