Skip to main content
POST
/
api
/
v1
/
scrape
Scrape webpage
curl --request POST \
  --url https://app.dumplingai.com/api/v1/scrape \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://example.com/article",
  "format": "markdown",
  "cleaned": true,
  "renderJs": true
}'
{
  "title": "<string>",
  "url": "<string>",
  "content": "<string>",
  "metadata": {}
}

Description

This endpoint allows users to scrape data from a specified URL, format the scraped data, and optionally clean it before returning the result.

Endpoint

POST https://app.dumplingai.com/api/v1/scrape

Headers

  • Content-Type: application/json
  • Authorization: Bearer <API_KEY> (required)

Request Body

{
  "url": "string", // Required. The URL to scrape.
  "format": "string", // Optional. The format of the output. Valid values: "markdown", "html", "screenshot".
  "cleaned": "boolean", // Optional. Whether the output should be cleaned.
  "renderJs": "boolean" // Optional. Whether to render JavaScript before scraping. Default is true.
}

Responses

Success (200)

Returns the scraped data in the specified format.
{
  "title": "string",
  "metadata": "object",
  "url": "string",
  "format": "string", // "markdown", "html", "screenshot"
  "cleaned": "boolean",
  "content": "string"
}
  • Content-Type: application/json
  • X-RateLimit-Limit: The rate limit for the user.
  • X-RateLimit-Remaining: The remaining number of requests for the user.

Example Request

curl -X POST https://app.dumplingai.com/api/v1/scrape \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com",
    "format": "markdown",
    "cleaned": true,
    "renderJs": true
  }'

Rate Limiting

Rate limit headers (X-RateLimit-Limit and X-RateLimit-Remaining) are included in the response to indicate the user’s current rate limit status.

Notes

  • This endpoint uses 1 credit per request.
  • Disable JavaScript rendering by setting renderJs to false for faster results if you don’t need it.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string<uri>
required

The URL to scrape

format
enum<string>
default:markdown

Output format for the scraped content

Available options:
markdown,
html,
screenshot
cleaned
boolean
default:true

Whether to return cleaned/simplified content

renderJs
boolean
default:true

Whether to execute JavaScript on the page before scraping

requestSource
enum<string>

Optional identifier describing where the API request originated.

Available options:
API,
WEB,
MAKE_DOT_COM,
ZAPIER,
N8N,
PLAYGROUND,
DEFAULT_AUTOMATION,
AGENT_PREVIEW,
AGENT_LIVE,
AUTOPILOT,
STUDIO

Response

Scraped content returned.

title
string

Page title

url
string<uri>

Final URL after redirects

content
string

Scraped content in requested format

metadata
object

Additional page metadata

I