Skip to main content
POST
/
api
/
v1
/
crawl
Crawl website
curl --request POST \
  --url https://app.dumplingai.com/api/v1/crawl \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://example.com",
  "limit": 5,
  "depth": 2
}'
{}

Description

This endpoint crawls a website and returns structured content from multiple pages.

Endpoint

POST https://app.dumplingai.com/api/v1/crawl

Headers

  • Content-Type: application/json
  • Authorization: Bearer <API_KEY> (required)

Request Body

{
  "url": "string", // Required. The website URL to crawl
  "limit": "number", // Optional. Max pages to crawl (default: 5)
  "depth": "number", // Optional. Crawl depth (default: 2)
  "format": "string" // Optional. Output format: "markdown", "text", or "raw" (default: "markdown")
}

Responses

Success (200)

{
  "url": "string",
  "format": "string",
  "depth": "number",
  "limit": "number",
  "pages": "number",
  "results": [
    {
      "content": "string",
      "url": "string",
      "status": "number"
    }
  ],
  "creditUsage": "number"
}

Example Request

curl -X POST https://app.dumplingai.com/api/v1/crawl \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
  "url": "https://example.com",
  "limit": 10,
  "depth": 3,
  "format": "markdown"
}'

Notes

  • Uses 1 credit per crawled page
  • Uses anti-bot measures and stealth crawling techniques
  • Limit is the max number of pages to crawl
  • Depth refers to the distance between the base URL path and sub paths

Rate Limiting

Rate limit headers (X-RateLimit-Limit and X-RateLimit-Remaining) are included in the response.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Parameters controlling a crawl job.

url
string<uri>
required

Root URL to crawl.

depth
integer
default:2

Maximum crawl depth.

Required range: x >= 1
limit
integer
default:5

Maximum number of pages to fetch.

Required range: x >= 1
format
enum<string>
default:markdown

Output format for the crawled pages.

Available options:
markdown,
text,
raw
requestSource
string

Optional request source identifier.

Response

Crawl job accepted or results returned.

Flexible JSON structure; fields differ per endpoint.