Skip to main content
POST
/
api
/
v1
/
extract
Extract structured content
curl --request POST \
  --url https://app.dumplingai.com/api/v1/extract \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://example.com/product-page",
  "schema": {
    "title": "string",
    "description": "string",
    "price": "number",
    "rating": "number"
  }
}'
{
  "screenshotUrl": "<string>",
  "results": {}
}

Description

This endpoint extracts structured data from a specified URL based on a user-defined schema. It uses visual and textual information from the webpage to extract the requested data.

Endpoint

POST https://app.dumplingai.com/api/v1/extract

Headers

  • Content-Type: application/json
  • Authorization: Bearer <API_KEY> (required)

Request Body

{
  "url": "string", // Required. The URL to extract data from.
  "schema": "object" // Required. The schema defining the data to extract.
}

Responses

Success (200)

Returns the extracted data based on the provided schema, along with a URL to the screenshot of the page.
{
  "screenshotUrl": "string", // URL of the captured screenshot
  "results": "object" // Extracted data matching the provided schema
}
  • Content-Type: application/json
  • X-RateLimit-Limit: The rate limit for the user.
  • X-RateLimit-Remaining: The remaining number of requests for the user.

Bad Request (400)

Returned if the request is invalid.
{
  "error": "Error message describing the issue"
}

Internal Server Error (500)

Returned if there’s an error during the extraction process.
{
  "error": "Failed to extract URL: [error details]"
}

Example Request

curl -X POST https://app.dumplingai.com/api/v1/extract \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
  "url": "https://example.com",
  "schema": {
    "title": "string",
    "description": "string",
    "price": "number",
    "rating": "number"
  }
}'

Example Response

{
  "screenshotUrl": "https://storage.example.com/screenshots/abcdef123456.png",
  "results": {
    "title": "Example Product",
    "description": "This is an example product description.",
    "price": 29.99,
    "rating": 4.5
  }
}

Notes

  • The extraction process uses both visual (screenshot) and textual (HTML) information from the webpage.
  • The extracted data is limited to a single object. Extracting lists of objects may require separate handling.
  • The maximum allowed content for extraction is 100,000 tokens. If the content exceeds this limit, it will be truncated.
  • If the URL doesn’t include a protocol, https:// will be used by default.
  • This endpoint uses 25 credits per request.
  • The extraction is performed using a combination of web scraping and AI-powered analysis (Claude 3 Haiku model).

Rate Limiting

Rate limit headers (X-RateLimit-Limit and X-RateLimit-Remaining) are included in the response to indicate the user’s current rate limit status.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url
string<uri>
required

Fully-qualified URL to extract content from.

schema
object
required

JSON schema describing the structure of the extracted data.

requestSource
enum<string>

Optional identifier describing where the API request originated.

Available options:
API,
WEB,
MAKE_DOT_COM,
ZAPIER,
N8N,
PLAYGROUND,
DEFAULT_AUTOMATION,
AGENT_PREVIEW,
AGENT_LIVE,
AUTOPILOT,
STUDIO

Response

Extracted content returned.

screenshotUrl
string<uri>
required

Temporary URL pointing to the captured screenshot of the source page.

results
object
required

Data extracted according to the provided schema.

I