Extract

Description

This endpoint extracts structured data from a specified URL based on a user-defined schema. It uses visual and textual information from the webpage to extract the requested data.

Endpoint

POST https://app.dumplingai.com/api/v1/extract

Headers

Content-Type: application/json
Authorization: Bearer <API_KEY> (required)

Request Body

{
  "url": "string", // Required. The URL to extract data from.
  "schema": "object" // Required. The schema defining the data to extract.
}

Responses

Success (200)

Returns the extracted data based on the provided schema, along with a URL to the screenshot of the page.

{
  "screenshotUrl": "string", // URL of the captured screenshot
  "results": "object" // Extracted data matching the provided schema
}

Content-Type: application/json
X-RateLimit-Limit: The rate limit for the user.
X-RateLimit-Remaining: The remaining number of requests for the user.

Bad Request (400)

Returned if the request is invalid.

{
  "error": "Error message describing the issue"
}

Internal Server Error (500)

Returned if there’s an error during the extraction process.

{
  "error": "Failed to extract URL: [error details]"
}

Example Request

curl -X POST https://app.dumplingai.com/api/v1/extract \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
  "url": "https://example.com",
  "schema": {
    "title": "string",
    "description": "string",
    "price": "number",
    "rating": "number"
  }
}'

Example Response

{
  "screenshotUrl": "https://storage.example.com/screenshots/abcdef123456.png",
  "results": {
    "title": "Example Product",
    "description": "This is an example product description.",
    "price": 29.99,
    "rating": 4.5
  }
}

Notes

The extraction process uses both visual (screenshot) and textual (HTML) information from the webpage.
The extracted data is limited to a single object. Extracting lists of objects may require separate handling.
The maximum allowed content for extraction is 100,000 tokens. If the content exceeds this limit, it will be truncated.
If the URL doesn’t include a protocol, https:// will be used by default.
This endpoint uses 25 credits per request.
The extraction is performed using a combination of web scraping and AI-powered analysis (Claude 3 Haiku model).

Rate Limiting

Rate limit headers (X-RateLimit-Limit and X-RateLimit-Remaining) are included in the response to indicate the user’s current rate limit status.

API Documentation

Endpoints

Description

Endpoint

Headers

Request Body

Responses

Success (200)

Bad Request (400)

Internal Server Error (500)

Example Request

Example Response

Notes

Rate Limiting

API Documentation

Endpoints

​Description

​Endpoint

​Headers

​Request Body

​Responses

​Success (200)

​Bad Request (400)

​Internal Server Error (500)

​Example Request

​Example Response

​Notes

​Rate Limiting

Description

Endpoint

Headers

Request Body

Responses

Success (200)

Bad Request (400)

Internal Server Error (500)

Example Request

Example Response

Notes

Rate Limiting