Endpoints
Extract
Description
This endpoint extracts structured data from a specified URL based on a user-defined schema. It uses visual and textual information from the webpage to extract the requested data.
Endpoint
POST /api/v1/extract
Headers
- Content-Type:
application/json
- Authorization: Bearer
<API_KEY>
(required)
Request Body
{
"url": "string", // Required. The URL to extract data from.
"schema": "object" // Required. The schema defining the data to extract.
}
Responses
Success (200)
Returns the extracted data based on the provided schema, along with a URL to the screenshot of the page.
{
"screenshotUrl": "string", // URL of the captured screenshot
"results": "object" // Extracted data matching the provided schema
}
- Content-Type: application/json
- X-RateLimit-Limit: The rate limit for the user.
- X-RateLimit-Remaining: The remaining number of requests for the user.
Bad Request (400)
Returned if the request is invalid.
{
"error": "Error message describing the issue"
}
Internal Server Error (500)
Returned if there’s an error during the extraction process.
{
"error": "Failed to extract URL: [error details]"
}
Example Request
curl -X POST https://app.dumplingai.com/api/v1/extract \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"url": "https://example.com",
"schema": {
"title": "string",
"description": "string",
"price": "number",
"rating": "number"
}
}'
Example Response
{
"screenshotUrl": "https://storage.example.com/screenshots/abcdef123456.png",
"results": {
"title": "Example Product",
"description": "This is an example product description.",
"price": 29.99,
"rating": 4.5
}
}
Notes
- The extraction process uses both visual (screenshot) and textual (HTML) information from the webpage.
- The extracted data is limited to a single object. Extracting lists of objects may require separate handling.
- The maximum allowed content for extraction is 100,000 tokens. If the content exceeds this limit, it will be truncated.
- If the URL doesn’t include a protocol,
https://
will be used by default. - This endpoint uses 25 credits per request.
- The extraction is performed using a combination of web scraping and AI-powered analysis (Claude 3 Haiku model).
Rate Limiting
Rate limit headers (X-RateLimit-Limit
and X-RateLimit-Remaining
) are included in the response to indicate the user’s current rate limit status.