Scrape
Use this feature when you need either the content of a single URL or structured fields extracted from that page.What it does
- Fetches a page by URL
- Returns content as markdown, HTML, or screenshot output
- Can clean the page output for easier downstream use
- Can render JavaScript for sites that need it
- Can also return structured JSON when you switch from
scrapetoextract
Common use cases
- Pull article content for summarization
- Capture product pages for structured extraction
- Clean blog posts before sending them to an LLM
- Turn live web pages into inputs for automations
- Extract fields like product title, price, or availability as JSON
Why use it
This is the core web extraction workflow: start with raw page content, then move to structured extraction when you need JSON instead of text.Scrape vs structured extraction
- Use
scrapewhen you want the page content back as markdown, HTML, or screenshot output - Use
extractwhen you want DumplingAI to return specific fields that match a schema - In practice, most users start with
scrapeto validate the page, then useextractfor production workflows