Skip to main content

Scrape

Use this feature when you need either the content of a single URL or structured fields extracted from that page.

What it does

  • Fetches a page by URL
  • Returns content as markdown, HTML, or screenshot output
  • Can clean the page output for easier downstream use
  • Can render JavaScript for sites that need it
  • Can also return structured JSON when you switch from scrape to extract

Common use cases

  • Pull article content for summarization
  • Capture product pages for structured extraction
  • Clean blog posts before sending them to an LLM
  • Turn live web pages into inputs for automations
  • Extract fields like product title, price, or availability as JSON

Why use it

This is the core web extraction workflow: start with raw page content, then move to structured extraction when you need JSON instead of text.

Scrape vs structured extraction

  • Use scrape when you want the page content back as markdown, HTML, or screenshot output
  • Use extract when you want DumplingAI to return specific fields that match a schema
  • In practice, most users start with scrape to validate the page, then use extract for production workflows