Skip to main content

Scrape

Use this feature when you need either the content of a single URL or structured fields extracted from that page.

What it does

  • Fetches a page by URL
  • Returns content as markdown, HTML, or screenshot output
  • Can clean the page output for easier downstream use
  • Can render JavaScript for sites that need it
  • Can also return structured JSON when you switch from scrape to extract

Common use cases

  • Pull article content for summarization
  • Capture product pages for structured extraction
  • Clean blog posts before sending them to an LLM
  • Turn live web pages into inputs for automations
  • Extract fields like product title, price, or availability as JSON

Why use it

This is the core web extraction workflow: start with raw page content, then move to structured extraction when you need JSON instead of text.

Scrape vs structured extraction

  • Use scrape when you want the page content back as markdown, HTML, or screenshot output
  • Use extract when you want DumplingAI to return specific fields that match a schema
  • In practice, most users start with scrape to validate the page, then use extract for production workflows

Scrape API

Parameters, request examples, and response format

Web Scraping Tutorial

Learn how to use scraping in a larger workflow

Extract

Pull structured fields from a page using AI

Crawl

Capture multiple pages from a site instead of just one URL