Skip to main content
POST
/
api
/
v1
/
doc-to-text
Convert document to text
curl --request POST \
  --url https://app.dumplingai.com/api/v1/doc-to-text \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "inputMethod": "url",
  "file": "https://example.com/report.pdf",
  "pages": "1-3,5"
}'
{
  "text": "<string>"
}

Description

This endpoint converts PDF or DOCX documents into plain text. It supports input via URL or base64-encoded file content.

Endpoint

POST https://app.dumplingai.com/api/v1/doc-to-text

Headers

  • Content-Type: application/json
  • Authorization: Bearer <API_KEY> (required)

Request Body

{
  "inputMethod": "string", // Required. Either "url" or "base64".
  "file": "string", // Required. URL or base64-encoded file content.
  "pages": "string" // Optional. Specify pages to process.
}

Responses

Success (200)

Returns the extracted text from the document.
{
  "text": "string" // Extracted text content
}
  • X-RateLimit-Limit: The rate limit for the user.
  • X-RateLimit-Remaining: The remaining number of requests for the user.

Bad Request (400)

Returned if the request is invalid or the file format is unsupported.
{
  "error": "Error message describing the issue"
}

Internal Server Error (500)

Returned if there’s an error during the document processing.
{
  "error": "Error processing document"
}

Example Request

curl -X POST https://app.dumplingai.com/api/v1/doc-to-text \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
  "inputMethod": "url",
  "file": "https://example.com/sample.pdf"
}'

Example Response

{
  "text": "This is the extracted text content from the document..."
}

Notes

  • Supported file formats: PDF and DOCX.
  • Maximum file size may be limited (refer to your plan details).
  • If using the URL method, ensure the file is publicly accessible.
  • This endpoint uses 2 credits per request.
  • The file type is automatically detected based on the file content.
  • The “pages” field allows you to specify which pages to process:
    • Use comma-separated values or ranges (e.g., “1, 2-” or “1, 2, 3-7”).
    • The first page index is 1.
    • Use ”!” before a number for inverted page numbers (e.g., “!1” for the last page).
    • If not specified, all pages will be processed by default.
    • The input must be in string format.

Rate Limiting

Rate limit headers (X-RateLimit-Limit and X-RateLimit-Remaining) are included in the response to indicate the user’s current rate limit status.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
inputMethod
enum<string>
required

Indicates whether binary content is supplied via URL or base64-encoded string.

Available options:
url,
base64
file
string
required

Document URL or base64-encoded file content to convert to text.

pages
string

Optional page ranges to extract (e.g., "1-3,5").

requestSource
enum<string>

Optional identifier describing where the API request originated.

Available options:
API,
WEB,
MAKE_DOT_COM,
ZAPIER,
N8N,
PLAYGROUND,
DEFAULT_AUTOMATION,
AGENT_PREVIEW,
AGENT_LIVE,
AUTOPILOT,
STUDIO

Response

Plain text representation returned.

text
string
required

Plain text extracted from the source document.

I