Description

This endpoint converts PDF or DOCX documents into plain text. It supports input via URL or base64-encoded file content.

Endpoint

POST /api/v1/doc-to-text

Headers

  • Content-Type: application/json
  • Authorization: Bearer <API_KEY> (required)

Request Body

{
  "inputMethod": "string", // Required. Either "url" or "base64".
  "file": "string", // Required. URL or base64-encoded file content.
  "pages": "string" // Optional. Specify pages to process.
}

Responses

Success (200)

Returns the extracted text from the document.

{
  "text": "string" // Extracted text content
}
  • X-RateLimit-Limit: The rate limit for the user.
  • X-RateLimit-Remaining: The remaining number of requests for the user.

Bad Request (400)

Returned if the request is invalid or the file format is unsupported.

{
  "error": "Error message describing the issue"
}

Internal Server Error (500)

Returned if there’s an error during the document processing.

{
  "error": "Error processing document"
}

Example Request

curl -X POST https://app.dumplingai.com/api/v1/doc-to-text \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
  "inputMethod": "url",
  "file": "https://example.com/sample.pdf"
}'

Example Response

{
  "text": "This is the extracted text content from the document..."
}

Notes

  • Supported file formats: PDF and DOCX.
  • Maximum file size may be limited (refer to your plan details).
  • If using the URL method, ensure the file is publicly accessible.
  • This endpoint uses 2 credits per request.
  • The file type is automatically detected based on the file content.
  • The “pages” field allows you to specify which pages to process:
    • Use comma-separated values or ranges (e.g., “1, 2-” or “1, 2, 3-7”).
    • The first page index is 1.
    • Use ”!” before a number for inverted page numbers (e.g., “!1” for the last page).
    • If not specified, all pages will be processed by default.
    • The input must be in string format.

Rate Limiting

Rate limit headers (X-RateLimit-Limit and X-RateLimit-Remaining) are included in the response to indicate the user’s current rate limit status.