Extract Audio API Documentation

Description

This endpoint extracts structured data from audio files based on a user-defined prompt. It supports input via URL or base64-encoded audio content and uses Large Language Models (LLMs) to interpret and extract relevant information from the audio.

Endpoint

POST /api/v1/extract-audio

Headers

  • Content-Type: application/json
  • Authorization: Bearer <API_KEY> (required)
  • Request-Source: string (optional)

Request Body

{
  "inputMethod": "string", // Required. Either "url" or "base64".
  "audio": "string", // Required. URL or base64-encoded audio content.
  "prompt": "string", // Required. The prompt describing the data to extract.
  "jsonMode": boolean // Optional. Whether to return the result in JSON format. Default: false.
}

Responses

Success (200)

Returns the extracted data based on the provided prompt, along with additional information.

{
  "results": "string", // Extracted data based on the prompt
  "prompt": "string", // The original prompt used for extraction
  "audioDuration": number, // Duration of the audio in seconds
  "creditUsage": number // Total credits used for this request
}
  • Content-Type: application/json
  • X-RateLimit-Limit: The rate limit for the user.
  • X-RateLimit-Remaining: The remaining number of requests for the user.

Bad Request (400)

Returned if the request is invalid or the audio file exceeds size or duration limits.

{
  "error": "Error message describing the issue"
}

Unauthorized (401)

Returned if the API key is invalid or missing.

{
  "error": "Invalid or missing Authorization header"
}

Internal Server Error (500)

Returned if there’s an error during the audio extraction process.

{
  "error": "Failed to extract audio data: [error details]"
}

Example Request

curl -X POST https://app.dumplingai.com/api/v1/extract-audio \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Request-Source: API" \
-d '{
  "inputMethod": "url",
  "audio": "https://example.com/sample-audio.mp3",
  "prompt": "Summarize the main points discussed in this audio.",
  "jsonMode": false
}'

Notes

  • The maximum file size for an audio file is 100MB.
  • The maximum audio duration is 9.5 hours (34,200 seconds).
  • Supported audio formats: wav, mp3, aiff, aac, ogg, flac
  • Credit usage:
    • Base cost: 10 credits
    • Additional 2 credits per minute of audio duration (rounded up)
  • The total credit usage is returned in the response as creditUsage.
  • If using the URL method, ensure the audio file is publicly accessible.
  • The jsonMode parameter determines whether the output is formatted as JSON (true) or plain text (false).
  • The endpoint uses the Gemini 1.5 Flash model for audio analysis and data extraction.
  • Temporary files are created during processing and are deleted after use.
  • You can get a list of supported audio formats by calling:
GET /api/v1/extract-audio

Rate Limiting

Rate limit headers (X-RateLimit-Limit and X-RateLimit-Remaining) are included in the response to indicate the user’s current rate limit status.

Error Handling

  • If the required parameters (audio or prompt) are missing, a 400 Bad Request error is returned.
  • If the audio file size exceeds 100MB, a 400 Bad Request error is returned.
  • If the audio duration exceeds 9.5 hours, a 400 Bad Request error is returned.
  • If there’s an error during extraction, a 500 Internal Server Error is returned with details about the failure.

Security and Privacy

  • Uploaded audio files are temporarily stored and then deleted after processing.
  • Audio metadata (including duration) is checked using a separate Python service before processing.