Endpoints
Extract Audio
Extract Audio API Documentation
Description
This endpoint extracts structured data from audio files based on a user-defined prompt. It supports input via URL or base64-encoded audio content and uses Large Language Models (LLMs) to interpret and extract relevant information from the audio.
Endpoint
Headers
- Content-Type:
application/json
- Authorization: Bearer
<API_KEY>
(required)
Request Body
Responses
Success (200)
Returns the extracted data based on the provided prompt, along with additional information.
- Content-Type: application/json
- X-RateLimit-Limit: The rate limit for the user.
- X-RateLimit-Remaining: The remaining number of requests for the user.
Bad Request (400)
Returned if the request is invalid or the audio file exceeds size or duration limits.
Unauthorized (401)
Returned if the API key is invalid or missing.
Internal Server Error (500)
Returned if there’s an error during the audio extraction process.
Example Request
Notes
- The maximum file size for an audio file is 100MB.
- The maximum audio duration is 9.5 hours (34,200 seconds).
- Supported audio formats: wav, mp3, aiff, aac, ogg, flac
- Credit usage:
- Base cost: 10 credits
- Additional 2 credits per minute of audio duration (rounded up)
- The total credit usage is returned in the response as
creditUsage
. - If using the URL method, ensure the audio file is publicly accessible.
- The
jsonMode
parameter determines whether the output is formatted as JSON (true) or plain text (false). - The endpoint uses the Gemini 1.5 Flash model for audio analysis and data extraction.
- Temporary files are created during processing and are deleted after use.
- You can get a list of supported audio formats by calling:
Rate Limiting
Rate limit headers (X-RateLimit-Limit
and X-RateLimit-Remaining
) are included in the response to indicate the user’s current rate limit status.
Error Handling
- If the required parameters (
audio
orprompt
) are missing, a 400 Bad Request error is returned. - If the audio file size exceeds 100MB, a 400 Bad Request error is returned.
- If the audio duration exceeds 9.5 hours, a 400 Bad Request error is returned.
- If there’s an error during extraction, a 500 Internal Server Error is returned with details about the failure.
Security and Privacy
- Uploaded audio files are temporarily stored and then deleted after processing.
- Audio metadata (including duration) is checked using a separate Python service before processing.