Jobs
List Jobs
Run Job
Get Job
Run Job On File
Get Job Result
ModelsExpand Collapse
ExtractJob = object { id, extraction_agent, status, 3 more }
Schema for an extraction job.
id: string
The id of the extraction job
The agent that the job was run on.
id: string
The id of the extraction agent.
The configuration parameters for the extraction agent.
chunk_mode: optional "PAGE" or "SECTION"
The mode to use for chunking the document.
Deprecatedcitation_bbox: optional boolean
Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.
cite_sources: optional boolean
Whether to cite sources for the extraction.
confidence_scores: optional boolean
Whether to fetch confidence scores for the extraction.
extract_model: optional "openai-gpt-4-1" or "openai-gpt-4-1-mini" or "openai-gpt-4-1-nano" or 8 more or string
The extract model to use for data extraction. If not provided, uses the default for the extraction mode.
ExtractModels = "openai-gpt-4-1" or "openai-gpt-4-1-mini" or "openai-gpt-4-1-nano" or 8 more
Extract model options.
extraction_mode: optional "FAST" or "BALANCED" or "PREMIUM" or "MULTIMODAL"
The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).
extraction_target: optional "PER_DOC" or "PER_PAGE" or "PER_TABLE_ROW"
The extraction target specified.
high_resolution_mode: optional boolean
Whether to use high resolution mode for the extraction.
invalidate_cache: optional boolean
Whether to invalidate the cache for the extraction.
multimodal_fast_mode: optional boolean
DEPRECATED: Whether to use fast mode for multimodal extraction.
num_pages_context: optional number
Number of pages to pass as context on long document extraction.
page_range: optional string
Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').
parse_model: optional "openai-gpt-4o" or "openai-gpt-4o-mini" or "openai-gpt-4-1" or 23 more
Public model names.
priority: optional "low" or "medium" or "high" or "critical"
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
system_prompt: optional string
The system prompt to use for the extraction.
use_reasoning: optional boolean
Whether to use reasoning for the extraction.
data_schema: map[map[unknown] or array of unknown or string or 2 more]
The schema of the data.
name: string
The name of the extraction agent.
project_id: string
The ID of the project that the extraction agent belongs to.
created_at: optional string
The creation time of the extraction agent.
custom_configuration: optional "default"
Custom configuration type for the extraction agent. Currently supports 'default'.
updated_at: optional string
The last update time of the extraction agent.
status: "PENDING" or "SUCCESS" or "ERROR" or 2 more
The status of the extraction job
error: optional string
The error that occurred during extraction
Schema for a file.
id: string
Unique identifier
project_id: string
The ID of the project that the file belongs to
created_at: optional string
Creation datetime
data_source_id: optional string
The ID of the data source that the file belongs to
expires_at: optional string
The expiration date for the file. Files past this date can be deleted.
external_file_id: optional string
The ID of the file in the external system
file_size: optional number
Size of the file in bytes
file_type: optional string
File type (e.g. pdf, docx, etc.)
last_modified_at: optional string
The last modified time of the file
permission_info: optional map[map[unknown] or array of unknown or string or 2 more]
Permission information for the file
purpose: optional string
The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')
resource_info: optional map[map[unknown] or array of unknown or string or 2 more]
Resource information for the file
updated_at: optional string
Update datetime
file_id: optional string
The id of the file that the extract was extracted from
WebhookConfiguration = object { webhook_events, webhook_headers, webhook_output_format, webhook_url }
Allows the user to configure webhook options for notifications and callbacks.
webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 13 more
List of event names to subscribe to
webhook_headers: optional map[string]
Custom HTTP headers to include with webhook requests.
webhook_output_format: optional string
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url: optional string
The URL to send webhook notifications to.