Runs
List Extract Runs
Get Run
Delete Extraction Run
Get Run By Job Id
ModelsExpand Collapse
ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }
Configuration parameters for the extraction agent.
chunk_mode?: "PAGE" | "SECTION"
The mode to use for chunking the document.
Deprecatedcitation_bbox?: boolean
Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.
cite_sources?: boolean
Whether to cite sources for the extraction.
confidence_scores?: boolean
Whether to fetch confidence scores for the extraction.
extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null
The extract model to use for data extraction. If not provided, uses the default for the extraction mode.
"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more
extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"
The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).
extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"
The extraction target specified.
high_resolution_mode?: boolean
Whether to use high resolution mode for the extraction.
invalidate_cache?: boolean
Whether to invalidate the cache for the extraction.
multimodal_fast_mode?: boolean
DEPRECATED: Whether to use fast mode for multimodal extraction.
num_pages_context?: number | null
Number of pages to pass as context on long document extraction.
page_range?: string | null
Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').
parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null
Public model names.
priority?: "low" | "medium" | "high" | "critical" | null
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
system_prompt?: string | null
The system prompt to use for the extraction.
use_reasoning?: boolean
Whether to use reasoning for the extraction.
ExtractRun { id, config, data_schema, 12 more }
Schema for an extraction run.
id: string
The id of the extraction run
The config used for extraction
chunk_mode?: "PAGE" | "SECTION"
The mode to use for chunking the document.
Deprecatedcitation_bbox?: boolean
Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.
cite_sources?: boolean
Whether to cite sources for the extraction.
confidence_scores?: boolean
Whether to fetch confidence scores for the extraction.
extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null
The extract model to use for data extraction. If not provided, uses the default for the extraction mode.
"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more
extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"
The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).
extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"
The extraction target specified.
high_resolution_mode?: boolean
Whether to use high resolution mode for the extraction.
invalidate_cache?: boolean
Whether to invalidate the cache for the extraction.
multimodal_fast_mode?: boolean
DEPRECATED: Whether to use fast mode for multimodal extraction.
num_pages_context?: number | null
Number of pages to pass as context on long document extraction.
page_range?: string | null
Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').
parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null
Public model names.
priority?: "low" | "medium" | "high" | "critical" | null
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
system_prompt?: string | null
The system prompt to use for the extraction.
use_reasoning?: boolean
Whether to use reasoning for the extraction.
data_schema: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>
The schema used for extraction
extraction_agent_id: string
The id of the extraction agent
from_ui: boolean
Whether this extraction run was triggered from the UI
project_id: string
The id of the project that the extraction run belongs to
status: "CREATED" | "PENDING" | "SUCCESS" | "ERROR"
The status of the extraction run
created_at?: string | null
Creation datetime
data?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | Array<Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>> | null
The data extracted from the file
Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>
Array<Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>>
error?: string | null
The error that occurred during extraction
extraction_metadata?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
The metadata extracted from the file
Schema for a file.
id: string
Unique identifier
project_id: string
The ID of the project that the file belongs to
created_at?: string | null
Creation datetime
data_source_id?: string | null
The ID of the data source that the file belongs to
expires_at?: string | null
The expiration date for the file. Files past this date can be deleted.
external_file_id?: string | null
The ID of the file in the external system
file_size?: number | null
Size of the file in bytes
file_type?: string | null
File type (e.g. pdf, docx, etc.)
last_modified_at?: string | null
The last modified time of the file
permission_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Permission information for the file
purpose?: string | null
The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')
resource_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Resource information for the file
updated_at?: string | null
Update datetime
file_id?: string | null
The id of the file that the extract was extracted from
job_id?: string | null
The id of the job that the extraction run belongs to
updated_at?: string | null
Update datetime