Extraction
Extract Stateless
ExtractionJobs
Run Job On File
Get Job Result
ModelsExpand Collapse
class ExtractJob: …
Schema for an extraction job.
id: str
The id of the extraction job
The agent that the job was run on.
id: str
The id of the extraction agent.
The configuration parameters for the extraction agent.
chunk_mode: Optional[Literal["PAGE", "SECTION"]]
The mode to use for chunking the document.
Deprecatedcitation_bbox: Optional[bool]
Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.
cite_sources: Optional[bool]
Whether to cite sources for the extraction.
confidence_scores: Optional[bool]
Whether to fetch confidence scores for the extraction.
extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]
The extract model to use for data extraction. If not provided, uses the default for the extraction mode.
Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]
Extract model options.
extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]
The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).
extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]
The extraction target specified.
high_resolution_mode: Optional[bool]
Whether to use high resolution mode for the extraction.
invalidate_cache: Optional[bool]
Whether to invalidate the cache for the extraction.
multimodal_fast_mode: Optional[bool]
DEPRECATED: Whether to use fast mode for multimodal extraction.
num_pages_context: Optional[int]
Number of pages to pass as context on long document extraction.
page_range: Optional[str]
Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').
parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]
Public model names.
priority: Optional[Literal["low", "medium", "high", "critical"]]
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
system_prompt: Optional[str]
The system prompt to use for the extraction.
use_reasoning: Optional[bool]
Whether to use reasoning for the extraction.
data_schema: Dict[str, Union[Dict[str, object], List[object], str, 3 more]]
The schema of the data.
name: str
The name of the extraction agent.
project_id: str
The ID of the project that the extraction agent belongs to.
created_at: Optional[datetime]
The creation time of the extraction agent.
custom_configuration: Optional[Literal["default"]]
Custom configuration type for the extraction agent. Currently supports 'default'.
updated_at: Optional[datetime]
The last update time of the extraction agent.
status: Literal["PENDING", "SUCCESS", "ERROR", 2 more]
The status of the extraction job
error: Optional[str]
The error that occurred during extraction
Deprecatedfile: Optional[File]
Schema for a file.
id: str
Unique identifier
project_id: str
The ID of the project that the file belongs to
created_at: Optional[datetime]
Creation datetime
data_source_id: Optional[str]
The ID of the data source that the file belongs to
expires_at: Optional[datetime]
The expiration date for the file. Files past this date can be deleted.
external_file_id: Optional[str]
The ID of the file in the external system
file_size: Optional[int]
Size of the file in bytes
file_type: Optional[str]
File type (e.g. pdf, docx, etc.)
last_modified_at: Optional[datetime]
The last modified time of the file
permission_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]
Permission information for the file
purpose: Optional[str]
The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')
resource_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]
Resource information for the file
updated_at: Optional[datetime]
Update datetime
file_id: Optional[str]
The id of the file that the extract was extracted from
class WebhookConfiguration: …
Allows the user to configure webhook options for notifications and callbacks.
webhook_events: Optional[List[Literal["extract.pending", "extract.success", "extract.error", 13 more]]]
List of event names to subscribe to
webhook_headers: Optional[Dict[str, str]]
Custom HTTP headers to include with webhook requests.
webhook_output_format: Optional[str]
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url: Optional[str]
The URL to send webhook notifications to.
ExtractionRuns
List Extract Runs
Get Run
Delete Extraction Run
Get Run By Job Id
ModelsExpand Collapse
class ExtractConfig: …
Configuration parameters for the extraction agent.
chunk_mode: Optional[Literal["PAGE", "SECTION"]]
The mode to use for chunking the document.
Deprecatedcitation_bbox: Optional[bool]
Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.
cite_sources: Optional[bool]
Whether to cite sources for the extraction.
confidence_scores: Optional[bool]
Whether to fetch confidence scores for the extraction.
extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]
The extract model to use for data extraction. If not provided, uses the default for the extraction mode.
Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]
Extract model options.
extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]
The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).
extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]
The extraction target specified.
high_resolution_mode: Optional[bool]
Whether to use high resolution mode for the extraction.
invalidate_cache: Optional[bool]
Whether to invalidate the cache for the extraction.
multimodal_fast_mode: Optional[bool]
DEPRECATED: Whether to use fast mode for multimodal extraction.
num_pages_context: Optional[int]
Number of pages to pass as context on long document extraction.
page_range: Optional[str]
Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').
parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]
Public model names.
priority: Optional[Literal["low", "medium", "high", "critical"]]
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
system_prompt: Optional[str]
The system prompt to use for the extraction.
use_reasoning: Optional[bool]
Whether to use reasoning for the extraction.
class ExtractRun: …
Schema for an extraction run.
id: str
The id of the extraction run
The config used for extraction
chunk_mode: Optional[Literal["PAGE", "SECTION"]]
The mode to use for chunking the document.
Deprecatedcitation_bbox: Optional[bool]
Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.
cite_sources: Optional[bool]
Whether to cite sources for the extraction.
confidence_scores: Optional[bool]
Whether to fetch confidence scores for the extraction.
extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]
The extract model to use for data extraction. If not provided, uses the default for the extraction mode.
Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]
Extract model options.
extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]
The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).
extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]
The extraction target specified.
high_resolution_mode: Optional[bool]
Whether to use high resolution mode for the extraction.
invalidate_cache: Optional[bool]
Whether to invalidate the cache for the extraction.
multimodal_fast_mode: Optional[bool]
DEPRECATED: Whether to use fast mode for multimodal extraction.
num_pages_context: Optional[int]
Number of pages to pass as context on long document extraction.
page_range: Optional[str]
Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').
parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]
Public model names.
priority: Optional[Literal["low", "medium", "high", "critical"]]
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
system_prompt: Optional[str]
The system prompt to use for the extraction.
use_reasoning: Optional[bool]
Whether to use reasoning for the extraction.
data_schema: Dict[str, Union[Dict[str, object], List[object], str, 3 more]]
The schema used for extraction
extraction_agent_id: str
The id of the extraction agent
from_ui: bool
Whether this extraction run was triggered from the UI
project_id: str
The id of the project that the extraction run belongs to
status: Literal["CREATED", "PENDING", "SUCCESS", "ERROR"]
The status of the extraction run
created_at: Optional[datetime]
Creation datetime
data: Optional[Union[Dict[str, Union[Dict[str, object], List[object], str, 3 more]], List[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]], null]]
The data extracted from the file
Dict[str, Union[Dict[str, object], List[object], str, 3 more]]
List[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]
error: Optional[str]
The error that occurred during extraction
extraction_metadata: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]
The metadata extracted from the file
Deprecatedfile: Optional[File]
Schema for a file.
id: str
Unique identifier
project_id: str
The ID of the project that the file belongs to
created_at: Optional[datetime]
Creation datetime
data_source_id: Optional[str]
The ID of the data source that the file belongs to
expires_at: Optional[datetime]
The expiration date for the file. Files past this date can be deleted.
external_file_id: Optional[str]
The ID of the file in the external system
file_size: Optional[int]
Size of the file in bytes
file_type: Optional[str]
File type (e.g. pdf, docx, etc.)
last_modified_at: Optional[datetime]
The last modified time of the file
permission_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]
Permission information for the file
purpose: Optional[str]
The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')
resource_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]
Resource information for the file
updated_at: Optional[datetime]
Update datetime
file_id: Optional[str]
The id of the file that the extract was extracted from
job_id: Optional[str]
The id of the job that the extraction run belongs to
updated_at: Optional[datetime]
Update datetime
ExtractionExtraction Agents
Create Extraction Agent
List Extraction Agents
Get Extraction Agent
Delete Extraction Agent
Update Extraction Agent
ModelsExpand Collapse
class ExtractAgent: …
Schema and configuration for an extraction agent.
id: str
The id of the extraction agent.
The configuration parameters for the extraction agent.
chunk_mode: Optional[Literal["PAGE", "SECTION"]]
The mode to use for chunking the document.
Deprecatedcitation_bbox: Optional[bool]
Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.
cite_sources: Optional[bool]
Whether to cite sources for the extraction.
confidence_scores: Optional[bool]
Whether to fetch confidence scores for the extraction.
extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]
The extract model to use for data extraction. If not provided, uses the default for the extraction mode.
Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]
Extract model options.
extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]
The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).
extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]
The extraction target specified.
high_resolution_mode: Optional[bool]
Whether to use high resolution mode for the extraction.
invalidate_cache: Optional[bool]
Whether to invalidate the cache for the extraction.
multimodal_fast_mode: Optional[bool]
DEPRECATED: Whether to use fast mode for multimodal extraction.
num_pages_context: Optional[int]
Number of pages to pass as context on long document extraction.
page_range: Optional[str]
Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').
parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]
Public model names.
priority: Optional[Literal["low", "medium", "high", "critical"]]
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
system_prompt: Optional[str]
The system prompt to use for the extraction.
use_reasoning: Optional[bool]
Whether to use reasoning for the extraction.
data_schema: Dict[str, Union[Dict[str, object], List[object], str, 3 more]]
The schema of the data.
name: str
The name of the extraction agent.
project_id: str
The ID of the project that the extraction agent belongs to.
created_at: Optional[datetime]
The creation time of the extraction agent.
custom_configuration: Optional[Literal["default"]]
Custom configuration type for the extraction agent. Currently supports 'default'.
updated_at: Optional[datetime]
The last update time of the extraction agent.