Skip to content
Get started

Extraction Agents

Create Extraction Agent
extraction.extraction_agents.create(ExtractionAgentCreateParams**kwargs) -> ExtractAgent
POST/api/v1/extraction/extraction-agents
List Extraction Agents
extraction.extraction_agents.list(ExtractionAgentListParams**kwargs) -> ExtractionAgentListResponse
GET/api/v1/extraction/extraction-agents
Get Extraction Agent
extraction.extraction_agents.get(strextraction_agent_id) -> ExtractAgent
GET/api/v1/extraction/extraction-agents/{extraction_agent_id}
Delete Extraction Agent
extraction.extraction_agents.delete(strextraction_agent_id) -> object
DELETE/api/v1/extraction/extraction-agents/{extraction_agent_id}
Update Extraction Agent
extraction.extraction_agents.update(strextraction_agent_id, ExtractionAgentUpdateParams**kwargs) -> ExtractAgent
PUT/api/v1/extraction/extraction-agents/{extraction_agent_id}
ModelsExpand Collapse
class ExtractAgent:

Schema and configuration for an extraction agent.

id: str

The id of the extraction agent.

formatuuid

The configuration parameters for the extraction agent.

chunk_mode: Optional[Literal["PAGE", "SECTION"]]

The mode to use for chunking the document.

Accepts one of the following:
"PAGE"
"SECTION"
Deprecatedcitation_bbox: Optional[bool]

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources: Optional[bool]

Whether to cite sources for the extraction.

confidence_scores: Optional[bool]

Whether to fetch confidence scores for the extraction.

extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:
Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]

Extract model options.

Accepts one of the following:
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"gemini-2.0-flash"
"gemini-2.5-flash"
"gemini-2.5-flash-lite"
"gemini-2.5-pro"
"openai-gpt-4o"
"openai-gpt-4o-mini"
str
extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:
"FAST"
"BALANCED"
"PREMIUM"
"MULTIMODAL"
extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]

The extraction target specified.

Accepts one of the following:
"PER_DOC"
"PER_PAGE"
"PER_TABLE_ROW"
high_resolution_mode: Optional[bool]

Whether to use high resolution mode for the extraction.

invalidate_cache: Optional[bool]

Whether to invalidate the cache for the extraction.

multimodal_fast_mode: Optional[bool]

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context: Optional[int]

Number of pages to pass as context on long document extraction.

minimum1
page_range: Optional[str]

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]

Public model names.

Accepts one of the following:
"openai-gpt-4o"
"openai-gpt-4o-mini"
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"openai-gpt-5-nano"
"openai-text-embedding-3-large"
"openai-text-embedding-3-small"
"openai-whisper-1"
"anthropic-sonnet-3.5"
"anthropic-sonnet-3.5-v2"
"anthropic-sonnet-3.7"
"anthropic-sonnet-4.0"
"anthropic-sonnet-4.5"
"anthropic-haiku-3.5"
"anthropic-haiku-4.5"
"gemini-2.5-flash"
"gemini-3.0-pro"
"gemini-2.5-pro"
"gemini-2.0-flash"
"gemini-2.0-flash-lite"
"gemini-2.5-flash-lite"
"gemini-1.5-flash"
"gemini-1.5-pro"
priority: Optional[Literal["low", "medium", "high", "critical"]]

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
system_prompt: Optional[str]

The system prompt to use for the extraction.

use_reasoning: Optional[bool]

Whether to use reasoning for the extraction.

data_schema: Dict[str, Union[Dict[str, object], List[object], str, 3 more]]

The schema of the data.

Accepts one of the following:
Dict[str, object]
List[object]
str
float
bool
name: str

The name of the extraction agent.

project_id: str

The ID of the project that the extraction agent belongs to.

formatuuid
created_at: Optional[datetime]

The creation time of the extraction agent.

formatdate-time
custom_configuration: Optional[Literal["default"]]

Custom configuration type for the extraction agent. Currently supports 'default'.

updated_at: Optional[datetime]

The last update time of the extraction agent.

formatdate-time

Extraction AgentsSchema

Validate Extraction Schema
extraction.extraction_agents.schema.validate_schema(SchemaValidateSchemaParams**kwargs) -> SchemaValidateSchemaResponse
POST/api/v1/extraction/extraction-agents/schema/validation
Generate Extraction Schema
extraction.extraction_agents.schema.generate_schema(SchemaGenerateSchemaParams**kwargs) -> SchemaGenerateSchemaResponse
POST/api/v1/extraction/extraction-agents/schema/generate