Extraction Agents

Create Extraction Agent

client.extraction.extractionAgents.create(, ?): ExtractAgent { id, config, data_schema, 5 more }

POST/api/v1/extraction/extraction-agents

List Extraction Agents

client.extraction.extractionAgents.list(?, ?): ExtractionAgentListResponse { id, config, data_schema, 5 more }

GET/api/v1/extraction/extraction-agents

Get Extraction Agent

client.extraction.extractionAgents.get(, ?): ExtractAgent { id, config, data_schema, 5 more }

GET/api/v1/extraction/extraction-agents/{extraction_agent_id}

Delete Extraction Agent

client.extraction.extractionAgents.delete(, ?): ExtractionAgentDeleteResponse

DELETE/api/v1/extraction/extraction-agents/{extraction_agent_id}

Update Extraction Agent

client.extraction.extractionAgents.update(, , ?): ExtractAgent { id, config, data_schema, 5 more }

PUT/api/v1/extraction/extraction-agents/{extraction_agent_id}

ModelsExpand Collapse

ExtractAgent { id, config, data_schema, 5 more }

Schema and configuration for an extraction agent.

id: string

The id of the extraction agent.

formatuuid

config: ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }

The configuration parameters for the extraction agent.

chunk_mode?: "PAGE" | "SECTION"

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox?: boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources?: boolean

Whether to cite sources for the extraction.

confidence_scores?: boolean

Whether to fetch confidence scores for the extraction.

extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

(string & {})

extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode?: boolean

Whether to use high resolution mode for the extraction.

invalidate_cache?: boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode?: boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context?: number | null

Number of pages to pass as context on long document extraction.

minimum1

page_range?: string | null

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority?: "low" | "medium" | "high" | "critical" | null

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt?: string | null

The system prompt to use for the extraction.

use_reasoning?: boolean

Whether to use reasoning for the extraction.

data_schema: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>

The schema of the data.

Accepts one of the following:

Record<string, unknown>

Array<unknown>

string

number

boolean

name: string

The name of the extraction agent.

project_id: string

The ID of the project that the extraction agent belongs to.

formatuuid

created_at?: string | null

The creation time of the extraction agent.

formatdate-time

custom_configuration?: "default" | null

Custom configuration type for the extraction agent. Currently supports 'default'.

updated_at?: string | null

The last update time of the extraction agent.

formatdate-time

Extraction AgentsSchema

Validate Extraction Schema

client.extraction.extractionAgents.schema.validateSchema(, ?): SchemaValidateSchemaResponse { data_schema }

POST/api/v1/extraction/extraction-agents/schema/validation

Generate Extraction Schema

client.extraction.extractionAgents.schema.generateSchema(, ?): SchemaGenerateSchemaResponse { data_schema }

POST/api/v1/extraction/extraction-agents/schema/generate