Skip to content
Get started

Extraction

Extract Stateless
client.extraction.run(ExtractionRunParams { config, data_schema, organization_id, 5 more } params, RequestOptionsoptions?): ExtractJob { id, extraction_agent, status, 3 more }
POST/api/v1/extraction/run

ExtractionJobs

List Jobs
client.extraction.jobs.list(JobListParams { extraction_agent_id } query, RequestOptionsoptions?): JobListResponse { id, extraction_agent, status, 3 more }
GET/api/v1/extraction/jobs
Run Job
client.extraction.jobs.create(JobCreateParams { extraction_agent_id, file_id, from_ui, 4 more } params, RequestOptionsoptions?): ExtractJob { id, extraction_agent, status, 3 more }
POST/api/v1/extraction/jobs
Get Job
client.extraction.jobs.get(stringjobID, RequestOptionsoptions?): ExtractJob { id, extraction_agent, status, 3 more }
GET/api/v1/extraction/jobs/{job_id}
Run Job On File
client.extraction.jobs.file(JobFileParams { extraction_agent_id, file, from_ui, 2 more } params, RequestOptionsoptions?): ExtractJob { id, extraction_agent, status, 3 more }
POST/api/v1/extraction/jobs/file
Get Job Result
client.extraction.jobs.getResult(stringjobID, JobGetResultParams { organization_id, project_id } query?, RequestOptionsoptions?): JobGetResultResponse { data, extraction_agent_id, extraction_metadata, run_id }
GET/api/v1/extraction/jobs/{job_id}/result
ModelsExpand Collapse
ExtractJob { id, extraction_agent, status, 3 more }

Schema for an extraction job.

id: string

The id of the extraction job

formatuuid
extraction_agent: ExtractAgent { id, config, data_schema, 5 more }

The agent that the job was run on.

id: string

The id of the extraction agent.

formatuuid
config: ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }

The configuration parameters for the extraction agent.

chunk_mode?: "PAGE" | "SECTION"

The mode to use for chunking the document.

Accepts one of the following:
"PAGE"
"SECTION"
Deprecatedcitation_bbox?: boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources?: boolean

Whether to cite sources for the extraction.

confidence_scores?: boolean

Whether to fetch confidence scores for the extraction.

extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:
"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"gemini-2.0-flash"
"gemini-2.5-flash"
"gemini-2.5-flash-lite"
"gemini-2.5-pro"
"openai-gpt-4o"
"openai-gpt-4o-mini"
(string & {})
extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:
"FAST"
"BALANCED"
"PREMIUM"
"MULTIMODAL"
extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:
"PER_DOC"
"PER_PAGE"
"PER_TABLE_ROW"
high_resolution_mode?: boolean

Whether to use high resolution mode for the extraction.

invalidate_cache?: boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode?: boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context?: number | null

Number of pages to pass as context on long document extraction.

minimum1
page_range?: string | null

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null

Public model names.

Accepts one of the following:
"openai-gpt-4o"
"openai-gpt-4o-mini"
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"openai-gpt-5-nano"
"openai-text-embedding-3-large"
"openai-text-embedding-3-small"
"openai-whisper-1"
"anthropic-sonnet-3.5"
"anthropic-sonnet-3.5-v2"
"anthropic-sonnet-3.7"
"anthropic-sonnet-4.0"
"anthropic-sonnet-4.5"
"anthropic-haiku-3.5"
"anthropic-haiku-4.5"
"gemini-2.5-flash"
"gemini-3.0-pro"
"gemini-2.5-pro"
"gemini-2.0-flash"
"gemini-2.0-flash-lite"
"gemini-2.5-flash-lite"
"gemini-1.5-flash"
"gemini-1.5-pro"
priority?: "low" | "medium" | "high" | "critical" | null

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
system_prompt?: string | null

The system prompt to use for the extraction.

use_reasoning?: boolean

Whether to use reasoning for the extraction.

data_schema: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>

The schema of the data.

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
name: string

The name of the extraction agent.

project_id: string

The ID of the project that the extraction agent belongs to.

formatuuid
created_at?: string | null

The creation time of the extraction agent.

formatdate-time
custom_configuration?: "default" | null

Custom configuration type for the extraction agent. Currently supports 'default'.

updated_at?: string | null

The last update time of the extraction agent.

formatdate-time
status: "PENDING" | "SUCCESS" | "ERROR" | 2 more

The status of the extraction job

Accepts one of the following:
"PENDING"
"SUCCESS"
"ERROR"
"PARTIAL_SUCCESS"
"CANCELLED"
error?: string | null

The error that occurred during extraction

Deprecatedfile?: File { id, name, project_id, 11 more } | null

Schema for a file.

id: string

Unique identifier

formatuuid
name: string
project_id: string

The ID of the project that the file belongs to

formatuuid
created_at?: string | null

Creation datetime

formatdate-time
data_source_id?: string | null

The ID of the data source that the file belongs to

formatuuid
expires_at?: string | null

The expiration date for the file. Files past this date can be deleted.

formatdate-time
external_file_id?: string | null

The ID of the file in the external system

file_size?: number | null

Size of the file in bytes

minimum0
file_type?: string | null

File type (e.g. pdf, docx, etc.)

maxLength3000
minLength1
last_modified_at?: string | null

The last modified time of the file

formatdate-time
permission_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null

Permission information for the file

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
purpose?: string | null

The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')

resource_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null

Resource information for the file

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
updated_at?: string | null

Update datetime

formatdate-time
file_id?: string | null

The id of the file that the extract was extracted from

formatuuid
WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }

Allows the user to configure webhook options for notifications and callbacks.

webhook_events?: Array<"extract.pending" | "extract.success" | "extract.error" | 13 more> | null

List of event names to subscribe to

Accepts one of the following:
"extract.pending"
"extract.success"
"extract.error"
"extract.partial_success"
"extract.cancelled"
"parse.pending"
"parse.success"
"parse.error"
"parse.partial_success"
"parse.cancelled"
"classify.pending"
"classify.success"
"classify.error"
"classify.partial_success"
"classify.cancelled"
"unmapped_event"
webhook_headers?: Record<string, string> | null

Custom HTTP headers to include with webhook requests.

webhook_output_format?: string | null

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url?: string | null

The URL to send webhook notifications to.

ExtractionRuns

List Extract Runs
client.extraction.runs.list(RunListParams { extraction_agent_id, limit, skip } query, RequestOptionsoptions?): PaginatedExtractRuns<ExtractRun { id, config, data_schema, 12 more } >
GET/api/v1/extraction/runs
Get Run
client.extraction.runs.get(stringrunID, RunGetParams { organization_id, project_id } query?, RequestOptionsoptions?): ExtractRun { id, config, data_schema, 12 more }
GET/api/v1/extraction/runs/{run_id}
Delete Extraction Run
client.extraction.runs.delete(stringrunID, RunDeleteParams { organization_id, project_id } params?, RequestOptionsoptions?): RunDeleteResponse
DELETE/api/v1/extraction/runs/{run_id}
Get Run By Job Id
client.extraction.runs.getByJob(stringjobID, RunGetByJobParams { organization_id, project_id } query?, RequestOptionsoptions?): ExtractRun { id, config, data_schema, 12 more }
GET/api/v1/extraction/runs/by-job/{job_id}
ModelsExpand Collapse
ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }

Configuration parameters for the extraction agent.

chunk_mode?: "PAGE" | "SECTION"

The mode to use for chunking the document.

Accepts one of the following:
"PAGE"
"SECTION"
Deprecatedcitation_bbox?: boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources?: boolean

Whether to cite sources for the extraction.

confidence_scores?: boolean

Whether to fetch confidence scores for the extraction.

extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:
"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"gemini-2.0-flash"
"gemini-2.5-flash"
"gemini-2.5-flash-lite"
"gemini-2.5-pro"
"openai-gpt-4o"
"openai-gpt-4o-mini"
(string & {})
extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:
"FAST"
"BALANCED"
"PREMIUM"
"MULTIMODAL"
extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:
"PER_DOC"
"PER_PAGE"
"PER_TABLE_ROW"
high_resolution_mode?: boolean

Whether to use high resolution mode for the extraction.

invalidate_cache?: boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode?: boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context?: number | null

Number of pages to pass as context on long document extraction.

minimum1
page_range?: string | null

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null

Public model names.

Accepts one of the following:
"openai-gpt-4o"
"openai-gpt-4o-mini"
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"openai-gpt-5-nano"
"openai-text-embedding-3-large"
"openai-text-embedding-3-small"
"openai-whisper-1"
"anthropic-sonnet-3.5"
"anthropic-sonnet-3.5-v2"
"anthropic-sonnet-3.7"
"anthropic-sonnet-4.0"
"anthropic-sonnet-4.5"
"anthropic-haiku-3.5"
"anthropic-haiku-4.5"
"gemini-2.5-flash"
"gemini-3.0-pro"
"gemini-2.5-pro"
"gemini-2.0-flash"
"gemini-2.0-flash-lite"
"gemini-2.5-flash-lite"
"gemini-1.5-flash"
"gemini-1.5-pro"
priority?: "low" | "medium" | "high" | "critical" | null

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
system_prompt?: string | null

The system prompt to use for the extraction.

use_reasoning?: boolean

Whether to use reasoning for the extraction.

ExtractRun { id, config, data_schema, 12 more }

Schema for an extraction run.

id: string

The id of the extraction run

formatuuid
config: ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }

The config used for extraction

chunk_mode?: "PAGE" | "SECTION"

The mode to use for chunking the document.

Accepts one of the following:
"PAGE"
"SECTION"
Deprecatedcitation_bbox?: boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources?: boolean

Whether to cite sources for the extraction.

confidence_scores?: boolean

Whether to fetch confidence scores for the extraction.

extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:
"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"gemini-2.0-flash"
"gemini-2.5-flash"
"gemini-2.5-flash-lite"
"gemini-2.5-pro"
"openai-gpt-4o"
"openai-gpt-4o-mini"
(string & {})
extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:
"FAST"
"BALANCED"
"PREMIUM"
"MULTIMODAL"
extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:
"PER_DOC"
"PER_PAGE"
"PER_TABLE_ROW"
high_resolution_mode?: boolean

Whether to use high resolution mode for the extraction.

invalidate_cache?: boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode?: boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context?: number | null

Number of pages to pass as context on long document extraction.

minimum1
page_range?: string | null

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null

Public model names.

Accepts one of the following:
"openai-gpt-4o"
"openai-gpt-4o-mini"
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"openai-gpt-5-nano"
"openai-text-embedding-3-large"
"openai-text-embedding-3-small"
"openai-whisper-1"
"anthropic-sonnet-3.5"
"anthropic-sonnet-3.5-v2"
"anthropic-sonnet-3.7"
"anthropic-sonnet-4.0"
"anthropic-sonnet-4.5"
"anthropic-haiku-3.5"
"anthropic-haiku-4.5"
"gemini-2.5-flash"
"gemini-3.0-pro"
"gemini-2.5-pro"
"gemini-2.0-flash"
"gemini-2.0-flash-lite"
"gemini-2.5-flash-lite"
"gemini-1.5-flash"
"gemini-1.5-pro"
priority?: "low" | "medium" | "high" | "critical" | null

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
system_prompt?: string | null

The system prompt to use for the extraction.

use_reasoning?: boolean

Whether to use reasoning for the extraction.

data_schema: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>

The schema used for extraction

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
extraction_agent_id: string

The id of the extraction agent

formatuuid
from_ui: boolean

Whether this extraction run was triggered from the UI

project_id: string

The id of the project that the extraction run belongs to

formatuuid
status: "CREATED" | "PENDING" | "SUCCESS" | "ERROR"

The status of the extraction run

Accepts one of the following:
"CREATED"
"PENDING"
"SUCCESS"
"ERROR"
created_at?: string | null

Creation datetime

formatdate-time
data?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | Array<Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>> | null

The data extracted from the file

Accepts one of the following:
Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>
Record<string, unknown>
Array<unknown>
string
number
boolean
Array<Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>>
Record<string, unknown>
Array<unknown>
string
number
boolean
error?: string | null

The error that occurred during extraction

extraction_metadata?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null

The metadata extracted from the file

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
Deprecatedfile?: File { id, name, project_id, 11 more } | null

Schema for a file.

id: string

Unique identifier

formatuuid
name: string
project_id: string

The ID of the project that the file belongs to

formatuuid
created_at?: string | null

Creation datetime

formatdate-time
data_source_id?: string | null

The ID of the data source that the file belongs to

formatuuid
expires_at?: string | null

The expiration date for the file. Files past this date can be deleted.

formatdate-time
external_file_id?: string | null

The ID of the file in the external system

file_size?: number | null

Size of the file in bytes

minimum0
file_type?: string | null

File type (e.g. pdf, docx, etc.)

maxLength3000
minLength1
last_modified_at?: string | null

The last modified time of the file

formatdate-time
permission_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null

Permission information for the file

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
purpose?: string | null

The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')

resource_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null

Resource information for the file

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
updated_at?: string | null

Update datetime

formatdate-time
file_id?: string | null

The id of the file that the extract was extracted from

formatuuid
job_id?: string | null

The id of the job that the extraction run belongs to

formatuuid
updated_at?: string | null

Update datetime

formatdate-time

ExtractionExtraction Agents

Create Extraction Agent
client.extraction.extractionAgents.create(ExtractionAgentCreateParams { config, data_schema, name, 2 more } params, RequestOptionsoptions?): ExtractAgent { id, config, data_schema, 5 more }
POST/api/v1/extraction/extraction-agents
List Extraction Agents
client.extraction.extractionAgents.list(ExtractionAgentListParams { include_default, organization_id, project_id } query?, RequestOptionsoptions?): ExtractionAgentListResponse { id, config, data_schema, 5 more }
GET/api/v1/extraction/extraction-agents
Get Extraction Agent
client.extraction.extractionAgents.get(stringextractionAgentID, RequestOptionsoptions?): ExtractAgent { id, config, data_schema, 5 more }
GET/api/v1/extraction/extraction-agents/{extraction_agent_id}
Delete Extraction Agent
client.extraction.extractionAgents.delete(stringextractionAgentID, RequestOptionsoptions?): ExtractionAgentDeleteResponse
DELETE/api/v1/extraction/extraction-agents/{extraction_agent_id}
Update Extraction Agent
client.extraction.extractionAgents.update(stringextractionAgentID, ExtractionAgentUpdateParams { config, data_schema } body, RequestOptionsoptions?): ExtractAgent { id, config, data_schema, 5 more }
PUT/api/v1/extraction/extraction-agents/{extraction_agent_id}
ModelsExpand Collapse
ExtractAgent { id, config, data_schema, 5 more }

Schema and configuration for an extraction agent.

id: string

The id of the extraction agent.

formatuuid
config: ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }

The configuration parameters for the extraction agent.

chunk_mode?: "PAGE" | "SECTION"

The mode to use for chunking the document.

Accepts one of the following:
"PAGE"
"SECTION"
Deprecatedcitation_bbox?: boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources?: boolean

Whether to cite sources for the extraction.

confidence_scores?: boolean

Whether to fetch confidence scores for the extraction.

extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:
"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"gemini-2.0-flash"
"gemini-2.5-flash"
"gemini-2.5-flash-lite"
"gemini-2.5-pro"
"openai-gpt-4o"
"openai-gpt-4o-mini"
(string & {})
extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:
"FAST"
"BALANCED"
"PREMIUM"
"MULTIMODAL"
extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:
"PER_DOC"
"PER_PAGE"
"PER_TABLE_ROW"
high_resolution_mode?: boolean

Whether to use high resolution mode for the extraction.

invalidate_cache?: boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode?: boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context?: number | null

Number of pages to pass as context on long document extraction.

minimum1
page_range?: string | null

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null

Public model names.

Accepts one of the following:
"openai-gpt-4o"
"openai-gpt-4o-mini"
"openai-gpt-4-1"
"openai-gpt-4-1-mini"
"openai-gpt-4-1-nano"
"openai-gpt-5"
"openai-gpt-5-mini"
"openai-gpt-5-nano"
"openai-text-embedding-3-large"
"openai-text-embedding-3-small"
"openai-whisper-1"
"anthropic-sonnet-3.5"
"anthropic-sonnet-3.5-v2"
"anthropic-sonnet-3.7"
"anthropic-sonnet-4.0"
"anthropic-sonnet-4.5"
"anthropic-haiku-3.5"
"anthropic-haiku-4.5"
"gemini-2.5-flash"
"gemini-3.0-pro"
"gemini-2.5-pro"
"gemini-2.0-flash"
"gemini-2.0-flash-lite"
"gemini-2.5-flash-lite"
"gemini-1.5-flash"
"gemini-1.5-pro"
priority?: "low" | "medium" | "high" | "critical" | null

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
system_prompt?: string | null

The system prompt to use for the extraction.

use_reasoning?: boolean

Whether to use reasoning for the extraction.

data_schema: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>

The schema of the data.

Accepts one of the following:
Record<string, unknown>
Array<unknown>
string
number
boolean
name: string

The name of the extraction agent.

project_id: string

The ID of the project that the extraction agent belongs to.

formatuuid
created_at?: string | null

The creation time of the extraction agent.

formatdate-time
custom_configuration?: "default" | null

Custom configuration type for the extraction agent. Currently supports 'default'.

updated_at?: string | null

The last update time of the extraction agent.

formatdate-time

ExtractionExtraction AgentsSchema

Validate Extraction Schema
client.extraction.extractionAgents.schema.validateSchema(SchemaValidateSchemaParams { data_schema } body, RequestOptionsoptions?): SchemaValidateSchemaResponse { data_schema }
POST/api/v1/extraction/extraction-agents/schema/validation
Generate Extraction Schema
client.extraction.extractionAgents.schema.generateSchema(SchemaGenerateSchemaParams { organization_id, project_id, data_schema, 2 more } params, RequestOptionsoptions?): SchemaGenerateSchemaResponse { data_schema }
POST/api/v1/extraction/extraction-agents/schema/generate