Runs

List Extract Runs

GET/api/v1/extraction/runs

Get Run

GET/api/v1/extraction/runs/{run_id}

Delete Extraction Run

DELETE/api/v1/extraction/runs/{run_id}

Get Run By Job Id

GET/api/v1/extraction/runs/by-job/{job_id}

ModelsExpand Collapse

ExtractConfig = object { chunk_mode, citation_bbox, cite_sources, 13 more }

Configuration parameters for the extraction agent.

chunk_mode: optional "PAGE" or "SECTION"

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox: optional boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources: optional boolean

Whether to cite sources for the extraction.

confidence_scores: optional boolean

Whether to fetch confidence scores for the extraction.

extract_model: optional "openai-gpt-4-1" or "openai-gpt-4-1-mini" or "openai-gpt-4-1-nano" or 8 more or string

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

ExtractModels = "openai-gpt-4-1" or "openai-gpt-4-1-mini" or "openai-gpt-4-1-nano" or 8 more

Extract model options.

Accepts one of the following:

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

UnionMember1 = string

extraction_mode: optional "FAST" or "BALANCED" or "PREMIUM" or "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target: optional "PER_DOC" or "PER_PAGE" or "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode: optional boolean

Whether to use high resolution mode for the extraction.

invalidate_cache: optional boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode: optional boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context: optional number

Number of pages to pass as context on long document extraction.

minimum1

page_range: optional string

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model: optional "openai-gpt-4o" or "openai-gpt-4o-mini" or "openai-gpt-4-1" or 23 more

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt: optional string

The system prompt to use for the extraction.

use_reasoning: optional boolean

Whether to use reasoning for the extraction.

ExtractRun = object { id, config, data_schema, 12 more }

Schema for an extraction run.

id: string

The id of the extraction run

formatuuid

config: ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }

The config used for extraction

chunk_mode: optional "PAGE" or "SECTION"

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox: optional boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources: optional boolean

Whether to cite sources for the extraction.

confidence_scores: optional boolean

Whether to fetch confidence scores for the extraction.

extract_model: optional "openai-gpt-4-1" or "openai-gpt-4-1-mini" or "openai-gpt-4-1-nano" or 8 more or string

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

ExtractModels = "openai-gpt-4-1" or "openai-gpt-4-1-mini" or "openai-gpt-4-1-nano" or 8 more

Extract model options.

Accepts one of the following:

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

UnionMember1 = string

extraction_mode: optional "FAST" or "BALANCED" or "PREMIUM" or "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target: optional "PER_DOC" or "PER_PAGE" or "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode: optional boolean

Whether to use high resolution mode for the extraction.

invalidate_cache: optional boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode: optional boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context: optional number

Number of pages to pass as context on long document extraction.

minimum1

page_range: optional string

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model: optional "openai-gpt-4o" or "openai-gpt-4o-mini" or "openai-gpt-4-1" or 23 more

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt: optional string

The system prompt to use for the extraction.

use_reasoning: optional boolean

Whether to use reasoning for the extraction.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

The schema used for extraction

Accepts one of the following:

UnionMember0 = map[unknown]

UnionMember1 = array of unknown

UnionMember2 = string

UnionMember3 = number

UnionMember4 = boolean

extraction_agent_id: string

The id of the extraction agent

formatuuid

from_ui: boolean

Whether this extraction run was triggered from the UI

project_id: string

The id of the project that the extraction run belongs to

formatuuid

status: "CREATED" or "PENDING" or "SUCCESS" or "ERROR"

The status of the extraction run

Accepts one of the following:

"CREATED"

"PENDING"

"SUCCESS"

"ERROR"

created_at: optional string

Creation datetime

formatdate-time

data: optional map[map[unknown] or array of unknown or string or 2 more] or array of map[map[unknown] or array of unknown or string or 2 more]

The data extracted from the file

Accepts one of the following:

UnionMember0 = map[map[unknown] or array of unknown or string or 2 more]

Accepts one of the following:

UnionMember0 = map[unknown]

UnionMember1 = array of unknown

UnionMember2 = string

UnionMember3 = number

UnionMember4 = boolean

UnionMember1 = array of map[map[unknown] or array of unknown or string or 2 more]

Accepts one of the following:

UnionMember0 = map[unknown]

UnionMember1 = array of unknown

UnionMember2 = string

UnionMember3 = number

UnionMember4 = boolean

error: optional string

The error that occurred during extraction

extraction_metadata: optional map[map[unknown] or array of unknown or string or 2 more]

The metadata extracted from the file

Accepts one of the following:

UnionMember0 = map[unknown]

UnionMember1 = array of unknown

UnionMember2 = string

UnionMember3 = number

UnionMember4 = boolean

Deprecatedfile: optional File { id, name, project_id, 11 more }

Schema for a file.

id: string

Unique identifier

formatuuid

name: string

project_id: string

The ID of the project that the file belongs to

formatuuid

created_at: optional string

Creation datetime

formatdate-time

data_source_id: optional string

The ID of the data source that the file belongs to

formatuuid

expires_at: optional string

The expiration date for the file. Files past this date can be deleted.

formatdate-time

external_file_id: optional string

The ID of the file in the external system

file_size: optional number

Size of the file in bytes

minimum0

file_type: optional string

File type (e.g. pdf, docx, etc.)

maxLength3000

minLength1

last_modified_at: optional string

The last modified time of the file

formatdate-time

permission_info: optional map[map[unknown] or array of unknown or string or 2 more]

Permission information for the file

Accepts one of the following:

UnionMember0 = map[unknown]

UnionMember1 = array of unknown

UnionMember2 = string

UnionMember3 = number

UnionMember4 = boolean

purpose: optional string

The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')

resource_info: optional map[map[unknown] or array of unknown or string or 2 more]

Resource information for the file

Accepts one of the following:

UnionMember0 = map[unknown]

UnionMember1 = array of unknown

UnionMember2 = string

UnionMember3 = number

UnionMember4 = boolean

updated_at: optional string

Update datetime

formatdate-time

file_id: optional string

The id of the file that the extract was extracted from

formatuuid

job_id: optional string

The id of the job that the extraction run belongs to

formatuuid

updated_at: optional string

Update datetime

formatdate-time