Extract

ModelsExpand Collapse

ExtractConfiguration object { data_schema, cite_sources, confidence_scores, 8 more }

Extract configuration combining parse and extract settings.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:

map[unknown]

array of unknown

string

number

boolean

cite_sources: optional boolean

Include citations in results

confidence_scores: optional boolean

Include confidence scores in results

extraction_target: optional "per_doc" or "per_page" or "per_table_row"

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:

"per_doc"

"per_page"

"per_table_row"

max_pages: optional number

Maximum number of pages to process. Omit for no limit.

minimum1

parse_config_id: optional string

Saved parse configuration ID to control how the document is parsed before extraction

parse_tier: optional string

Parse tier to use before extraction. Defaults to the extract tier if not specified.

system_prompt: optional string

Custom system prompt to guide extraction behavior

target_pages: optional string

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

tier: optional "agentic" or "agentic_plus" or "cost_effective"

Extract tier: cost_effective (5 credits/page), agentic (15 credits/page), or agentic_plus (50 credits/page)

One of the following:

"agentic"

"agentic_plus"

"cost_effective"

version: optional string

Use ‘latest’ for the latest release for the selected tier or a date string (YYYY-MM-DD format) to pin to the nearest release at or before that date. Job responses always report the concrete resolved version the job runs, fixed at job creation; saved configurations keep the value as provided.

ExtractJobMetadata object { field_metadata, parse_job_id, parse_tier }

Extraction metadata.

field_metadata: optional ExtractedFieldMetadata { document_metadata, page_metadata, row_metadata }

Metadata for extracted fields including document, page, and row level info.

document_metadata: optional map[map[unknown] or array of unknown or string or 2 more]

Per-field metadata keyed by field name from your schema. Scalar fields (e.g. vendor) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. items) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.

One of the following:

map[unknown]

array of unknown

string

number

boolean

page_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-page metadata when extraction_target is per_page

One of the following:

map[unknown]

array of unknown

string

number

boolean

row_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-row metadata when extraction_target is per_table_row

One of the following:

map[unknown]

array of unknown

string

number

boolean

parse_job_id: optional string

Reference to the ParseJob ID used for parsing

parse_tier: optional string

Parse tier used for parsing the document

ExtractJobUsage object { num_pages_billed, num_pages_extracted }

Extraction usage metrics.

num_pages_billed: optional number

Number of effective pages billed

num_pages_extracted: optional number

Number of pages extracted

ExtractV2Job object { id, created_at, file_input, 9 more }

An extraction job.

id: string

Unique job identifier (job_id)

created_at: string

Creation timestamp

formatdate-time

file_input: string

File ID or parse job ID that was extracted

project_id: string

Project this job belongs to

status: string

Current job status.

PENDING — queued, not yet started
RUNNING — actively processing
COMPLETED — finished successfully
FAILED — terminated with an error
CANCELLED — cancelled by user

updated_at: string

Last update timestamp

formatdate-time

configuration: optional ExtractConfiguration { data_schema, cite_sources, confidence_scores, 8 more }

Extract configuration combining parse and extract settings.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:

map[unknown]

array of unknown

string

number

boolean

cite_sources: optional boolean

Include citations in results

confidence_scores: optional boolean

Include confidence scores in results

extraction_target: optional "per_doc" or "per_page" or "per_table_row"

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:

"per_doc"

"per_page"

"per_table_row"

max_pages: optional number

Maximum number of pages to process. Omit for no limit.

minimum1

parse_config_id: optional string

Saved parse configuration ID to control how the document is parsed before extraction

parse_tier: optional string

Parse tier to use before extraction. Defaults to the extract tier if not specified.

system_prompt: optional string

Custom system prompt to guide extraction behavior

target_pages: optional string

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

tier: optional "agentic" or "agentic_plus" or "cost_effective"

Extract tier: cost_effective (5 credits/page), agentic (15 credits/page), or agentic_plus (50 credits/page)

One of the following:

"agentic"

"agentic_plus"

"cost_effective"

version: optional string

configuration_id: optional string

Saved extract configuration ID used for this job, if any

error_message: optional string

Error details when status is FAILED

extract_metadata: optional ExtractJobMetadata { field_metadata, parse_job_id, parse_tier }

Extraction metadata.

field_metadata: optional ExtractedFieldMetadata { document_metadata, page_metadata, row_metadata }

Metadata for extracted fields including document, page, and row level info.

document_metadata: optional map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

page_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-page metadata when extraction_target is per_page

One of the following:

map[unknown]

array of unknown

string

number

boolean

row_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-row metadata when extraction_target is per_table_row

One of the following:

map[unknown]

array of unknown

string

number

boolean

parse_job_id: optional string

Reference to the ParseJob ID used for parsing

parse_tier: optional string

Parse tier used for parsing the document

extract_result: optional map[map[unknown] or array of unknown or string or 2 more] or array of map[map[unknown] or array of unknown or string or 2 more]

Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.

One of the following:

map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

array of map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

metadata: optional object { usage }

Job-level metadata.

usage: optional ExtractJobUsage { num_pages_billed, num_pages_extracted }

Extraction usage metrics.

num_pages_billed: optional number

Number of effective pages billed

num_pages_extracted: optional number

Number of pages extracted

ExtractV2JobCreate object { file_input, configuration, configuration_id, webhook_configurations }

Request to create an extraction job. Provide configuration_id or inline configuration.

file_input: string

File ID or parse job ID to extract from

maxLength200

configuration: optional ExtractConfiguration { data_schema, cite_sources, confidence_scores, 8 more }

Extract configuration combining parse and extract settings.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:

map[unknown]

array of unknown

string

number

boolean

cite_sources: optional boolean

Include citations in results

confidence_scores: optional boolean

Include confidence scores in results

extraction_target: optional "per_doc" or "per_page" or "per_table_row"

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:

"per_doc"

"per_page"

"per_table_row"

max_pages: optional number

Maximum number of pages to process. Omit for no limit.

minimum1

parse_config_id: optional string

Saved parse configuration ID to control how the document is parsed before extraction

parse_tier: optional string

Parse tier to use before extraction. Defaults to the extract tier if not specified.

system_prompt: optional string

Custom system prompt to guide extraction behavior

target_pages: optional string

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

tier: optional "agentic" or "agentic_plus" or "cost_effective"

Extract tier: cost_effective (5 credits/page), agentic (15 credits/page), or agentic_plus (50 credits/page)

One of the following:

"agentic"

"agentic_plus"

"cost_effective"

version: optional string

configuration_id: optional string

Saved configuration ID

webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_output_format, 2 more }

Outbound webhook endpoints to notify on job status changes

webhook_events: optional array of "classify.cancelled" or "classify.error" or "classify.partial_success" or 25 more

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:

"classify.cancelled"

"classify.error"

"classify.partial_success"

"classify.pending"

"classify.running"

"classify.success"

"extract.cancelled"

"extract.error"

"extract.partial_success"

"extract.pending"

"extract.success"

"parse.cancelled"

"parse.error"

"parse.partial_success"

"parse.pending"

"parse.running"

"parse.success"

"sheets.cancelled"

"sheets.error"

"sheets.partial_success"

"sheets.pending"

"sheets.success"

"split.cancelled"

"split.error"

"split.pending"

"split.processing"

"split.success"

"unmapped_event"

webhook_headers: optional map[string]

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

webhook_output_format: optional string

Response format sent to the webhook: ‘string’ (default) or ‘json’

webhook_signing_secret: optional string

Shared signing secret used to sign webhook deliveries. When set, each request includes an HMAC-SHA256 signature of the request body in the ‘LC-Signature’ header (value ‘sha256=’). Recompute the HMAC over the raw request body with this secret to verify the delivery is authentic.

webhook_url: optional string

URL to receive webhook POST notifications

ExtractV2JobQueryResponse object { items, next_page_token, total_size }

Paginated list of extraction jobs.

items: array of ExtractV2Job { id, created_at, file_input, 9 more }

The list of items.

id: string

Unique job identifier (job_id)

created_at: string

Creation timestamp

formatdate-time

file_input: string

File ID or parse job ID that was extracted

project_id: string

Project this job belongs to

status: string

Current job status.

PENDING — queued, not yet started
RUNNING — actively processing
COMPLETED — finished successfully
FAILED — terminated with an error
CANCELLED — cancelled by user

updated_at: string

Last update timestamp

formatdate-time

configuration: optional ExtractConfiguration { data_schema, cite_sources, confidence_scores, 8 more }

Extract configuration combining parse and extract settings.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:

map[unknown]

array of unknown

string

number

boolean

cite_sources: optional boolean

Include citations in results

confidence_scores: optional boolean

Include confidence scores in results

extraction_target: optional "per_doc" or "per_page" or "per_table_row"

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:

"per_doc"

"per_page"

"per_table_row"

max_pages: optional number

Maximum number of pages to process. Omit for no limit.

minimum1

parse_config_id: optional string

Saved parse configuration ID to control how the document is parsed before extraction

parse_tier: optional string

Parse tier to use before extraction. Defaults to the extract tier if not specified.

system_prompt: optional string

Custom system prompt to guide extraction behavior

target_pages: optional string

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

tier: optional "agentic" or "agentic_plus" or "cost_effective"

Extract tier: cost_effective (5 credits/page), agentic (15 credits/page), or agentic_plus (50 credits/page)

One of the following:

"agentic"

"agentic_plus"

"cost_effective"

version: optional string

configuration_id: optional string

Saved extract configuration ID used for this job, if any

error_message: optional string

Error details when status is FAILED

extract_metadata: optional ExtractJobMetadata { field_metadata, parse_job_id, parse_tier }

Extraction metadata.

field_metadata: optional ExtractedFieldMetadata { document_metadata, page_metadata, row_metadata }

Metadata for extracted fields including document, page, and row level info.

document_metadata: optional map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

page_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-page metadata when extraction_target is per_page

One of the following:

map[unknown]

array of unknown

string

number

boolean

row_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-row metadata when extraction_target is per_table_row

One of the following:

map[unknown]

array of unknown

string

number

boolean

parse_job_id: optional string

Reference to the ParseJob ID used for parsing

parse_tier: optional string

Parse tier used for parsing the document

extract_result: optional map[map[unknown] or array of unknown or string or 2 more] or array of map[map[unknown] or array of unknown or string or 2 more]

Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.

One of the following:

map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

array of map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

metadata: optional object { usage }

Job-level metadata.

usage: optional ExtractJobUsage { num_pages_billed, num_pages_extracted }

Extraction usage metrics.

num_pages_billed: optional number

Number of effective pages billed

num_pages_extracted: optional number

Number of pages extracted

next_page_token: optional string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

total_size: optional number

The total number of items available. This is only populated when specifically requested. The value may be an estimate and can be used for display purposes only.

ExtractV2SchemaGenerateRequest object { data_schema, file_id, name, prompt }

Request schema for generating an extraction schema.

data_schema: optional map[map[unknown] or array of unknown or string or 2 more]

Optional schema to validate, refine, or extend

One of the following:

map[unknown]

array of unknown

string

number

boolean

file_id: optional string

Optional file ID to analyze for schema generation

name: optional string

Name for the generated configuration (auto-generated if omitted)

maxLength255

prompt: optional string

Natural language description of the data structure to extract

ExtractV2SchemaValidateRequest object { data_schema }

Request schema for validating an extraction schema.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON Schema to validate for use with extract jobs

One of the following:

map[unknown]

array of unknown

string

number

boolean

ExtractV2SchemaValidateResponse object { data_schema }

Response schema for schema validation.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

Validated JSON Schema, ready for use in extract jobs

One of the following:

map[unknown]

array of unknown

string

number

boolean

ExtractedFieldMetadata object { document_metadata, page_metadata, row_metadata }

Metadata for extracted fields including document, page, and row level info.

document_metadata: optional map[map[unknown] or array of unknown or string or 2 more]

One of the following:

map[unknown]

array of unknown

string

number

boolean

page_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-page metadata when extraction_target is per_page

One of the following:

map[unknown]

array of unknown

string

number

boolean

row_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-row metadata when extraction_target is per_table_row

One of the following:

map[unknown]

array of unknown

string

number

boolean

ExtractDeleteResponse = unknown

Extract

Create Extract Job

List Extract Jobs

Get Extract Job

Delete Extract Job

Validate Extraction Schema

Generate Extraction Schema

ModelsExpand Collapse