Skip to content
Framework Docs

Extract

Create Extract Job
client.Extract.New(ctx, params) (*ExtractV2Job, error)
POST/api/v2/extract
List Extract Jobs
client.Extract.List(ctx, query) (*PaginatedCursor[ExtractV2Job], error)
GET/api/v2/extract
Get Extract Job
client.Extract.Get(ctx, jobID, query) (*ExtractV2Job, error)
GET/api/v2/extract/{job_id}
Delete Extract Job
client.Extract.Delete(ctx, jobID, body) (*ExtractDeleteResponse, error)
DELETE/api/v2/extract/{job_id}
Validate Extraction Schema
client.Extract.ValidateSchema(ctx, body) (*ExtractV2SchemaValidateResponse, error)
POST/api/v2/extract/schema/validation
Generate Extraction Schema
client.Extract.GenerateSchema(ctx, params) (*ConfigurationCreate, error)
POST/api/v2/extract/schema/generate
ModelsExpand Collapse
type ExtractConfiguration struct{…}

Extract configuration combining parse and extract settings.

DataSchema map[string, ExtractConfigurationDataSchemaUnion]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:
type ExtractConfigurationDataSchemaMap map[string, any]
type ExtractConfigurationDataSchemaArray []any
string
float64
bool
CiteSources booloptional

Include citations in results

ConfidenceScores booloptional

Include confidence scores in results

ExtractVersion stringoptional

Extract algorithm version. Use 'latest' for the default pipeline or a date string (e.g. '2026-01-08') to pin to a specific release.

ExtractionTarget ExtractConfigurationExtractionTargetoptional

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:
const ExtractConfigurationExtractionTargetPerDoc ExtractConfigurationExtractionTarget = "per_doc"
const ExtractConfigurationExtractionTargetPerPage ExtractConfigurationExtractionTarget = "per_page"
const ExtractConfigurationExtractionTargetPerTableRow ExtractConfigurationExtractionTarget = "per_table_row"
MaxPages int64optional

Maximum number of pages to process. Omit for no limit.

minimum1
ParseConfigID stringoptional

Saved parse configuration ID to control how the document is parsed before extraction

ParseTier stringoptional

Parse tier to use before extraction. Defaults to the extract tier if not specified.

SystemPrompt stringoptional

Custom system prompt to guide extraction behavior

TargetPages stringoptional

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

Tier ExtractConfigurationTieroptional

Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)

One of the following:
const ExtractConfigurationTierCostEffective ExtractConfigurationTier = "cost_effective"
const ExtractConfigurationTierAgentic ExtractConfigurationTier = "agentic"
type ExtractJobMetadata struct{…}

Extraction metadata.

FieldMetadata ExtractedFieldMetadataoptional

Metadata for extracted fields including document, page, and row level info.

DocumentMetadata map[string, ExtractedFieldMetadataDocumentMetadataUnion]optional

Per-field metadata keyed by field name from your schema. Scalar fields (e.g. vendor) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. items) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.

One of the following:
type ExtractedFieldMetadataDocumentMetadataMap map[string, any]
type ExtractedFieldMetadataDocumentMetadataArray []any
string
float64
bool
PageMetadata []map[string, ExtractedFieldMetadataPageMetadataUnion]optional

Per-page metadata when extraction_target is per_page

One of the following:
type ExtractedFieldMetadataPageMetadataMap map[string, any]
type ExtractedFieldMetadataPageMetadataArray []any
string
float64
bool
RowMetadata []map[string, ExtractedFieldMetadataRowMetadataUnion]optional

Per-row metadata when extraction_target is per_table_row

One of the following:
type ExtractedFieldMetadataRowMetadataMap map[string, any]
type ExtractedFieldMetadataRowMetadataArray []any
string
float64
bool
ParseJobID stringoptional

Reference to the ParseJob ID used for parsing

ParseTier stringoptional

Parse tier used for parsing the document

type ExtractJobUsage struct{…}

Extraction usage metrics.

NumDocumentTokens int64optional

Number of document tokens

NumOutputTokens int64optional

Number of output tokens

NumPagesExtracted int64optional

Number of pages extracted

type ExtractV2Job struct{…}

An extraction job.

ID string

Unique job identifier (job_id)

CreatedAt Time

Creation timestamp

formatdate-time
FileInput string

File ID or parse job ID that was extracted

ProjectID string

Project this job belongs to

Status string

Current job status.

  • PENDING — queued, not yet started
  • RUNNING — actively processing
  • COMPLETED — finished successfully
  • FAILED — terminated with an error
  • CANCELLED — cancelled by user
UpdatedAt Time

Last update timestamp

formatdate-time
Configuration ExtractConfigurationoptional

Extract configuration combining parse and extract settings.

DataSchema map[string, ExtractConfigurationDataSchemaUnion]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:
type ExtractConfigurationDataSchemaMap map[string, any]
type ExtractConfigurationDataSchemaArray []any
string
float64
bool
CiteSources booloptional

Include citations in results

ConfidenceScores booloptional

Include confidence scores in results

ExtractVersion stringoptional

Extract algorithm version. Use 'latest' for the default pipeline or a date string (e.g. '2026-01-08') to pin to a specific release.

ExtractionTarget ExtractConfigurationExtractionTargetoptional

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:
const ExtractConfigurationExtractionTargetPerDoc ExtractConfigurationExtractionTarget = "per_doc"
const ExtractConfigurationExtractionTargetPerPage ExtractConfigurationExtractionTarget = "per_page"
const ExtractConfigurationExtractionTargetPerTableRow ExtractConfigurationExtractionTarget = "per_table_row"
MaxPages int64optional

Maximum number of pages to process. Omit for no limit.

minimum1
ParseConfigID stringoptional

Saved parse configuration ID to control how the document is parsed before extraction

ParseTier stringoptional

Parse tier to use before extraction. Defaults to the extract tier if not specified.

SystemPrompt stringoptional

Custom system prompt to guide extraction behavior

TargetPages stringoptional

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

Tier ExtractConfigurationTieroptional

Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)

One of the following:
const ExtractConfigurationTierCostEffective ExtractConfigurationTier = "cost_effective"
const ExtractConfigurationTierAgentic ExtractConfigurationTier = "agentic"
ConfigurationID stringoptional

Saved extract configuration ID used for this job, if any

ErrorMessage stringoptional

Error details when status is FAILED

ExtractMetadata ExtractJobMetadataoptional

Extraction metadata.

FieldMetadata ExtractedFieldMetadataoptional

Metadata for extracted fields including document, page, and row level info.

DocumentMetadata map[string, ExtractedFieldMetadataDocumentMetadataUnion]optional

Per-field metadata keyed by field name from your schema. Scalar fields (e.g. vendor) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. items) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.

One of the following:
type ExtractedFieldMetadataDocumentMetadataMap map[string, any]
type ExtractedFieldMetadataDocumentMetadataArray []any
string
float64
bool
PageMetadata []map[string, ExtractedFieldMetadataPageMetadataUnion]optional

Per-page metadata when extraction_target is per_page

One of the following:
type ExtractedFieldMetadataPageMetadataMap map[string, any]
type ExtractedFieldMetadataPageMetadataArray []any
string
float64
bool
RowMetadata []map[string, ExtractedFieldMetadataRowMetadataUnion]optional

Per-row metadata when extraction_target is per_table_row

One of the following:
type ExtractedFieldMetadataRowMetadataMap map[string, any]
type ExtractedFieldMetadataRowMetadataArray []any
string
float64
bool
ParseJobID stringoptional

Reference to the ParseJob ID used for parsing

ParseTier stringoptional

Parse tier used for parsing the document

ExtractResult ExtractV2JobExtractResultUnionoptional

Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.

One of the following:
type ExtractV2JobExtractResultMap map[string, ExtractV2JobExtractResultMapItemUnion]
One of the following:
type ExtractV2JobExtractResultMapItemMap map[string, any]
type ExtractV2JobExtractResultMapItemArray []any
string
float64
bool
type ExtractV2JobExtractResultArray []map[string, ExtractV2JobExtractResultArrayItemUnion]
One of the following:
type ExtractV2JobExtractResultArrayItemMap map[string, any]
type ExtractV2JobExtractResultArrayItemArray []any
string
float64
bool
Metadata ExtractV2JobMetadataoptional

Job-level metadata.

Usage ExtractJobUsageoptional

Extraction usage metrics.

NumDocumentTokens int64optional

Number of document tokens

NumOutputTokens int64optional

Number of output tokens

NumPagesExtracted int64optional

Number of pages extracted

type ExtractV2JobCreate struct{…}

Request to create an extraction job. Provide configuration_id or inline configuration.

FileInput string

File ID or parse job ID to extract from

maxLength200
Configuration ExtractConfigurationoptional

Extract configuration combining parse and extract settings.

DataSchema map[string, ExtractConfigurationDataSchemaUnion]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:
type ExtractConfigurationDataSchemaMap map[string, any]
type ExtractConfigurationDataSchemaArray []any
string
float64
bool
CiteSources booloptional

Include citations in results

ConfidenceScores booloptional

Include confidence scores in results

ExtractVersion stringoptional

Extract algorithm version. Use 'latest' for the default pipeline or a date string (e.g. '2026-01-08') to pin to a specific release.

ExtractionTarget ExtractConfigurationExtractionTargetoptional

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:
const ExtractConfigurationExtractionTargetPerDoc ExtractConfigurationExtractionTarget = "per_doc"
const ExtractConfigurationExtractionTargetPerPage ExtractConfigurationExtractionTarget = "per_page"
const ExtractConfigurationExtractionTargetPerTableRow ExtractConfigurationExtractionTarget = "per_table_row"
MaxPages int64optional

Maximum number of pages to process. Omit for no limit.

minimum1
ParseConfigID stringoptional

Saved parse configuration ID to control how the document is parsed before extraction

ParseTier stringoptional

Parse tier to use before extraction. Defaults to the extract tier if not specified.

SystemPrompt stringoptional

Custom system prompt to guide extraction behavior

TargetPages stringoptional

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

Tier ExtractConfigurationTieroptional

Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)

One of the following:
const ExtractConfigurationTierCostEffective ExtractConfigurationTier = "cost_effective"
const ExtractConfigurationTierAgentic ExtractConfigurationTier = "agentic"
ConfigurationID stringoptional

Saved configuration ID

WebhookConfigurations []ExtractV2JobCreateWebhookConfigurationoptional

Outbound webhook endpoints to notify on job status changes

WebhookEvents []stringoptional

Events to subscribe to (e.g. 'parse.success', 'extract.error'). If null, all events are delivered.

One of the following:
const ExtractV2JobCreateWebhookConfigurationWebhookEventExtractPending ExtractV2JobCreateWebhookConfigurationWebhookEvent = "extract.pending"
const ExtractV2JobCreateWebhookConfigurationWebhookEventExtractSuccess ExtractV2JobCreateWebhookConfigurationWebhookEvent = "extract.success"
const ExtractV2JobCreateWebhookConfigurationWebhookEventExtractError ExtractV2JobCreateWebhookConfigurationWebhookEvent = "extract.error"
const ExtractV2JobCreateWebhookConfigurationWebhookEventExtractPartialSuccess ExtractV2JobCreateWebhookConfigurationWebhookEvent = "extract.partial_success"
const ExtractV2JobCreateWebhookConfigurationWebhookEventExtractCancelled ExtractV2JobCreateWebhookConfigurationWebhookEvent = "extract.cancelled"
const ExtractV2JobCreateWebhookConfigurationWebhookEventParsePending ExtractV2JobCreateWebhookConfigurationWebhookEvent = "parse.pending"
const ExtractV2JobCreateWebhookConfigurationWebhookEventParseRunning ExtractV2JobCreateWebhookConfigurationWebhookEvent = "parse.running"
const ExtractV2JobCreateWebhookConfigurationWebhookEventParseSuccess ExtractV2JobCreateWebhookConfigurationWebhookEvent = "parse.success"
const ExtractV2JobCreateWebhookConfigurationWebhookEventParseError ExtractV2JobCreateWebhookConfigurationWebhookEvent = "parse.error"
const ExtractV2JobCreateWebhookConfigurationWebhookEventParsePartialSuccess ExtractV2JobCreateWebhookConfigurationWebhookEvent = "parse.partial_success"
const ExtractV2JobCreateWebhookConfigurationWebhookEventParseCancelled ExtractV2JobCreateWebhookConfigurationWebhookEvent = "parse.cancelled"
const ExtractV2JobCreateWebhookConfigurationWebhookEventClassifyPending ExtractV2JobCreateWebhookConfigurationWebhookEvent = "classify.pending"
const ExtractV2JobCreateWebhookConfigurationWebhookEventClassifySuccess ExtractV2JobCreateWebhookConfigurationWebhookEvent = "classify.success"
const ExtractV2JobCreateWebhookConfigurationWebhookEventClassifyError ExtractV2JobCreateWebhookConfigurationWebhookEvent = "classify.error"
const ExtractV2JobCreateWebhookConfigurationWebhookEventClassifyPartialSuccess ExtractV2JobCreateWebhookConfigurationWebhookEvent = "classify.partial_success"
const ExtractV2JobCreateWebhookConfigurationWebhookEventClassifyCancelled ExtractV2JobCreateWebhookConfigurationWebhookEvent = "classify.cancelled"
const ExtractV2JobCreateWebhookConfigurationWebhookEventUnmappedEvent ExtractV2JobCreateWebhookConfigurationWebhookEvent = "unmapped_event"
WebhookHeaders map[string, string]optional

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

WebhookOutputFormat stringoptional

Response format sent to the webhook: 'string' (default) or 'json'

WebhookURL stringoptional

URL to receive webhook POST notifications

type ExtractV2JobQueryResponse struct{…}

Paginated list of extraction jobs.

Items []ExtractV2Job

The list of items.

ID string

Unique job identifier (job_id)

CreatedAt Time

Creation timestamp

formatdate-time
FileInput string

File ID or parse job ID that was extracted

ProjectID string

Project this job belongs to

Status string

Current job status.

  • PENDING — queued, not yet started
  • RUNNING — actively processing
  • COMPLETED — finished successfully
  • FAILED — terminated with an error
  • CANCELLED — cancelled by user
UpdatedAt Time

Last update timestamp

formatdate-time
Configuration ExtractConfigurationoptional

Extract configuration combining parse and extract settings.

DataSchema map[string, ExtractConfigurationDataSchemaUnion]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

One of the following:
type ExtractConfigurationDataSchemaMap map[string, any]
type ExtractConfigurationDataSchemaArray []any
string
float64
bool
CiteSources booloptional

Include citations in results

ConfidenceScores booloptional

Include confidence scores in results

ExtractVersion stringoptional

Extract algorithm version. Use 'latest' for the default pipeline or a date string (e.g. '2026-01-08') to pin to a specific release.

ExtractionTarget ExtractConfigurationExtractionTargetoptional

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

One of the following:
const ExtractConfigurationExtractionTargetPerDoc ExtractConfigurationExtractionTarget = "per_doc"
const ExtractConfigurationExtractionTargetPerPage ExtractConfigurationExtractionTarget = "per_page"
const ExtractConfigurationExtractionTargetPerTableRow ExtractConfigurationExtractionTarget = "per_table_row"
MaxPages int64optional

Maximum number of pages to process. Omit for no limit.

minimum1
ParseConfigID stringoptional

Saved parse configuration ID to control how the document is parsed before extraction

ParseTier stringoptional

Parse tier to use before extraction. Defaults to the extract tier if not specified.

SystemPrompt stringoptional

Custom system prompt to guide extraction behavior

TargetPages stringoptional

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

Tier ExtractConfigurationTieroptional

Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)

One of the following:
const ExtractConfigurationTierCostEffective ExtractConfigurationTier = "cost_effective"
const ExtractConfigurationTierAgentic ExtractConfigurationTier = "agentic"
ConfigurationID stringoptional

Saved extract configuration ID used for this job, if any

ErrorMessage stringoptional

Error details when status is FAILED

ExtractMetadata ExtractJobMetadataoptional

Extraction metadata.

FieldMetadata ExtractedFieldMetadataoptional

Metadata for extracted fields including document, page, and row level info.

DocumentMetadata map[string, ExtractedFieldMetadataDocumentMetadataUnion]optional

Per-field metadata keyed by field name from your schema. Scalar fields (e.g. vendor) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. items) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.

One of the following:
type ExtractedFieldMetadataDocumentMetadataMap map[string, any]
type ExtractedFieldMetadataDocumentMetadataArray []any
string
float64
bool
PageMetadata []map[string, ExtractedFieldMetadataPageMetadataUnion]optional

Per-page metadata when extraction_target is per_page

One of the following:
type ExtractedFieldMetadataPageMetadataMap map[string, any]
type ExtractedFieldMetadataPageMetadataArray []any
string
float64
bool
RowMetadata []map[string, ExtractedFieldMetadataRowMetadataUnion]optional

Per-row metadata when extraction_target is per_table_row

One of the following:
type ExtractedFieldMetadataRowMetadataMap map[string, any]
type ExtractedFieldMetadataRowMetadataArray []any
string
float64
bool
ParseJobID stringoptional

Reference to the ParseJob ID used for parsing

ParseTier stringoptional

Parse tier used for parsing the document

ExtractResult ExtractV2JobExtractResultUnionoptional

Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.

One of the following:
type ExtractV2JobExtractResultMap map[string, ExtractV2JobExtractResultMapItemUnion]
One of the following:
type ExtractV2JobExtractResultMapItemMap map[string, any]
type ExtractV2JobExtractResultMapItemArray []any
string
float64
bool
type ExtractV2JobExtractResultArray []map[string, ExtractV2JobExtractResultArrayItemUnion]
One of the following:
type ExtractV2JobExtractResultArrayItemMap map[string, any]
type ExtractV2JobExtractResultArrayItemArray []any
string
float64
bool
Metadata ExtractV2JobMetadataoptional

Job-level metadata.

Usage ExtractJobUsageoptional

Extraction usage metrics.

NumDocumentTokens int64optional

Number of document tokens

NumOutputTokens int64optional

Number of output tokens

NumPagesExtracted int64optional

Number of pages extracted

NextPageToken stringoptional

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

TotalSize int64optional

The total number of items available. This is only populated when specifically requested. The value may be an estimate and can be used for display purposes only.

type ExtractV2SchemaGenerateRequest struct{…}

Request schema for generating an extraction schema.

DataSchema map[string, ExtractV2SchemaGenerateRequestDataSchemaUnion]optional

Optional schema to validate, refine, or extend

One of the following:
map[string, any]
[]any
string
float64
bool
FileID stringoptional

Optional file ID to analyze for schema generation

Name stringoptional

Name for the generated configuration (auto-generated if omitted)

maxLength255
Prompt stringoptional

Natural language description of the data structure to extract

type ExtractV2SchemaValidateRequest struct{…}

Request schema for validating an extraction schema.

DataSchema map[string, ExtractV2SchemaValidateRequestDataSchemaUnion]

JSON Schema to validate for use with extract jobs

One of the following:
map[string, any]
[]any
string
float64
bool
type ExtractV2SchemaValidateResponse struct{…}

Response schema for schema validation.

DataSchema map[string, ExtractV2SchemaValidateResponseDataSchemaUnion]

Validated JSON Schema, ready for use in extract jobs

One of the following:
type ExtractV2SchemaValidateResponseDataSchemaMap map[string, any]
type ExtractV2SchemaValidateResponseDataSchemaArray []any
string
float64
bool
type ExtractedFieldMetadata struct{…}

Metadata for extracted fields including document, page, and row level info.

DocumentMetadata map[string, ExtractedFieldMetadataDocumentMetadataUnion]optional

Per-field metadata keyed by field name from your schema. Scalar fields (e.g. vendor) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. items) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.

One of the following:
type ExtractedFieldMetadataDocumentMetadataMap map[string, any]
type ExtractedFieldMetadataDocumentMetadataArray []any
string
float64
bool
PageMetadata []map[string, ExtractedFieldMetadataPageMetadataUnion]optional

Per-page metadata when extraction_target is per_page

One of the following:
type ExtractedFieldMetadataPageMetadataMap map[string, any]
type ExtractedFieldMetadataPageMetadataArray []any
string
float64
bool
RowMetadata []map[string, ExtractedFieldMetadataRowMetadataUnion]optional

Per-row metadata when extraction_target is per_table_row

One of the following:
type ExtractedFieldMetadataRowMetadataMap map[string, any]
type ExtractedFieldMetadataRowMetadataArray []any
string
float64
bool