# Extract
## Create Extract Job
`client.Extract.New(ctx, params) (*ExtractV2Job, error)`
**post** `/api/v2/extract`
Create an extraction job.
Extracts structured data from a document using either a saved
configuration or an inline JSON Schema.
## Input
Provide exactly one of:
- `configuration_id` — reference a saved extraction config
- `configuration` — inline configuration with a `data_schema`
## Document input
Set `file_input` to a file ID (`dfl-...`) or a
completed parse job ID (`pjb-...`).
The job runs asynchronously. Poll `GET /extract/{job_id}` or
register a webhook to monitor completion.
### Parameters
- `params ExtractNewParams`
- `ExtractV2JobCreate param.Field[ExtractV2JobCreate]`
Body param: Request to create an extraction job. Provide configuration_id or inline configuration.
- `OrganizationID param.Field[string]`
Query param
- `ProjectID param.Field[string]`
Query param
### Returns
- `type ExtractV2Job struct{…}`
An extraction job.
- `ID string`
Unique job identifier (job_id)
- `CreatedAt Time`
Creation timestamp
- `FileInput string`
File ID or parse job ID that was extracted
- `ProjectID string`
Project this job belongs to
- `Status string`
Current job status.
- `PENDING` — queued, not yet started
- `RUNNING` — actively processing
- `COMPLETED` — finished successfully
- `FAILED` — terminated with an error
- `CANCELLED` — cancelled by user
- `UpdatedAt Time`
Last update timestamp
- `Configuration ExtractConfiguration`
Extract configuration combining parse and extract settings.
- `DataSchema map[string, ExtractConfigurationDataSchemaUnion]`
JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.
- `type ExtractConfigurationDataSchemaMap map[string, any]`
- `type ExtractConfigurationDataSchemaArray []any`
- `string`
- `float64`
- `bool`
- `CiteSources bool`
Include citations in results
- `ConfidenceScores bool`
Include confidence scores in results
- `ExtractVersion string`
Extract algorithm version. Use 'latest' for the default pipeline or a date string (e.g. '2026-01-08') to pin to a specific release.
- `ExtractionTarget ExtractConfigurationExtractionTarget`
Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row
- `const ExtractConfigurationExtractionTargetPerDoc ExtractConfigurationExtractionTarget = "per_doc"`
- `const ExtractConfigurationExtractionTargetPerPage ExtractConfigurationExtractionTarget = "per_page"`
- `const ExtractConfigurationExtractionTargetPerTableRow ExtractConfigurationExtractionTarget = "per_table_row"`
- `MaxPages int64`
Maximum number of pages to process. Omit for no limit.
- `ParseConfigID string`
Saved parse configuration ID to control how the document is parsed before extraction
- `ParseTier string`
Parse tier to use before extraction. Defaults to the extract tier if not specified.
- `SystemPrompt string`
Custom system prompt to guide extraction behavior
- `TargetPages string`
Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.
- `Tier ExtractConfigurationTier`
Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)
- `const ExtractConfigurationTierCostEffective ExtractConfigurationTier = "cost_effective"`
- `const ExtractConfigurationTierAgentic ExtractConfigurationTier = "agentic"`
- `ConfigurationID string`
Saved extract configuration ID used for this job, if any
- `ErrorMessage string`
Error details when status is FAILED
- `ExtractMetadata ExtractJobMetadata`
Extraction metadata.
- `FieldMetadata ExtractedFieldMetadata`
Metadata for extracted fields including document, page, and row level info.
- `DocumentMetadata map[string, ExtractedFieldMetadataDocumentMetadataUnion]`
Per-field metadata keyed by field name from your schema. Scalar fields (e.g. `vendor`) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. `items`) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.
- `type ExtractedFieldMetadataDocumentMetadataMap map[string, any]`
- `type ExtractedFieldMetadataDocumentMetadataArray []any`
- `string`
- `float64`
- `bool`
- `PageMetadata []map[string, ExtractedFieldMetadataPageMetadataUnion]`
Per-page metadata when extraction_target is per_page
- `type ExtractedFieldMetadataPageMetadataMap map[string, any]`
- `type ExtractedFieldMetadataPageMetadataArray []any`
- `string`
- `float64`
- `bool`
- `RowMetadata []map[string, ExtractedFieldMetadataRowMetadataUnion]`
Per-row metadata when extraction_target is per_table_row
- `type ExtractedFieldMetadataRowMetadataMap map[string, any]`
- `type ExtractedFieldMetadataRowMetadataArray []any`
- `string`
- `float64`
- `bool`
- `ParseJobID string`
Reference to the ParseJob ID used for parsing
- `ParseTier string`
Parse tier used for parsing the document
- `ExtractResult ExtractV2JobExtractResultUnion`
Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.
- `type ExtractV2JobExtractResultMap map[string, ExtractV2JobExtractResultMapItemUnion]`
- `type ExtractV2JobExtractResultMapItemMap map[string, any]`
- `type ExtractV2JobExtractResultMapItemArray []any`
- `string`
- `float64`
- `bool`
- `type ExtractV2JobExtractResultArray []map[string, ExtractV2JobExtractResultArrayItemUnion]`
- `type ExtractV2JobExtractResultArrayItemMap map[string, any]`
- `type ExtractV2JobExtractResultArrayItemArray []any`
- `string`
- `float64`
- `bool`
- `Metadata ExtractV2JobMetadata`
Job-level metadata.
- `Usage ExtractJobUsage`
Extraction usage metrics.
- `NumDocumentTokens int64`
Number of document tokens
- `NumOutputTokens int64`
Number of output tokens
- `NumPagesExtracted int64`
Number of pages extracted
### Example
```go
package main
import (
"context"
"fmt"
"github.com/stainless-sdks/llamacloud-prod-go"
"github.com/stainless-sdks/llamacloud-prod-go/option"
)
func main() {
client := llamacloudprod.NewClient(
option.WithAPIKey("My API Key"),
)
extractV2Job, err := client.Extract.New(context.TODO(), llamacloudprod.ExtractNewParams{
ExtractV2JobCreate: llamacloudprod.ExtractV2JobCreateParam{
FileInput: "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
},
})
if err != nil {
panic(err.Error())
}
fmt.Printf("%+v\n", extractV2Job.ID)
}
```
#### Response
```json
{
"id": "ext-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"created_at": "2019-12-27T18:11:19.117Z",
"file_input": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"status": "COMPLETED",
"updated_at": "2019-12-27T18:11:19.117Z",
"configuration": {
"data_schema": {
"foo": {
"foo": "bar"
}
},
"cite_sources": true,
"confidence_scores": true,
"extract_version": "latest",
"extraction_target": "per_doc",
"max_pages": 10,
"parse_config_id": "cfg-11111111-2222-3333-4444-555555555555",
"parse_tier": "fast",
"system_prompt": "Extract all monetary values in USD. If a currency is not specified, assume USD.",
"target_pages": "1,3,5-7",
"tier": "cost_effective"
},
"configuration_id": "cfg-11111111-2222-3333-4444-555555555555",
"error_message": "error_message",
"extract_metadata": {
"field_metadata": {
"document_metadata": {
"items": [
{
"amount": {
"citation": [
{
"matching_text": "$10.00",
"page": 1
}
],
"confidence": 1
},
"description": {
"citation": [
{
"matching_text": "$10/month",
"page": 1
}
],
"confidence": 0.998
}
}
],
"total": {
"citation": "bar",
"confidence": "bar"
},
"vendor": {
"citation": "bar",
"confidence": "bar",
"extraction_confidence": "bar",
"parsing_confidence": "bar"
}
},
"page_metadata": [
{
"foo": {
"foo": "bar"
}
}
],
"row_metadata": [
{
"foo": {
"foo": "bar"
}
}
]
},
"parse_job_id": "parse_job_id",
"parse_tier": "parse_tier"
},
"extract_result": {
"foo": {
"foo": "bar"
}
},
"metadata": {
"usage": {
"num_document_tokens": 0,
"num_output_tokens": 0,
"num_pages_extracted": 0
}
}
}
```
## List Extract Jobs
`client.Extract.List(ctx, query) (*PaginatedCursor[ExtractV2Job], error)`
**get** `/api/v2/extract`
List extraction jobs with optional filtering and pagination.
Filter by `configuration_id`, `status`, `file_input`,
or creation date range. Results are returned newest-first.
Use `expand=configuration` to include the full configuration used,
and `expand=extract_metadata` for per-field metadata.
### Parameters
- `query ExtractListParams`
- `ConfigurationID param.Field[string]`
Filter by configuration ID
- `CreatedAtOnOrAfter param.Field[Time]`
Include items created at or after this timestamp (inclusive)
- `CreatedAtOnOrBefore param.Field[Time]`
Include items created at or before this timestamp (inclusive)
- `DocumentInputType param.Field[string]`
Filter by document input type (file_id or parse_job_id)
- `DocumentInputValue param.Field[string]`
Deprecated: use file_input instead
- `Expand param.Field[[]string]`
Additional fields to include: configuration, extract_metadata
- `FileInput param.Field[string]`
Filter by file input value
- `JobIDs param.Field[[]string]`
Filter by specific job IDs
- `OrganizationID param.Field[string]`
- `PageSize param.Field[int64]`
Number of items per page
- `PageToken param.Field[string]`
Token for pagination
- `ProjectID param.Field[string]`
- `Status param.Field[ExtractListParamsStatus]`
Filter by status
- `const ExtractListParamsStatusPending ExtractListParamsStatus = "PENDING"`
- `const ExtractListParamsStatusThrottled ExtractListParamsStatus = "THROTTLED"`
- `const ExtractListParamsStatusRunning ExtractListParamsStatus = "RUNNING"`
- `const ExtractListParamsStatusCompleted ExtractListParamsStatus = "COMPLETED"`
- `const ExtractListParamsStatusFailed ExtractListParamsStatus = "FAILED"`
- `const ExtractListParamsStatusCancelled ExtractListParamsStatus = "CANCELLED"`
### Returns
- `type ExtractV2Job struct{…}`
An extraction job.
- `ID string`
Unique job identifier (job_id)
- `CreatedAt Time`
Creation timestamp
- `FileInput string`
File ID or parse job ID that was extracted
- `ProjectID string`
Project this job belongs to
- `Status string`
Current job status.
- `PENDING` — queued, not yet started
- `RUNNING` — actively processing
- `COMPLETED` — finished successfully
- `FAILED` — terminated with an error
- `CANCELLED` — cancelled by user
- `UpdatedAt Time`
Last update timestamp
- `Configuration ExtractConfiguration`
Extract configuration combining parse and extract settings.
- `DataSchema map[string, ExtractConfigurationDataSchemaUnion]`
JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.
- `type ExtractConfigurationDataSchemaMap map[string, any]`
- `type ExtractConfigurationDataSchemaArray []any`
- `string`
- `float64`
- `bool`
- `CiteSources bool`
Include citations in results
- `ConfidenceScores bool`
Include confidence scores in results
- `ExtractVersion string`
Extract algorithm version. Use 'latest' for the default pipeline or a date string (e.g. '2026-01-08') to pin to a specific release.
- `ExtractionTarget ExtractConfigurationExtractionTarget`
Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row
- `const ExtractConfigurationExtractionTargetPerDoc ExtractConfigurationExtractionTarget = "per_doc"`
- `const ExtractConfigurationExtractionTargetPerPage ExtractConfigurationExtractionTarget = "per_page"`
- `const ExtractConfigurationExtractionTargetPerTableRow ExtractConfigurationExtractionTarget = "per_table_row"`
- `MaxPages int64`
Maximum number of pages to process. Omit for no limit.
- `ParseConfigID string`
Saved parse configuration ID to control how the document is parsed before extraction
- `ParseTier string`
Parse tier to use before extraction. Defaults to the extract tier if not specified.
- `SystemPrompt string`
Custom system prompt to guide extraction behavior
- `TargetPages string`
Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.
- `Tier ExtractConfigurationTier`
Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)
- `const ExtractConfigurationTierCostEffective ExtractConfigurationTier = "cost_effective"`
- `const ExtractConfigurationTierAgentic ExtractConfigurationTier = "agentic"`
- `ConfigurationID string`
Saved extract configuration ID used for this job, if any
- `ErrorMessage string`
Error details when status is FAILED
- `ExtractMetadata ExtractJobMetadata`
Extraction metadata.
- `FieldMetadata ExtractedFieldMetadata`
Metadata for extracted fields including document, page, and row level info.
- `DocumentMetadata map[string, ExtractedFieldMetadataDocumentMetadataUnion]`
Per-field metadata keyed by field name from your schema. Scalar fields (e.g. `vendor`) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. `items`) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.
- `type ExtractedFieldMetadataDocumentMetadataMap map[string, any]`
- `type ExtractedFieldMetadataDocumentMetadataArray []any`
- `string`
- `float64`
- `bool`
- `PageMetadata []map[string, ExtractedFieldMetadataPageMetadataUnion]`
Per-page metadata when extraction_target is per_page
- `type ExtractedFieldMetadataPageMetadataMap map[string, any]`
- `type ExtractedFieldMetadataPageMetadataArray []any`
- `string`
- `float64`
- `bool`
- `RowMetadata []map[string, ExtractedFieldMetadataRowMetadataUnion]`
Per-row metadata when extraction_target is per_table_row
- `type ExtractedFieldMetadataRowMetadataMap map[string, any]`
- `type ExtractedFieldMetadataRowMetadataArray []any`
- `string`
- `float64`
- `bool`
- `ParseJobID string`
Reference to the ParseJob ID used for parsing
- `ParseTier string`
Parse tier used for parsing the document
- `ExtractResult ExtractV2JobExtractResultUnion`
Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.
- `type ExtractV2JobExtractResultMap map[string, ExtractV2JobExtractResultMapItemUnion]`
- `type ExtractV2JobExtractResultMapItemMap map[string, any]`
- `type ExtractV2JobExtractResultMapItemArray []any`
- `string`
- `float64`
- `bool`
- `type ExtractV2JobExtractResultArray []map[string, ExtractV2JobExtractResultArrayItemUnion]`
- `type ExtractV2JobExtractResultArrayItemMap map[string, any]`
- `type ExtractV2JobExtractResultArrayItemArray []any`
- `string`
- `float64`
- `bool`
- `Metadata ExtractV2JobMetadata`
Job-level metadata.
- `Usage ExtractJobUsage`
Extraction usage metrics.
- `NumDocumentTokens int64`
Number of document tokens
- `NumOutputTokens int64`
Number of output tokens
- `NumPagesExtracted int64`
Number of pages extracted
### Example
```go
package main
import (
"context"
"fmt"
"github.com/stainless-sdks/llamacloud-prod-go"
"github.com/stainless-sdks/llamacloud-prod-go/option"
)
func main() {
client := llamacloudprod.NewClient(
option.WithAPIKey("My API Key"),
)
page, err := client.Extract.List(context.TODO(), llamacloudprod.ExtractListParams{
})
if err != nil {
panic(err.Error())
}
fmt.Printf("%+v\n", page)
}
```
#### Response
```json
{
"items": [
{
"id": "ext-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"created_at": "2019-12-27T18:11:19.117Z",
"file_input": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"status": "COMPLETED",
"updated_at": "2019-12-27T18:11:19.117Z",
"configuration": {
"data_schema": {
"foo": {
"foo": "bar"
}
},
"cite_sources": true,
"confidence_scores": true,
"extract_version": "latest",
"extraction_target": "per_doc",
"max_pages": 10,
"parse_config_id": "cfg-11111111-2222-3333-4444-555555555555",
"parse_tier": "fast",
"system_prompt": "Extract all monetary values in USD. If a currency is not specified, assume USD.",
"target_pages": "1,3,5-7",
"tier": "cost_effective"
},
"configuration_id": "cfg-11111111-2222-3333-4444-555555555555",
"error_message": "error_message",
"extract_metadata": {
"field_metadata": {
"document_metadata": {
"items": [
{
"amount": {
"citation": [
{
"matching_text": "$10.00",
"page": 1
}
],
"confidence": 1
},
"description": {
"citation": [
{
"matching_text": "$10/month",
"page": 1
}
],
"confidence": 0.998
}
}
],
"total": {
"citation": "bar",
"confidence": "bar"
},
"vendor": {
"citation": "bar",
"confidence": "bar",
"extraction_confidence": "bar",
"parsing_confidence": "bar"
}
},
"page_metadata": [
{
"foo": {
"foo": "bar"
}
}
],
"row_metadata": [
{
"foo": {
"foo": "bar"
}
}
]
},
"parse_job_id": "parse_job_id",
"parse_tier": "parse_tier"
},
"extract_result": {
"foo": {
"foo": "bar"
}
},
"metadata": {
"usage": {
"num_document_tokens": 0,
"num_output_tokens": 0,
"num_pages_extracted": 0
}
}
}
],
"next_page_token": "next_page_token",
"total_size": 0
}
```
## Get Extract Job
`client.Extract.Get(ctx, jobID, query) (*ExtractV2Job, error)`
**get** `/api/v2/extract/{job_id}`
Get a single extraction job by ID.
Returns the job status and results when complete.
Use `expand=configuration` to include the full configuration used,
and `expand=extract_metadata` for per-field metadata.
### Parameters
- `jobID string`
- `query ExtractGetParams`
- `Expand param.Field[[]string]`
Additional fields to include: configuration, extract_metadata
- `OrganizationID param.Field[string]`
- `ProjectID param.Field[string]`
### Returns
- `type ExtractV2Job struct{…}`
An extraction job.
- `ID string`
Unique job identifier (job_id)
- `CreatedAt Time`
Creation timestamp
- `FileInput string`
File ID or parse job ID that was extracted
- `ProjectID string`
Project this job belongs to
- `Status string`
Current job status.
- `PENDING` — queued, not yet started
- `RUNNING` — actively processing
- `COMPLETED` — finished successfully
- `FAILED` — terminated with an error
- `CANCELLED` — cancelled by user
- `UpdatedAt Time`
Last update timestamp
- `Configuration ExtractConfiguration`
Extract configuration combining parse and extract settings.
- `DataSchema map[string, ExtractConfigurationDataSchemaUnion]`
JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.
- `type ExtractConfigurationDataSchemaMap map[string, any]`
- `type ExtractConfigurationDataSchemaArray []any`
- `string`
- `float64`
- `bool`
- `CiteSources bool`
Include citations in results
- `ConfidenceScores bool`
Include confidence scores in results
- `ExtractVersion string`
Extract algorithm version. Use 'latest' for the default pipeline or a date string (e.g. '2026-01-08') to pin to a specific release.
- `ExtractionTarget ExtractConfigurationExtractionTarget`
Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row
- `const ExtractConfigurationExtractionTargetPerDoc ExtractConfigurationExtractionTarget = "per_doc"`
- `const ExtractConfigurationExtractionTargetPerPage ExtractConfigurationExtractionTarget = "per_page"`
- `const ExtractConfigurationExtractionTargetPerTableRow ExtractConfigurationExtractionTarget = "per_table_row"`
- `MaxPages int64`
Maximum number of pages to process. Omit for no limit.
- `ParseConfigID string`
Saved parse configuration ID to control how the document is parsed before extraction
- `ParseTier string`
Parse tier to use before extraction. Defaults to the extract tier if not specified.
- `SystemPrompt string`
Custom system prompt to guide extraction behavior
- `TargetPages string`
Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.
- `Tier ExtractConfigurationTier`
Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)
- `const ExtractConfigurationTierCostEffective ExtractConfigurationTier = "cost_effective"`
- `const ExtractConfigurationTierAgentic ExtractConfigurationTier = "agentic"`
- `ConfigurationID string`
Saved extract configuration ID used for this job, if any
- `ErrorMessage string`
Error details when status is FAILED
- `ExtractMetadata ExtractJobMetadata`
Extraction metadata.
- `FieldMetadata ExtractedFieldMetadata`
Metadata for extracted fields including document, page, and row level info.
- `DocumentMetadata map[string, ExtractedFieldMetadataDocumentMetadataUnion]`
Per-field metadata keyed by field name from your schema. Scalar fields (e.g. `vendor`) map to a FieldMetadataEntry with citation and confidence. Array fields (e.g. `items`) map to a list where each element contains per-sub-field FieldMetadataEntry objects, indexed by array position. Nested objects contain sub-field entries recursively.
- `type ExtractedFieldMetadataDocumentMetadataMap map[string, any]`
- `type ExtractedFieldMetadataDocumentMetadataArray []any`
- `string`
- `float64`
- `bool`
- `PageMetadata []map[string, ExtractedFieldMetadataPageMetadataUnion]`
Per-page metadata when extraction_target is per_page
- `type ExtractedFieldMetadataPageMetadataMap map[string, any]`
- `type ExtractedFieldMetadataPageMetadataArray []any`
- `string`
- `float64`
- `bool`
- `RowMetadata []map[string, ExtractedFieldMetadataRowMetadataUnion]`
Per-row metadata when extraction_target is per_table_row
- `type ExtractedFieldMetadataRowMetadataMap map[string, any]`
- `type ExtractedFieldMetadataRowMetadataArray []any`
- `string`
- `float64`
- `bool`
- `ParseJobID string`
Reference to the ParseJob ID used for parsing
- `ParseTier string`
Parse tier used for parsing the document
- `ExtractResult ExtractV2JobExtractResultUnion`
Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.
- `type ExtractV2JobExtractResultMap map[string, ExtractV2JobExtractResultMapItemUnion]`
- `type ExtractV2JobExtractResultMapItemMap map[string, any]`
- `type ExtractV2JobExtractResultMapItemArray []any`
- `string`
- `float64`
- `bool`
- `type ExtractV2JobExtractResultArray []map[string, ExtractV2JobExtractResultArrayItemUnion]`
- `type ExtractV2JobExtractResultArrayItemMap map[string, any]`
- `type ExtractV2JobExtractResultArrayItemArray []any`
- `string`
- `float64`
- `bool`
- `Metadata ExtractV2JobMetadata`
Job-level metadata.
- `Usage ExtractJobUsage`
Extraction usage metrics.
- `NumDocumentTokens int64`
Number of document tokens
- `NumOutputTokens int64`
Number of output tokens
- `NumPagesExtracted int64`
Number of pages extracted
### Example
```go
package main
import (
"context"
"fmt"
"github.com/stainless-sdks/llamacloud-prod-go"
"github.com/stainless-sdks/llamacloud-prod-go/option"
)
func main() {
client := llamacloudprod.NewClient(
option.WithAPIKey("My API Key"),
)
extractV2Job, err := client.Extract.Get(
context.TODO(),
"job_id",
llamacloudprod.ExtractGetParams{
},
)
if err != nil {
panic(err.Error())
}
fmt.Printf("%+v\n", extractV2Job.ID)
}
```
#### Response
```json
{
"id": "ext-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"created_at": "2019-12-27T18:11:19.117Z",
"file_input": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"status": "COMPLETED",
"updated_at": "2019-12-27T18:11:19.117Z",
"configuration": {
"data_schema": {
"foo": {
"foo": "bar"
}
},
"cite_sources": true,
"confidence_scores": true,
"extract_version": "latest",
"extraction_target": "per_doc",
"max_pages": 10,
"parse_config_id": "cfg-11111111-2222-3333-4444-555555555555",
"parse_tier": "fast",
"system_prompt": "Extract all monetary values in USD. If a currency is not specified, assume USD.",
"target_pages": "1,3,5-7",
"tier": "cost_effective"
},
"configuration_id": "cfg-11111111-2222-3333-4444-555555555555",
"error_message": "error_message",
"extract_metadata": {
"field_metadata": {
"document_metadata": {
"items": [
{
"amount": {
"citation": [
{
"matching_text": "$10.00",
"page": 1
}
],
"confidence": 1
},
"description": {
"citation": [
{
"matching_text": "$10/month",
"page": 1
}
],
"confidence": 0.998
}
}
],
"total": {
"citation": "bar",
"confidence": "bar"
},
"vendor": {
"citation": "bar",
"confidence": "bar",
"extraction_confidence": "bar",
"parsing_confidence": "bar"
}
},
"page_metadata": [
{
"foo": {
"foo": "bar"
}
}
],
"row_metadata": [
{
"foo": {
"foo": "bar"
}
}
]
},
"parse_job_id": "parse_job_id",
"parse_tier": "parse_tier"
},
"extract_result": {
"foo": {
"foo": "bar"
}
},
"metadata": {
"usage": {
"num_document_tokens": 0,
"num_output_tokens": 0,
"num_pages_extracted": 0
}
}
}
```
## Delete Extract Job
`client.Extract.Delete(ctx, jobID, body) (*ExtractDeleteResponse, error)`
**delete** `/api/v2/extract/{job_id}`
Delete an extraction job and its results.
### Parameters
- `jobID string`
- `body ExtractDeleteParams`
- `OrganizationID param.Field[string]`
- `ProjectID param.Field[string]`
### Returns
- `type ExtractDeleteResponse interface{…}`
### Example
```go
package main
import (
"context"
"fmt"
"github.com/stainless-sdks/llamacloud-prod-go"
"github.com/stainless-sdks/llamacloud-prod-go/option"
)
func main() {
client := llamacloudprod.NewClient(
option.WithAPIKey("My API Key"),
)
extract, err := client.Extract.Delete(
context.TODO(),
"job_id",
llamacloudprod.ExtractDeleteParams{
},
)
if err != nil {
panic(err.Error())
}
fmt.Printf("%+v\n", extract)
}
```
#### Response
```json
{}
```
## Validate Extraction Schema
`client.Extract.ValidateSchema(ctx, body) (*ExtractV2SchemaValidateResponse, error)`
**post** `/api/v2/extract/schema/validation`
Validate a JSON schema for extraction.
### Parameters
- `body ExtractValidateSchemaParams`
- `ExtractV2SchemaValidateRequest param.Field[ExtractV2SchemaValidateRequest]`
Request schema for validating an extraction schema.
### Returns
- `type ExtractV2SchemaValidateResponse struct{…}`
Response schema for schema validation.
- `DataSchema map[string, ExtractV2SchemaValidateResponseDataSchemaUnion]`
Validated JSON Schema, ready for use in extract jobs
- `type ExtractV2SchemaValidateResponseDataSchemaMap map[string, any]`
- `type ExtractV2SchemaValidateResponseDataSchemaArray []any`
- `string`
- `float64`
- `bool`
### Example
```go
package main
import (
"context"
"fmt"
"github.com/stainless-sdks/llamacloud-prod-go"
"github.com/stainless-sdks/llamacloud-prod-go/option"
)
func main() {
client := llamacloudprod.NewClient(
option.WithAPIKey("My API Key"),
)
extractV2SchemaValidateResponse, err := client.Extract.ValidateSchema(context.TODO(), llamacloudprod.ExtractValidateSchemaParams{
ExtractV2SchemaValidateRequest: llamacloudprod.ExtractV2SchemaValidateRequestParam{
DataSchema: map[string]llamacloudprod.ExtractV2SchemaValidateRequestDataSchemaUnionParam{
"properties": llamacloudprod.ExtractV2SchemaValidateRequestDataSchemaUnionParam{
OfAnyMap: map[string]any{
"vendor_name": "bar",
"invoice_number": "bar",
"total_amount": "bar",
"line_items": "bar",
},
},
"required": llamacloudprod.ExtractV2SchemaValidateRequestDataSchemaUnionParam{
OfAnyArray: []any{"vendor_name", "invoice_number", "total_amount"},
},
"type": llamacloudprod.ExtractV2SchemaValidateRequestDataSchemaUnionParam{
OfString: llamacloudprod.String("object"),
},
},
},
})
if err != nil {
panic(err.Error())
}
fmt.Printf("%+v\n", extractV2SchemaValidateResponse.DataSchema)
}
```
#### Response
```json
{
"data_schema": {
"foo": {
"foo": "bar"
}
}
}
```
## Generate Extraction Schema
`client.Extract.GenerateSchema(ctx, params) (*ConfigurationCreate, error)`
**post** `/api/v2/extract/schema/generate`
Generate a JSON schema and return a product configuration request.
### Parameters
- `params ExtractGenerateSchemaParams`
- `ExtractV2SchemaGenerateRequest param.Field[ExtractV2SchemaGenerateRequest]`
Body param: Request schema for generating an extraction schema.
- `OrganizationID param.Field[string]`
Query param
- `ProjectID param.Field[string]`
Query param
### Returns
- `type ConfigurationCreate struct{…}`
Request body for creating a product configuration.
- `Name string`
Human-readable name for this configuration.
- `Parameters ConfigurationCreateParametersUnion`
Product-specific configuration parameters.
- `type SplitV1ParametersResp struct{…}`
Typed parameters for a *split v1* product configuration.
- `Categories []SplitCategory`
Categories to split documents into.
- `Name string`
Name of the category.
- `Description string`
Optional description of what content belongs in this category.
- `ProductType SplitV1`
Product type.
- `const SplitV1SplitV1 SplitV1 = "split_v1"`
- `SplittingStrategy SplitV1ParametersSplittingStrategyResp`
Strategy for splitting documents.
- `AllowUncategorized string`
Controls handling of pages that don't match any category. 'include': pages can be grouped as 'uncategorized' and included in results. 'forbid': all pages must be assigned to a defined category. 'omit': pages can be classified as 'uncategorized' but are excluded from results.
- `const SplitV1ParametersSplittingStrategyAllowUncategorizedInclude SplitV1ParametersSplittingStrategyAllowUncategorized = "include"`
- `const SplitV1ParametersSplittingStrategyAllowUncategorizedForbid SplitV1ParametersSplittingStrategyAllowUncategorized = "forbid"`
- `const SplitV1ParametersSplittingStrategyAllowUncategorizedOmit SplitV1ParametersSplittingStrategyAllowUncategorized = "omit"`
- `type ExtractV2ParametersResp struct{…}`
Typed parameters for an *extract v2* product configuration.
- `DataSchema map[string, ExtractV2ParametersDataSchemaUnionResp]`
JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.
- `type ExtractV2ParametersDataSchemaMap map[string, any]`
- `type ExtractV2ParametersDataSchemaArray []any`
- `string`
- `float64`
- `bool`
- `ProductType ExtractV2`
Product type.
- `const ExtractV2ExtractV2 ExtractV2 = "extract_v2"`
- `CiteSources bool`
Include citations in results
- `ConfidenceScores bool`
Include confidence scores in results
- `ExtractVersion string`
Extract algorithm version. Use 'latest' for the default pipeline or a date string (e.g. '2026-01-08') to pin to a specific release.
- `ExtractionTarget ExtractV2ParametersExtractionTarget`
Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row
- `const ExtractV2ParametersExtractionTargetPerDoc ExtractV2ParametersExtractionTarget = "per_doc"`
- `const ExtractV2ParametersExtractionTargetPerPage ExtractV2ParametersExtractionTarget = "per_page"`
- `const ExtractV2ParametersExtractionTargetPerTableRow ExtractV2ParametersExtractionTarget = "per_table_row"`
- `MaxPages int64`
Maximum number of pages to process. Omit for no limit.
- `ParseConfigID string`
Saved parse configuration ID to control how the document is parsed before extraction
- `ParseTier string`
Parse tier to use before extraction. Defaults to the extract tier if not specified.
- `SystemPrompt string`
Custom system prompt to guide extraction behavior
- `TargetPages string`
Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.
- `Tier ExtractV2ParametersTier`
Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)
- `const ExtractV2ParametersTierCostEffective ExtractV2ParametersTier = "cost_effective"`
- `const ExtractV2ParametersTierAgentic ExtractV2ParametersTier = "agentic"`
- `type ClassifyV2ParametersResp struct{…}`
Typed parameters for a *classify v2* product configuration.
- `ProductType ClassifyV2`
Product type.
- `const ClassifyV2ClassifyV2 ClassifyV2 = "classify_v2"`
- `Rules []ClassifyV2ParametersRuleResp`
Classify rules to evaluate against the document (at least one required)
- `Description string`
Natural language criteria for matching this rule
- `Type string`
Document type to assign when rule matches
- `Mode ClassifyV2ParametersMode`
Classify execution mode
- `const ClassifyV2ParametersModeFast ClassifyV2ParametersMode = "FAST"`
- `ParsingConfiguration ClassifyV2ParametersParsingConfigurationResp`
Parsing configuration for classify jobs.
- `Lang string`
ISO 639-1 language code for the document
- `MaxPages int64`
Maximum number of pages to process. Omit for no limit.
- `TargetPages string`
Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.
- `type ParseV2ParametersResp struct{…}`
Configuration for LlamaParse v2 document parsing.
Includes tier selection, processing options, output formatting,
page targeting, and webhook delivery. Refer to the LlamaParse
documentation for details on each field.
- `ProductType ParseV2`
Product type.
- `const ParseV2ParseV2 ParseV2 = "parse_v2"`
- `Tier ParseV2ParametersTier`
Parsing tier: 'fast' (rule-based, cheapest), 'cost_effective' (balanced), 'agentic' (AI-powered with custom prompts), or 'agentic_plus' (premium AI with highest accuracy)
- `const ParseV2ParametersTierFast ParseV2ParametersTier = "fast"`
- `const ParseV2ParametersTierCostEffective ParseV2ParametersTier = "cost_effective"`
- `const ParseV2ParametersTierAgentic ParseV2ParametersTier = "agentic"`
- `const ParseV2ParametersTierAgenticPlus ParseV2ParametersTier = "agentic_plus"`
- `Version ParseV2ParametersVersion`
Tier version. Use 'latest' for the current stable version, or specify a specific version (e.g., '1.0', '2.0') for reproducible results
- `type ParseV2ParametersVersion string`
Tier version. Use 'latest' for the current stable version, or specify a specific version (e.g., '1.0', '2.0') for reproducible results
- `const ParseV2ParametersVersion2025_12_11 ParseV2ParametersVersion = "2025-12-11"`
- `const ParseV2ParametersVersion2025_12_18 ParseV2ParametersVersion = "2025-12-18"`
- `const ParseV2ParametersVersion2025_12_31 ParseV2ParametersVersion = "2025-12-31"`
- `const ParseV2ParametersVersion2026_01_08 ParseV2ParametersVersion = "2026-01-08"`
- `const ParseV2ParametersVersion2026_01_09 ParseV2ParametersVersion = "2026-01-09"`
- `const ParseV2ParametersVersion2026_01_16 ParseV2ParametersVersion = "2026-01-16"`
- `const ParseV2ParametersVersion2026_01_21 ParseV2ParametersVersion = "2026-01-21"`
- `const ParseV2ParametersVersion2026_01_22 ParseV2ParametersVersion = "2026-01-22"`
- `const ParseV2ParametersVersion2026_01_24 ParseV2ParametersVersion = "2026-01-24"`
- `const ParseV2ParametersVersion2026_01_29 ParseV2ParametersVersion = "2026-01-29"`
- `const ParseV2ParametersVersion2026_01_30 ParseV2ParametersVersion = "2026-01-30"`
- `const ParseV2ParametersVersion2026_02_03 ParseV2ParametersVersion = "2026-02-03"`
- `const ParseV2ParametersVersion2026_02_18 ParseV2ParametersVersion = "2026-02-18"`
- `const ParseV2ParametersVersion2026_02_20 ParseV2ParametersVersion = "2026-02-20"`
- `const ParseV2ParametersVersion2026_02_24 ParseV2ParametersVersion = "2026-02-24"`
- `const ParseV2ParametersVersion2026_02_26 ParseV2ParametersVersion = "2026-02-26"`
- `const ParseV2ParametersVersion2026_03_02 ParseV2ParametersVersion = "2026-03-02"`
- `const ParseV2ParametersVersion2026_03_03 ParseV2ParametersVersion = "2026-03-03"`
- `const ParseV2ParametersVersion2026_03_04 ParseV2ParametersVersion = "2026-03-04"`
- `const ParseV2ParametersVersion2026_03_05 ParseV2ParametersVersion = "2026-03-05"`
- `const ParseV2ParametersVersion2026_03_09 ParseV2ParametersVersion = "2026-03-09"`
- `const ParseV2ParametersVersion2026_03_10 ParseV2ParametersVersion = "2026-03-10"`
- `const ParseV2ParametersVersion2026_03_11 ParseV2ParametersVersion = "2026-03-11"`
- `const ParseV2ParametersVersion2026_03_12 ParseV2ParametersVersion = "2026-03-12"`
- `const ParseV2ParametersVersion2026_03_17 ParseV2ParametersVersion = "2026-03-17"`
- `const ParseV2ParametersVersion2026_03_19 ParseV2ParametersVersion = "2026-03-19"`
- `const ParseV2ParametersVersion2026_03_20 ParseV2ParametersVersion = "2026-03-20"`
- `const ParseV2ParametersVersion2026_03_22 ParseV2ParametersVersion = "2026-03-22"`
- `const ParseV2ParametersVersion2026_03_23 ParseV2ParametersVersion = "2026-03-23"`
- `const ParseV2ParametersVersion2026_03_24 ParseV2ParametersVersion = "2026-03-24"`
- `const ParseV2ParametersVersion2026_03_25 ParseV2ParametersVersion = "2026-03-25"`
- `const ParseV2ParametersVersion2026_03_26 ParseV2ParametersVersion = "2026-03-26"`
- `const ParseV2ParametersVersion2026_03_27 ParseV2ParametersVersion = "2026-03-27"`
- `const ParseV2ParametersVersion2026_03_30 ParseV2ParametersVersion = "2026-03-30"`
- `const ParseV2ParametersVersion2026_03_31 ParseV2ParametersVersion = "2026-03-31"`
- `const ParseV2ParametersVersion2026_04_02 ParseV2ParametersVersion = "2026-04-02"`
- `const ParseV2ParametersVersion2026_04_06 ParseV2ParametersVersion = "2026-04-06"`
- `const ParseV2ParametersVersion2026_04_09 ParseV2ParametersVersion = "2026-04-09"`
- `const ParseV2ParametersVersion2026_04_14 ParseV2ParametersVersion = "2026-04-14"`
- `const ParseV2ParametersVersion2026_04_19 ParseV2ParametersVersion = "2026-04-19"`
- `const ParseV2ParametersVersion2026_04_22 ParseV2ParametersVersion = "2026-04-22"`
- `const ParseV2ParametersVersion2026_04_27 ParseV2ParametersVersion = "2026-04-27"`
- `const ParseV2ParametersVersionLatest ParseV2ParametersVersion = "latest"`
- `string`
- `AgenticOptions ParseV2ParametersAgenticOptionsResp`
Options for AI-powered parsing tiers (cost_effective, agentic, agentic_plus).
These options customize how the AI processes and interprets document content.
Only applicable when using non-fast tiers.
- `CustomPrompt string`
Custom instructions for the AI parser. Use to guide extraction behavior, specify output formatting, or provide domain-specific context. Example: 'Extract financial tables with currency symbols. Format dates as YYYY-MM-DD.'
- `ClientName string`
Identifier for the client/application making the request. Used for analytics and debugging. Example: 'my-app-v2'
- `CropBox ParseV2ParametersCropBoxResp`
Crop boundaries to process only a portion of each page. Values are ratios 0-1 from page edges
- `Bottom float64`
Bottom boundary as ratio (0-1). 0=top edge, 1=bottom edge. Content below this line is excluded
- `Left float64`
Left boundary as ratio (0-1). 0=left edge, 1=right edge. Content left of this line is excluded
- `Right float64`
Right boundary as ratio (0-1). 0=left edge, 1=right edge. Content right of this line is excluded
- `Top float64`
Top boundary as ratio (0-1). 0=top edge, 1=bottom edge. Content above this line is excluded
- `DisableCache bool`
Bypass result caching and force re-parsing. Use when document content may have changed or you need fresh results
- `FastOptions any`
Options for fast tier parsing (rule-based, no AI).
Fast tier uses deterministic algorithms for text extraction without AI enhancement.
It's the fastest and most cost-effective option, best suited for simple documents
with standard layouts. Currently has no configurable options but reserved for
future expansion.
- `InputOptions ParseV2ParametersInputOptionsResp`
Format-specific options (HTML, PDF, spreadsheet, presentation). Applied based on detected input file type
- `HTML ParseV2ParametersInputOptionsHTMLResp`
HTML/web page parsing options (applies to .html, .htm files)
- `MakeAllElementsVisible bool`
Force all HTML elements to be visible by overriding CSS display/visibility properties. Useful for parsing pages with hidden content or collapsed sections
- `RemoveFixedElements bool`
Remove fixed-position elements (headers, footers, floating buttons) that appear on every page render
- `RemoveNavigationElements bool`
Remove navigation elements (nav bars, sidebars, menus) to focus on main content
- `Pdf any`
PDF-specific parsing options (applies to .pdf files)
- `Presentation ParseV2ParametersInputOptionsPresentationResp`
Presentation parsing options (applies to .pptx, .ppt, .odp, .key files)
- `OutOfBoundsContent bool`
Extract content positioned outside the visible slide area. Some presentations have hidden notes or content that extends beyond slide boundaries
- `SkipEmbeddedData bool`
Skip extraction of embedded chart data tables. When true, only the visual representation of charts is captured, not the underlying data
- `Spreadsheet ParseV2ParametersInputOptionsSpreadsheetResp`
Spreadsheet parsing options (applies to .xlsx, .xls, .csv, .ods files)
- `DetectSubTablesInSheets bool`
Detect and extract multiple tables within a single sheet. Useful when spreadsheets contain several data regions separated by blank rows/columns
- `ForceFormulaComputationInSheets bool`
Compute formula results instead of extracting formula text. Use when you need calculated values rather than formula definitions
- `IncludeHiddenSheets bool`
Parse hidden sheets in addition to visible ones. By default, hidden sheets are skipped
- `OutputOptions ParseV2ParametersOutputOptionsResp`
Output formatting options for markdown, text, and extracted images
- `ExtractPrintedPageNumber bool`
Extract the printed page number as it appears in the document (e.g., 'Page 5 of 10', 'v', 'A-3'). Useful for referencing original page numbers
- `ImagesToSave []string`
Image categories to extract and save. Options: 'screenshot' (full page renders useful for visual QA), 'embedded' (images found within the document), 'layout' (cropped regions from layout detection like figures and diagrams). Empty list saves no images
- `const ParseV2ParametersOutputOptionsImagesToSaveScreenshot ParseV2ParametersOutputOptionsImagesToSave = "screenshot"`
- `const ParseV2ParametersOutputOptionsImagesToSaveEmbedded ParseV2ParametersOutputOptionsImagesToSave = "embedded"`
- `const ParseV2ParametersOutputOptionsImagesToSaveLayout ParseV2ParametersOutputOptionsImagesToSave = "layout"`
- `Markdown ParseV2ParametersOutputOptionsMarkdownResp`
Markdown formatting options including table styles and link annotations
- `AnnotateLinks bool`
Add link annotations to markdown output in the format [text](url). When false, only the link text is included
- `InlineImages bool`
Embed images directly in markdown as base64 data URIs instead of extracting them as separate files. Useful for self-contained markdown output
- `Tables ParseV2ParametersOutputOptionsMarkdownTablesResp`
Table formatting options including markdown vs HTML format and merging behavior
- `CompactMarkdownTables bool`
Remove extra whitespace padding in markdown table cells for more compact output
- `MarkdownTableMultilineSeparator string`
Separator string for multiline cell content in markdown tables. Example: '
' to preserve line breaks, ' ' to join with spaces
- `MergeContinuedTables bool`
Automatically merge tables that span multiple pages into a single table. The merged table appears on the first page with merged_from_pages metadata
- `OutputTablesAsMarkdown bool`
Output tables as markdown pipe tables instead of HTML