Extract Stateless

Deprecated

client.extraction.run(, ?): ExtractJob { id, extraction_agent, status, 3 more }

POST/api/v1/extraction/run

Stateless extraction endpoint that uses a default extraction agent in the user's default project. Requires data_schema, config, and either file_id, text, or base64 encoded file data.

ParametersExpand Collapse

params: ExtractionRunParams { config, data_schema, organization_id, 5 more }

config: ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }

Body param: The configuration parameters for the extraction

chunk_mode?: "PAGE" | "SECTION"

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox?: boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources?: boolean

Whether to cite sources for the extraction.

confidence_scores?: boolean

Whether to fetch confidence scores for the extraction.

extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

(string & {})

extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode?: boolean

Whether to use high resolution mode for the extraction.

invalidate_cache?: boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode?: boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context?: number | null

Number of pages to pass as context on long document extraction.

minimum1

page_range?: string | null

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority?: "low" | "medium" | "high" | "critical" | null

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt?: string | null

The system prompt to use for the extraction.

use_reasoning?: boolean

Whether to use reasoning for the extraction.

Body param: The schema of the data to extract

Accepts one of the following:

Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>

Record<string, unknown>

Array<unknown>

string

number

boolean

string

organization_id?: string | null

Query param

formatuuid

project_id?: string | null

Query param

formatuuid

file?: File | null

Body param: Schema for file data with base64 content and MIME type.

data: string

The file content as base64-encoded string

mime_type: string

The MIME type of the file (e.g., 'application/pdf', 'text/plain')

file_id?: string | null

Body param: The ID of the file to extract from

formatuuid

text?: string | null

Body param: The text content to extract from

webhook_configurations?: Array<WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url } > | null

Body param: The outbound webhook configurations

webhook_events?: Array<"extract.pending" | "extract.success" | "extract.error" | 14 more> | null

List of event names to subscribe to

Accepts one of the following:

"extract.pending"

"extract.success"

"extract.error"

"extract.partial_success"

"extract.cancelled"

"parse.pending"

"parse.running"

"parse.success"

"parse.error"

"parse.partial_success"

"parse.cancelled"

"classify.pending"

"classify.success"

"classify.error"

"classify.partial_success"

"classify.cancelled"

"unmapped_event"

webhook_headers?: Record<string, string> | null

Custom HTTP headers to include with webhook requests.

webhook_output_format?: string | null

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url?: string | null

The URL to send webhook notifications to.

ReturnsExpand Collapse

ExtractJob { id, extraction_agent, status, 3 more }

Schema for an extraction job.

id: string

The id of the extraction job

formatuuid

extraction_agent: ExtractAgent { id, config, data_schema, 5 more }

The agent that the job was run on.

id: string

The id of the extraction agent.

formatuuid

config: ExtractConfig { chunk_mode, citation_bbox, cite_sources, 13 more }

The configuration parameters for the extraction agent.

chunk_mode?: "PAGE" | "SECTION"

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox?: boolean

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources?: boolean

Whether to cite sources for the extraction.

confidence_scores?: boolean

Whether to fetch confidence scores for the extraction.

extract_model?: "openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more | (string & {}) | null

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

"openai-gpt-4-1" | "openai-gpt-4-1-mini" | "openai-gpt-4-1-nano" | 8 more

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

(string & {})

extraction_mode?: "FAST" | "BALANCED" | "PREMIUM" | "MULTIMODAL"

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target?: "PER_DOC" | "PER_PAGE" | "PER_TABLE_ROW"

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode?: boolean

Whether to use high resolution mode for the extraction.

invalidate_cache?: boolean

Whether to invalidate the cache for the extraction.

multimodal_fast_mode?: boolean

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context?: number | null

Number of pages to pass as context on long document extraction.

minimum1

page_range?: string | null

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model?: "openai-gpt-4o" | "openai-gpt-4o-mini" | "openai-gpt-4-1" | 23 more | null

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority?: "low" | "medium" | "high" | "critical" | null

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt?: string | null

The system prompt to use for the extraction.

use_reasoning?: boolean

Whether to use reasoning for the extraction.

data_schema: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null>

The schema of the data.

Accepts one of the following:

Record<string, unknown>

Array<unknown>

string

number

boolean

name: string

The name of the extraction agent.

project_id: string

The ID of the project that the extraction agent belongs to.

formatuuid

created_at?: string | null

The creation time of the extraction agent.

formatdate-time

custom_configuration?: "default" | null

Custom configuration type for the extraction agent. Currently supports 'default'.

updated_at?: string | null

The last update time of the extraction agent.

formatdate-time

status: "PENDING" | "SUCCESS" | "ERROR" | 2 more

The status of the extraction job

Accepts one of the following:

"PENDING"

"SUCCESS"

"ERROR"

"PARTIAL_SUCCESS"

"CANCELLED"

error?: string | null

The error that occurred during extraction

Deprecatedfile?: File { id, name, project_id, 11 more } | null

Schema for a file.

id: string

Unique identifier

formatuuid

name: string

project_id: string

The ID of the project that the file belongs to

formatuuid

created_at?: string | null

Creation datetime

formatdate-time

data_source_id?: string | null

The ID of the data source that the file belongs to

formatuuid

expires_at?: string | null

The expiration date for the file. Files past this date can be deleted.

formatdate-time

external_file_id?: string | null

The ID of the file in the external system

file_size?: number | null

Size of the file in bytes

minimum0

file_type?: string | null

File type (e.g. pdf, docx, etc.)

maxLength3000

minLength1

last_modified_at?: string | null

The last modified time of the file

formatdate-time

Permission information for the file

Accepts one of the following:

Record<string, unknown>

Array<unknown>

string

number

boolean

purpose?: string | null

The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')

Resource information for the file

Accepts one of the following:

Record<string, unknown>

Array<unknown>

string

number

boolean

updated_at?: string | null

Update datetime

formatdate-time

file_id?: string | null

The id of the file that the extract was extracted from

formatuuid

Extract Stateless

import LlamaCloud from '@llamaindex/llama-cloud';

const client = new LlamaCloud({
  apiKey: process.env['LLAMA_CLOUD_API_KEY'], // This is the default and can be omitted
});

const extractJob = await client.extraction.run({
  config: {},
  data_schema: { foo: { foo: 'bar' } },
});

console.log(extractJob.id);

{
  "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "extraction_agent": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "config": {
      "chunk_mode": "PAGE",
      "citation_bbox": true,
      "cite_sources": true,
      "confidence_scores": true,
      "extract_model": "openai-gpt-4-1",
      "extraction_mode": "FAST",
      "extraction_target": "PER_DOC",
      "high_resolution_mode": true,
      "invalidate_cache": true,
      "multimodal_fast_mode": true,
      "num_pages_context": 1,
      "page_range": "page_range",
      "parse_model": "openai-gpt-4o",
      "priority": "low",
      "system_prompt": "system_prompt",
      "use_reasoning": true
    },
    "data_schema": {
      "foo": {
        "foo": "bar"
      }
    },
    "name": "name",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "created_at": "2019-12-27T18:11:19.117Z",
    "custom_configuration": "default",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "status": "PENDING",
  "error": "error",
  "file": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "name": "x",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "created_at": "2019-12-27T18:11:19.117Z",
    "data_source_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "expires_at": "2019-12-27T18:11:19.117Z",
    "external_file_id": "external_file_id",
    "file_size": 0,
    "file_type": "x",
    "last_modified_at": "2019-12-27T18:11:19.117Z",
    "permission_info": {
      "foo": {
        "foo": "bar"
      }
    },
    "purpose": "purpose",
    "resource_info": {
      "foo": {
        "foo": "bar"
      }
    },
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "file_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e"
}

Returns Examples

{
  "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "extraction_agent": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "config": {
      "chunk_mode": "PAGE",
      "citation_bbox": true,
      "cite_sources": true,
      "confidence_scores": true,
      "extract_model": "openai-gpt-4-1",
      "extraction_mode": "FAST",
      "extraction_target": "PER_DOC",
      "high_resolution_mode": true,
      "invalidate_cache": true,
      "multimodal_fast_mode": true,
      "num_pages_context": 1,
      "page_range": "page_range",
      "parse_model": "openai-gpt-4o",
      "priority": "low",
      "system_prompt": "system_prompt",
      "use_reasoning": true
    },
    "data_schema": {
      "foo": {
        "foo": "bar"
      }
    },
    "name": "name",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "created_at": "2019-12-27T18:11:19.117Z",
    "custom_configuration": "default",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "status": "PENDING",
  "error": "error",
  "file": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "name": "x",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "created_at": "2019-12-27T18:11:19.117Z",
    "data_source_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "expires_at": "2019-12-27T18:11:19.117Z",
    "external_file_id": "external_file_id",
    "file_size": 0,
    "file_type": "x",
    "last_modified_at": "2019-12-27T18:11:19.117Z",
    "permission_info": {
      "foo": {
        "foo": "bar"
      }
    },
    "purpose": "purpose",
    "resource_info": {
      "foo": {
        "foo": "bar"
      }
    },
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "file_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e"
}