Skip to content
Get started

Parse File

client.parsing.create(ParsingCreateParams { tier, version, organization_id, 15 more } params, RequestOptionsoptions?): ParsingCreateResponse { id, project_id, status, 4 more }
POST/api/v2/parse

Parse a file by file ID or URL.

ParametersExpand Collapse
params: ParsingCreateParams { tier, version, organization_id, 15 more }
tier: "fast" | "cost_effective" | "agentic" | "agentic_plus"

Body param: The parsing tier to use

Accepts one of the following:
"fast"
"cost_effective"
"agentic"
"agentic_plus"
version: "2026-01-08" | "2025-12-31" | "2025-12-18" | 6 more | (string & {})

Body param: Version of the tier configuration

Accepts one of the following:
"2026-01-08" | "2025-12-31" | "2025-12-18" | 6 more
"2026-01-08"
"2025-12-31"
"2025-12-18"
"2025-12-11"
"2026-01-16"
"2026-01-21"
"2026-01-22"
"2026-01-24"
"latest"
(string & {})
organization_id?: string | null

Query param

formatuuid
project_id?: string | null

Query param

formatuuid
agentic_options?: AgenticOptions | null

Body param: Options for agentic tier parsing (with AI agents).

custom_prompt?: string | null

Custom prompt for AI-powered parsing

client_name?: string | null

Body param: Name of the client making the parsing request

crop_box?: CropBox

Body param: Document crop box boundaries

bottom?: number | null

Bottom boundary of crop box as ratio (0-1)

maximum1
minimum0
left?: number | null

Left boundary of crop box as ratio (0-1)

maximum1
minimum0
right?: number | null

Right boundary of crop box as ratio (0-1)

maximum1
minimum0
top?: number | null

Top boundary of crop box as ratio (0-1)

maximum1
minimum0
disable_cache?: boolean | null

Body param: Whether to disable caching for this parsing job

fast_options?: unknown

Body param: Options for fast tier parsing (without AI).

file_id?: string | null

Body param: ID of an existing file in the project to parse

http_proxy?: string | null

Body param: HTTP proxy URL for network requests (only used with source_url)

input_options?: InputOptions

Body param: Input format-specific parsing options

html?: HTML { make_all_elements_visible, remove_fixed_elements, remove_navigation_elements }

HTML-specific parsing options

make_all_elements_visible?: boolean | null

Make all HTML elements visible during parsing

remove_fixed_elements?: boolean | null

Remove fixed position elements from HTML

remove_navigation_elements?: boolean | null

Remove navigation elements from HTML

pdf?: unknown

PDF-specific parsing options

presentation?: Presentation { out_of_bounds_content, skip_embedded_data }

Presentation-specific parsing options

out_of_bounds_content?: boolean | null

Extract out of bounds content in presentation slides

skip_embedded_data?: boolean | null

Skip extraction of embedded data for charts in presentation slides

spreadsheet?: Spreadsheet { detect_sub_tables_in_sheets, force_formula_computation_in_sheets }

Spreadsheet-specific parsing options

detect_sub_tables_in_sheets?: boolean | null

Detect and extract sub-tables within spreadsheet cells

force_formula_computation_in_sheets?: boolean | null

Force re-computation of spreadsheet cells containing formulas

output_options?: OutputOptions

Body param: Output format and styling options

extract_printed_page_number?: boolean | null

Extract printed page numbers from the document

images_to_save?: Array<"screenshot" | "embedded" | "layout">

Image categories to save: 'screenshot' (full page), 'embedded' (images in document), 'layout' (cropped images from layout detection). Empty list means no images are saved.

Accepts one of the following:
"screenshot"
"embedded"
"layout"
markdown?: Markdown { annotate_links, inline_images, tables }

Markdown output formatting options

Add annotations to links in markdown output

inline_images?: boolean | null

Instead of transcribing images, inline them in the markdown output

tables?: Tables { compact_markdown_tables, markdown_table_multiline_separator, merge_continued_tables, output_tables_as_markdown }

Table formatting options for markdown

compact_markdown_tables?: boolean | null

Use compact formatting for markdown tables

markdown_table_multiline_separator?: string | null

Separator for multiline content in markdown tables

merge_continued_tables?: boolean | null

Merge tables that continue across or within pages. Affects markdown and items

output_tables_as_markdown?: boolean | null

Output tables in markdown format

spatial_text?: SpatialText { do_not_unroll_columns, preserve_layout_alignment_across_pages, preserve_very_small_text }

Spatial text output options

do_not_unroll_columns?: boolean | null

Keep column structure intact without unrolling

preserve_layout_alignment_across_pages?: boolean | null

Preserve text alignment across page boundaries

preserve_very_small_text?: boolean | null

Include very small text in spatial output

tables_as_spreadsheet?: TablesAsSpreadsheet { enable, guess_sheet_name }

Table export as spreadsheet options

enable?: boolean | null

Whether this option is enabled

guess_sheet_name?: boolean

Automatically guess sheet names when exporting tables

page_ranges?: PageRanges

Body param: Page range selection options

max_pages?: number | null

Maximum number of pages to process

minimum1
target_pages?: string | null

Specific pages to process (e.g., '1,3,5-8') using 1-based indexing

processing_control?: ProcessingControl

Body param: Job processing control and failure handling

job_failure_conditions?: JobFailureConditions { allowed_page_failure_ratio, fail_on_buggy_font, fail_on_image_extraction_error, 2 more }

Conditions that determine job failure

allowed_page_failure_ratio?: number | null

Maximum ratio of pages allowed to fail (0-1)

maximum1
exclusiveMinimum0
fail_on_buggy_font?: boolean | null

Fail job if buggy fonts are detected

fail_on_image_extraction_error?: boolean | null

Fail job if image extraction encounters errors

fail_on_image_ocr_error?: boolean | null

Fail job if image OCR encounters errors

fail_on_markdown_reconstruction_error?: boolean | null

Fail job if markdown reconstruction encounters errors

timeouts?: Timeouts { base_in_seconds, extra_time_per_page_in_seconds }

Timeout configuration for parsing jobs

base_in_seconds?: number | null

Base timeout in seconds (max 30 minutes)

maximum1800
exclusiveMinimum0
extra_time_per_page_in_seconds?: number | null

Additional timeout per page in seconds (max 5 minutes)

maximum300
exclusiveMinimum0
processing_options?: ProcessingOptions

Body param: Processing options shared across all tiers

aggressive_table_extraction?: boolean | null

Whether to use aggressive table extraction

auto_mode_configuration?: Array<AutoModeConfiguration> | null

Configuration for auto mode parsing with triggers and parsing options

parsing_conf: ParsingConf { adaptive_long_table, aggressive_table_extraction, crop_box, 11 more }

Configuration for parsing in auto mode (V2 format).

This uses V2 API naming conventions. The backend service will convert these to the V1 format expected by the llamaparse worker.

adaptive_long_table?: boolean | null

Whether to use adaptive long table handling

aggressive_table_extraction?: boolean | null

Whether to use aggressive table extraction

crop_box?: CropBox | null

Crop box options for auto mode parsing configuration.

bottom?: number | null

Bottom boundary of crop box as ratio (0-1)

maximum1
minimum0
left?: number | null

Left boundary of crop box as ratio (0-1)

maximum1
minimum0
right?: number | null

Right boundary of crop box as ratio (0-1)

maximum1
minimum0
top?: number | null

Top boundary of crop box as ratio (0-1)

maximum1
minimum0
custom_prompt?: string | null

Custom prompt for AI-powered parsing

extract_layout?: boolean | null

Whether to extract layout information

high_res_ocr?: boolean | null

Whether to use high resolution OCR

ignore?: Ignore | null

Ignore options for auto mode parsing configuration.

ignore_diagonal_text?: boolean | null

Whether to ignore diagonal text in the document

ignore_hidden_text?: boolean | null

Whether to ignore hidden text in the document

language?: string | null

Primary language of the document

outlined_table_extraction?: boolean | null

Whether to use outlined table extraction

presentation?: Presentation | null

Presentation-specific options for auto mode parsing configuration.

out_of_bounds_content?: boolean | null

Extract out of bounds content in presentation slides

skip_embedded_data?: boolean | null

Skip extraction of embedded data for charts in presentation slides

spatial_text?: SpatialText | null

Spatial text options for auto mode parsing configuration.

do_not_unroll_columns?: boolean | null

Keep column structure intact without unrolling

preserve_layout_alignment_across_pages?: boolean | null

Preserve text alignment across page boundaries

preserve_very_small_text?: boolean | null

Include very small text in spatial output

specialized_chart_parsing?: "agentic_plus" | "agentic" | "efficient" | null

Enable specialized chart parsing with the specified mode

Accepts one of the following:
"agentic_plus"
"agentic"
"efficient"
tier?: "fast" | "cost_effective" | "agentic" | "agentic_plus" | null

The parsing tier to use

Accepts one of the following:
"fast"
"cost_effective"
"agentic"
"agentic_plus"
version?: "2026-01-08" | "2025-12-31" | "2025-12-18" | 6 more | (string & {}) | null

Version of the tier configuration

Accepts one of the following:
"2026-01-08" | "2025-12-31" | "2025-12-18" | 6 more
"2026-01-08"
"2025-12-31"
"2025-12-18"
"2025-12-11"
"2026-01-16"
"2026-01-21"
"2026-01-22"
"2026-01-24"
"latest"
(string & {})
filename_match_glob?: string | null

Single glob pattern to match against filename

filename_match_glob_list?: Array<string> | null

List of glob patterns to match against filename

filename_regexp?: string | null

Regex pattern to match against filename

filename_regexp_mode?: string | null

Regex mode flags (e.g., 'i' for case-insensitive)

full_page_image_in_page?: boolean | null

Trigger if page contains a full-page image (scanned page detection)

full_page_image_in_page_threshold?: number | string | null

Threshold for full page image detection (0.0-1.0, default 0.8)

Accepts one of the following:
number
string
image_in_page?: boolean | null

Trigger if page contains non-screenshot images

layout_element_in_page?: string | null

Trigger if page contains this layout element type

layout_element_in_page_confidence_threshold?: number | string | null

Confidence threshold for layout element detection

Accepts one of the following:
number
string
page_contains_at_least_n_charts?: number | string | null

Trigger if page has more than N charts

Accepts one of the following:
number
string
page_contains_at_least_n_images?: number | string | null

Trigger if page has more than N images

Accepts one of the following:
number
string
page_contains_at_least_n_layout_elements?: number | string | null

Trigger if page has more than N layout elements

Accepts one of the following:
number
string
page_contains_at_least_n_lines?: number | string | null

Trigger if page has more than N lines

Accepts one of the following:
number
string

Trigger if page has more than N links

Accepts one of the following:
page_contains_at_least_n_numbers?: number | string | null

Trigger if page has more than N numeric words

Accepts one of the following:
number
string
page_contains_at_least_n_percent_numbers?: number | string | null

Trigger if page has more than N% numeric words

Accepts one of the following:
number
string
page_contains_at_least_n_tables?: number | string | null

Trigger if page has more than N tables

Accepts one of the following:
number
string
page_contains_at_least_n_words?: number | string | null

Trigger if page has more than N words

Accepts one of the following:
number
string
page_contains_at_most_n_charts?: number | string | null

Trigger if page has fewer than N charts

Accepts one of the following:
number
string
page_contains_at_most_n_images?: number | string | null

Trigger if page has fewer than N images

Accepts one of the following:
number
string
page_contains_at_most_n_layout_elements?: number | string | null

Trigger if page has fewer than N layout elements

Accepts one of the following:
number
string
page_contains_at_most_n_lines?: number | string | null

Trigger if page has fewer than N lines

Accepts one of the following:
number
string

Trigger if page has fewer than N links

Accepts one of the following:
page_contains_at_most_n_numbers?: number | string | null

Trigger if page has fewer than N numeric words

Accepts one of the following:
number
string
page_contains_at_most_n_percent_numbers?: number | string | null

Trigger if page has fewer than N% numeric words

Accepts one of the following:
number
string
page_contains_at_most_n_tables?: number | string | null

Trigger if page has fewer than N tables

Accepts one of the following:
number
string
page_contains_at_most_n_words?: number | string | null

Trigger if page has fewer than N words

Accepts one of the following:
number
string
page_longer_than_n_chars?: number | string | null

Trigger if page has more than N characters

Accepts one of the following:
number
string
page_md_error?: boolean | null

Trigger on pages with markdown extraction errors

page_shorter_than_n_chars?: number | string | null

Trigger if page has fewer than N characters

Accepts one of the following:
number
string
regexp_in_page?: string | null

Regex pattern to match in page content

regexp_in_page_mode?: string | null

Regex mode flags for regexp_in_page

table_in_page?: boolean | null

Trigger if page contains a table

text_in_page?: string | null

Trigger if page text/markdown contains this string

trigger_mode?: string | null

How to combine multiple trigger conditions: 'and' (all must match, default) or 'or' (any can match)

cost_optimizer?: CostOptimizer | null

Cost optimizer parameters for parsing configuration.

enable?: boolean | null

Use cost-optimized parsing for the document. May negatively impact parsing speed and quality.

disable_heuristics?: boolean | null

Whether to disable heuristics like outlined table extraction and adaptive long table handling

ignore?: Ignore { ignore_diagonal_text, ignore_hidden_text, ignore_text_in_image }

Options for ignoring specific text types

ignore_diagonal_text?: boolean | null

Whether to ignore diagonal text in the document

ignore_hidden_text?: boolean | null

Whether to ignore hidden text in the document

ignore_text_in_image?: boolean | null

Whether to ignore text that appears within images

ocr_parameters?: OcrParameters { languages }

OCR configuration parameters

languages?: Array<ParsingLanguages> | null

List of languages to use for OCR processing

Accepts one of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
specialized_chart_parsing?: "agentic_plus" | "agentic" | "efficient" | null

Enable specialized chart parsing with the specified mode

Accepts one of the following:
"agentic_plus"
"agentic"
"efficient"
source_url?: string | null

Body param: Source URL to fetch document from

webhook_configurations?: Array<WebhookConfiguration>

Body param: List of webhook configurations for notifications

webhook_events?: Array<string> | null

List of events that trigger webhook notifications

webhook_headers?: Record<string, unknown> | null

Custom headers to include in webhook requests

webhook_url?: string | null

Webhook URL for receiving parsing notifications

ReturnsExpand Collapse
ParsingCreateResponse { id, project_id, status, 4 more }

Response schema for a parse job.

id: string

Unique identifier for the parse job

project_id: string

Project this job belongs to

status: "PENDING" | "RUNNING" | "COMPLETED" | 2 more

Current status of the job (e.g., pending, running, completed, failed, cancelled)

Accepts one of the following:
"PENDING"
"RUNNING"
"COMPLETED"
"FAILED"
"CANCELLED"
created_at?: string | null

Creation datetime

formatdate-time
error_message?: string | null

Error message if job failed

name?: string | null

User friendly name

updated_at?: string | null

Update datetime

formatdate-time

Parse File

import LlamaCloud from '@llamaindex/llama-cloud';

const client = new LlamaCloud({
  apiKey: process.env['LLAMA_CLOUD_API_KEY'], // This is the default and can be omitted
});

const parsing = await client.parsing.create({ tier: 'fast', version: '2026-01-08' });

console.log(parsing.id);
{
  "id": "id",
  "project_id": "project_id",
  "status": "PENDING",
  "created_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "name": "name",
  "updated_at": "2019-12-27T18:11:19.117Z"
}
Returns Examples
{
  "id": "id",
  "project_id": "project_id",
  "status": "PENDING",
  "created_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "name": "name",
  "updated_at": "2019-12-27T18:11:19.117Z"
}