Parse File

POST/api/v2/parse

Parse a file by file ID or URL.

Query ParametersExpand Collapse

organization_id: optional string

project_id: optional string

Cookie ParametersExpand Collapse

session: optional string

Body ParametersExpand Collapse

tier: "fast" or "cost_effective" or "agentic" or "agentic_plus"

The parsing tier to use

Accepts one of the following:

"fast"

"cost_effective"

"agentic"

"agentic_plus"

version: "2025-12-11" or "2025-12-18" or "2025-12-31" or 21 more or string

Version of the tier configuration

Accepts one of the following:

UnionMember0 = "2025-12-11" or "2025-12-18" or "2025-12-31" or 21 more

Version of the tier configuration

Accepts one of the following:

"2025-12-11"

"2025-12-18"

"2025-12-31"

"2026-01-08"

"2026-01-09"

"2026-01-16"

"2026-01-21"

"2026-01-22"

"2026-01-24"

"2026-01-29"

"2026-01-30"

"2026-02-03"

"2026-02-18"

"2026-02-20"

"2026-02-24"

"2026-02-26"

"2026-03-02"

"2026-03-03"

"2026-03-04"

"2026-03-05"

"2026-03-09"

"2026-03-10"

"2026-03-11"

"latest"

UnionMember1 = string

agentic_options: optional object { custom_prompt }

Options for agentic tier parsing (with AI agents).

custom_prompt: optional string

Custom prompt for AI-powered parsing

client_name: optional string

Name of the client making the parsing request

crop_box: optional object { bottom, left, right, top }

Document crop box boundaries

bottom: optional number

Bottom boundary of crop box as ratio (0-1)

maximum1

minimum0

left: optional number

Left boundary of crop box as ratio (0-1)

maximum1

minimum0

right: optional number

Right boundary of crop box as ratio (0-1)

maximum1

minimum0

top: optional number

Top boundary of crop box as ratio (0-1)

maximum1

minimum0

disable_cache: optional boolean

Whether to disable caching for this parsing job

fast_options: optional unknown

Options for fast tier parsing (without AI).

file_id: optional string

ID of an existing file in the project to parse

http_proxy: optional string

HTTP proxy URL for network requests (only used with source_url)

input_options: optional object { html, pdf, presentation, spreadsheet }

Input format-specific parsing options

html: optional object { make_all_elements_visible, remove_fixed_elements, remove_navigation_elements }

HTML-specific parsing options

make_all_elements_visible: optional boolean

Make all HTML elements visible during parsing

remove_fixed_elements: optional boolean

Remove fixed position elements from HTML

remove_navigation_elements: optional boolean

Remove navigation elements from HTML

pdf: optional unknown

PDF-specific parsing options

presentation: optional object { out_of_bounds_content, skip_embedded_data }

Presentation-specific parsing options

out_of_bounds_content: optional boolean

Extract out of bounds content in presentation slides

skip_embedded_data: optional boolean

Skip extraction of embedded data for charts in presentation slides

spreadsheet: optional object { detect_sub_tables_in_sheets, force_formula_computation_in_sheets, include_hidden_sheets }

Spreadsheet-specific parsing options

detect_sub_tables_in_sheets: optional boolean

Detect and extract sub-tables within spreadsheet cells

force_formula_computation_in_sheets: optional boolean

Force re-computation of spreadsheet cells containing formulas

include_hidden_sheets: optional boolean

Include hidden sheets when parsing spreadsheet files

output_options: optional object { extract_printed_page_number, images_to_save, markdown, 2 more }

Output format and styling options

extract_printed_page_number: optional boolean

Extract printed page numbers from the document

images_to_save: optional array of "screenshot" or "embedded" or "layout"

Image categories to save: 'screenshot' (full page), 'embedded' (images in document), 'layout' (cropped images from layout detection). Empty list means no images are saved.

Accepts one of the following:

"screenshot"

"embedded"

"layout"

markdown: optional object { annotate_links, inline_images, tables }

Markdown output formatting options

annotate_links: optional boolean

Add annotations to links in markdown output

inline_images: optional boolean

Instead of transcribing images, inline them in the markdown output

tables: optional object { compact_markdown_tables, markdown_table_multiline_separator, merge_continued_tables, output_tables_as_markdown }

Table formatting options for markdown

compact_markdown_tables: optional boolean

Use compact formatting for markdown tables

markdown_table_multiline_separator: optional string

Separator for multiline content in markdown tables

merge_continued_tables: optional boolean

Merge tables that continue across or within pages. Affects markdown and items

output_tables_as_markdown: optional boolean

Output tables in markdown format

spatial_text: optional object { do_not_unroll_columns, preserve_layout_alignment_across_pages, preserve_very_small_text }

Spatial text output options

do_not_unroll_columns: optional boolean

Keep column structure intact without unrolling

preserve_layout_alignment_across_pages: optional boolean

Preserve text alignment across page boundaries

preserve_very_small_text: optional boolean

Include very small text in spatial output

tables_as_spreadsheet: optional object { enable, guess_sheet_name }

Table export as spreadsheet options

enable: optional boolean

Whether this option is enabled

guess_sheet_name: optional boolean

Automatically guess sheet names when exporting tables

page_ranges: optional object { max_pages, target_pages }

Page range selection options

max_pages: optional number

Maximum number of pages to process

minimum1

target_pages: optional string

Specific pages to process (e.g., '1,3,5-8') using 1-based indexing

processing_control: optional object { job_failure_conditions, timeouts }

Job processing control and failure handling

job_failure_conditions: optional object { allowed_page_failure_ratio, fail_on_buggy_font, fail_on_image_extraction_error, 2 more }

Conditions that determine job failure

allowed_page_failure_ratio: optional number

Maximum ratio of pages allowed to fail (0-1)

maximum1

exclusiveMinimum0

fail_on_buggy_font: optional boolean

Fail job if buggy fonts are detected

fail_on_image_extraction_error: optional boolean

Fail job if image extraction encounters errors

fail_on_image_ocr_error: optional boolean

Fail job if image OCR encounters errors

fail_on_markdown_reconstruction_error: optional boolean

Fail job if markdown reconstruction encounters errors

timeouts: optional object { base_in_seconds, extra_time_per_page_in_seconds }

Timeout configuration for parsing jobs

base_in_seconds: optional number

Base timeout in seconds (max 30 minutes)

maximum1800

exclusiveMinimum0

extra_time_per_page_in_seconds: optional number

Additional timeout per page in seconds (max 5 minutes)

maximum300

exclusiveMinimum0

processing_options: optional object { aggressive_table_extraction, auto_mode_configuration, cost_optimizer, 4 more }

Processing options shared across all tiers

aggressive_table_extraction: optional boolean

Whether to use aggressive table extraction

auto_mode_configuration: optional array of object { parsing_conf, filename_match_glob, filename_match_glob_list, 33 more }

Configuration for auto mode parsing with triggers and parsing options

parsing_conf: object { adaptive_long_table, aggressive_table_extraction, crop_box, 11 more }

Configuration for parsing in auto mode (V2 format).

This uses V2 API naming conventions. The backend service will convert these to the V1 format expected by the llamaparse worker.

adaptive_long_table: optional boolean

Whether to use adaptive long table handling

aggressive_table_extraction: optional boolean

Whether to use aggressive table extraction

crop_box: optional object { bottom, left, right, top }

Crop box options for auto mode parsing configuration.

bottom: optional number

Bottom boundary of crop box as ratio (0-1)

maximum1

minimum0

left: optional number

Left boundary of crop box as ratio (0-1)

maximum1

minimum0

right: optional number

Right boundary of crop box as ratio (0-1)

maximum1

minimum0

top: optional number

Top boundary of crop box as ratio (0-1)

maximum1

minimum0

custom_prompt: optional string

Custom prompt for AI-powered parsing

extract_layout: optional boolean

Whether to extract layout information

high_res_ocr: optional boolean

Whether to use high resolution OCR

ignore: optional object { ignore_diagonal_text, ignore_hidden_text }

Ignore options for auto mode parsing configuration.

ignore_diagonal_text: optional boolean

Whether to ignore diagonal text in the document

ignore_hidden_text: optional boolean

Whether to ignore hidden text in the document

language: optional string

Primary language of the document

outlined_table_extraction: optional boolean

Whether to use outlined table extraction

presentation: optional object { out_of_bounds_content, skip_embedded_data }

Presentation-specific options for auto mode parsing configuration.

out_of_bounds_content: optional boolean

Extract out of bounds content in presentation slides

skip_embedded_data: optional boolean

Skip extraction of embedded data for charts in presentation slides

spatial_text: optional object { do_not_unroll_columns, preserve_layout_alignment_across_pages, preserve_very_small_text }

Spatial text options for auto mode parsing configuration.

do_not_unroll_columns: optional boolean

Keep column structure intact without unrolling

preserve_layout_alignment_across_pages: optional boolean

Preserve text alignment across page boundaries

preserve_very_small_text: optional boolean

Include very small text in spatial output

specialized_chart_parsing: optional "agentic_plus" or "agentic" or "efficient"

Enable specialized chart parsing with the specified mode

Accepts one of the following:

"agentic_plus"

"agentic"

"efficient"

tier: optional "fast" or "cost_effective" or "agentic" or "agentic_plus"

The parsing tier to use

Accepts one of the following:

"fast"

"cost_effective"

"agentic"

"agentic_plus"

version: optional "2025-12-11" or "2025-12-18" or "2025-12-31" or 21 more or string

Version of the tier configuration

Accepts one of the following:

UnionMember0 = "2025-12-11" or "2025-12-18" or "2025-12-31" or 21 more

Version of the tier configuration

Accepts one of the following:

"2025-12-11"

"2025-12-18"

"2025-12-31"

"2026-01-08"

"2026-01-09"

"2026-01-16"

"2026-01-21"

"2026-01-22"

"2026-01-24"

"2026-01-29"

"2026-01-30"

"2026-02-03"

"2026-02-18"

"2026-02-20"

"2026-02-24"

"2026-02-26"

"2026-03-02"

"2026-03-03"

"2026-03-04"

"2026-03-05"

"2026-03-09"

"2026-03-10"

"2026-03-11"

"latest"

UnionMember1 = string

filename_match_glob: optional string

Single glob pattern to match against filename

filename_match_glob_list: optional array of string

List of glob patterns to match against filename

filename_regexp: optional string

Regex pattern to match against filename

filename_regexp_mode: optional string

Regex mode flags (e.g., 'i' for case-insensitive)

full_page_image_in_page: optional boolean

Trigger if page contains a full-page image (scanned page detection)

full_page_image_in_page_threshold: optional number or string

Threshold for full page image detection (0.0-1.0, default 0.8)

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

image_in_page: optional boolean

Trigger if page contains non-screenshot images

layout_element_in_page: optional string

Trigger if page contains this layout element type

layout_element_in_page_confidence_threshold: optional number or string

Confidence threshold for layout element detection

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_charts: optional number or string

Trigger if page has more than N charts

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_images: optional number or string

Trigger if page has more than N images

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_layout_elements: optional number or string

Trigger if page has more than N layout elements

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_lines: optional number or string

Trigger if page has more than N lines

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_links: optional number or string

Trigger if page has more than N links

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_numbers: optional number or string

Trigger if page has more than N numeric words

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_percent_numbers: optional number or string

Trigger if page has more than N% numeric words

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_tables: optional number or string

Trigger if page has more than N tables

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_least_n_words: optional number or string

Trigger if page has more than N words

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_charts: optional number or string

Trigger if page has fewer than N charts

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_images: optional number or string

Trigger if page has fewer than N images

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_layout_elements: optional number or string

Trigger if page has fewer than N layout elements

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_lines: optional number or string

Trigger if page has fewer than N lines

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_links: optional number or string

Trigger if page has fewer than N links

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_numbers: optional number or string

Trigger if page has fewer than N numeric words

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_percent_numbers: optional number or string

Trigger if page has fewer than N% numeric words

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_tables: optional number or string

Trigger if page has fewer than N tables

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_contains_at_most_n_words: optional number or string

Trigger if page has fewer than N words

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_longer_than_n_chars: optional number or string

Trigger if page has more than N characters

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

page_md_error: optional boolean

Trigger on pages with markdown extraction errors

page_shorter_than_n_chars: optional number or string

Trigger if page has fewer than N characters

Accepts one of the following:

UnionMember0 = number

UnionMember1 = string

regexp_in_page: optional string

Regex pattern to match in page content

regexp_in_page_mode: optional string

Regex mode flags for regexp_in_page

table_in_page: optional boolean

Trigger if page contains a table

text_in_page: optional string

Trigger if page text/markdown contains this string

trigger_mode: optional string

How to combine multiple trigger conditions: 'and' (all must match, default) or 'or' (any can match)

cost_optimizer: optional object { enable }

Cost optimizer parameters for parsing configuration.

enable: optional boolean

Use cost-optimized parsing for the document. May negatively impact parsing speed and quality.

disable_heuristics: optional boolean

Whether to disable heuristics like outlined table extraction and adaptive long table handling

ignore: optional object { ignore_diagonal_text, ignore_hidden_text, ignore_text_in_image }

Options for ignoring specific text types

ignore_diagonal_text: optional boolean

Whether to ignore diagonal text in the document

ignore_hidden_text: optional boolean

Whether to ignore hidden text in the document

ignore_text_in_image: optional boolean

Whether to ignore text that appears within images

ocr_parameters: optional object { languages }

OCR configuration parameters

languages: optional array of ParsingLanguages

List of languages to use for OCR processing

Accepts one of the following:

"af"

"az"

"bs"

"cs"

"cy"

"da"

"de"

"en"

"es"

"et"

"fr"

"ga"

"hr"

"hu"

"id"

"is"

"it"

"ku"

"la"

"lt"

"lv"

"mi"

"ms"

"mt"

"nl"

"no"

"oc"

"pi"

"pl"

"pt"

"ro"

"rs_latin"

"sk"

"sl"

"sq"

"sv"

"sw"

"tl"

"tr"

"uz"

"vi"

"ar"

"fa"

"ug"

"ur"

"bn"

"as"

"mni"

"ru"

"rs_cyrillic"

"be"

"bg"

"uk"

"mn"

"abq"

"ady"

"kbd"

"ava"

"dar"

"inh"

"che"

"lbe"

"lez"

"tab"

"tjk"

"hi"

"mr"

"ne"

"bh"

"mai"

"ang"

"bho"

"mah"

"sck"

"new"

"gom"

"sa"

"bgc"

"th"

"ch_sim"

"ch_tra"

"ja"

"ko"

"ta"

"te"

"kn"

specialized_chart_parsing: optional "agentic_plus" or "agentic" or "efficient"

Enable specialized chart parsing with the specified mode

Accepts one of the following:

"agentic_plus"

"agentic"

"efficient"

source_url: optional string

Source URL to fetch document from

webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_url }

List of webhook configurations for notifications

webhook_events: optional array of string

List of events that trigger webhook notifications

webhook_headers: optional map[unknown]

Custom headers to include in webhook requests

webhook_url: optional string

Webhook URL for receiving parsing notifications

ReturnsExpand Collapse

id: string

Unique identifier for the parse job

project_id: string

Project this job belongs to

status: "PENDING" or "RUNNING" or "COMPLETED" or 2 more

Current status of the job (e.g., pending, running, completed, failed, cancelled)

Accepts one of the following:

"PENDING"

"RUNNING"

"COMPLETED"

"FAILED"

"CANCELLED"

created_at: optional string

Creation datetime

formatdate-time

error_message: optional string

Error message if job failed

name: optional string

User friendly name

updated_at: optional string

Update datetime

formatdate-time

Parse File

curl https://api.cloud.llamaindex.ai/api/v2/parse \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
    -d '{
          "tier": "fast",
          "version": "2025-12-11"
        }'

{
  "id": "id",
  "project_id": "project_id",
  "status": "PENDING",
  "created_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "name": "name",
  "updated_at": "2019-12-27T18:11:19.117Z"
}

Returns Examples

{
  "id": "id",
  "project_id": "project_id",
  "status": "PENDING",
  "created_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "name": "name",
  "updated_at": "2019-12-27T18:11:19.117Z"
}