Parse File
Parse a file by file ID or URL.
Query ParametersExpand Collapse
Cookie ParametersExpand Collapse
Body ParametersExpand Collapse
tier: "fast" or "cost_effective" or "agentic" or "agentic_plus"
The parsing tier to use
version: "2026-01-08" or "2025-12-31" or "2025-12-18" or 6 more or string
Version of the tier configuration
UnionMember0 = "2026-01-08" or "2025-12-31" or "2025-12-18" or 6 more
Version of the tier configuration
agentic_options: optional object { custom_prompt }
Options for agentic tier parsing (with AI agents).
custom_prompt: optional string
Custom prompt for AI-powered parsing
client_name: optional string
Name of the client making the parsing request
crop_box: optional object { bottom, left, right, top }
Document crop box boundaries
bottom: optional number
Bottom boundary of crop box as ratio (0-1)
left: optional number
Left boundary of crop box as ratio (0-1)
right: optional number
Right boundary of crop box as ratio (0-1)
top: optional number
Top boundary of crop box as ratio (0-1)
disable_cache: optional boolean
Whether to disable caching for this parsing job
fast_options: optional unknown
Options for fast tier parsing (without AI).
file_id: optional string
ID of an existing file in the project to parse
http_proxy: optional string
HTTP proxy URL for network requests (only used with source_url)
input_options: optional object { html, pdf, presentation, spreadsheet }
Input format-specific parsing options
html: optional object { make_all_elements_visible, remove_fixed_elements, remove_navigation_elements }
HTML-specific parsing options
make_all_elements_visible: optional boolean
Make all HTML elements visible during parsing
remove_fixed_elements: optional boolean
Remove fixed position elements from HTML
remove_navigation_elements: optional boolean
Remove navigation elements from HTML
pdf: optional unknown
PDF-specific parsing options
presentation: optional object { out_of_bounds_content, skip_embedded_data }
Presentation-specific parsing options
out_of_bounds_content: optional boolean
Extract out of bounds content in presentation slides
skip_embedded_data: optional boolean
Skip extraction of embedded data for charts in presentation slides
spreadsheet: optional object { detect_sub_tables_in_sheets, force_formula_computation_in_sheets }
Spreadsheet-specific parsing options
detect_sub_tables_in_sheets: optional boolean
Detect and extract sub-tables within spreadsheet cells
force_formula_computation_in_sheets: optional boolean
Force re-computation of spreadsheet cells containing formulas
output_options: optional object { extract_printed_page_number, images_to_save, markdown, 2 more }
Output format and styling options
extract_printed_page_number: optional boolean
Extract printed page numbers from the document
images_to_save: optional array of "screenshot" or "embedded" or "layout"
Image categories to save: 'screenshot' (full page), 'embedded' (images in document), 'layout' (cropped images from layout detection). Empty list means no images are saved.
markdown: optional object { annotate_links, inline_images, tables }
Markdown output formatting options
annotate_links: optional boolean
Add annotations to links in markdown output
inline_images: optional boolean
Instead of transcribing images, inline them in the markdown output
tables: optional object { compact_markdown_tables, markdown_table_multiline_separator, merge_continued_tables, output_tables_as_markdown }
Table formatting options for markdown
compact_markdown_tables: optional boolean
Use compact formatting for markdown tables
markdown_table_multiline_separator: optional string
Separator for multiline content in markdown tables
merge_continued_tables: optional boolean
Merge tables that continue across or within pages. Affects markdown and items
output_tables_as_markdown: optional boolean
Output tables in markdown format
spatial_text: optional object { do_not_unroll_columns, preserve_layout_alignment_across_pages, preserve_very_small_text }
Spatial text output options
do_not_unroll_columns: optional boolean
Keep column structure intact without unrolling
preserve_layout_alignment_across_pages: optional boolean
Preserve text alignment across page boundaries
preserve_very_small_text: optional boolean
Include very small text in spatial output
tables_as_spreadsheet: optional object { enable, guess_sheet_name }
Table export as spreadsheet options
enable: optional boolean
Whether this option is enabled
guess_sheet_name: optional boolean
Automatically guess sheet names when exporting tables
page_ranges: optional object { max_pages, target_pages }
Page range selection options
max_pages: optional number
Maximum number of pages to process
target_pages: optional string
Specific pages to process (e.g., '1,3,5-8') using 1-based indexing
processing_control: optional object { job_failure_conditions, timeouts }
Job processing control and failure handling
job_failure_conditions: optional object { allowed_page_failure_ratio, fail_on_buggy_font, fail_on_image_extraction_error, 2 more }
Conditions that determine job failure
allowed_page_failure_ratio: optional number
Maximum ratio of pages allowed to fail (0-1)
fail_on_buggy_font: optional boolean
Fail job if buggy fonts are detected
fail_on_image_extraction_error: optional boolean
Fail job if image extraction encounters errors
fail_on_image_ocr_error: optional boolean
Fail job if image OCR encounters errors
fail_on_markdown_reconstruction_error: optional boolean
Fail job if markdown reconstruction encounters errors
timeouts: optional object { base_in_seconds, extra_time_per_page_in_seconds }
Timeout configuration for parsing jobs
base_in_seconds: optional number
Base timeout in seconds (max 30 minutes)
extra_time_per_page_in_seconds: optional number
Additional timeout per page in seconds (max 5 minutes)
processing_options: optional object { aggressive_table_extraction, auto_mode_configuration, cost_optimizer, 4 more }
Processing options shared across all tiers
aggressive_table_extraction: optional boolean
Whether to use aggressive table extraction
auto_mode_configuration: optional array of object { parsing_conf, filename_match_glob, filename_match_glob_list, 33 more }
Configuration for auto mode parsing with triggers and parsing options
parsing_conf: object { adaptive_long_table, aggressive_table_extraction, crop_box, 11 more }
Configuration for parsing in auto mode (V2 format).
This uses V2 API naming conventions. The backend service will convert these to the V1 format expected by the llamaparse worker.
adaptive_long_table: optional boolean
Whether to use adaptive long table handling
aggressive_table_extraction: optional boolean
Whether to use aggressive table extraction
crop_box: optional object { bottom, left, right, top }
Crop box options for auto mode parsing configuration.
bottom: optional number
Bottom boundary of crop box as ratio (0-1)
left: optional number
Left boundary of crop box as ratio (0-1)
right: optional number
Right boundary of crop box as ratio (0-1)
top: optional number
Top boundary of crop box as ratio (0-1)
custom_prompt: optional string
Custom prompt for AI-powered parsing
extract_layout: optional boolean
Whether to extract layout information
high_res_ocr: optional boolean
Whether to use high resolution OCR
ignore: optional object { ignore_diagonal_text, ignore_hidden_text }
Ignore options for auto mode parsing configuration.
ignore_diagonal_text: optional boolean
Whether to ignore diagonal text in the document
ignore_hidden_text: optional boolean
Whether to ignore hidden text in the document
language: optional string
Primary language of the document
outlined_table_extraction: optional boolean
Whether to use outlined table extraction
presentation: optional object { out_of_bounds_content, skip_embedded_data }
Presentation-specific options for auto mode parsing configuration.
out_of_bounds_content: optional boolean
Extract out of bounds content in presentation slides
skip_embedded_data: optional boolean
Skip extraction of embedded data for charts in presentation slides
spatial_text: optional object { do_not_unroll_columns, preserve_layout_alignment_across_pages, preserve_very_small_text }
Spatial text options for auto mode parsing configuration.
do_not_unroll_columns: optional boolean
Keep column structure intact without unrolling
preserve_layout_alignment_across_pages: optional boolean
Preserve text alignment across page boundaries
preserve_very_small_text: optional boolean
Include very small text in spatial output
specialized_chart_parsing: optional "agentic_plus" or "agentic" or "efficient"
Enable specialized chart parsing with the specified mode
tier: optional "fast" or "cost_effective" or "agentic" or "agentic_plus"
The parsing tier to use
version: optional "2026-01-08" or "2025-12-31" or "2025-12-18" or 6 more or string
Version of the tier configuration
UnionMember0 = "2026-01-08" or "2025-12-31" or "2025-12-18" or 6 more
Version of the tier configuration
filename_match_glob: optional string
Single glob pattern to match against filename
filename_match_glob_list: optional array of string
List of glob patterns to match against filename
filename_regexp: optional string
Regex pattern to match against filename
filename_regexp_mode: optional string
Regex mode flags (e.g., 'i' for case-insensitive)
full_page_image_in_page: optional boolean
Trigger if page contains a full-page image (scanned page detection)
full_page_image_in_page_threshold: optional number or string
Threshold for full page image detection (0.0-1.0, default 0.8)
image_in_page: optional boolean
Trigger if page contains non-screenshot images
layout_element_in_page: optional string
Trigger if page contains this layout element type
layout_element_in_page_confidence_threshold: optional number or string
Confidence threshold for layout element detection
page_contains_at_least_n_charts: optional number or string
Trigger if page has more than N charts
page_contains_at_least_n_images: optional number or string
Trigger if page has more than N images
page_contains_at_least_n_layout_elements: optional number or string
Trigger if page has more than N layout elements
page_contains_at_least_n_lines: optional number or string
Trigger if page has more than N lines
page_contains_at_least_n_links: optional number or string
Trigger if page has more than N links
page_contains_at_least_n_numbers: optional number or string
Trigger if page has more than N numeric words
page_contains_at_least_n_percent_numbers: optional number or string
Trigger if page has more than N% numeric words
page_contains_at_least_n_tables: optional number or string
Trigger if page has more than N tables
page_contains_at_least_n_words: optional number or string
Trigger if page has more than N words
page_contains_at_most_n_charts: optional number or string
Trigger if page has fewer than N charts
page_contains_at_most_n_images: optional number or string
Trigger if page has fewer than N images
page_contains_at_most_n_layout_elements: optional number or string
Trigger if page has fewer than N layout elements
page_contains_at_most_n_lines: optional number or string
Trigger if page has fewer than N lines
page_contains_at_most_n_links: optional number or string
Trigger if page has fewer than N links
page_contains_at_most_n_numbers: optional number or string
Trigger if page has fewer than N numeric words
page_contains_at_most_n_percent_numbers: optional number or string
Trigger if page has fewer than N% numeric words
page_contains_at_most_n_tables: optional number or string
Trigger if page has fewer than N tables
page_contains_at_most_n_words: optional number or string
Trigger if page has fewer than N words
page_longer_than_n_chars: optional number or string
Trigger if page has more than N characters
page_md_error: optional boolean
Trigger on pages with markdown extraction errors
page_shorter_than_n_chars: optional number or string
Trigger if page has fewer than N characters
regexp_in_page: optional string
Regex pattern to match in page content
regexp_in_page_mode: optional string
Regex mode flags for regexp_in_page
table_in_page: optional boolean
Trigger if page contains a table
text_in_page: optional string
Trigger if page text/markdown contains this string
trigger_mode: optional string
How to combine multiple trigger conditions: 'and' (all must match, default) or 'or' (any can match)
cost_optimizer: optional object { enable }
Cost optimizer parameters for parsing configuration.
enable: optional boolean
Use cost-optimized parsing for the document. May negatively impact parsing speed and quality.
disable_heuristics: optional boolean
Whether to disable heuristics like outlined table extraction and adaptive long table handling
ignore: optional object { ignore_diagonal_text, ignore_hidden_text, ignore_text_in_image }
Options for ignoring specific text types
ignore_diagonal_text: optional boolean
Whether to ignore diagonal text in the document
ignore_hidden_text: optional boolean
Whether to ignore hidden text in the document
ignore_text_in_image: optional boolean
Whether to ignore text that appears within images
ocr_parameters: optional object { languages }
OCR configuration parameters
List of languages to use for OCR processing
specialized_chart_parsing: optional "agentic_plus" or "agentic" or "efficient"
Enable specialized chart parsing with the specified mode
source_url: optional string
Source URL to fetch document from
webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_url }
List of webhook configurations for notifications
webhook_events: optional array of string
List of events that trigger webhook notifications
webhook_headers: optional map[unknown]
Custom headers to include in webhook requests
webhook_url: optional string
Webhook URL for receiving parsing notifications
ReturnsExpand Collapse
id: string
Unique identifier for the parse job
project_id: string
Project this job belongs to
status: "PENDING" or "RUNNING" or "COMPLETED" or 2 more
Current status of the job (e.g., pending, running, completed, failed, cancelled)
created_at: optional string
Creation datetime
error_message: optional string
Error message if job failed
name: optional string
User friendly name
updated_at: optional string
Update datetime
Parse File
curl https://api.cloud.llamaindex.ai/api/v2/parse \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-d '{
"tier": "fast",
"version": "2026-01-08"
}'{
"id": "id",
"project_id": "project_id",
"status": "PENDING",
"created_at": "2019-12-27T18:11:19.117Z",
"error_message": "error_message",
"name": "name",
"updated_at": "2019-12-27T18:11:19.117Z"
}Returns Examples
{
"id": "id",
"project_id": "project_id",
"status": "PENDING",
"created_at": "2019-12-27T18:11:19.117Z",
"error_message": "error_message",
"name": "name",
"updated_at": "2019-12-27T18:11:19.117Z"
}