Skip to content
Get started

Parse File

parsing.create(ParsingCreateParams**kwargs) -> ParsingCreateResponse
POST/api/v2/parse

Parse a file by file ID or URL.

ParametersExpand Collapse
tier: Literal["fast", "cost_effective", "agentic", "agentic_plus"]

The parsing tier to use

Accepts one of the following:
"fast"
"cost_effective"
"agentic"
"agentic_plus"
version: Union[Literal["2026-01-08", "2025-12-31", "2025-12-18", 6 more], str]

Version of the tier configuration

Accepts one of the following:
Literal["2026-01-08", "2025-12-31", "2025-12-18", 6 more]

Version of the tier configuration

Accepts one of the following:
"2026-01-08"
"2025-12-31"
"2025-12-18"
"2025-12-11"
"2026-01-16"
"2026-01-21"
"2026-01-22"
"2026-01-24"
"latest"
str
organization_id: Optional[str]
project_id: Optional[str]
agentic_options: Optional[AgenticOptions]

Options for agentic tier parsing (with AI agents).

custom_prompt: Optional[str]

Custom prompt for AI-powered parsing

client_name: Optional[str]

Name of the client making the parsing request

crop_box: Optional[CropBox]

Document crop box boundaries

bottom: Optional[float]

Bottom boundary of crop box as ratio (0-1)

maximum1
minimum0
left: Optional[float]

Left boundary of crop box as ratio (0-1)

maximum1
minimum0
right: Optional[float]

Right boundary of crop box as ratio (0-1)

maximum1
minimum0
top: Optional[float]

Top boundary of crop box as ratio (0-1)

maximum1
minimum0
disable_cache: Optional[bool]

Whether to disable caching for this parsing job

fast_options: Optional[object]

Options for fast tier parsing (without AI).

file_id: Optional[str]

ID of an existing file in the project to parse

http_proxy: Optional[str]

HTTP proxy URL for network requests (only used with source_url)

input_options: Optional[InputOptions]

Input format-specific parsing options

html: Optional[InputOptionsHTML]

HTML-specific parsing options

make_all_elements_visible: Optional[bool]

Make all HTML elements visible during parsing

remove_fixed_elements: Optional[bool]

Remove fixed position elements from HTML

remove_navigation_elements: Optional[bool]

Remove navigation elements from HTML

pdf: Optional[object]

PDF-specific parsing options

presentation: Optional[InputOptionsPresentation]

Presentation-specific parsing options

out_of_bounds_content: Optional[bool]

Extract out of bounds content in presentation slides

skip_embedded_data: Optional[bool]

Skip extraction of embedded data for charts in presentation slides

spreadsheet: Optional[InputOptionsSpreadsheet]

Spreadsheet-specific parsing options

detect_sub_tables_in_sheets: Optional[bool]

Detect and extract sub-tables within spreadsheet cells

force_formula_computation_in_sheets: Optional[bool]

Force re-computation of spreadsheet cells containing formulas

output_options: Optional[OutputOptions]

Output format and styling options

extract_printed_page_number: Optional[bool]

Extract printed page numbers from the document

images_to_save: Optional[List[Literal["screenshot", "embedded", "layout"]]]

Image categories to save: 'screenshot' (full page), 'embedded' (images in document), 'layout' (cropped images from layout detection). Empty list means no images are saved.

Accepts one of the following:
"screenshot"
"embedded"
"layout"
markdown: Optional[OutputOptionsMarkdown]

Markdown output formatting options

Add annotations to links in markdown output

inline_images: Optional[bool]

Instead of transcribing images, inline them in the markdown output

tables: Optional[OutputOptionsMarkdownTables]

Table formatting options for markdown

compact_markdown_tables: Optional[bool]

Use compact formatting for markdown tables

markdown_table_multiline_separator: Optional[str]

Separator for multiline content in markdown tables

merge_continued_tables: Optional[bool]

Merge tables that continue across or within pages. Affects markdown and items

output_tables_as_markdown: Optional[bool]

Output tables in markdown format

spatial_text: Optional[OutputOptionsSpatialText]

Spatial text output options

do_not_unroll_columns: Optional[bool]

Keep column structure intact without unrolling

preserve_layout_alignment_across_pages: Optional[bool]

Preserve text alignment across page boundaries

preserve_very_small_text: Optional[bool]

Include very small text in spatial output

tables_as_spreadsheet: Optional[OutputOptionsTablesAsSpreadsheet]

Table export as spreadsheet options

enable: Optional[bool]

Whether this option is enabled

guess_sheet_name: Optional[bool]

Automatically guess sheet names when exporting tables

page_ranges: Optional[PageRanges]

Page range selection options

max_pages: Optional[int]

Maximum number of pages to process

minimum1
target_pages: Optional[str]

Specific pages to process (e.g., '1,3,5-8') using 1-based indexing

processing_control: Optional[ProcessingControl]

Job processing control and failure handling

job_failure_conditions: Optional[ProcessingControlJobFailureConditions]

Conditions that determine job failure

allowed_page_failure_ratio: Optional[float]

Maximum ratio of pages allowed to fail (0-1)

maximum1
exclusiveMinimum0
fail_on_buggy_font: Optional[bool]

Fail job if buggy fonts are detected

fail_on_image_extraction_error: Optional[bool]

Fail job if image extraction encounters errors

fail_on_image_ocr_error: Optional[bool]

Fail job if image OCR encounters errors

fail_on_markdown_reconstruction_error: Optional[bool]

Fail job if markdown reconstruction encounters errors

timeouts: Optional[ProcessingControlTimeouts]

Timeout configuration for parsing jobs

base_in_seconds: Optional[int]

Base timeout in seconds (max 30 minutes)

maximum1800
exclusiveMinimum0
extra_time_per_page_in_seconds: Optional[int]

Additional timeout per page in seconds (max 5 minutes)

maximum300
exclusiveMinimum0
processing_options: Optional[ProcessingOptions]

Processing options shared across all tiers

aggressive_table_extraction: Optional[bool]

Whether to use aggressive table extraction

auto_mode_configuration: Optional[Iterable[ProcessingOptionsAutoModeConfiguration]]

Configuration for auto mode parsing with triggers and parsing options

parsing_conf: ProcessingOptionsAutoModeConfigurationParsingConf

Configuration for parsing in auto mode (V2 format).

This uses V2 API naming conventions. The backend service will convert these to the V1 format expected by the llamaparse worker.

adaptive_long_table: Optional[bool]

Whether to use adaptive long table handling

aggressive_table_extraction: Optional[bool]

Whether to use aggressive table extraction

crop_box: Optional[ProcessingOptionsAutoModeConfigurationParsingConfCropBox]

Crop box options for auto mode parsing configuration.

bottom: Optional[float]

Bottom boundary of crop box as ratio (0-1)

maximum1
minimum0
left: Optional[float]

Left boundary of crop box as ratio (0-1)

maximum1
minimum0
right: Optional[float]

Right boundary of crop box as ratio (0-1)

maximum1
minimum0
top: Optional[float]

Top boundary of crop box as ratio (0-1)

maximum1
minimum0
custom_prompt: Optional[str]

Custom prompt for AI-powered parsing

extract_layout: Optional[bool]

Whether to extract layout information

high_res_ocr: Optional[bool]

Whether to use high resolution OCR

ignore: Optional[ProcessingOptionsAutoModeConfigurationParsingConfIgnore]

Ignore options for auto mode parsing configuration.

ignore_diagonal_text: Optional[bool]

Whether to ignore diagonal text in the document

ignore_hidden_text: Optional[bool]

Whether to ignore hidden text in the document

language: Optional[str]

Primary language of the document

outlined_table_extraction: Optional[bool]

Whether to use outlined table extraction

presentation: Optional[ProcessingOptionsAutoModeConfigurationParsingConfPresentation]

Presentation-specific options for auto mode parsing configuration.

out_of_bounds_content: Optional[bool]

Extract out of bounds content in presentation slides

skip_embedded_data: Optional[bool]

Skip extraction of embedded data for charts in presentation slides

spatial_text: Optional[ProcessingOptionsAutoModeConfigurationParsingConfSpatialText]

Spatial text options for auto mode parsing configuration.

do_not_unroll_columns: Optional[bool]

Keep column structure intact without unrolling

preserve_layout_alignment_across_pages: Optional[bool]

Preserve text alignment across page boundaries

preserve_very_small_text: Optional[bool]

Include very small text in spatial output

specialized_chart_parsing: Optional[Literal["agentic_plus", "agentic", "efficient"]]

Enable specialized chart parsing with the specified mode

Accepts one of the following:
"agentic_plus"
"agentic"
"efficient"
tier: Optional[Literal["fast", "cost_effective", "agentic", "agentic_plus"]]

The parsing tier to use

Accepts one of the following:
"fast"
"cost_effective"
"agentic"
"agentic_plus"
version: Optional[Union[Literal["2026-01-08", "2025-12-31", "2025-12-18", 6 more], str, null]]

Version of the tier configuration

Accepts one of the following:
Literal["2026-01-08", "2025-12-31", "2025-12-18", 6 more]

Version of the tier configuration

Accepts one of the following:
"2026-01-08"
"2025-12-31"
"2025-12-18"
"2025-12-11"
"2026-01-16"
"2026-01-21"
"2026-01-22"
"2026-01-24"
"latest"
str
filename_match_glob: Optional[str]

Single glob pattern to match against filename

filename_match_glob_list: Optional[SequenceNotStr[str]]

List of glob patterns to match against filename

filename_regexp: Optional[str]

Regex pattern to match against filename

filename_regexp_mode: Optional[str]

Regex mode flags (e.g., 'i' for case-insensitive)

full_page_image_in_page: Optional[bool]

Trigger if page contains a full-page image (scanned page detection)

full_page_image_in_page_threshold: Optional[Union[float, str, null]]

Threshold for full page image detection (0.0-1.0, default 0.8)

Accepts one of the following:
float
str
image_in_page: Optional[bool]

Trigger if page contains non-screenshot images

layout_element_in_page: Optional[str]

Trigger if page contains this layout element type

layout_element_in_page_confidence_threshold: Optional[Union[float, str, null]]

Confidence threshold for layout element detection

Accepts one of the following:
float
str
page_contains_at_least_n_charts: Optional[Union[int, str, null]]

Trigger if page has more than N charts

Accepts one of the following:
int
str
page_contains_at_least_n_images: Optional[Union[int, str, null]]

Trigger if page has more than N images

Accepts one of the following:
int
str
page_contains_at_least_n_layout_elements: Optional[Union[int, str, null]]

Trigger if page has more than N layout elements

Accepts one of the following:
int
str
page_contains_at_least_n_lines: Optional[Union[int, str, null]]

Trigger if page has more than N lines

Accepts one of the following:
int
str

Trigger if page has more than N links

Accepts one of the following:
page_contains_at_least_n_numbers: Optional[Union[int, str, null]]

Trigger if page has more than N numeric words

Accepts one of the following:
int
str
page_contains_at_least_n_percent_numbers: Optional[Union[int, str, null]]

Trigger if page has more than N% numeric words

Accepts one of the following:
int
str
page_contains_at_least_n_tables: Optional[Union[int, str, null]]

Trigger if page has more than N tables

Accepts one of the following:
int
str
page_contains_at_least_n_words: Optional[Union[int, str, null]]

Trigger if page has more than N words

Accepts one of the following:
int
str
page_contains_at_most_n_charts: Optional[Union[int, str, null]]

Trigger if page has fewer than N charts

Accepts one of the following:
int
str
page_contains_at_most_n_images: Optional[Union[int, str, null]]

Trigger if page has fewer than N images

Accepts one of the following:
int
str
page_contains_at_most_n_layout_elements: Optional[Union[int, str, null]]

Trigger if page has fewer than N layout elements

Accepts one of the following:
int
str
page_contains_at_most_n_lines: Optional[Union[int, str, null]]

Trigger if page has fewer than N lines

Accepts one of the following:
int
str

Trigger if page has fewer than N links

Accepts one of the following:
page_contains_at_most_n_numbers: Optional[Union[int, str, null]]

Trigger if page has fewer than N numeric words

Accepts one of the following:
int
str
page_contains_at_most_n_percent_numbers: Optional[Union[int, str, null]]

Trigger if page has fewer than N% numeric words

Accepts one of the following:
int
str
page_contains_at_most_n_tables: Optional[Union[int, str, null]]

Trigger if page has fewer than N tables

Accepts one of the following:
int
str
page_contains_at_most_n_words: Optional[Union[int, str, null]]

Trigger if page has fewer than N words

Accepts one of the following:
int
str
page_longer_than_n_chars: Optional[Union[int, str, null]]

Trigger if page has more than N characters

Accepts one of the following:
int
str
page_md_error: Optional[bool]

Trigger on pages with markdown extraction errors

page_shorter_than_n_chars: Optional[Union[int, str, null]]

Trigger if page has fewer than N characters

Accepts one of the following:
int
str
regexp_in_page: Optional[str]

Regex pattern to match in page content

regexp_in_page_mode: Optional[str]

Regex mode flags for regexp_in_page

table_in_page: Optional[bool]

Trigger if page contains a table

text_in_page: Optional[str]

Trigger if page text/markdown contains this string

trigger_mode: Optional[str]

How to combine multiple trigger conditions: 'and' (all must match, default) or 'or' (any can match)

cost_optimizer: Optional[ProcessingOptionsCostOptimizer]

Cost optimizer parameters for parsing configuration.

enable: Optional[bool]

Use cost-optimized parsing for the document. May negatively impact parsing speed and quality.

disable_heuristics: Optional[bool]

Whether to disable heuristics like outlined table extraction and adaptive long table handling

ignore: Optional[ProcessingOptionsIgnore]

Options for ignoring specific text types

ignore_diagonal_text: Optional[bool]

Whether to ignore diagonal text in the document

ignore_hidden_text: Optional[bool]

Whether to ignore hidden text in the document

ignore_text_in_image: Optional[bool]

Whether to ignore text that appears within images

ocr_parameters: Optional[ProcessingOptionsOcrParameters]

OCR configuration parameters

languages: Optional[List[ParsingLanguages]]

List of languages to use for OCR processing

Accepts one of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
specialized_chart_parsing: Optional[Literal["agentic_plus", "agentic", "efficient"]]

Enable specialized chart parsing with the specified mode

Accepts one of the following:
"agentic_plus"
"agentic"
"efficient"
source_url: Optional[str]

Source URL to fetch document from

webhook_configurations: Optional[Iterable[WebhookConfiguration]]

List of webhook configurations for notifications

webhook_events: Optional[SequenceNotStr[str]]

List of events that trigger webhook notifications

webhook_headers: Optional[Dict[str, object]]

Custom headers to include in webhook requests

webhook_url: Optional[str]

Webhook URL for receiving parsing notifications

ReturnsExpand Collapse
class ParsingCreateResponse:

Response schema for a parse job.

id: str

Unique identifier for the parse job

project_id: str

Project this job belongs to

status: Literal["PENDING", "RUNNING", "COMPLETED", 2 more]

Current status of the job (e.g., pending, running, completed, failed, cancelled)

Accepts one of the following:
"PENDING"
"RUNNING"
"COMPLETED"
"FAILED"
"CANCELLED"
created_at: Optional[datetime]

Creation datetime

formatdate-time
error_message: Optional[str]

Error message if job failed

name: Optional[str]

User friendly name

updated_at: Optional[datetime]

Update datetime

formatdate-time

Parse File

import os
from llama_cloud import LlamaCloud

client = LlamaCloud(
    api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),  # This is the default and can be omitted
)
parsing = client.parsing.create(
    tier="fast",
    version="2026-01-08",
)
print(parsing.id)
{
  "id": "id",
  "project_id": "project_id",
  "status": "PENDING",
  "created_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "name": "name",
  "updated_at": "2019-12-27T18:11:19.117Z"
}
Returns Examples
{
  "id": "id",
  "project_id": "project_id",
  "status": "PENDING",
  "created_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "name": "name",
  "updated_at": "2019-12-27T18:11:19.117Z"
}