# Parsing ## Parse File `ParsingCreateResponse parsing().create(ParsingCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **post** `/api/v2/parse` Parse a file by file ID or URL. Provide either `file_id` (a previously uploaded file) or `source_url` (a publicly accessible URL). Configure parsing with options like `tier`, `target_pages`, and `lang`. ## Tiers - `fast` — rule-based, cheapest, no AI - `cost_effective` — balanced speed and quality - `agentic` — full AI-powered parsing - `agentic_plus` — premium AI with specialized features The job runs asynchronously. Poll `GET /parse/{job_id}` with `expand=text` or `expand=markdown` to retrieve results. ### Parameters - `ParsingCreateParams params` - `Optional organizationId` - `Optional projectId` - `Tier tier` Parsing tier: 'fast' (rule-based, cheapest), 'cost_effective' (balanced), 'agentic' (AI-powered with custom prompts), or 'agentic_plus' (premium AI with highest accuracy) - `FAST("fast")` - `COST_EFFECTIVE("cost_effective")` - `AGENTIC("agentic")` - `AGENTIC_PLUS("agentic_plus")` - `Version version` Version for the selected tier. Use `latest`, or pin one of that tier's dated versions. Current `latest` by tier: - `fast`: `2025-12-11` - `cost_effective`: `2026-06-05` - `agentic`: `2026-06-04` - `agentic_plus`: `2026-06-04` Full list: `GET /api/v2/parse/versions`. - `LATEST("latest")` - `_2026_06_05("2026-06-05")` - `_2026_06_04("2026-06-04")` - `_2025_12_11("2025-12-11")` - `Optional agenticOptions` Options for AI-powered parsing tiers (cost_effective, agentic, agentic_plus). These options customize how the AI processes and interprets document content. Only applicable when using non-fast tiers. - `Optional customPrompt` Custom instructions for the AI parser. Use to guide extraction behavior, specify output formatting, or provide domain-specific context. Example: 'Extract financial tables with currency symbols. Format dates as YYYY-MM-DD.' - `Optional clientName` Identifier for the client/application making the request. Used for analytics and debugging. Example: 'my-app-v2' - `Optional cropBox` Crop boundaries to process only a portion of each page. Values are ratios 0-1 from page edges - `Optional bottom` Bottom boundary as ratio (0-1). 0=top edge, 1=bottom edge. Content below this line is excluded - `Optional left` Left boundary as ratio (0-1). 0=left edge, 1=right edge. Content left of this line is excluded - `Optional right` Right boundary as ratio (0-1). 0=left edge, 1=right edge. Content right of this line is excluded - `Optional top` Top boundary as ratio (0-1). 0=top edge, 1=bottom edge. Content above this line is excluded - `Optional disableCache` Bypass result caching and force re-parsing. Use when document content may have changed or you need fresh results - `Optional fastOptions` Options for fast tier parsing (rule-based, no AI). Fast tier uses deterministic algorithms for text extraction without AI enhancement. It's the fastest and most cost-effective option, best suited for simple documents with standard layouts. Currently has no configurable options but reserved for future expansion. - `Optional fileId` ID of an existing file in the project to parse. Mutually exclusive with source_url - `Optional httpProxy` HTTP/HTTPS proxy for fetching source_url. Ignored if using file_id - `Optional inputOptions` Format-specific options (HTML, PDF, spreadsheet, presentation). Applied based on detected input file type - `Optional html` HTML/web page parsing options (applies to .html, .htm files) - `Optional makeAllElementsVisible` Force all HTML elements to be visible by overriding CSS display/visibility properties. Useful for parsing pages with hidden content or collapsed sections - `Optional removeFixedElements` Remove fixed-position elements (headers, footers, floating buttons) that appear on every page render - `Optional removeNavigationElements` Remove navigation elements (nav bars, sidebars, menus) to focus on main content - `Optional pdf` PDF-specific parsing options (applies to .pdf files) - `Optional presentation` Presentation parsing options (applies to .pptx, .ppt, .odp, .key files) - `Optional outOfBoundsContent` Extract content positioned outside the visible slide area. Some presentations have hidden notes or content that extends beyond slide boundaries - `Optional skipEmbeddedData` Skip extraction of embedded chart data tables. When true, only the visual representation of charts is captured, not the underlying data - `Optional spreadsheet` Spreadsheet parsing options (applies to .xlsx, .xls, .csv, .ods files) - `Optional detectSubTablesInSheets` Detect and extract multiple tables within a single sheet. Useful when spreadsheets contain several data regions separated by blank rows/columns - `Optional forceFormulaComputationInSheets` Compute formula results instead of extracting formula text. Use when you need calculated values rather than formula definitions - `Optional includeHiddenSheets` Parse hidden sheets in addition to visible ones. By default, hidden sheets are skipped - `Optional outputOptions` Output formatting options for markdown, text, and extracted images - `Optional> additionalOutputs` Optional additional output artifacts to save alongside the primary parse output. Each value opts in to generating and persisting one extra file; the empty list (default) saves none. The three accepted values are: 'stripped_md' — per-page markdown stripped of formatting (links, bold/italic, images, HTML), saved as JSON for full-text-search indexing; fetch via `expand=stripped_markdown_content_metadata`. 'concatenated_stripped_txt' — all stripped pages concatenated into a single plain-text file with `\n\n---\n\n` between pages, useful for feeding the document into search or embedding pipelines as one blob; fetch via `expand=concatenated_stripped_markdown_content_metadata`. 'word_bbox' — raw word-level bounding boxes (one JSON object per word, with page number and x/y/w/h coordinates) saved as JSONL, useful for highlighting or grounding extracted answers back to the source document; fetch via `expand=raw_words_content_metadata`. - `Optional extractPrintedPageNumber` Extract the printed page number as it appears in the document (e.g., 'Page 5 of 10', 'v', 'A-3'). Useful for referencing original page numbers - `Optional> granularBboxes` Bounding-box granularity levels to compute for the parse. 'word' computes one bounding box per detected word; 'line' computes one per text line; 'cell' computes one per table cell. Multiple levels can be requested. Empty list (default) disables granular bboxes — only item-level layout boxes are returned on the result. When set, the computed boxes are not inlined on the result items; they are written to a separate `grounded_items` sidecar (JSONL, one row per page) and exposed as `result_content_metadata.grounded_items` (a presigned download URL) on the parse result. Each row matches the `GroundedJsonItem` shape. - `CELL("cell")` - `LINE("line")` - `WORD("word")` - `Optional> imagesToSave` Image categories to extract and save. Options: 'screenshot' (full page renders useful for visual QA), 'embedded' (images found within the document), 'layout' (cropped regions from layout detection like figures and diagrams). Empty list saves no images - `SCREENSHOT("screenshot")` - `EMBEDDED("embedded")` - `LAYOUT("layout")` - `Optional markdown` Markdown formatting options including table styles and link annotations - `Optional annotateLinks` Add link annotations to markdown output in the format [text](url). When false, only the link text is included - `Optional inlineImages` Embed images directly in markdown as base64 data URIs instead of extracting them as separate files. Useful for self-contained markdown output - `Optional tables` Table formatting options including markdown vs HTML format and merging behavior - `Optional compactMarkdownTables` Remove extra whitespace padding in markdown table cells for more compact output - `Optional markdownTableMultilineSeparator` Separator string for multiline cell content in markdown tables. Example: '
' to preserve line breaks, ' ' to join with spaces - `Optional mergeContinuedTables` Automatically merge tables that span multiple pages into a single table. The merged table appears on the first page with merged_from_pages metadata - `Optional outputTablesAsMarkdown` Output tables as markdown pipe tables instead of HTML tags. Markdown tables are simpler but cannot represent complex structures like merged cells - `Optional spatialText` Spatial text output options for preserving document layout structure - `Optional doNotUnrollColumns` Keep multi-column layouts intact instead of linearizing columns into sequential text. Automatically enabled for non-fast tiers - `Optional preserveLayoutAlignmentAcrossPages` Maintain consistent text column alignment across page boundaries. Automatically enabled for document-level parsing modes - `Optional preserveVerySmallText` Include text below the normal size threshold. Useful for footnotes, watermarks, or fine print that might otherwise be filtered out - `Optional tablesAsSpreadsheet` Options for exporting tables as XLSX spreadsheets - `Optional enable` Whether this option is enabled - `Optional guessSheetName` Automatically generate descriptive sheet names from table context (headers, surrounding text) instead of using generic names like 'Table_1' - `Optional pageRanges` Page selection: limit total pages or specify exact pages to process - `Optional maxPages` Maximum number of pages to process. Pages are processed in order starting from page 1. If both max_pages and target_pages are set, target_pages takes precedence - `Optional targetPages` Comma-separated list of specific pages to process using 1-based indexing. Supports individual pages and ranges. Examples: '1,3,5' (pages 1, 3, 5), '1-5' (pages 1 through 5 inclusive), '1,3,5-8,10' (pages 1, 3, 5-8, and 10). Pages are sorted and deduplicated automatically. Duplicate pages cause an error - `Optional processingControl` Job execution controls including timeouts and failure thresholds - `Optional jobFailureConditions` Quality thresholds that determine when a job should fail vs complete with partial results - `Optional allowedPageFailureRatio` Maximum ratio of pages allowed to fail before the job fails (0-1). Example: 0.1 means job fails if more than 10% of pages fail. Default is 0.05 (5%) - `Optional failOnBuggyFont` Fail the job if a problematic font is detected that may cause incorrect text extraction. Buggy fonts can produce garbled or missing characters - `Optional failOnImageExtractionError` Fail the entire job if any embedded image cannot be extracted. By default, image extraction errors are logged but don't fail the job - `Optional failOnImageOcrError` Fail the entire job if OCR fails on any image. By default, OCR errors result in empty text for that image - `Optional failOnMarkdownReconstructionError` Fail the entire job if markdown cannot be reconstructed for any page. By default, failed pages use fallback text extraction - `Optional timeouts` Timeout settings for job execution. Increase for large or complex documents - `Optional baseInSeconds` Base timeout for the job in seconds (max 7200 = 2 hours). This is the minimum time allowed regardless of document size - `Optional extraTimePerPageInSeconds` Additional timeout per page in seconds (max 300 = 5 minutes). Total timeout = base + (this value × page count) - `Optional processingOptions` Document processing options including OCR, table extraction, and chart parsing - `Optional aggressiveTableExtraction` Use aggressive heuristics to detect table boundaries, even without visible borders. Useful for documents with borderless or complex tables - `Optional> autoModeConfiguration` Conditional processing rules that apply different parsing options based on page content, document structure, or filename patterns. Each entry defines trigger conditions and the parsing configuration to apply when triggered - `ParsingConf parsingConf` Parsing configuration to apply when trigger conditions are met - `Optional adaptiveLongTable` Whether to use adaptive long table handling - `Optional aggressiveTableExtraction` Whether to use aggressive table extraction - `Optional cropBox` Crop box options for auto mode parsing configuration. - `Optional bottom` Bottom boundary of crop box as ratio (0-1) - `Optional left` Left boundary of crop box as ratio (0-1) - `Optional right` Right boundary of crop box as ratio (0-1) - `Optional top` Top boundary of crop box as ratio (0-1) - `Optional customPrompt` Custom AI instructions for matched pages. Overrides the base custom_prompt - `Optional extractLayout` Whether to extract layout information - `Optional highResOcr` Whether to use high resolution OCR - `Optional ignore` Ignore options for auto mode parsing configuration. - `Optional ignoreDiagonalText` Whether to ignore diagonal text in the document - `Optional ignoreHiddenText` Whether to ignore hidden text in the document - `Optional language` Primary language of the document - `Optional outlinedTableExtraction` Whether to use outlined table extraction - `Optional presentation` Presentation-specific options for auto mode parsing configuration. - `Optional outOfBoundsContent` Extract out of bounds content in presentation slides - `Optional skipEmbeddedData` Skip extraction of embedded data for charts in presentation slides - `Optional spatialText` Spatial text options for auto mode parsing configuration. - `Optional doNotUnrollColumns` Keep column structure intact without unrolling - `Optional preserveLayoutAlignmentAcrossPages` Preserve text alignment across page boundaries - `Optional preserveVerySmallText` Include very small text in spatial output - `Optional specializedChartParsing` Enable specialized chart parsing with the specified mode - `AGENTIC_PLUS("agentic_plus")` - `AGENTIC("agentic")` - `EFFICIENT("efficient")` - `Optional tier` Override the parsing tier for matched pages. Must be paired with version - `FAST("fast")` - `COST_EFFECTIVE("cost_effective")` - `AGENTIC("agentic")` - `AGENTIC_PLUS("agentic_plus")` - `Optional version` Version for the override tier. Required when `tier` is set. Use `latest`, or pin one of that tier's dated versions. Current `latest` by tier: - `fast`: `2025-12-11` - `cost_effective`: `2026-06-05` - `agentic`: `2026-06-04` - `agentic_plus`: `2026-06-04` Full list: `GET /api/v2/parse/versions`. - `LATEST("latest")` - `_2026_06_05("2026-06-05")` - `_2026_06_04("2026-06-04")` - `_2025_12_11("2025-12-11")` - `Optional filenameMatchGlob` Single glob pattern to match against filename - `Optional> filenameMatchGlobList` List of glob patterns to match against filename - `Optional filenameRegexp` Regex pattern to match against filename - `Optional filenameRegexpMode` Regex mode flags (e.g., 'i' for case-insensitive) - `Optional fullPageImageInPage` Trigger if page contains a full-page image (scanned page detection) - `Optional fullPageImageInPageThreshold` Threshold for full page image detection (0.0-1.0, default 0.8) - `double` - `String` - `Optional imageInPage` Trigger if page contains non-screenshot images - `Optional layoutElementInPage` Trigger if page contains this layout element type - `Optional layoutElementInPageConfidenceThreshold` Confidence threshold for layout element detection - `double` - `String` - `Optional pageContainsAtLeastNCharts` Trigger if page has more than N charts - `long` - `String` - `Optional pageContainsAtLeastNImages` Trigger if page has more than N images - `long` - `String` - `Optional pageContainsAtLeastNLayoutElements` Trigger if page has more than N layout elements - `long` - `String` - `Optional pageContainsAtLeastNLines` Trigger if page has more than N lines - `long` - `String` - `Optional pageContainsAtLeastNLinks` Trigger if page has more than N links - `long` - `String` - `Optional pageContainsAtLeastNNumbers` Trigger if page has more than N numeric words - `long` - `String` - `Optional pageContainsAtLeastNPercentNumbers` Trigger if page has more than N% numeric words - `long` - `String` - `Optional pageContainsAtLeastNTables` Trigger if page has more than N tables - `long` - `String` - `Optional pageContainsAtLeastNWords` Trigger if page has more than N words - `long` - `String` - `Optional pageContainsAtMostNCharts` Trigger if page has fewer than N charts - `long` - `String` - `Optional pageContainsAtMostNImages` Trigger if page has fewer than N images - `long` - `String` - `Optional pageContainsAtMostNLayoutElements` Trigger if page has fewer than N layout elements - `long` - `String` - `Optional pageContainsAtMostNLines` Trigger if page has fewer than N lines - `long` - `String` - `Optional pageContainsAtMostNLinks` Trigger if page has fewer than N links - `long` - `String` - `Optional pageContainsAtMostNNumbers` Trigger if page has fewer than N numeric words - `long` - `String` - `Optional pageContainsAtMostNPercentNumbers` Trigger if page has fewer than N% numeric words - `long` - `String` - `Optional pageContainsAtMostNTables` Trigger if page has fewer than N tables - `long` - `String` - `Optional pageContainsAtMostNWords` Trigger if page has fewer than N words - `long` - `String` - `Optional pageLongerThanNChars` Trigger if page has more than N characters - `long` - `String` - `Optional pageMdError` Trigger on pages with markdown extraction errors - `Optional pageShorterThanNChars` Trigger if page has fewer than N characters - `long` - `String` - `Optional regexpInPage` Regex pattern to match in page content - `Optional regexpInPageMode` Regex mode flags for regexp_in_page - `Optional tableInPage` Trigger if page contains a table - `Optional textInPage` Trigger if page text/markdown contains this string - `Optional triggerMode` How to combine multiple trigger conditions: 'and' (all conditions must match, this is the default) or 'or' (any single condition can trigger) - `Optional costOptimizer` Cost optimizer configuration for reducing parsing costs on simpler pages. When enabled, the parser analyzes each page and routes simpler pages to faster, cheaper processing while preserving quality for complex pages. Only works with 'agentic' or 'agentic_plus' tiers. - `Optional enable` Enable cost-optimized parsing. Routes simpler pages to faster processing while complex pages use full AI analysis. May reduce speed on some documents. IMPORTANT: Only available with 'agentic' or 'agentic_plus' tiers - `Optional disableHeuristics` Disable automatic heuristics including outlined table extraction and adaptive long table handling. Use when heuristics produce incorrect results - `Optional ignore` Options for ignoring specific text types (diagonal, hidden, text in images) - `Optional ignoreDiagonalText` Skip text rotated at an angle (not horizontal/vertical). Useful for ignoring watermarks or decorative angled text - `Optional ignoreHiddenText` Skip text marked as hidden in the document structure. Some PDFs contain invisible text layers used for accessibility or search indexing - `Optional ignoreTextInImage` Skip OCR text extraction from embedded images. Use when images contain irrelevant text (watermarks, logos) that shouldn't be in the output - `Optional ocrParameters` OCR configuration including language detection settings - `Optional> languages` Languages to use for OCR text recognition. Specify multiple languages if document contains mixed-language content. Order matters - put primary language first. Example: ['en', 'es'] for English with Spanish - `AF("af")` - `AZ("az")` - `BS("bs")` - `CS("cs")` - `CY("cy")` - `DA("da")` - `DE("de")` - `EN("en")` - `ES("es")` - `ET("et")` - `FR("fr")` - `GA("ga")` - `HR("hr")` - `HU("hu")` - `ID("id")` - `IS("is")` - `IT("it")` - `KU("ku")` - `LA("la")` - `LT("lt")` - `LV("lv")` - `MI("mi")` - `MS("ms")` - `MT("mt")` - `NL("nl")` - `NO("no")` - `OC("oc")` - `PI("pi")` - `PL("pl")` - `PT("pt")` - `RO("ro")` - `RS_LATIN("rs_latin")` - `SK("sk")` - `SL("sl")` - `SQ("sq")` - `SV("sv")` - `SW("sw")` - `TL("tl")` - `TR("tr")` - `UZ("uz")` - `VI("vi")` - `AR("ar")` - `FA("fa")` - `UG("ug")` - `UR("ur")` - `BN("bn")` - `AS("as")` - `MNI("mni")` - `RU("ru")` - `RS_CYRILLIC("rs_cyrillic")` - `BE("be")` - `BG("bg")` - `UK("uk")` - `MN("mn")` - `ABQ("abq")` - `ADY("ady")` - `KBD("kbd")` - `AVA("ava")` - `DAR("dar")` - `INH("inh")` - `CHE("che")` - `LBE("lbe")` - `LEZ("lez")` - `TAB("tab")` - `TJK("tjk")` - `HI("hi")` - `MR("mr")` - `NE("ne")` - `BH("bh")` - `MAI("mai")` - `ANG("ang")` - `BHO("bho")` - `MAH("mah")` - `SCK("sck")` - `NEW("new")` - `GOM("gom")` - `SA("sa")` - `BGC("bgc")` - `TH("th")` - `CH_SIM("ch_sim")` - `CH_TRA("ch_tra")` - `JA("ja")` - `KO("ko")` - `TA("ta")` - `TE("te")` - `KN("kn")` - `Optional specializedChartParsing` Enable AI-powered chart analysis. Modes: 'efficient' (fast, lower cost), 'agentic' (balanced), 'agentic_plus' (highest accuracy). Automatically enables extract_layout and precise_bounding_box when set - `AGENTIC_PLUS("agentic_plus")` - `AGENTIC("agentic")` - `EFFICIENT("efficient")` - `Optional sourceUrl` Public URL of the document to parse. Mutually exclusive with file_id - `Optional> webhookConfigurations` Webhook endpoints for job status notifications. Multiple webhooks can be configured for different events or services - `Optional> webhookEvents` Events that trigger this webhook. Options: 'parse.success' (job completed), 'parse.error' (job failed), 'parse.partial_success' (some pages failed), 'parse.pending', 'parse.running', 'parse.cancelled'. If not specified, webhook fires for all events - `Optional webhookHeaders` Custom HTTP headers to include in webhook requests. Use for authentication tokens or custom routing. Example: {'Authorization': 'Bearer xyz'} - `Optional webhookOutputFormat` Format of the webhook payload body. 'string' (default) sends the payload as a JSON-encoded string; 'json' sends it as a JSON object. - `STRING("string")` - `JSON("json")` - `Optional webhookUrl` HTTPS URL to receive webhook POST requests. Must be publicly accessible ### Returns - `class ParsingCreateResponse:` A parse job. - `String id` Unique parse job identifier - `String projectId` Project this job belongs to - `Status status` Current job status: PENDING, RUNNING, COMPLETED, FAILED, or CANCELLED - `PENDING("PENDING")` - `RUNNING("RUNNING")` - `COMPLETED("COMPLETED")` - `FAILED("FAILED")` - `CANCELLED("CANCELLED")` - `Optional createdAt` Creation datetime - `Optional errorMessage` Error details when status is FAILED - `Optional name` Optional display name for this parse job - `Optional tier` Parsing tier used for this job - `Optional updatedAt` Update datetime ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.parsing.ParsingCreateParams; import com.llamacloud_prod.api.models.parsing.ParsingCreateResponse; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); ParsingCreateParams params = ParsingCreateParams.builder() .tier(ParsingCreateParams.Tier.FAST) .version(ParsingCreateParams.Version.LATEST) .build(); ParsingCreateResponse parsing = client.parsing().create(params); } } ``` #### Response ```json { "id": "pjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "status": "PENDING", "created_at": "2019-12-27T18:11:19.117Z", "error_message": "error_message", "name": "Q4 Financial Report", "tier": "fast", "updated_at": "2019-12-27T18:11:19.117Z" } ``` ## Get Parse Job `ParsingGetResponse parsing().get(ParsingGetParamsparams = ParsingGetParams.none(), RequestOptionsrequestOptions = RequestOptions.none())` **get** `/api/v2/parse/{job_id}` Retrieve a parse job with optional expanded content. By default returns job metadata only. Use `expand` to include parsed content: - `text` — plain text output - `markdown` — markdown output - `items` — structured page-by-page output - `job_metadata` — usage and processing details Content metadata fields (e.g. `text_content_metadata`) return presigned URLs for downloading large results. ### Parameters - `ParsingGetParams params` - `Optional jobId` - `Optional> expand` Fields to include: text, markdown, items, metadata, job_metadata, text_content_metadata, markdown_content_metadata, items_content_metadata, metadata_content_metadata, raw_words_content_metadata, xlsx_content_metadata, output_pdf_content_metadata, images_content_metadata. Metadata fields include presigned URLs. - `Optional imageFilenames` Filter to specific image filenames (optional). Example: image_0.png,image_1.jpg - `Optional organizationId` - `Optional projectId` ### Returns - `class ParsingGetResponse:` Parse result response with job status and optional content or metadata. The job field is always included. Other fields are included based on expand parameters. - `Job job` Parse job status and metadata - `String id` Unique parse job identifier - `String projectId` Project this job belongs to - `Status status` Current job status: PENDING, RUNNING, COMPLETED, FAILED, or CANCELLED - `PENDING("PENDING")` - `RUNNING("RUNNING")` - `COMPLETED("COMPLETED")` - `FAILED("FAILED")` - `CANCELLED("CANCELLED")` - `Optional createdAt` Creation datetime - `Optional errorMessage` Error details when status is FAILED - `Optional name` Optional display name for this parse job - `Optional tier` Parsing tier used for this job - `Optional updatedAt` Update datetime - `Optional imagesContentMetadata` Metadata for all extracted images. - `List images` List of image metadata with presigned URLs - `String filename` Image filename (e.g., 'image_0.png') - `long index` Index of the image in the extraction order - `Optional bbox` Bounding box for an image on its page. - `long h` Height of the bounding box - `long w` Width of the bounding box - `long x` X coordinate of the bounding box - `long y` Y coordinate of the bounding box - `Optional category` Image category: 'screenshot' (full page), 'embedded' (images in document), or 'layout' (cropped from layout detection) - `SCREENSHOT("screenshot")` - `EMBEDDED("embedded")` - `LAYOUT("layout")` - `Optional contentType` MIME type of the image - `Optional presignedUrl` Presigned URL to download the image - `Optional sizeBytes` Deprecated: always returns None. Will be removed in a future release. - `long totalCount` Total number of extracted images - `Optional items` Structured JSON result (if requested) - `List pages` List of structured pages or failed page entries - `class StructuredResultPage:` - `List items` List of structured items on the page - `class TextItem:` - `String md` Markdown representation preserving formatting - `String value` Text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Text item type - `TEXT("text")` - `class HeadingItem:` - `long level` Heading level (1-6) - `String md` Markdown representation preserving formatting - `String value` Heading text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Heading item type - `HEADING("heading")` - `class ListItem:` - `List items` List of nested text or list items - `class TextItem:` - `class ListItem:` - `String md` Markdown representation preserving formatting - `boolean ordered` Whether the list is ordered or unordered - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` List item type - `LIST("list")` - `class CodeItem:` - `String md` Markdown representation preserving formatting - `String value` Code content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional language` Programming language identifier - `Optional type` Code block item type - `CODE("code")` - `class TableItem:` - `String csv` CSV representation of the table - `String html` HTML representation of the table - `String md` Markdown representation preserving formatting - `List> rows` Table data as array of arrays (string, number, or null) - `String` - `double` - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional> mergedFromPages` List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4]) - `Optional mergedIntoPage` Populated when merged into another table. Page number where the full merged table begins (used on empty tables). - `Optional> parseConcerns` Quality concerns detected during table extraction, indicating the table may have issues - `String details` Human-readable details about the concern - `String type` Type of parse concern (e.g. header_value_type_mismatch, inconsistent_row_cell_count) - `Optional type` Table item type - `TABLE("table")` - `class ImageItem:` - `String caption` Image caption - `String md` Markdown representation preserving formatting - `String url` URL to the image - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Image item type - `IMAGE("image")` - `class LinkItem:` - `String md` Markdown representation preserving formatting - `String text` Display text of the link - `String url` URL of the link - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Link item type - `LINK("link")` - `class HeaderItem:` - `List items` List of items within the header - `class TextItem:` - `class HeadingItem:` - `class ListItem:` - `class CodeItem:` - `class TableItem:` - `class ImageItem:` - `class LinkItem:` - `String md` Markdown representation preserving formatting - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Page header container - `HEADER("header")` - `class FooterItem:` - `List items` List of items within the footer - `class TextItem:` - `class HeadingItem:` - `class ListItem:` - `class CodeItem:` - `class TableItem:` - `class ImageItem:` - `class LinkItem:` - `String md` Markdown representation preserving formatting - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Page footer container - `FOOTER("footer")` - `double pageHeight` Height of the page in points - `long pageNumber` Page number of the document - `double pageWidth` Width of the page in points - `JsonValue; success trueconstant` Success indicator - `TRUE(true)` - `class FailedStructuredPage:` - `String error` Error message describing the failure - `long pageNumber` Page number of the document - `JsonValue; success falseconstant` Failure indicator - `FALSE(false)` - `Optional jobMetadata` Job execution metadata (if requested) - `Optional markdown` Markdown result (if requested) - `List pages` List of markdown pages or failed page entries - `class MarkdownResultPage:` - `String markdown` Markdown content of the page - `long pageNumber` Page number of the document - `JsonValue; success trueconstant` Success indicator - `TRUE(true)` - `Optional footer` Footer of the page in markdown - `Optional header` Header of the page in markdown - `class FailedMarkdownPage:` - `String error` Error message describing the failure - `long pageNumber` Page number of the document - `JsonValue; success falseconstant` Failure indicator - `FALSE(false)` - `Optional markdownFull` Full raw markdown content (if requested) - `Optional metadata` Result containing metadata (page level and general) for the parsed document. - `List pages` List of page metadata entries - `long pageNumber` Page number of the document - `Optional confidence` Confidence score for the page parsing (0-1) - `Optional costOptimized` Whether cost-optimized parsing was used for the page - `Optional originalOrientationAngle` Original orientation angle of the page in degrees - `Optional printedPageNumber` Printed page number as it appears in the document - `Optional slideSectionName` Section name from presentation slides - `Optional speakerNotes` Speaker notes from presentation slides - `Optional triggeredAutoMode` Whether auto mode was triggered for the page - `Optional rawParameters` - `Optional resultContentMetadata` Metadata including size, existence, and presigned URLs for result files - `long sizeBytes` Size of the result file in bytes - `Optional exists` Whether the result file exists in S3 - `Optional presignedUrl` Presigned URL to download the result file - `Optional text` Plain text result (if requested) - `List pages` List of text pages - `long pageNumber` Page number of the document - `String text` Plain text content of the page - `Optional textFull` Full raw text content (if requested) ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.parsing.ParsingGetParams; import com.llamacloud_prod.api.models.parsing.ParsingGetResponse; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); ParsingGetResponse parsing = client.parsing().get("job_id"); } } ``` #### Response ```json { "job": { "id": "pjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "status": "PENDING", "created_at": "2019-12-27T18:11:19.117Z", "error_message": "error_message", "name": "Q4 Financial Report", "tier": "fast", "updated_at": "2019-12-27T18:11:19.117Z" }, "images_content_metadata": { "images": [ { "filename": "filename", "index": 0, "bbox": { "h": 0, "w": 0, "x": 0, "y": 0 }, "category": "screenshot", "content_type": "content_type", "presigned_url": "presigned_url", "size_bytes": 0 } ], "total_count": 0 }, "items": { "pages": [ { "items": [ { "md": "md", "value": "value", "bbox": [ { "h": 0, "w": 0, "x": 0, "y": 0, "confidence": 0, "end_index": 0, "label": "label", "r": 0, "start_index": 0 } ], "type": "text" } ], "page_height": 0, "page_number": 0, "page_width": 0, "success": true } ] }, "job_metadata": { "foo": "bar" }, "markdown": { "pages": [ { "markdown": "markdown", "page_number": 0, "success": true, "footer": "footer", "header": "header" } ] }, "markdown_full": "markdown_full", "metadata": { "pages": [ { "page_number": 0, "confidence": 0, "cost_optimized": true, "original_orientation_angle": 0, "printed_page_number": "printed_page_number", "slide_section_name": "slide_section_name", "speaker_notes": "speaker_notes", "triggered_auto_mode": true } ] }, "raw_parameters": { "foo": "bar" }, "result_content_metadata": { "foo": { "size_bytes": 0, "exists": true, "presigned_url": "presigned_url" } }, "text": { "pages": [ { "page_number": 0, "text": "text" } ] }, "text_full": "text_full" } ``` ## List Parse Jobs `ParsingListPage parsing().list(ParsingListParamsparams = ParsingListParams.none(), RequestOptionsrequestOptions = RequestOptions.none())` **get** `/api/v2/parse` List parse jobs for the current project. Filter by `status` or creation date range. Results are paginated — use `page_token` from the response to fetch subsequent pages. ### Parameters - `ParsingListParams params` - `Optional createdAtOnOrAfter` Include items created at or after this timestamp (inclusive) - `Optional createdAtOnOrBefore` Include items created at or before this timestamp (inclusive) - `Optional> jobIds` Filter by specific job IDs - `Optional organizationId` - `Optional pageSize` Number of items per page - `Optional pageToken` Token for pagination - `Optional projectId` - `Optional status` Filter by job status (PENDING, RUNNING, COMPLETED, FAILED, CANCELLED) - `PENDING("PENDING")` - `RUNNING("RUNNING")` - `COMPLETED("COMPLETED")` - `FAILED("FAILED")` - `CANCELLED("CANCELLED")` ### Returns - `class ParsingListResponse:` A parse job. - `String id` Unique parse job identifier - `String projectId` Project this job belongs to - `Status status` Current job status: PENDING, RUNNING, COMPLETED, FAILED, or CANCELLED - `PENDING("PENDING")` - `RUNNING("RUNNING")` - `COMPLETED("COMPLETED")` - `FAILED("FAILED")` - `CANCELLED("CANCELLED")` - `Optional createdAt` Creation datetime - `Optional errorMessage` Error details when status is FAILED - `Optional name` Optional display name for this parse job - `Optional tier` Parsing tier used for this job - `Optional updatedAt` Update datetime ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.parsing.ParsingListPage; import com.llamacloud_prod.api.models.parsing.ParsingListParams; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); ParsingListPage page = client.parsing().list(); } } ``` #### Response ```json { "items": [ { "id": "pjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "status": "PENDING", "created_at": "2019-12-27T18:11:19.117Z", "error_message": "error_message", "name": "Q4 Financial Report", "tier": "fast", "updated_at": "2019-12-27T18:11:19.117Z" } ], "next_page_token": "next_page_token", "total_size": 0 } ``` ## Domain Types ### B Box - `class BBox:` Bounding box with coordinates and optional metadata. - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text ### Code Item - `class CodeItem:` - `String md` Markdown representation preserving formatting - `String value` Code content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional language` Programming language identifier - `Optional type` Code block item type - `CODE("code")` ### Fail Page Mode - `enum FailPageMode:` Enum for representing the different available page error handling modes. - `RAW_TEXT("raw_text")` - `BLANK_PAGE("blank_page")` - `ERROR_MESSAGE("error_message")` ### Footer Item - `class FooterItem:` - `List items` List of items within the footer - `class TextItem:` - `String md` Markdown representation preserving formatting - `String value` Text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Text item type - `TEXT("text")` - `class HeadingItem:` - `long level` Heading level (1-6) - `String md` Markdown representation preserving formatting - `String value` Heading text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Heading item type - `HEADING("heading")` - `class ListItem:` - `List items` List of nested text or list items - `class TextItem:` - `class ListItem:` - `String md` Markdown representation preserving formatting - `boolean ordered` Whether the list is ordered or unordered - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` List item type - `LIST("list")` - `class CodeItem:` - `String md` Markdown representation preserving formatting - `String value` Code content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional language` Programming language identifier - `Optional type` Code block item type - `CODE("code")` - `class TableItem:` - `String csv` CSV representation of the table - `String html` HTML representation of the table - `String md` Markdown representation preserving formatting - `List> rows` Table data as array of arrays (string, number, or null) - `String` - `double` - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional> mergedFromPages` List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4]) - `Optional mergedIntoPage` Populated when merged into another table. Page number where the full merged table begins (used on empty tables). - `Optional> parseConcerns` Quality concerns detected during table extraction, indicating the table may have issues - `String details` Human-readable details about the concern - `String type` Type of parse concern (e.g. header_value_type_mismatch, inconsistent_row_cell_count) - `Optional type` Table item type - `TABLE("table")` - `class ImageItem:` - `String caption` Image caption - `String md` Markdown representation preserving formatting - `String url` URL to the image - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Image item type - `IMAGE("image")` - `class LinkItem:` - `String md` Markdown representation preserving formatting - `String text` Display text of the link - `String url` URL of the link - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Link item type - `LINK("link")` - `String md` Markdown representation preserving formatting - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Page footer container - `FOOTER("footer")` ### Header Item - `class HeaderItem:` - `List items` List of items within the header - `class TextItem:` - `String md` Markdown representation preserving formatting - `String value` Text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Text item type - `TEXT("text")` - `class HeadingItem:` - `long level` Heading level (1-6) - `String md` Markdown representation preserving formatting - `String value` Heading text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Heading item type - `HEADING("heading")` - `class ListItem:` - `List items` List of nested text or list items - `class TextItem:` - `class ListItem:` - `String md` Markdown representation preserving formatting - `boolean ordered` Whether the list is ordered or unordered - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` List item type - `LIST("list")` - `class CodeItem:` - `String md` Markdown representation preserving formatting - `String value` Code content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional language` Programming language identifier - `Optional type` Code block item type - `CODE("code")` - `class TableItem:` - `String csv` CSV representation of the table - `String html` HTML representation of the table - `String md` Markdown representation preserving formatting - `List> rows` Table data as array of arrays (string, number, or null) - `String` - `double` - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional> mergedFromPages` List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4]) - `Optional mergedIntoPage` Populated when merged into another table. Page number where the full merged table begins (used on empty tables). - `Optional> parseConcerns` Quality concerns detected during table extraction, indicating the table may have issues - `String details` Human-readable details about the concern - `String type` Type of parse concern (e.g. header_value_type_mismatch, inconsistent_row_cell_count) - `Optional type` Table item type - `TABLE("table")` - `class ImageItem:` - `String caption` Image caption - `String md` Markdown representation preserving formatting - `String url` URL to the image - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Image item type - `IMAGE("image")` - `class LinkItem:` - `String md` Markdown representation preserving formatting - `String text` Display text of the link - `String url` URL of the link - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Link item type - `LINK("link")` - `String md` Markdown representation preserving formatting - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Page header container - `HEADER("header")` ### Heading Item - `class HeadingItem:` - `long level` Heading level (1-6) - `String md` Markdown representation preserving formatting - `String value` Heading text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Heading item type - `HEADING("heading")` ### Image Item - `class ImageItem:` - `String caption` Image caption - `String md` Markdown representation preserving formatting - `String url` URL to the image - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Image item type - `IMAGE("image")` ### Link Item - `class LinkItem:` - `String md` Markdown representation preserving formatting - `String text` Display text of the link - `String url` URL of the link - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Link item type - `LINK("link")` ### List Item - `class ListItem:` - `List items` List of nested text or list items - `class TextItem:` - `String md` Markdown representation preserving formatting - `String value` Text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Text item type - `TEXT("text")` - `class ListItem:` - `String md` Markdown representation preserving formatting - `boolean ordered` Whether the list is ordered or unordered - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` List item type - `LIST("list")` ### Llama Parse Supported File Extensions - `enum LlamaParseSupportedFileExtensions:` Enum for supported file extensions. - `PDF(".pdf")` - `ABW(".abw")` - `AWT(".awt")` - `CGM(".cgm")` - `CWK(".cwk")` - `DOC(".doc")` - `DOCM(".docm")` - `DOCX(".docx")` - `DOT(".dot")` - `DOTM(".dotm")` - `DOTX(".dotx")` - `FODG(".fodg")` - `FODP(".fodp")` - `FOPD(".fopd")` - `FODT(".fodt")` - `FB2(".fb2")` - `HWP(".hwp")` - `LWP(".lwp")` - `MCW(".mcw")` - `MW(".mw")` - `MWD(".mwd")` - `ODF(".odf")` - `ODT(".odt")` - `OTG(".otg")` - `OTT(".ott")` - `PAGES(".pages")` - `PBD(".pbd")` - `PSW(".psw")` - `RTF(".rtf")` - `SDA(".sda")` - `SDD(".sdd")` - `SDP(".sdp")` - `SDW(".sdw")` - `SGL(".sgl")` - `STD(".std")` - `STW(".stw")` - `SXD(".sxd")` - `SXG(".sxg")` - `SXM(".sxm")` - `SXW(".sxw")` - `UOF(".uof")` - `UOP(".uop")` - `UOT(".uot")` - `VOR(".vor")` - `WPD(".wpd")` - `WPS(".wps")` - `WPT(".wpt")` - `WRI(".wri")` - `WN(".wn")` - `XML(".xml")` - `ZABW(".zabw")` - `KEY(".key")` - `ODP(".odp")` - `ODG(".odg")` - `OTP(".otp")` - `POT(".pot")` - `POTM(".potm")` - `POTX(".potx")` - `PPT(".ppt")` - `PPTM(".pptm")` - `PPTX(".pptx")` - `STI(".sti")` - `SXI(".sxi")` - `VSD(".vsd")` - `VSDM(".vsdm")` - `VSDX(".vsdx")` - `VDX(".vdx")` - `BMP(".bmp")` - `GIF(".gif")` - `HEIC(".heic")` - `HEIF(".heif")` - `JPG(".jpg")` - `JPEG(".jpeg")` - `PNG(".png")` - `SVG(".svg")` - `TIF(".tif")` - `TIFF(".tiff")` - `WEBP(".webp")` - `HTM(".htm")` - `HTML(".html")` - `XHTM(".xhtm")` - `CSV(".csv")` - `DBF(".dbf")` - `DIF(".dif")` - `ET(".et")` - `ETH(".eth")` - `FODS(".fods")` - `NUMBERS(".numbers")` - `ODS(".ods")` - `OTS(".ots")` - `PRN(".prn")` - `QPW(".qpw")` - `SLK(".slk")` - `STC(".stc")` - `SXC(".sxc")` - `SYLK(".sylk")` - `TSV(".tsv")` - `UOS1(".uos1")` - `UOS2(".uos2")` - `UOS(".uos")` - `WB1(".wb1")` - `WB2(".wb2")` - `WB3(".wb3")` - `WK1(".wk1")` - `WK2(".wk2")` - `WK3(".wk3")` - `WK4(".wk4")` - `WKS(".wks")` - `WQ1(".wq1")` - `WQ2(".wq2")` - `XLR(".xlr")` - `XLS(".xls")` - `XLSB(".xlsb")` - `XLSM(".xlsm")` - `XLSX(".xlsx")` - `XLW(".xlw")` - `AZW(".azw")` - `AZW3(".azw3")` - `AZW4(".azw4")` - `CB7(".cb7")` - `CBC(".cbc")` - `CBR(".cbr")` - `CBZ(".cbz")` - `CHM(".chm")` - `DJVU(".djvu")` - `EPUB(".epub")` - `FBZ(".fbz")` - `HTMLZ(".htmlz")` - `LIT(".lit")` - `LRF(".lrf")` - `MD(".md")` - `MOBI(".mobi")` - `PDB(".pdb")` - `PML(".pml")` - `PRC(".prc")` - `RB(".rb")` - `SNB(".snb")` - `TCR(".tcr")` - `TXTZ(".txtz")` - `M4A(".m4a")` - `MP3(".mp3")` - `MP4(".mp4")` - `MPEG(".mpeg")` - `MPGA(".mpga")` - `WAV(".wav")` - `WEBM(".webm")` - `YXMD(".yxmd")` ### Parsing Job - `class ParsingJob:` A parse job (v1). - `String id` Unique parse job identifier - `StatusEnum status` Current job status - `PENDING("PENDING")` - `SUCCESS("SUCCESS")` - `ERROR("ERROR")` - `PARTIAL_SUCCESS("PARTIAL_SUCCESS")` - `CANCELLED("CANCELLED")` - `Optional errorCode` Machine-readable error code when failed - `Optional errorMessage` Human-readable error details when failed ### Parsing Languages - `enum ParsingLanguages:` Enum for representing the languages supported by the parser. - `AF("af")` - `AZ("az")` - `BS("bs")` - `CS("cs")` - `CY("cy")` - `DA("da")` - `DE("de")` - `EN("en")` - `ES("es")` - `ET("et")` - `FR("fr")` - `GA("ga")` - `HR("hr")` - `HU("hu")` - `ID("id")` - `IS("is")` - `IT("it")` - `KU("ku")` - `LA("la")` - `LT("lt")` - `LV("lv")` - `MI("mi")` - `MS("ms")` - `MT("mt")` - `NL("nl")` - `NO("no")` - `OC("oc")` - `PI("pi")` - `PL("pl")` - `PT("pt")` - `RO("ro")` - `RS_LATIN("rs_latin")` - `SK("sk")` - `SL("sl")` - `SQ("sq")` - `SV("sv")` - `SW("sw")` - `TL("tl")` - `TR("tr")` - `UZ("uz")` - `VI("vi")` - `AR("ar")` - `FA("fa")` - `UG("ug")` - `UR("ur")` - `BN("bn")` - `AS("as")` - `MNI("mni")` - `RU("ru")` - `RS_CYRILLIC("rs_cyrillic")` - `BE("be")` - `BG("bg")` - `UK("uk")` - `MN("mn")` - `ABQ("abq")` - `ADY("ady")` - `KBD("kbd")` - `AVA("ava")` - `DAR("dar")` - `INH("inh")` - `CHE("che")` - `LBE("lbe")` - `LEZ("lez")` - `TAB("tab")` - `TJK("tjk")` - `HI("hi")` - `MR("mr")` - `NE("ne")` - `BH("bh")` - `MAI("mai")` - `ANG("ang")` - `BHO("bho")` - `MAH("mah")` - `SCK("sck")` - `NEW("new")` - `GOM("gom")` - `SA("sa")` - `BGC("bgc")` - `TH("th")` - `CH_SIM("ch_sim")` - `CH_TRA("ch_tra")` - `JA("ja")` - `KO("ko")` - `TA("ta")` - `TE("te")` - `KN("kn")` ### Parsing Mode - `enum ParsingMode:` Enum for representing the mode of parsing to be used. - `PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")` - `PARSE_PAGE_WITH_LLM("parse_page_with_llm")` - `PARSE_PAGE_WITH_LVM("parse_page_with_lvm")` - `PARSE_PAGE_WITH_AGENT("parse_page_with_agent")` - `PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")` - `PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")` - `PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")` - `PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")` ### Status Enum - `enum StatusEnum:` Enum for representing the status of a job - `PENDING("PENDING")` - `SUCCESS("SUCCESS")` - `ERROR("ERROR")` - `PARTIAL_SUCCESS("PARTIAL_SUCCESS")` - `CANCELLED("CANCELLED")` ### Table Item - `class TableItem:` - `String csv` CSV representation of the table - `String html` HTML representation of the table - `String md` Markdown representation preserving formatting - `List> rows` Table data as array of arrays (string, number, or null) - `String` - `double` - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional> mergedFromPages` List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4]) - `Optional mergedIntoPage` Populated when merged into another table. Page number where the full merged table begins (used on empty tables). - `Optional> parseConcerns` Quality concerns detected during table extraction, indicating the table may have issues - `String details` Human-readable details about the concern - `String type` Type of parse concern (e.g. header_value_type_mismatch, inconsistent_row_cell_count) - `Optional type` Table item type - `TABLE("table")` ### Text Item - `class TextItem:` - `String md` Markdown representation preserving formatting - `String value` Text content - `Optional> bbox` List of bounding boxes - `double h` Height of the bounding box - `double w` Width of the bounding box - `double x` X coordinate of the bounding box - `double y` Y coordinate of the bounding box - `Optional confidence` Confidence score - `Optional endIndex` End index in the text - `Optional label` Label for the bounding box - `Optional r` Optional visual text rotation angle in degrees. Omitted when unrotated. - `Optional startIndex` Start index in the text - `Optional type` Text item type - `TEXT("text")`