Parse a file by file ID or URL.
Provide either file_id (a previously uploaded file) or
source_url (a publicly accessible URL). Configure parsing
with options like tier, target_pages, and lang.
Tiers
fast— rule-based, cheapest, no AIcost_effective— balanced speed and qualityagentic— full AI-powered parsingagentic_plus— premium AI with specialized features
The job runs asynchronously. Poll GET /parse/{job_id} with
expand=text or expand=markdown to retrieve results.
ParametersExpand Collapse
params ParsingNewParams
Tier param.Field[ParsingNewParamsTier]
Version param.Field[ParsingNewParamsVersion]
Body param: Tier version. Use 'latest' for the current stable version, or specify a specific version (e.g., '1.0', '2.0') for reproducible results
Body param: Options for AI-powered parsing tiers (cost_effective, agentic, agentic_plus).
These options customize how the AI processes and interprets document content. Only applicable when using non-fast tiers.
ClientName param.Field[string]optional
Body param: Identifier for the client/application making the request. Used for analytics and debugging. Example: 'my-app-v2'
Body param: Crop boundaries to process only a portion of each page. Values are ratios 0-1 from page edges
Bottom float64optional
Bottom boundary as ratio (0-1). 0=top edge, 1=bottom edge. Content below this line is excluded
Left float64optional
Left boundary as ratio (0-1). 0=left edge, 1=right edge. Content left of this line is excluded
DisableCache param.Field[bool]optional
Body param: Bypass result caching and force re-parsing. Use when document content may have changed or you need fresh results
Body param: Options for fast tier parsing (rule-based, no AI).
Fast tier uses deterministic algorithms for text extraction without AI enhancement. It's the fastest and most cost-effective option, best suited for simple documents with standard layouts. Currently has no configurable options but reserved for future expansion.
FileID param.Field[string]optional
Body param: ID of an existing file in the project to parse. Mutually exclusive with source_url
HTTPProxy param.Field[string]optional
Body param: HTTP/HTTPS proxy for fetching source_url. Ignored if using file_id
Body param: Format-specific options (HTML, PDF, spreadsheet, presentation). Applied based on detected input file type
HTML ParsingNewParamsInputOptionsHTMLoptional
HTML/web page parsing options (applies to .html, .htm files)
MakeAllElementsVisible booloptional
Force all HTML elements to be visible by overriding CSS display/visibility properties. Useful for parsing pages with hidden content or collapsed sections
Presentation ParsingNewParamsInputOptionsPresentationoptional
Spreadsheet ParsingNewParamsInputOptionsSpreadsheetoptional
Spreadsheet parsing options (applies to .xlsx, .xls, .csv, .ods files)
DetectSubTablesInSheets booloptional
Detect and extract multiple tables within a single sheet. Useful when spreadsheets contain several data regions separated by blank rows/columns
Body param: Output formatting options for markdown, text, and extracted images
ExtractPrintedPageNumber booloptional
Extract the printed page number as it appears in the document (e.g., 'Page 5 of 10', 'v', 'A-3'). Useful for referencing original page numbers
ImagesToSave []stringoptional
Image categories to extract and save. Options: 'screenshot' (full page renders useful for visual QA), 'embedded' (images found within the document), 'layout' (cropped regions from layout detection like figures and diagrams). Empty list saves no images
Markdown ParsingNewParamsOutputOptionsMarkdownoptional
Markdown formatting options including table styles and link annotations
AnnotateLinks booloptional
Add link annotations to markdown output in the format text. When false, only the link text is included
InlineImages booloptional
Embed images directly in markdown as base64 data URIs instead of extracting them as separate files. Useful for self-contained markdown output
Tables ParsingNewParamsOutputOptionsMarkdownTablesoptional
Table formatting options including markdown vs HTML format and merging behavior
CompactMarkdownTables booloptional
Remove extra whitespace padding in markdown table cells for more compact output
MarkdownTableMultilineSeparator stringoptional
Separator string for multiline cell content in markdown tables. Example: '<br>' to preserve line breaks, ' ' to join with spaces
SpatialText ParsingNewParamsOutputOptionsSpatialTextoptional
Spatial text output options for preserving document layout structure
DoNotUnrollColumns booloptional
Keep multi-column layouts intact instead of linearizing columns into sequential text. Automatically enabled for non-fast tiers
Body param: Page selection: limit total pages or specify exact pages to process
MaxPages int64optional
Maximum number of pages to process. Pages are processed in order starting from page 1. If both max_pages and target_pages are set, target_pages takes precedence
TargetPages stringoptional
Comma-separated list of specific pages to process using 1-based indexing. Supports individual pages and ranges. Examples: '1,3,5' (pages 1, 3, 5), '1-5' (pages 1 through 5 inclusive), '1,3,5-8,10' (pages 1, 3, 5-8, and 10). Pages are sorted and deduplicated automatically. Duplicate pages cause an error
Body param: Job execution controls including timeouts and failure thresholds
JobFailureConditions ParsingNewParamsProcessingControlJobFailureConditionsoptional
Quality thresholds that determine when a job should fail vs complete with partial results
AllowedPageFailureRatio float64optional
Maximum ratio of pages allowed to fail before the job fails (0-1). Example: 0.1 means job fails if more than 10% of pages fail. Default is 0.05 (5%)
FailOnBuggyFont booloptional
Fail the job if a problematic font is detected that may cause incorrect text extraction. Buggy fonts can produce garbled or missing characters
FailOnImageExtractionError booloptional
Fail the entire job if any embedded image cannot be extracted. By default, image extraction errors are logged but don't fail the job
Timeouts ParsingNewParamsProcessingControlTimeoutsoptional
Body param: Document processing options including OCR, table extraction, and chart parsing
AggressiveTableExtraction booloptional
Use aggressive heuristics to detect table boundaries, even without visible borders. Useful for documents with borderless or complex tables
AutoModeConfiguration []ParsingNewParamsProcessingOptionsAutoModeConfigurationoptional
Conditional processing rules that apply different parsing options based on page content, document structure, or filename patterns. Each entry defines trigger conditions and the parsing configuration to apply when triggered
ParsingConf ParsingNewParamsProcessingOptionsAutoModeConfigurationParsingConf
Parsing configuration to apply when trigger conditions are met
CropBox ParsingNewParamsProcessingOptionsAutoModeConfigurationParsingConfCropBoxoptional
Crop box options for auto mode parsing configuration.
CustomPrompt stringoptional
Custom AI instructions for matched pages. Overrides the base custom_prompt
SpatialText ParsingNewParamsProcessingOptionsAutoModeConfigurationParsingConfSpatialTextoptional
Spatial text options for auto mode parsing configuration.
SpecializedChartParsing stringoptional
Enable specialized chart parsing with the specified mode
Tier stringoptional
Override the parsing tier for matched pages. Must be paired with version
Version stringoptional
Tier version when overriding tier. Required when tier is specified
string
FullPageImageInPageThreshold ParsingNewParamsProcessingOptionsAutoModeConfigurationFullPageImageInPageThresholdUnionoptional
LayoutElementInPageConfidenceThreshold ParsingNewParamsProcessingOptionsAutoModeConfigurationLayoutElementInPageConfidenceThresholdUnionoptional
PageContainsAtLeastNCharts ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNChartsUnionoptional
PageContainsAtLeastNImages ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNImagesUnionoptional
PageContainsAtLeastNLayoutElements ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNLayoutElementsUnionoptional
PageContainsAtLeastNLines ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNLinesUnionoptional
PageContainsAtLeastNLinks ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNLinksUnionoptional
PageContainsAtLeastNNumbers ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNNumbersUnionoptional
PageContainsAtLeastNPercentNumbers ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNPercentNumbersUnionoptional
PageContainsAtLeastNTables ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNTablesUnionoptional
PageContainsAtLeastNWords ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtLeastNWordsUnionoptional
PageContainsAtMostNCharts ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNChartsUnionoptional
PageContainsAtMostNImages ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNImagesUnionoptional
PageContainsAtMostNLayoutElements ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNLayoutElementsUnionoptional
PageContainsAtMostNLines ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNLinesUnionoptional
PageContainsAtMostNLinks ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNLinksUnionoptional
PageContainsAtMostNNumbers ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNNumbersUnionoptional
PageContainsAtMostNPercentNumbers ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNPercentNumbersUnionoptional
PageContainsAtMostNTables ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNTablesUnionoptional
PageContainsAtMostNWords ParsingNewParamsProcessingOptionsAutoModeConfigurationPageContainsAtMostNWordsUnionoptional
PageLongerThanNChars ParsingNewParamsProcessingOptionsAutoModeConfigurationPageLongerThanNCharsUnionoptional
CostOptimizer ParsingNewParamsProcessingOptionsCostOptimizeroptional
Cost optimizer configuration for reducing parsing costs on simpler pages.
When enabled, the parser analyzes each page and routes simpler pages to faster, cheaper processing while preserving quality for complex pages. Only works with 'agentic' or 'agentic_plus' tiers.
DisableHeuristics booloptional
Disable automatic heuristics including outlined table extraction and adaptive long table handling. Use when heuristics produce incorrect results
Ignore ParsingNewParamsProcessingOptionsIgnoreoptional
Options for ignoring specific text types (diagonal, hidden, text in images)
IgnoreDiagonalText booloptional
Skip text rotated at an angle (not horizontal/vertical). Useful for ignoring watermarks or decorative angled text
OcrParameters ParsingNewParamsProcessingOptionsOcrParametersoptional
SpecializedChartParsing stringoptional
Enable AI-powered chart analysis. Modes: 'efficient' (fast, lower cost), 'agentic' (balanced), 'agentic_plus' (highest accuracy). Automatically enables extract_layout and precise_bounding_box when set
SourceURL param.Field[string]optional
Body param: Public URL of the document to parse. Mutually exclusive with file_id
WebhookConfigurations param.Field[[]ParsingNewParamsWebhookConfiguration]optional
Body param: Webhook endpoints for job status notifications. Multiple webhooks can be configured for different events or services
WebhookEvents []stringoptional
Events that trigger this webhook. Options: 'parse.success' (job completed), 'parse.failure' (job failed), 'parse.partial' (some pages failed). If not specified, webhook fires for all events
Parse File
package main
import (
"context"
"fmt"
"github.com/stainless-sdks/llamacloud-prod-go"
"github.com/stainless-sdks/llamacloud-prod-go/option"
)
func main() {
client := llamacloudprod.NewClient(
option.WithAPIKey("My API Key"),
)
parsing, err := client.Parsing.New(context.TODO(), llamacloudprod.ParsingNewParams{
Tier: llamacloudprod.ParsingNewParamsTierFast,
Version: llamacloudprod.ParsingNewParamsVersion2025_12_11,
})
if err != nil {
panic(err.Error())
}
fmt.Printf("%+v\n", parsing.ID)
}
{
"id": "pjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"status": "PENDING",
"created_at": "2019-12-27T18:11:19.117Z",
"error_message": "error_message",
"name": "Q4 Financial Report",
"tier": "fast",
"updated_at": "2019-12-27T18:11:19.117Z"
}Returns Examples
{
"id": "pjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"status": "PENDING",
"created_at": "2019-12-27T18:11:19.117Z",
"error_message": "error_message",
"name": "Q4 Financial Report",
"tier": "fast",
"updated_at": "2019-12-27T18:11:19.117Z"
}