Beta

BetaAgent Data

ModelsExpand Collapse

AgentData = object { data, deployment_name, id, 4 more }

API Result for a single agent data item

data: map[unknown]

deployment_name: string

id: optional string

collection: optional string

created_at: optional string

project_id: optional string

updated_at: optional string

BetaParse Configurations

Get Parse Configuration

GET/api/v1/beta/parse-configurations/{config_id}

Update Parse Configuration

PUT/api/v1/beta/parse-configurations/{config_id}

Delete Parse Configuration

DELETE/api/v1/beta/parse-configurations/{config_id}

ModelsExpand Collapse

ParseConfiguration = object { id, created_at, name, 6 more }

Parse configuration schema.

id: string

Unique identifier for the parse configuration

created_at: string

Creation timestamp

formatdate-time

Name of the parse configuration

parameters: LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 116 more }

LlamaParseParameters configuration

adaptive_long_table: optional boolean

aggressive_table_extraction: optional boolean

annotate_links: optional boolean

auto_mode: optional boolean

auto_mode_configuration_json: optional string

auto_mode_trigger_on_image_in_page: optional boolean

auto_mode_trigger_on_regexp_in_page: optional string

auto_mode_trigger_on_table_in_page: optional boolean

auto_mode_trigger_on_text_in_page: optional string

azure_openai_api_version: optional string

azure_openai_deployment_name: optional string

azure_openai_endpoint: optional string

azure_openai_key: optional string

bbox_bottom: optional number

bbox_left: optional number

bbox_right: optional number

bbox_top: optional number

bounding_box: optional string

compact_markdown_table: optional boolean

complemental_formatting_instruction: optional string

content_guideline_instruction: optional string

continuous_mode: optional boolean

disable_image_extraction: optional boolean

disable_ocr: optional boolean

disable_reconstruction: optional boolean

do_not_cache: optional boolean

do_not_unroll_columns: optional boolean

enable_cost_optimizer: optional boolean

extract_charts: optional boolean

extract_layout: optional boolean

extract_printed_page_number: optional boolean

fast_mode: optional boolean

formatting_instruction: optional string

gpt4o_api_key: optional string

gpt4o_mode: optional boolean

guess_xlsx_sheet_name: optional boolean

hide_footers: optional boolean

hide_headers: optional boolean

high_res_ocr: optional boolean

html_make_all_elements_visible: optional boolean

html_remove_fixed_elements: optional boolean

html_remove_navigation_elements: optional boolean

http_proxy: optional string

ignore_document_elements_for_layout_detection: optional boolean

images_to_save: optional array of "screenshot" or "embedded" or "layout"

Accepts one of the following:

"screenshot"

"embedded"

"layout"

inline_images_in_markdown: optional boolean

input_s3_path: optional string

input_s3_region: optional string

input_url: optional string

internal_is_screenshot_job: optional boolean

invalidate_cache: optional boolean

is_formatting_instruction: optional boolean

job_timeout_extra_time_per_page_in_seconds: optional number

job_timeout_in_seconds: optional number

keep_page_separator_when_merging_tables: optional boolean

languages: optional array of ParsingLanguages

Accepts one of the following:

"af"

"az"

"bs"

"cs"

"cy"

"da"

"de"

"en"

"es"

"et"

"fr"

"ga"

"hr"

"hu"

"id"

"is"

"it"

"ku"

"la"

"lt"

"lv"

"mi"

"ms"

"mt"

"nl"

"no"

"oc"

"pi"

"pl"

"pt"

"ro"

"rs_latin"

"sk"

"sl"

"sq"

"sv"

"sw"

"tl"

"tr"

"uz"

"vi"

"ar"

"fa"

"ug"

"ur"

"bn"

"as"

"mni"

"ru"

"rs_cyrillic"

"be"

"bg"

"uk"

"mn"

"abq"

"ady"

"kbd"

"ava"

"dar"

"inh"

"che"

"lbe"

"lez"

"tab"

"tjk"

"hi"

"mr"

"ne"

"bh"

"mai"

"ang"

"bho"

"mah"

"sck"

"new"

"gom"

"sa"

"bgc"

"th"

"ch_sim"

"ch_tra"

"ja"

"ko"

"ta"

"te"

"kn"

layout_aware: optional boolean

line_level_bounding_box: optional boolean

markdown_table_multiline_header_separator: optional string

max_pages: optional number

max_pages_enforced: optional number

merge_tables_across_pages_in_markdown: optional boolean

model: optional string

outlined_table_extraction: optional boolean

output_pdf_of_document: optional boolean

output_s3_path_prefix: optional string

output_s3_region: optional string

output_tables_as_HTML: optional boolean

page_error_tolerance: optional number

page_footer_prefix: optional string

page_footer_suffix: optional string

page_header_prefix: optional string

page_header_suffix: optional string

page_prefix: optional string

page_separator: optional string

page_suffix: optional string

parse_mode: optional ParsingMode

Enum for representing the mode of parsing to be used.

Accepts one of the following:

"parse_page_without_llm"

"parse_page_with_llm"

"parse_page_with_lvm"

"parse_page_with_agent"

"parse_page_with_layout_agent"

"parse_document_with_llm"

"parse_document_with_lvm"

"parse_document_with_agent"

parsing_instruction: optional string

precise_bounding_box: optional boolean

premium_mode: optional boolean

presentation_out_of_bounds_content: optional boolean

presentation_skip_embedded_data: optional boolean

preserve_layout_alignment_across_pages: optional boolean

preserve_very_small_text: optional boolean

preset: optional string

priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

project_id: optional string

remove_hidden_text: optional boolean

replace_failed_page_mode: optional FailPageMode

Enum for representing the different available page error handling modes.

Accepts one of the following:

"raw_text"

"blank_page"

"error_message"

replace_failed_page_with_error_message_prefix: optional string

replace_failed_page_with_error_message_suffix: optional string

save_images: optional boolean

skip_diagonal_text: optional boolean

specialized_chart_parsing_agentic: optional boolean

specialized_chart_parsing_efficient: optional boolean

specialized_chart_parsing_plus: optional boolean

specialized_image_parsing: optional boolean

spreadsheet_extract_sub_tables: optional boolean

spreadsheet_force_formula_computation: optional boolean

spreadsheet_include_hidden_sheets: optional boolean

strict_mode_buggy_font: optional boolean

strict_mode_image_extraction: optional boolean

strict_mode_image_ocr: optional boolean

strict_mode_reconstruction: optional boolean

structured_output: optional boolean

structured_output_json_schema: optional string

structured_output_json_schema_name: optional string

system_prompt: optional string

system_prompt_append: optional string

take_screenshot: optional boolean

target_pages: optional string

tier: optional string

use_vendor_multimodal_model: optional boolean

user_prompt: optional string

vendor_multimodal_api_key: optional string

vendor_multimodal_model_name: optional string

version: optional string

webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }

The outbound webhook configurations

webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 14 more

List of event names to subscribe to

Accepts one of the following:

"extract.pending"

"extract.success"

"extract.error"

"extract.partial_success"

"extract.cancelled"

"parse.pending"

"parse.running"

"parse.success"

"parse.error"

"parse.partial_success"

"parse.cancelled"

"classify.pending"

"classify.success"

"classify.error"

"classify.partial_success"

"classify.cancelled"

"unmapped_event"

webhook_headers: optional map[string]

Custom HTTP headers to include with webhook requests.

webhook_output_format: optional string

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url: optional string

The URL to send webhook notifications to.

webhook_url: optional string

source_id: string

ID of the source

source_type: string

Type of the source (e.g., 'project')

updated_at: string

Last update timestamp

formatdate-time

version: string

Version of the configuration

creator: optional string

Creator of the configuration

ParseConfigurationCreate = object { name, parameters, version, 3 more }

Schema for creating a new parse configuration (API boundary).

Name of the parse configuration

parameters: LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 116 more }

LlamaParseParameters configuration

adaptive_long_table: optional boolean

aggressive_table_extraction: optional boolean

annotate_links: optional boolean

auto_mode: optional boolean

auto_mode_configuration_json: optional string

auto_mode_trigger_on_image_in_page: optional boolean

auto_mode_trigger_on_regexp_in_page: optional string

auto_mode_trigger_on_table_in_page: optional boolean

auto_mode_trigger_on_text_in_page: optional string

azure_openai_api_version: optional string

azure_openai_deployment_name: optional string

azure_openai_endpoint: optional string

azure_openai_key: optional string

bbox_bottom: optional number

bbox_left: optional number

bbox_right: optional number

bbox_top: optional number

bounding_box: optional string

compact_markdown_table: optional boolean

complemental_formatting_instruction: optional string

content_guideline_instruction: optional string

continuous_mode: optional boolean

disable_image_extraction: optional boolean

disable_ocr: optional boolean

disable_reconstruction: optional boolean

do_not_cache: optional boolean

do_not_unroll_columns: optional boolean

enable_cost_optimizer: optional boolean

extract_charts: optional boolean

extract_layout: optional boolean

extract_printed_page_number: optional boolean

fast_mode: optional boolean

formatting_instruction: optional string

gpt4o_api_key: optional string

gpt4o_mode: optional boolean

guess_xlsx_sheet_name: optional boolean

hide_footers: optional boolean

hide_headers: optional boolean

high_res_ocr: optional boolean

html_make_all_elements_visible: optional boolean

html_remove_fixed_elements: optional boolean

html_remove_navigation_elements: optional boolean

http_proxy: optional string

ignore_document_elements_for_layout_detection: optional boolean

images_to_save: optional array of "screenshot" or "embedded" or "layout"

Accepts one of the following:

"screenshot"

"embedded"

"layout"

inline_images_in_markdown: optional boolean

input_s3_path: optional string

input_s3_region: optional string

input_url: optional string

internal_is_screenshot_job: optional boolean

invalidate_cache: optional boolean

is_formatting_instruction: optional boolean

job_timeout_extra_time_per_page_in_seconds: optional number

job_timeout_in_seconds: optional number

keep_page_separator_when_merging_tables: optional boolean

languages: optional array of ParsingLanguages

Accepts one of the following:

"af"

"az"

"bs"

"cs"

"cy"

"da"

"de"

"en"

"es"

"et"

"fr"

"ga"

"hr"

"hu"

"id"

"is"

"it"

"ku"

"la"

"lt"

"lv"

"mi"

"ms"

"mt"

"nl"

"no"

"oc"

"pi"

"pl"

"pt"

"ro"

"rs_latin"

"sk"

"sl"

"sq"

"sv"

"sw"

"tl"

"tr"

"uz"

"vi"

"ar"

"fa"

"ug"

"ur"

"bn"

"as"

"mni"

"ru"

"rs_cyrillic"

"be"

"bg"

"uk"

"mn"

"abq"

"ady"

"kbd"

"ava"

"dar"

"inh"

"che"

"lbe"

"lez"

"tab"

"tjk"

"hi"

"mr"

"ne"

"bh"

"mai"

"ang"

"bho"

"mah"

"sck"

"new"

"gom"

"sa"

"bgc"

"th"

"ch_sim"

"ch_tra"

"ja"

"ko"

"ta"

"te"

"kn"

layout_aware: optional boolean

line_level_bounding_box: optional boolean

markdown_table_multiline_header_separator: optional string

max_pages: optional number

max_pages_enforced: optional number

merge_tables_across_pages_in_markdown: optional boolean

model: optional string

outlined_table_extraction: optional boolean

output_pdf_of_document: optional boolean

output_s3_path_prefix: optional string

output_s3_region: optional string

output_tables_as_HTML: optional boolean

page_error_tolerance: optional number

page_footer_prefix: optional string

page_footer_suffix: optional string

page_header_prefix: optional string

page_header_suffix: optional string

page_prefix: optional string

page_separator: optional string

page_suffix: optional string

parse_mode: optional ParsingMode

Enum for representing the mode of parsing to be used.

Accepts one of the following:

"parse_page_without_llm"

"parse_page_with_llm"

"parse_page_with_lvm"

"parse_page_with_agent"

"parse_page_with_layout_agent"

"parse_document_with_llm"

"parse_document_with_lvm"

"parse_document_with_agent"

parsing_instruction: optional string

precise_bounding_box: optional boolean

premium_mode: optional boolean

presentation_out_of_bounds_content: optional boolean

presentation_skip_embedded_data: optional boolean

preserve_layout_alignment_across_pages: optional boolean

preserve_very_small_text: optional boolean

preset: optional string

priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

project_id: optional string

remove_hidden_text: optional boolean

replace_failed_page_mode: optional FailPageMode

Enum for representing the different available page error handling modes.

Accepts one of the following:

"raw_text"

"blank_page"

"error_message"

replace_failed_page_with_error_message_prefix: optional string

replace_failed_page_with_error_message_suffix: optional string

save_images: optional boolean

skip_diagonal_text: optional boolean

specialized_chart_parsing_agentic: optional boolean

specialized_chart_parsing_efficient: optional boolean

specialized_chart_parsing_plus: optional boolean

specialized_image_parsing: optional boolean

spreadsheet_extract_sub_tables: optional boolean

spreadsheet_force_formula_computation: optional boolean

spreadsheet_include_hidden_sheets: optional boolean

strict_mode_buggy_font: optional boolean

strict_mode_image_extraction: optional boolean

strict_mode_image_ocr: optional boolean

strict_mode_reconstruction: optional boolean

structured_output: optional boolean

structured_output_json_schema: optional string

structured_output_json_schema_name: optional string

system_prompt: optional string

system_prompt_append: optional string

take_screenshot: optional boolean

target_pages: optional string

tier: optional string

use_vendor_multimodal_model: optional boolean

user_prompt: optional string

vendor_multimodal_api_key: optional string

vendor_multimodal_model_name: optional string

version: optional string

webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }

The outbound webhook configurations

webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 14 more

List of event names to subscribe to

Accepts one of the following:

"extract.pending"

"extract.success"

"extract.error"

"extract.partial_success"

"extract.cancelled"

"parse.pending"

"parse.running"

"parse.success"

"parse.error"

"parse.partial_success"

"parse.cancelled"

"classify.pending"

"classify.success"

"classify.error"

"classify.partial_success"

"classify.cancelled"

"unmapped_event"

webhook_headers: optional map[string]

Custom HTTP headers to include with webhook requests.

webhook_output_format: optional string

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url: optional string

The URL to send webhook notifications to.

webhook_url: optional string

version: string

Version of the configuration

creator: optional string

Creator of the configuration

source_id: optional string

ID of the source

source_type: optional string

Type of the source (e.g., 'project')

ParseConfigurationQueryResponse = object { items, next_page_token, total_size }

Response schema for paginated parse configuration queries.

items: array of ParseConfiguration { id, created_at, name, 6 more }

The list of items.

id: string

Unique identifier for the parse configuration

created_at: string

Creation timestamp

formatdate-time

Name of the parse configuration

parameters: LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 116 more }

LlamaParseParameters configuration

adaptive_long_table: optional boolean

aggressive_table_extraction: optional boolean

annotate_links: optional boolean

auto_mode: optional boolean

auto_mode_configuration_json: optional string

auto_mode_trigger_on_image_in_page: optional boolean

auto_mode_trigger_on_regexp_in_page: optional string

auto_mode_trigger_on_table_in_page: optional boolean

auto_mode_trigger_on_text_in_page: optional string

azure_openai_api_version: optional string

azure_openai_deployment_name: optional string

azure_openai_endpoint: optional string

azure_openai_key: optional string

bbox_bottom: optional number

bbox_left: optional number

bbox_right: optional number

bbox_top: optional number

bounding_box: optional string

compact_markdown_table: optional boolean

complemental_formatting_instruction: optional string

content_guideline_instruction: optional string

continuous_mode: optional boolean

disable_image_extraction: optional boolean

disable_ocr: optional boolean

disable_reconstruction: optional boolean

do_not_cache: optional boolean

do_not_unroll_columns: optional boolean

enable_cost_optimizer: optional boolean

extract_charts: optional boolean

extract_layout: optional boolean

extract_printed_page_number: optional boolean

fast_mode: optional boolean

formatting_instruction: optional string

gpt4o_api_key: optional string

gpt4o_mode: optional boolean

guess_xlsx_sheet_name: optional boolean

hide_footers: optional boolean

hide_headers: optional boolean

high_res_ocr: optional boolean

html_make_all_elements_visible: optional boolean

html_remove_fixed_elements: optional boolean

html_remove_navigation_elements: optional boolean

http_proxy: optional string

ignore_document_elements_for_layout_detection: optional boolean

images_to_save: optional array of "screenshot" or "embedded" or "layout"

Accepts one of the following:

"screenshot"

"embedded"

"layout"

inline_images_in_markdown: optional boolean

input_s3_path: optional string

input_s3_region: optional string

input_url: optional string

internal_is_screenshot_job: optional boolean

invalidate_cache: optional boolean

is_formatting_instruction: optional boolean

job_timeout_extra_time_per_page_in_seconds: optional number

job_timeout_in_seconds: optional number

keep_page_separator_when_merging_tables: optional boolean

languages: optional array of ParsingLanguages

Accepts one of the following:

"af"

"az"

"bs"

"cs"

"cy"

"da"

"de"

"en"

"es"

"et"

"fr"

"ga"

"hr"

"hu"

"id"

"is"

"it"

"ku"

"la"

"lt"

"lv"

"mi"

"ms"

"mt"

"nl"

"no"

"oc"

"pi"

"pl"

"pt"

"ro"

"rs_latin"

"sk"

"sl"

"sq"

"sv"

"sw"

"tl"

"tr"

"uz"

"vi"

"ar"

"fa"

"ug"

"ur"

"bn"

"as"

"mni"

"ru"

"rs_cyrillic"

"be"

"bg"

"uk"

"mn"

"abq"

"ady"

"kbd"

"ava"

"dar"

"inh"

"che"

"lbe"

"lez"

"tab"

"tjk"

"hi"

"mr"

"ne"

"bh"

"mai"

"ang"

"bho"

"mah"

"sck"

"new"

"gom"

"sa"

"bgc"

"th"

"ch_sim"

"ch_tra"

"ja"

"ko"

"ta"

"te"

"kn"

layout_aware: optional boolean

line_level_bounding_box: optional boolean

markdown_table_multiline_header_separator: optional string

max_pages: optional number

max_pages_enforced: optional number

merge_tables_across_pages_in_markdown: optional boolean

model: optional string

outlined_table_extraction: optional boolean

output_pdf_of_document: optional boolean

output_s3_path_prefix: optional string

output_s3_region: optional string

output_tables_as_HTML: optional boolean

page_error_tolerance: optional number

page_footer_prefix: optional string

page_footer_suffix: optional string

page_header_prefix: optional string

page_header_suffix: optional string

page_prefix: optional string

page_separator: optional string

page_suffix: optional string

parse_mode: optional ParsingMode

Enum for representing the mode of parsing to be used.

Accepts one of the following:

"parse_page_without_llm"

"parse_page_with_llm"

"parse_page_with_lvm"

"parse_page_with_agent"

"parse_page_with_layout_agent"

"parse_document_with_llm"

"parse_document_with_lvm"

"parse_document_with_agent"

parsing_instruction: optional string

precise_bounding_box: optional boolean

premium_mode: optional boolean

presentation_out_of_bounds_content: optional boolean

presentation_skip_embedded_data: optional boolean

preserve_layout_alignment_across_pages: optional boolean

preserve_very_small_text: optional boolean

preset: optional string

priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

project_id: optional string

remove_hidden_text: optional boolean

replace_failed_page_mode: optional FailPageMode

Enum for representing the different available page error handling modes.

Accepts one of the following:

"raw_text"

"blank_page"

"error_message"

replace_failed_page_with_error_message_prefix: optional string

replace_failed_page_with_error_message_suffix: optional string

save_images: optional boolean

skip_diagonal_text: optional boolean

specialized_chart_parsing_agentic: optional boolean

specialized_chart_parsing_efficient: optional boolean

specialized_chart_parsing_plus: optional boolean

specialized_image_parsing: optional boolean

spreadsheet_extract_sub_tables: optional boolean

spreadsheet_force_formula_computation: optional boolean

spreadsheet_include_hidden_sheets: optional boolean

strict_mode_buggy_font: optional boolean

strict_mode_image_extraction: optional boolean

strict_mode_image_ocr: optional boolean

strict_mode_reconstruction: optional boolean

structured_output: optional boolean

structured_output_json_schema: optional string

structured_output_json_schema_name: optional string

system_prompt: optional string

system_prompt_append: optional string

take_screenshot: optional boolean

target_pages: optional string

tier: optional string

use_vendor_multimodal_model: optional boolean

user_prompt: optional string

vendor_multimodal_api_key: optional string

vendor_multimodal_model_name: optional string

version: optional string

webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }

The outbound webhook configurations

webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 14 more

List of event names to subscribe to

Accepts one of the following:

"extract.pending"

"extract.success"

"extract.error"

"extract.partial_success"

"extract.cancelled"

"parse.pending"

"parse.running"

"parse.success"

"parse.error"

"parse.partial_success"

"parse.cancelled"

"classify.pending"

"classify.success"

"classify.error"

"classify.partial_success"

"classify.cancelled"

"unmapped_event"

webhook_headers: optional map[string]

Custom HTTP headers to include with webhook requests.

webhook_output_format: optional string

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url: optional string

The URL to send webhook notifications to.

webhook_url: optional string

source_id: string

ID of the source

source_type: string

Type of the source (e.g., 'project')

updated_at: string

Last update timestamp

formatdate-time

version: string

Version of the configuration

creator: optional string

Creator of the configuration

next_page_token: optional string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

total_size: optional number

The total number of items available. This is only populated when specifically requested. The value may be an estimate and can be used for display purposes only.

BetaSheets

Get Spreadsheet Job

GET/api/v1/beta/sheets/jobs/{spreadsheet_job_id}

Get Result Region

GET/api/v1/beta/sheets/jobs/{spreadsheet_job_id}/regions/{region_id}/result/{region_type}

Delete Spreadsheet Job

DELETE/api/v1/beta/sheets/jobs/{spreadsheet_job_id}

ModelsExpand Collapse

SheetsJob = object { id, config, created_at, 10 more }

A spreadsheet parsing job

id: string

The ID of the job

config: SheetsParsingConfig { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 5 more }

Configuration for the parsing job

extraction_range: optional string

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: optional boolean

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: optional boolean

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: optional boolean

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: optional array of string

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: optional string

Optional specialization mode for domain-specific extraction. Supported values: 'financial-standard', 'financial-enhanced', 'financial-precise'. Default None uses the general-purpose pipeline.

table_merge_sensitivity: optional "strong" or "weak"

Influences how likely similar-looking regions are merged into a single table. Useful for spreadsheets that either have sparse tables (strong merging) or many distinct tables close together (weak merging).

Accepts one of the following:

"strong"

"weak"

use_experimental_processing: optional boolean

Enables experimental processing. Accuracy may be impacted.

created_at: string

When the job was created

file_id: string

The ID of the input file

formatuuid

project_id: string

The ID of the project

formatuuid

status: StatusEnum

The status of the parsing job

Accepts one of the following:

"PENDING"

"SUCCESS"

"ERROR"

"PARTIAL_SUCCESS"

"CANCELLED"

updated_at: string

When the job was last updated

user_id: string

The ID of the user

errors: optional array of string

Any errors encountered

Deprecatedfile: optional File { id, name, project_id, 11 more }

Schema for a file.

id: string

Unique identifier

formatuuid

project_id: string

The ID of the project that the file belongs to

formatuuid

created_at: optional string

Creation datetime

formatdate-time

data_source_id: optional string

The ID of the data source that the file belongs to

formatuuid

expires_at: optional string

The expiration date for the file. Files past this date can be deleted.

formatdate-time

external_file_id: optional string

The ID of the file in the external system

file_size: optional number

Size of the file in bytes

minimum0

file_type: optional string

File type (e.g. pdf, docx, etc.)

maxLength3000

minLength1

last_modified_at: optional string

The last modified time of the file

formatdate-time

permission_info: optional map[map[unknown] or array of unknown or string or 2 more]

Permission information for the file

Accepts one of the following:

UnionMember0 = map[unknown]

UnionMember1 = array of unknown

UnionMember2 = string

UnionMember3 = number

UnionMember4 = boolean

purpose: optional string

The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')

resource_info: optional map[map[unknown] or array of unknown or string or 2 more]

Resource information for the file

Accepts one of the following:

UnionMember0 = map[unknown]

UnionMember1 = array of unknown

UnionMember2 = string

UnionMember3 = number

UnionMember4 = boolean

updated_at: optional string

Update datetime

formatdate-time

regions: optional array of object { location, region_type, sheet_name, 3 more }

All extracted regions (populated when job is complete)

location: string

Location of the region in the spreadsheet

region_type: string

Type of the extracted region

sheet_name: string

Worksheet name where region was found

description: optional string

Generated description for the region

region_id: optional string

Unique identifier for this region within the file

title: optional string

Generated title for the region

success: optional boolean

Whether the job completed successfully

worksheet_metadata: optional array of object { sheet_name, description, title }

Metadata for each processed worksheet (populated when job is complete)

sheet_name: string

Name of the worksheet

description: optional string

Generated description of the worksheet

title: optional string

Generated title for the worksheet

SheetsParsingConfig = object { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 5 more }

Configuration for spreadsheet parsing and region extraction

extraction_range: optional string

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: optional boolean

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: optional boolean

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: optional boolean

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: optional array of string

The names of the sheets to extract regions from. If empty, all sheets will be processed.

specialization: optional string

Optional specialization mode for domain-specific extraction. Supported values: 'financial-standard', 'financial-enhanced', 'financial-precise'. Default None uses the general-purpose pipeline.

table_merge_sensitivity: optional "strong" or "weak"

Accepts one of the following:

"strong"

"weak"

use_experimental_processing: optional boolean

Enables experimental processing. Accuracy may be impacted.

BetaBatchJob Items

List Batch Job Items

GET/api/v1/beta/batch-processing/{job_id}/items

Get Item Processing Results

GET/api/v1/beta/batch-processing/items/{item_id}/processing-results

ModelsExpand Collapse

SplitCategory = object { name, description }

Category definition for document splitting.

Name of the category.

maxLength200

minLength1

description: optional string

Optional description of what content belongs in this category.

maxLength2000

minLength1

SplitDocumentInput = object { type, value }

Document input specification.

type: string

Type of document input. Valid values are: file_id

value: string

Document identifier.

SplitResultResponse = object { segments }

Result of a completed split job.

segments: array of SplitSegmentResponse { category, confidence_category, pages }

List of document segments.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.

SplitSegmentResponse = object { category, confidence_category, pages }

A segment of the split document.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.

Beta

BetaAgent Data

ModelsExpand Collapse

BetaParse Configurations

ModelsExpand Collapse

BetaSheets

ModelsExpand Collapse

BetaDirectories

BetaDirectoriesFiles

BetaBatch

BetaBatchJob Items

BetaSplit

ModelsExpand Collapse