Skip to content
Get started

Beta

BetaAgent Data

Get Agent Data
GET/api/v1/beta/agent-data/{item_id}
Update Agent Data
PUT/api/v1/beta/agent-data/{item_id}
Delete Agent Data
DELETE/api/v1/beta/agent-data/{item_id}
Create Agent Data
POST/api/v1/beta/agent-data
Search Agent Data
POST/api/v1/beta/agent-data/:search
Aggregate Agent Data
POST/api/v1/beta/agent-data/:aggregate
Delete Agent Data By Query
POST/api/v1/beta/agent-data/:delete
ModelsExpand Collapse
AgentData = object { data, deployment_name, id, 4 more }

API Result for a single agent data item

data: map[unknown]
deployment_name: string
id: optional string
collection: optional string
created_at: optional string
project_id: optional string
updated_at: optional string

BetaParse Configurations

Create Parse Configuration
POST/api/v1/beta/parse-configurations
List Parse Configurations
GET/api/v1/beta/parse-configurations
Get Parse Configuration
GET/api/v1/beta/parse-configurations/{config_id}
Update Parse Configuration
PUT/api/v1/beta/parse-configurations/{config_id}
Delete Parse Configuration
DELETE/api/v1/beta/parse-configurations/{config_id}
ModelsExpand Collapse
ParseConfiguration = object { id, created_at, name, 6 more }

Parse configuration schema.

id: string

Unique identifier for the parse configuration

created_at: string

Creation timestamp

formatdate-time
name: string

Name of the parse configuration

parameters: LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more }

LlamaParseParameters configuration

adaptive_long_table: optional boolean
aggressive_table_extraction: optional boolean
auto_mode: optional boolean
auto_mode_configuration_json: optional string
auto_mode_trigger_on_image_in_page: optional boolean
auto_mode_trigger_on_regexp_in_page: optional string
auto_mode_trigger_on_table_in_page: optional boolean
auto_mode_trigger_on_text_in_page: optional string
azure_openai_api_version: optional string
azure_openai_deployment_name: optional string
azure_openai_endpoint: optional string
azure_openai_key: optional string
bbox_bottom: optional number
bbox_left: optional number
bbox_right: optional number
bbox_top: optional number
bounding_box: optional string
compact_markdown_table: optional boolean
complemental_formatting_instruction: optional string
content_guideline_instruction: optional string
continuous_mode: optional boolean
disable_image_extraction: optional boolean
disable_ocr: optional boolean
disable_reconstruction: optional boolean
do_not_cache: optional boolean
do_not_unroll_columns: optional boolean
enable_cost_optimizer: optional boolean
extract_charts: optional boolean
extract_layout: optional boolean
extract_printed_page_number: optional boolean
fast_mode: optional boolean
formatting_instruction: optional string
gpt4o_api_key: optional string
gpt4o_mode: optional boolean
guess_xlsx_sheet_name: optional boolean
hide_footers: optional boolean
hide_headers: optional boolean
high_res_ocr: optional boolean
html_make_all_elements_visible: optional boolean
html_remove_fixed_elements: optional boolean
html_remove_navigation_elements: optional boolean
http_proxy: optional string
ignore_document_elements_for_layout_detection: optional boolean
images_to_save: optional array of "screenshot" or "embedded" or "layout"
Accepts one of the following:
"screenshot"
"embedded"
"layout"
inline_images_in_markdown: optional boolean
input_s3_path: optional string
input_s3_region: optional string
input_url: optional string
internal_is_screenshot_job: optional boolean
invalidate_cache: optional boolean
is_formatting_instruction: optional boolean
job_timeout_extra_time_per_page_in_seconds: optional number
job_timeout_in_seconds: optional number
keep_page_separator_when_merging_tables: optional boolean
languages: optional array of ParsingLanguages
Accepts one of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
layout_aware: optional boolean
line_level_bounding_box: optional boolean
markdown_table_multiline_header_separator: optional string
max_pages: optional number
max_pages_enforced: optional number
merge_tables_across_pages_in_markdown: optional boolean
model: optional string
outlined_table_extraction: optional boolean
output_pdf_of_document: optional boolean
output_s3_path_prefix: optional string
output_s3_region: optional string
output_tables_as_HTML: optional boolean
page_error_tolerance: optional number
page_header_prefix: optional string
page_header_suffix: optional string
page_prefix: optional string
page_separator: optional string
page_suffix: optional string
parse_mode: optional ParsingMode

Enum for representing the mode of parsing to be used.

Accepts one of the following:
"parse_page_without_llm"
"parse_page_with_llm"
"parse_page_with_lvm"
"parse_page_with_agent"
"parse_page_with_layout_agent"
"parse_document_with_llm"
"parse_document_with_lvm"
"parse_document_with_agent"
parsing_instruction: optional string
precise_bounding_box: optional boolean
premium_mode: optional boolean
presentation_out_of_bounds_content: optional boolean
presentation_skip_embedded_data: optional boolean
preserve_layout_alignment_across_pages: optional boolean
preserve_very_small_text: optional boolean
preset: optional string
priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
project_id: optional string
remove_hidden_text: optional boolean
replace_failed_page_mode: optional FailPageMode

Enum for representing the different available page error handling modes.

Accepts one of the following:
"raw_text"
"blank_page"
"error_message"
replace_failed_page_with_error_message_prefix: optional string
replace_failed_page_with_error_message_suffix: optional string
save_images: optional boolean
skip_diagonal_text: optional boolean
specialized_chart_parsing_agentic: optional boolean
specialized_chart_parsing_efficient: optional boolean
specialized_chart_parsing_plus: optional boolean
specialized_image_parsing: optional boolean
spreadsheet_extract_sub_tables: optional boolean
spreadsheet_force_formula_computation: optional boolean
strict_mode_buggy_font: optional boolean
strict_mode_image_extraction: optional boolean
strict_mode_image_ocr: optional boolean
strict_mode_reconstruction: optional boolean
structured_output: optional boolean
structured_output_json_schema: optional string
structured_output_json_schema_name: optional string
system_prompt: optional string
system_prompt_append: optional string
take_screenshot: optional boolean
target_pages: optional string
tier: optional string
use_vendor_multimodal_model: optional boolean
user_prompt: optional string
vendor_multimodal_api_key: optional string
vendor_multimodal_model_name: optional string
version: optional string
webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }

The outbound webhook configurations

webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 13 more

List of event names to subscribe to

Accepts one of the following:
"extract.pending"
"extract.success"
"extract.error"
"extract.partial_success"
"extract.cancelled"
"parse.pending"
"parse.success"
"parse.error"
"parse.partial_success"
"parse.cancelled"
"classify.pending"
"classify.success"
"classify.error"
"classify.partial_success"
"classify.cancelled"
"unmapped_event"
webhook_headers: optional map[string]

Custom HTTP headers to include with webhook requests.

webhook_output_format: optional string

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url: optional string

The URL to send webhook notifications to.

webhook_url: optional string
source_id: string

ID of the source

source_type: string

Type of the source (e.g., 'project')

updated_at: string

Last update timestamp

formatdate-time
version: string

Version of the configuration

creator: optional string

Creator of the configuration

ParseConfigurationCreate = object { name, parameters, version, 3 more }

Schema for creating a new parse configuration (API boundary).

name: string

Name of the parse configuration

parameters: LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more }

LlamaParseParameters configuration

adaptive_long_table: optional boolean
aggressive_table_extraction: optional boolean
auto_mode: optional boolean
auto_mode_configuration_json: optional string
auto_mode_trigger_on_image_in_page: optional boolean
auto_mode_trigger_on_regexp_in_page: optional string
auto_mode_trigger_on_table_in_page: optional boolean
auto_mode_trigger_on_text_in_page: optional string
azure_openai_api_version: optional string
azure_openai_deployment_name: optional string
azure_openai_endpoint: optional string
azure_openai_key: optional string
bbox_bottom: optional number
bbox_left: optional number
bbox_right: optional number
bbox_top: optional number
bounding_box: optional string
compact_markdown_table: optional boolean
complemental_formatting_instruction: optional string
content_guideline_instruction: optional string
continuous_mode: optional boolean
disable_image_extraction: optional boolean
disable_ocr: optional boolean
disable_reconstruction: optional boolean
do_not_cache: optional boolean
do_not_unroll_columns: optional boolean
enable_cost_optimizer: optional boolean
extract_charts: optional boolean
extract_layout: optional boolean
extract_printed_page_number: optional boolean
fast_mode: optional boolean
formatting_instruction: optional string
gpt4o_api_key: optional string
gpt4o_mode: optional boolean
guess_xlsx_sheet_name: optional boolean
hide_footers: optional boolean
hide_headers: optional boolean
high_res_ocr: optional boolean
html_make_all_elements_visible: optional boolean
html_remove_fixed_elements: optional boolean
html_remove_navigation_elements: optional boolean
http_proxy: optional string
ignore_document_elements_for_layout_detection: optional boolean
images_to_save: optional array of "screenshot" or "embedded" or "layout"
Accepts one of the following:
"screenshot"
"embedded"
"layout"
inline_images_in_markdown: optional boolean
input_s3_path: optional string
input_s3_region: optional string
input_url: optional string
internal_is_screenshot_job: optional boolean
invalidate_cache: optional boolean
is_formatting_instruction: optional boolean
job_timeout_extra_time_per_page_in_seconds: optional number
job_timeout_in_seconds: optional number
keep_page_separator_when_merging_tables: optional boolean
languages: optional array of ParsingLanguages
Accepts one of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
layout_aware: optional boolean
line_level_bounding_box: optional boolean
markdown_table_multiline_header_separator: optional string
max_pages: optional number
max_pages_enforced: optional number
merge_tables_across_pages_in_markdown: optional boolean
model: optional string
outlined_table_extraction: optional boolean
output_pdf_of_document: optional boolean
output_s3_path_prefix: optional string
output_s3_region: optional string
output_tables_as_HTML: optional boolean
page_error_tolerance: optional number
page_header_prefix: optional string
page_header_suffix: optional string
page_prefix: optional string
page_separator: optional string
page_suffix: optional string
parse_mode: optional ParsingMode

Enum for representing the mode of parsing to be used.

Accepts one of the following:
"parse_page_without_llm"
"parse_page_with_llm"
"parse_page_with_lvm"
"parse_page_with_agent"
"parse_page_with_layout_agent"
"parse_document_with_llm"
"parse_document_with_lvm"
"parse_document_with_agent"
parsing_instruction: optional string
precise_bounding_box: optional boolean
premium_mode: optional boolean
presentation_out_of_bounds_content: optional boolean
presentation_skip_embedded_data: optional boolean
preserve_layout_alignment_across_pages: optional boolean
preserve_very_small_text: optional boolean
preset: optional string
priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
project_id: optional string
remove_hidden_text: optional boolean
replace_failed_page_mode: optional FailPageMode

Enum for representing the different available page error handling modes.

Accepts one of the following:
"raw_text"
"blank_page"
"error_message"
replace_failed_page_with_error_message_prefix: optional string
replace_failed_page_with_error_message_suffix: optional string
save_images: optional boolean
skip_diagonal_text: optional boolean
specialized_chart_parsing_agentic: optional boolean
specialized_chart_parsing_efficient: optional boolean
specialized_chart_parsing_plus: optional boolean
specialized_image_parsing: optional boolean
spreadsheet_extract_sub_tables: optional boolean
spreadsheet_force_formula_computation: optional boolean
strict_mode_buggy_font: optional boolean
strict_mode_image_extraction: optional boolean
strict_mode_image_ocr: optional boolean
strict_mode_reconstruction: optional boolean
structured_output: optional boolean
structured_output_json_schema: optional string
structured_output_json_schema_name: optional string
system_prompt: optional string
system_prompt_append: optional string
take_screenshot: optional boolean
target_pages: optional string
tier: optional string
use_vendor_multimodal_model: optional boolean
user_prompt: optional string
vendor_multimodal_api_key: optional string
vendor_multimodal_model_name: optional string
version: optional string
webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }

The outbound webhook configurations

webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 13 more

List of event names to subscribe to

Accepts one of the following:
"extract.pending"
"extract.success"
"extract.error"
"extract.partial_success"
"extract.cancelled"
"parse.pending"
"parse.success"
"parse.error"
"parse.partial_success"
"parse.cancelled"
"classify.pending"
"classify.success"
"classify.error"
"classify.partial_success"
"classify.cancelled"
"unmapped_event"
webhook_headers: optional map[string]

Custom HTTP headers to include with webhook requests.

webhook_output_format: optional string

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url: optional string

The URL to send webhook notifications to.

webhook_url: optional string
version: string

Version of the configuration

creator: optional string

Creator of the configuration

source_id: optional string

ID of the source

source_type: optional string

Type of the source (e.g., 'project')

ParseConfigurationQueryResponse = object { items, next_page_token, total_size }

Response schema for paginated parse configuration queries.

items: array of ParseConfiguration { id, created_at, name, 6 more }

The list of items.

id: string

Unique identifier for the parse configuration

created_at: string

Creation timestamp

formatdate-time
name: string

Name of the parse configuration

parameters: LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more }

LlamaParseParameters configuration

adaptive_long_table: optional boolean
aggressive_table_extraction: optional boolean
auto_mode: optional boolean
auto_mode_configuration_json: optional string
auto_mode_trigger_on_image_in_page: optional boolean
auto_mode_trigger_on_regexp_in_page: optional string
auto_mode_trigger_on_table_in_page: optional boolean
auto_mode_trigger_on_text_in_page: optional string
azure_openai_api_version: optional string
azure_openai_deployment_name: optional string
azure_openai_endpoint: optional string
azure_openai_key: optional string
bbox_bottom: optional number
bbox_left: optional number
bbox_right: optional number
bbox_top: optional number
bounding_box: optional string
compact_markdown_table: optional boolean
complemental_formatting_instruction: optional string
content_guideline_instruction: optional string
continuous_mode: optional boolean
disable_image_extraction: optional boolean
disable_ocr: optional boolean
disable_reconstruction: optional boolean
do_not_cache: optional boolean
do_not_unroll_columns: optional boolean
enable_cost_optimizer: optional boolean
extract_charts: optional boolean
extract_layout: optional boolean
extract_printed_page_number: optional boolean
fast_mode: optional boolean
formatting_instruction: optional string
gpt4o_api_key: optional string
gpt4o_mode: optional boolean
guess_xlsx_sheet_name: optional boolean
hide_footers: optional boolean
hide_headers: optional boolean
high_res_ocr: optional boolean
html_make_all_elements_visible: optional boolean
html_remove_fixed_elements: optional boolean
html_remove_navigation_elements: optional boolean
http_proxy: optional string
ignore_document_elements_for_layout_detection: optional boolean
images_to_save: optional array of "screenshot" or "embedded" or "layout"
Accepts one of the following:
"screenshot"
"embedded"
"layout"
inline_images_in_markdown: optional boolean
input_s3_path: optional string
input_s3_region: optional string
input_url: optional string
internal_is_screenshot_job: optional boolean
invalidate_cache: optional boolean
is_formatting_instruction: optional boolean
job_timeout_extra_time_per_page_in_seconds: optional number
job_timeout_in_seconds: optional number
keep_page_separator_when_merging_tables: optional boolean
languages: optional array of ParsingLanguages
Accepts one of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
layout_aware: optional boolean
line_level_bounding_box: optional boolean
markdown_table_multiline_header_separator: optional string
max_pages: optional number
max_pages_enforced: optional number
merge_tables_across_pages_in_markdown: optional boolean
model: optional string
outlined_table_extraction: optional boolean
output_pdf_of_document: optional boolean
output_s3_path_prefix: optional string
output_s3_region: optional string
output_tables_as_HTML: optional boolean
page_error_tolerance: optional number
page_header_prefix: optional string
page_header_suffix: optional string
page_prefix: optional string
page_separator: optional string
page_suffix: optional string
parse_mode: optional ParsingMode

Enum for representing the mode of parsing to be used.

Accepts one of the following:
"parse_page_without_llm"
"parse_page_with_llm"
"parse_page_with_lvm"
"parse_page_with_agent"
"parse_page_with_layout_agent"
"parse_document_with_llm"
"parse_document_with_lvm"
"parse_document_with_agent"
parsing_instruction: optional string
precise_bounding_box: optional boolean
premium_mode: optional boolean
presentation_out_of_bounds_content: optional boolean
presentation_skip_embedded_data: optional boolean
preserve_layout_alignment_across_pages: optional boolean
preserve_very_small_text: optional boolean
preset: optional string
priority: optional "low" or "medium" or "high" or "critical"

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
project_id: optional string
remove_hidden_text: optional boolean
replace_failed_page_mode: optional FailPageMode

Enum for representing the different available page error handling modes.

Accepts one of the following:
"raw_text"
"blank_page"
"error_message"
replace_failed_page_with_error_message_prefix: optional string
replace_failed_page_with_error_message_suffix: optional string
save_images: optional boolean
skip_diagonal_text: optional boolean
specialized_chart_parsing_agentic: optional boolean
specialized_chart_parsing_efficient: optional boolean
specialized_chart_parsing_plus: optional boolean
specialized_image_parsing: optional boolean
spreadsheet_extract_sub_tables: optional boolean
spreadsheet_force_formula_computation: optional boolean
strict_mode_buggy_font: optional boolean
strict_mode_image_extraction: optional boolean
strict_mode_image_ocr: optional boolean
strict_mode_reconstruction: optional boolean
structured_output: optional boolean
structured_output_json_schema: optional string
structured_output_json_schema_name: optional string
system_prompt: optional string
system_prompt_append: optional string
take_screenshot: optional boolean
target_pages: optional string
tier: optional string
use_vendor_multimodal_model: optional boolean
user_prompt: optional string
vendor_multimodal_api_key: optional string
vendor_multimodal_model_name: optional string
version: optional string
webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }

The outbound webhook configurations

webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 13 more

List of event names to subscribe to

Accepts one of the following:
"extract.pending"
"extract.success"
"extract.error"
"extract.partial_success"
"extract.cancelled"
"parse.pending"
"parse.success"
"parse.error"
"parse.partial_success"
"parse.cancelled"
"classify.pending"
"classify.success"
"classify.error"
"classify.partial_success"
"classify.cancelled"
"unmapped_event"
webhook_headers: optional map[string]

Custom HTTP headers to include with webhook requests.

webhook_output_format: optional string

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url: optional string

The URL to send webhook notifications to.

webhook_url: optional string
source_id: string

ID of the source

source_type: string

Type of the source (e.g., 'project')

updated_at: string

Last update timestamp

formatdate-time
version: string

Version of the configuration

creator: optional string

Creator of the configuration

next_page_token: optional string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

total_size: optional number

The total number of items available. This is only populated when specifically requested. The value may be an estimate and can be used for display purposes only.

BetaSheets

Create Spreadsheet Job
POST/api/v1/beta/sheets/jobs
List Spreadsheet Jobs
GET/api/v1/beta/sheets/jobs
Get Spreadsheet Job
GET/api/v1/beta/sheets/jobs/{spreadsheet_job_id}
Get Result Region
GET/api/v1/beta/sheets/jobs/{spreadsheet_job_id}/regions/{region_id}/result/{region_type}
Delete Spreadsheet Job
DELETE/api/v1/beta/sheets/jobs/{spreadsheet_job_id}
ModelsExpand Collapse
SheetsJob = object { id, config, created_at, 10 more }

A spreadsheet parsing job

id: string

The ID of the job

config: SheetsParsingConfig { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 4 more }

Configuration for the parsing job

extraction_range: optional string

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: optional boolean

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: optional boolean

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: optional boolean

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: optional array of string

The names of the sheets to extract regions from. If empty, all sheets will be processed.

table_merge_sensitivity: optional "strong" or "weak"

Influences how likely similar-looking regions are merged into a single table. Useful for spreadsheets that either have sparse tables (strong merging) or many distinct tables close together (weak merging).

Accepts one of the following:
"strong"
"weak"
use_experimental_processing: optional boolean

Enables experimental processing. Accuracy may be impacted.

created_at: string

When the job was created

file_id: string

The ID of the input file

formatuuid
project_id: string

The ID of the project

formatuuid
status: StatusEnum

The status of the parsing job

Accepts one of the following:
"PENDING"
"SUCCESS"
"ERROR"
"PARTIAL_SUCCESS"
"CANCELLED"
updated_at: string

When the job was last updated

user_id: string

The ID of the user

errors: optional array of string

Any errors encountered

Deprecatedfile: optional File { id, name, project_id, 11 more }

Schema for a file.

id: string

Unique identifier

formatuuid
name: string
project_id: string

The ID of the project that the file belongs to

formatuuid
created_at: optional string

Creation datetime

formatdate-time
data_source_id: optional string

The ID of the data source that the file belongs to

formatuuid
expires_at: optional string

The expiration date for the file. Files past this date can be deleted.

formatdate-time
external_file_id: optional string

The ID of the file in the external system

file_size: optional number

Size of the file in bytes

minimum0
file_type: optional string

File type (e.g. pdf, docx, etc.)

maxLength3000
minLength1
last_modified_at: optional string

The last modified time of the file

formatdate-time
permission_info: optional map[map[unknown] or array of unknown or string or 2 more]

Permission information for the file

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
purpose: optional string

The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')

resource_info: optional map[map[unknown] or array of unknown or string or 2 more]

Resource information for the file

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
updated_at: optional string

Update datetime

formatdate-time
regions: optional array of object { location, region_type, sheet_name, 3 more }

All extracted regions (populated when job is complete)

location: string

Location of the region in the spreadsheet

region_type: string

Type of the extracted region

sheet_name: string

Worksheet name where region was found

description: optional string

Generated description for the region

region_id: optional string

Unique identifier for this region within the file

title: optional string

Generated title for the region

success: optional boolean

Whether the job completed successfully

worksheet_metadata: optional array of object { sheet_name, description, title }

Metadata for each processed worksheet (populated when job is complete)

sheet_name: string

Name of the worksheet

description: optional string

Generated description of the worksheet

title: optional string

Generated title for the worksheet

SheetsParsingConfig = object { extraction_range, flatten_hierarchical_tables, generate_additional_metadata, 4 more }

Configuration for spreadsheet parsing and region extraction

extraction_range: optional string

A1 notation of the range to extract a single region from. If None, the entire sheet is used.

flatten_hierarchical_tables: optional boolean

Return a flattened dataframe when a detected table is recognized as hierarchical.

generate_additional_metadata: optional boolean

Whether to generate additional metadata (title, description) for each extracted region.

include_hidden_cells: optional boolean

Whether to include hidden cells when extracting regions from the spreadsheet.

sheet_names: optional array of string

The names of the sheets to extract regions from. If empty, all sheets will be processed.

table_merge_sensitivity: optional "strong" or "weak"

Influences how likely similar-looking regions are merged into a single table. Useful for spreadsheets that either have sparse tables (strong merging) or many distinct tables close together (weak merging).

Accepts one of the following:
"strong"
"weak"
use_experimental_processing: optional boolean

Enables experimental processing. Accuracy may be impacted.

BetaDirectories

Create Directory
POST/api/v1/beta/directories
List Directories
GET/api/v1/beta/directories
Get Directory
GET/api/v1/beta/directories/{directory_id}
Update Directory
PATCH/api/v1/beta/directories/{directory_id}
Delete Directory
DELETE/api/v1/beta/directories/{directory_id}

BetaDirectoriesFiles

Add Directory File
POST/api/v1/beta/directories/{directory_id}/files
List Directory Files
GET/api/v1/beta/directories/{directory_id}/files
Get Directory File
GET/api/v1/beta/directories/{directory_id}/files/{directory_file_id}
Update Directory File
PATCH/api/v1/beta/directories/{directory_id}/files/{directory_file_id}
Delete Directory File
DELETE/api/v1/beta/directories/{directory_id}/files/{directory_file_id}
Upload File To Directory
POST/api/v1/beta/directories/{directory_id}/files/upload

BetaBatch

Create Batch Job
POST/api/v1/beta/batch-processing
List Batch Jobs
GET/api/v1/beta/batch-processing
Get Batch Job Status
GET/api/v1/beta/batch-processing/{job_id}
Cancel Batch Job
POST/api/v1/beta/batch-processing/{job_id}/cancel

BetaBatchJob Items

List Batch Job Items
GET/api/v1/beta/batch-processing/{job_id}/items
Get Item Processing Results
GET/api/v1/beta/batch-processing/items/{item_id}/processing-results

BetaSplit

Create Split Job
POST/api/v1/beta/split/jobs
List Split Jobs
GET/api/v1/beta/split/jobs
Get Split Job
GET/api/v1/beta/split/jobs/{split_job_id}
ModelsExpand Collapse
SplitCategory = object { name, description }

Category definition for document splitting.

name: string

Name of the category.

maxLength200
minLength1
description: optional string

Optional description of what content belongs in this category.

maxLength2000
minLength1
SplitDocumentInput = object { type, value }

Document input specification.

type: string

Type of document input. Valid values are: file_id

value: string

Document identifier.

SplitResultResponse = object { segments }

Result of a completed split job.

segments: array of SplitSegmentResponse { category, confidence_category, pages }

List of document segments.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.

SplitSegmentResponse = object { category, confidence_category, pages }

A segment of the split document.

category: string

Category name this split belongs to.

confidence_category: string

Categorical confidence level. Valid values are: high, medium, low.

pages: array of number

1-indexed page numbers in this split.