Skip to content
Get started

Create Batch Job

beta.batch.create(BatchCreateParams**kwargs) -> BatchCreateResponse
POST/api/v1/beta/batch-processing

Create a new batch processing job for a directory.

Processes all files in the specified directory according to the job configuration. The job runs asynchronously and you can monitor progress using the job status endpoint.

ParametersExpand Collapse
job_config: JobConfig

Job configuration for batch processing. Can be BatchParseJobRecordCreate or ClassifyJob.

Accepts one of the following:
class JobConfigBatchParseJobRecordCreate:

Batch-specific parse job record for batch processing.

This model contains the metadata and configuration for a batch parse job, but excludes file-specific information. It's used as input to the batch parent workflow and combined with DirectoryFile data to create full ParseJobRecordCreate instances for each file.

Attributes: job_name: Must be PARSE_RAW_FILE partitions: Partitions for job output location parameters: Generic parse configuration (BatchParseJobConfig) session_id: Upstream request ID for tracking correlation_id: Correlation ID for cross-service tracking parent_job_execution_id: Parent job execution ID if nested user_id: User who created the job project_id: Project this job belongs to webhook_url: Optional webhook URL for job completion notifications

correlation_id: Optional[str]

The correlation ID for this job. Used for tracking the job across services.

formatuuid
job_name: Optional[Literal["parse_raw_file_job"]]
parameters: Optional[JobConfigBatchParseJobRecordCreateParameters]

Generic parse job configuration for batch processing.

This model contains the parsing configuration that applies to all files in a batch, but excludes file-specific fields like file_name, file_id, etc. Those file-specific fields are populated from DirectoryFile data when creating individual ParseJobRecordCreate instances for each file.

The fields in this model should be generic settings that apply uniformly to all files being processed in the batch.

adaptive_long_table: Optional[bool]
aggressive_table_extraction: Optional[bool]
auto_mode: Optional[bool]
auto_mode_configuration_json: Optional[str]
auto_mode_trigger_on_image_in_page: Optional[bool]
auto_mode_trigger_on_regexp_in_page: Optional[str]
auto_mode_trigger_on_table_in_page: Optional[bool]
auto_mode_trigger_on_text_in_page: Optional[str]
azure_openai_api_version: Optional[str]
azure_openai_deployment_name: Optional[str]
azure_openai_endpoint: Optional[str]
azure_openai_key: Optional[str]
bbox_bottom: Optional[float]
bbox_left: Optional[float]
bbox_right: Optional[float]
bbox_top: Optional[float]
bounding_box: Optional[str]
compact_markdown_table: Optional[bool]
complemental_formatting_instruction: Optional[str]
content_guideline_instruction: Optional[str]
continuous_mode: Optional[bool]
custom_metadata: Optional[Dict[str, object]]

The custom metadata to attach to the documents.

disable_image_extraction: Optional[bool]
disable_ocr: Optional[bool]
disable_reconstruction: Optional[bool]
do_not_cache: Optional[bool]
do_not_unroll_columns: Optional[bool]
enable_cost_optimizer: Optional[bool]
extract_charts: Optional[bool]
extract_layout: Optional[bool]
extract_printed_page_number: Optional[bool]
fast_mode: Optional[bool]
formatting_instruction: Optional[str]
gpt4o_api_key: Optional[str]
gpt4o_mode: Optional[bool]
guess_xlsx_sheet_name: Optional[bool]
hide_footers: Optional[bool]
hide_headers: Optional[bool]
high_res_ocr: Optional[bool]
html_make_all_elements_visible: Optional[bool]
html_remove_fixed_elements: Optional[bool]
html_remove_navigation_elements: Optional[bool]
http_proxy: Optional[str]
ignore_document_elements_for_layout_detection: Optional[bool]
images_to_save: Optional[List[Literal["screenshot", "embedded", "layout"]]]
Accepts one of the following:
"screenshot"
"embedded"
"layout"
inline_images_in_markdown: Optional[bool]
input_s3_path: Optional[str]
input_s3_region: Optional[str]

The region for the input S3 bucket.

input_url: Optional[str]
internal_is_screenshot_job: Optional[bool]
invalidate_cache: Optional[bool]
is_formatting_instruction: Optional[bool]
job_timeout_extra_time_per_page_in_seconds: Optional[float]
job_timeout_in_seconds: Optional[float]
keep_page_separator_when_merging_tables: Optional[bool]
lang: Optional[str]

The language.

languages: Optional[List[ParsingLanguages]]
Accepts one of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
layout_aware: Optional[bool]
line_level_bounding_box: Optional[bool]
markdown_table_multiline_header_separator: Optional[str]
max_pages: Optional[int]
max_pages_enforced: Optional[int]
merge_tables_across_pages_in_markdown: Optional[bool]
model: Optional[str]
outlined_table_extraction: Optional[bool]
output_pdf_of_document: Optional[bool]
output_s3_path_prefix: Optional[str]

If specified, llamaParse will save the output to the specified path. All output file will use this 'prefix' should be a valid s3:// url

output_s3_region: Optional[str]

The region for the output S3 bucket.

output_tables_as_html: Optional[bool]
output_bucket: Optional[str]

The output bucket.

page_error_tolerance: Optional[float]
page_header_prefix: Optional[str]
page_header_suffix: Optional[str]
page_prefix: Optional[str]
page_separator: Optional[str]
page_suffix: Optional[str]
parse_mode: Optional[ParsingMode]

Enum for representing the mode of parsing to be used.

Accepts one of the following:
"parse_page_without_llm"
"parse_page_with_llm"
"parse_page_with_lvm"
"parse_page_with_agent"
"parse_page_with_layout_agent"
"parse_document_with_llm"
"parse_document_with_lvm"
"parse_document_with_agent"
parsing_instruction: Optional[str]
pipeline_id: Optional[str]

The pipeline ID.

precise_bounding_box: Optional[bool]
premium_mode: Optional[bool]
presentation_out_of_bounds_content: Optional[bool]
presentation_skip_embedded_data: Optional[bool]
preserve_layout_alignment_across_pages: Optional[bool]
preserve_very_small_text: Optional[bool]
preset: Optional[str]
priority: Optional[Literal["low", "medium", "high", "critical"]]

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:
"low"
"medium"
"high"
"critical"
project_id: Optional[str]
remove_hidden_text: Optional[bool]
replace_failed_page_mode: Optional[FailPageMode]

Enum for representing the different available page error handling modes.

Accepts one of the following:
"raw_text"
"blank_page"
"error_message"
replace_failed_page_with_error_message_prefix: Optional[str]
replace_failed_page_with_error_message_suffix: Optional[str]
resource_info: Optional[Dict[str, object]]

The resource info about the file

save_images: Optional[bool]
skip_diagonal_text: Optional[bool]
specialized_chart_parsing_agentic: Optional[bool]
specialized_chart_parsing_efficient: Optional[bool]
specialized_chart_parsing_plus: Optional[bool]
specialized_image_parsing: Optional[bool]
spreadsheet_extract_sub_tables: Optional[bool]
spreadsheet_force_formula_computation: Optional[bool]
strict_mode_buggy_font: Optional[bool]
strict_mode_image_extraction: Optional[bool]
strict_mode_image_ocr: Optional[bool]
strict_mode_reconstruction: Optional[bool]
structured_output: Optional[bool]
structured_output_json_schema: Optional[str]
structured_output_json_schema_name: Optional[str]
system_prompt: Optional[str]
system_prompt_append: Optional[str]
take_screenshot: Optional[bool]
target_pages: Optional[str]
tier: Optional[str]
type: Optional[Literal["parse"]]
use_vendor_multimodal_model: Optional[bool]
user_prompt: Optional[str]
vendor_multimodal_api_key: Optional[str]
vendor_multimodal_model_name: Optional[str]
version: Optional[str]
webhook_configurations: Optional[Iterable[WebhookConfigurationParam]]

The outbound webhook configurations

webhook_events: Optional[List[Literal["extract.pending", "extract.success", "extract.error", 13 more]]]

List of event names to subscribe to

Accepts one of the following:
"extract.pending"
"extract.success"
"extract.error"
"extract.partial_success"
"extract.cancelled"
"parse.pending"
"parse.success"
"parse.error"
"parse.partial_success"
"parse.cancelled"
"classify.pending"
"classify.success"
"classify.error"
"classify.partial_success"
"classify.cancelled"
"unmapped_event"
webhook_headers: Optional[Dict[str, str]]

Custom HTTP headers to include with webhook requests.

webhook_output_format: Optional[str]

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url: Optional[str]

The URL to send webhook notifications to.

webhook_url: Optional[str]
parent_job_execution_id: Optional[str]

The ID of the parent job execution.

formatuuid
partitions: Optional[Dict[str, str]]

The partitions for this execution. Used for determining where to save job output.

project_id: Optional[str]

The ID of the project this job belongs to.

formatuuid
session_id: Optional[str]

The upstream request ID that created this job. Used for tracking the job across services.

formatuuid
user_id: Optional[str]

The ID of the user that created this job

webhook_url: Optional[str]

The URL that needs to be called at the end of the parsing job.

class ClassifyJob:

A classify job.

id: str

Unique identifier

formatuuid
project_id: str

The ID of the project

formatuuid
rules: List[ClassifierRule]

The rules to classify the files

description: str

Natural language description of what to classify. Be specific about the content characteristics that identify this document type.

maxLength500
minLength10
type: str

The document type to assign when this rule matches (e.g., 'invoice', 'receipt', 'contract')

maxLength50
minLength1
status: StatusEnum

The status of the classify job

Accepts one of the following:
"PENDING"
"SUCCESS"
"ERROR"
"PARTIAL_SUCCESS"
"CANCELLED"
user_id: str

The ID of the user

created_at: Optional[datetime]

Creation datetime

formatdate-time
effective_at: Optional[datetime]
error_message: Optional[str]

Error message for the latest job attempt, if any.

job_record_id: Optional[str]

The job record ID associated with this status, if any.

mode: Optional[Literal["FAST", "MULTIMODAL"]]

The classification mode to use

Accepts one of the following:
"FAST"
"MULTIMODAL"
parsing_configuration: Optional[ClassifyParsingConfiguration]

The configuration for the parsing job

lang: Optional[ParsingLanguages]

The language to parse the files in

Accepts one of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
max_pages: Optional[int]

The maximum number of pages to parse

target_pages: Optional[List[int]]

The pages to target for parsing (0-indexed, so first page is at 0)

updated_at: Optional[datetime]

Update datetime

formatdate-time
organization_id: Optional[str]
project_id: Optional[str]
continue_as_new_threshold: Optional[int]

Maximum number of files to process before calling continue-as-new. If None, continue-as-new is called after every batch. (only used in directory mode)

directory_id: Optional[str]

ID of the directory containing files to process

item_ids: Optional[SequenceNotStr[str]]

List of specific item IDs to process. Either this or directory_id must be provided.

page_size: Optional[int]

Number of files to fetch per batch from the directory (only used in directory mode)

maximum1000
minimum1
temporal_namespace: Optional[str]
ReturnsExpand Collapse
class BatchCreateResponse:

Response schema for a batch processing job.

id: str

Unique identifier for the batch job

job_type: Literal["parse", "extract", "classify"]

Type of processing operation

Accepts one of the following:
"parse"
"extract"
"classify"
project_id: str

Project this job belongs to

status: Literal["pending", "running", "dispatched", 3 more]

Current status of the job

Accepts one of the following:
"pending"
"running"
"dispatched"
"completed"
"failed"
"cancelled"
total_items: int

Total number of items in the job

completed_at: Optional[datetime]

Timestamp when job completed

formatdate-time
created_at: Optional[datetime]

Creation datetime

formatdate-time
directory_id: Optional[str]

Directory being processed

effective_at: Optional[datetime]
error_message: Optional[str]

Error message for the latest job attempt, if any.

failed_items: Optional[int]

Number of items that failed processing

job_record_id: Optional[str]

The job record ID associated with this status, if any.

processed_items: Optional[int]

Number of items processed so far

skipped_items: Optional[int]

Number of items skipped (already processed or size limit)

started_at: Optional[datetime]

Timestamp when job processing started

formatdate-time
updated_at: Optional[datetime]

Update datetime

formatdate-time
workflow_id: Optional[str]

Temporal workflow ID for this batch job

Create Batch Job

import os
from llama_cloud import LlamaCloud

client = LlamaCloud(
    api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),  # This is the default and can be omitted
)
batch = client.beta.batch.create(
    job_config={},
)
print(batch.id)
{
  "id": "id",
  "job_type": "parse",
  "project_id": "project_id",
  "status": "pending",
  "total_items": 0,
  "completed_at": "2019-12-27T18:11:19.117Z",
  "created_at": "2019-12-27T18:11:19.117Z",
  "directory_id": "directory_id",
  "effective_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "failed_items": 0,
  "job_record_id": "job_record_id",
  "processed_items": 0,
  "skipped_items": 0,
  "started_at": "2019-12-27T18:11:19.117Z",
  "updated_at": "2019-12-27T18:11:19.117Z",
  "workflow_id": "workflow_id"
}
Returns Examples
{
  "id": "id",
  "job_type": "parse",
  "project_id": "project_id",
  "status": "pending",
  "total_items": 0,
  "completed_at": "2019-12-27T18:11:19.117Z",
  "created_at": "2019-12-27T18:11:19.117Z",
  "directory_id": "directory_id",
  "effective_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "failed_items": 0,
  "job_record_id": "job_record_id",
  "processed_items": 0,
  "skipped_items": 0,
  "started_at": "2019-12-27T18:11:19.117Z",
  "updated_at": "2019-12-27T18:11:19.117Z",
  "workflow_id": "workflow_id"
}