Skip to content
Framework Docs

Create Batch Job

BatchCreateResponse beta().batch().create(BatchCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
POST/api/v1/beta/batch-processing

Create a batch processing job.

Processes files from a directory or a specific list of item IDs. Supports batch parsing and classification operations.

Provide either directory_id to process all files in a directory, or item_ids for specific items. The job runs asynchronously — poll GET /batch/{job_id} for progress.

ParametersExpand Collapse
BatchCreateParams params
Optional<String> organizationId
Optional<String> projectId
Optional<String> temporalNamespace
JobConfig jobConfig

Job configuration — either a parse or classify config

class BatchParseJobRecordCreate:

Batch-specific parse job record for batch processing.

This model contains the metadata and configuration for a batch parse job, but excludes file-specific information. It’s used as input to the batch parent workflow and combined with DirectoryFile data to create full ParseJobRecordCreate instances for each file.

Attributes: job_name: Must be PARSE_RAW_FILE partitions: Partitions for job output location parameters: Generic parse configuration (BatchParseJobConfig) session_id: Upstream request ID for tracking correlation_id: Correlation ID for cross-service tracking parent_job_execution_id: Parent job execution ID if nested user_id: User who created the job project_id: Project this job belongs to webhook_url: Optional webhook URL for job completion notifications

Optional<String> correlationId

The correlation ID for this job. Used for tracking the job across services.

formatuuid
Optional<JobName> jobName
Optional<Parameters> parameters

Generic parse job configuration for batch processing.

This model contains the parsing configuration that applies to all files in a batch, but excludes file-specific fields like file_name, file_id, etc. Those file-specific fields are populated from DirectoryFile data when creating individual ParseJobRecordCreate instances for each file.

The fields in this model should be generic settings that apply uniformly to all files being processed in the batch.

Optional<Boolean> adaptiveLongTable
Optional<Boolean> aggressiveTableExtraction
Optional<Boolean> autoMode
Optional<String> autoModeConfigurationJson
Optional<Boolean> autoModeTriggerOnImageInPage
Optional<String> autoModeTriggerOnRegexpInPage
Optional<Boolean> autoModeTriggerOnTableInPage
Optional<String> autoModeTriggerOnTextInPage
Optional<String> azureOpenAIApiVersion
Optional<String> azureOpenAIDeploymentName
Optional<String> azureOpenAIEndpoint
Optional<String> azureOpenAIKey
Optional<Double> bboxBottom
Optional<Double> bboxLeft
Optional<Double> bboxRight
Optional<Double> bboxTop
Optional<String> boundingBox
Optional<Boolean> compactMarkdownTable
Optional<String> complementalFormattingInstruction
Optional<String> contentGuidelineInstruction
Optional<Boolean> continuousMode
Optional<CustomMetadata> customMetadata

The custom metadata to attach to the documents.

Optional<Boolean> disableImageExtraction
Optional<Boolean> disableOcr
Optional<Boolean> disableReconstruction
Optional<Boolean> doNotCache
Optional<Boolean> doNotUnrollColumns
Optional<Boolean> enableCostOptimizer
Optional<Boolean> extractCharts
Optional<Boolean> extractLayout
Optional<Boolean> extractPrintedPageNumber
Optional<Boolean> fastMode
Optional<String> formattingInstruction
Optional<String> gpt4oApiKey
Optional<Boolean> gpt4oMode
Optional<Boolean> guessXlsxSheetName
Optional<Boolean> hideFooters
Optional<Boolean> hideHeaders
Optional<Boolean> highResOcr
Optional<Boolean> htmlMakeAllElementsVisible
Optional<Boolean> htmlRemoveFixedElements
Optional<Boolean> htmlRemoveNavigationElements
Optional<String> httpProxy
Optional<Boolean> ignoreDocumentElementsForLayoutDetection
Optional<List<ImagesToSave>> imagesToSave
One of the following:
SCREENSHOT("screenshot")
EMBEDDED("embedded")
LAYOUT("layout")
Optional<Boolean> inlineImagesInMarkdown
Optional<String> inputS3Path
Optional<String> inputS3Region

The region for the input S3 bucket.

Optional<String> inputUrl
Optional<Boolean> internalIsScreenshotJob
Optional<Boolean> invalidateCache
Optional<Boolean> isFormattingInstruction
Optional<Double> jobTimeoutExtraTimePerPageInSeconds
Optional<Double> jobTimeoutInSeconds
Optional<Boolean> keepPageSeparatorWhenMergingTables
Optional<String> lang

The language.

Optional<List<ParsingLanguages>> languages
One of the following:
AF("af")
AZ("az")
BS("bs")
CS("cs")
CY("cy")
DA("da")
DE("de")
EN("en")
ES("es")
ET("et")
FR("fr")
GA("ga")
HR("hr")
HU("hu")
ID("id")
IS("is")
IT("it")
KU("ku")
LA("la")
LT("lt")
LV("lv")
MI("mi")
MS("ms")
MT("mt")
NL("nl")
NO("no")
OC("oc")
PI("pi")
PL("pl")
PT("pt")
RO("ro")
RS_LATIN("rs_latin")
SK("sk")
SL("sl")
SQ("sq")
SV("sv")
SW("sw")
TL("tl")
TR("tr")
UZ("uz")
VI("vi")
AR("ar")
FA("fa")
UG("ug")
UR("ur")
BN("bn")
AS("as")
MNI("mni")
RU("ru")
RS_CYRILLIC("rs_cyrillic")
BE("be")
BG("bg")
UK("uk")
MN("mn")
ABQ("abq")
ADY("ady")
KBD("kbd")
AVA("ava")
DAR("dar")
INH("inh")
CHE("che")
LBE("lbe")
LEZ("lez")
TAB("tab")
TJK("tjk")
HI("hi")
MR("mr")
NE("ne")
BH("bh")
MAI("mai")
ANG("ang")
BHO("bho")
MAH("mah")
SCK("sck")
NEW("new")
GOM("gom")
SA("sa")
BGC("bgc")
TH("th")
CH_SIM("ch_sim")
CH_TRA("ch_tra")
JA("ja")
KO("ko")
TA("ta")
TE("te")
KN("kn")
Optional<Boolean> layoutAware
Optional<Boolean> lineLevelBoundingBox
Optional<String> markdownTableMultilineHeaderSeparator
Optional<Long> maxPages
Optional<Long> maxPagesEnforced
Optional<Boolean> mergeTablesAcrossPagesInMarkdown
Optional<String> model
Optional<Boolean> outlinedTableExtraction
Optional<Boolean> outputPdfOfDocument
Optional<String> outputS3PathPrefix

If specified, llamaParse will save the output to the specified path. All output file will use this ‘prefix’ should be a valid s3:// url

Optional<String> outputS3Region

The region for the output S3 bucket.

Optional<Boolean> outputTablesAsHtml
Optional<String> outputBucket

The output bucket.

Optional<Double> pageErrorTolerance
Optional<String> pageHeaderPrefix
Optional<String> pageHeaderSuffix
Optional<String> pagePrefix
Optional<String> pageSeparator
Optional<String> pageSuffix
Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:
PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")
PARSE_PAGE_WITH_LLM("parse_page_with_llm")
PARSE_PAGE_WITH_LVM("parse_page_with_lvm")
PARSE_PAGE_WITH_AGENT("parse_page_with_agent")
PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")
PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")
PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")
PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")
Optional<String> parsingInstruction
Optional<String> pipelineId

The pipeline ID.

Optional<Boolean> preciseBoundingBox
Optional<Boolean> premiumMode
Optional<Boolean> presentationOutOfBoundsContent
Optional<Boolean> presentationSkipEmbeddedData
Optional<Boolean> preserveLayoutAlignmentAcrossPages
Optional<Boolean> preserveVerySmallText
Optional<String> preset
Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:
LOW("low")
MEDIUM("medium")
HIGH("high")
CRITICAL("critical")
Optional<String> projectId
Optional<Boolean> removeHiddenText
Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:
RAW_TEXT("raw_text")
BLANK_PAGE("blank_page")
ERROR_MESSAGE("error_message")
Optional<String> replaceFailedPageWithErrorMessagePrefix
Optional<String> replaceFailedPageWithErrorMessageSuffix
Optional<ResourceInfo> resourceInfo

The resource info about the file

Optional<Boolean> saveImages
Optional<Boolean> skipDiagonalText
Optional<Boolean> specializedChartParsingAgentic
Optional<Boolean> specializedChartParsingEfficient
Optional<Boolean> specializedChartParsingPlus
Optional<Boolean> specializedImageParsing
Optional<Boolean> spreadsheetExtractSubTables
Optional<Boolean> spreadsheetForceFormulaComputation
Optional<Boolean> spreadsheetIncludeHiddenSheets
Optional<Boolean> strictModeBuggyFont
Optional<Boolean> strictModeImageExtraction
Optional<Boolean> strictModeImageOcr
Optional<Boolean> strictModeReconstruction
Optional<Boolean> structuredOutput
Optional<String> structuredOutputJsonSchema
Optional<String> structuredOutputJsonSchemaName
Optional<String> systemPrompt
Optional<String> systemPromptAppend
Optional<Boolean> takeScreenshot
Optional<String> targetPages
Optional<String> tier
Optional<Type> type
Optional<Boolean> useVendorMultimodalModel
Optional<String> userPrompt
Optional<String> vendorMultimodalApiKey
Optional<String> vendorMultimodalModelName
Optional<String> version
Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:
EXTRACT_PENDING("extract.pending")
EXTRACT_SUCCESS("extract.success")
EXTRACT_ERROR("extract.error")
EXTRACT_PARTIAL_SUCCESS("extract.partial_success")
EXTRACT_CANCELLED("extract.cancelled")
PARSE_PENDING("parse.pending")
PARSE_RUNNING("parse.running")
PARSE_SUCCESS("parse.success")
PARSE_ERROR("parse.error")
PARSE_PARTIAL_SUCCESS("parse.partial_success")
PARSE_CANCELLED("parse.cancelled")
CLASSIFY_PENDING("classify.pending")
CLASSIFY_RUNNING("classify.running")
CLASSIFY_SUCCESS("classify.success")
CLASSIFY_ERROR("classify.error")
CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")
CLASSIFY_CANCELLED("classify.cancelled")
SHEETS_PENDING("sheets.pending")
SHEETS_SUCCESS("sheets.success")
SHEETS_ERROR("sheets.error")
SHEETS_PARTIAL_SUCCESS("sheets.partial_success")
SHEETS_CANCELLED("sheets.cancelled")
UNMAPPED_EVENT("unmapped_event")
Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl
Optional<String> parentJobExecutionId

The ID of the parent job execution.

formatuuid
Optional<Partitions> partitions

The partitions for this execution. Used for determining where to save job output.

Optional<String> projectId

The ID of the project this job belongs to.

formatuuid
Optional<String> sessionId

The upstream request ID that created this job. Used for tracking the job across services.

formatuuid
Optional<String> userId

The ID of the user that created this job

Optional<String> webhookUrl

The URL that needs to be called at the end of the parsing job.

class ClassifyJob:

A classify job.

String id

Unique identifier

formatuuid
String projectId

The ID of the project

formatuuid
List<ClassifierRule> rules

The rules to classify the files

String description

Natural language description of what to classify. Be specific about the content characteristics that identify this document type.

maxLength500
minLength10
String type

The document type to assign when this rule matches (e.g., ‘invoice’, ‘receipt’, ‘contract’)

maxLength50
minLength1
StatusEnum status

The status of the classify job

One of the following:
PENDING("PENDING")
SUCCESS("SUCCESS")
ERROR("ERROR")
PARTIAL_SUCCESS("PARTIAL_SUCCESS")
CANCELLED("CANCELLED")
String userId

The ID of the user

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time
Optional<LocalDateTime> effectiveAt
Optional<String> errorMessage

Error message for the latest job attempt, if any.

Optional<String> jobRecordId

The job record ID associated with this status, if any.

Optional<Mode> mode

The classification mode to use

One of the following:
FAST("FAST")
MULTIMODAL("MULTIMODAL")
Optional<ClassifyParsingConfiguration> parsingConfiguration

The configuration for the parsing job

Optional<ParsingLanguages> lang

The language to parse the files in

One of the following:
AF("af")
AZ("az")
BS("bs")
CS("cs")
CY("cy")
DA("da")
DE("de")
EN("en")
ES("es")
ET("et")
FR("fr")
GA("ga")
HR("hr")
HU("hu")
ID("id")
IS("is")
IT("it")
KU("ku")
LA("la")
LT("lt")
LV("lv")
MI("mi")
MS("ms")
MT("mt")
NL("nl")
NO("no")
OC("oc")
PI("pi")
PL("pl")
PT("pt")
RO("ro")
RS_LATIN("rs_latin")
SK("sk")
SL("sl")
SQ("sq")
SV("sv")
SW("sw")
TL("tl")
TR("tr")
UZ("uz")
VI("vi")
AR("ar")
FA("fa")
UG("ug")
UR("ur")
BN("bn")
AS("as")
MNI("mni")
RU("ru")
RS_CYRILLIC("rs_cyrillic")
BE("be")
BG("bg")
UK("uk")
MN("mn")
ABQ("abq")
ADY("ady")
KBD("kbd")
AVA("ava")
DAR("dar")
INH("inh")
CHE("che")
LBE("lbe")
LEZ("lez")
TAB("tab")
TJK("tjk")
HI("hi")
MR("mr")
NE("ne")
BH("bh")
MAI("mai")
ANG("ang")
BHO("bho")
MAH("mah")
SCK("sck")
NEW("new")
GOM("gom")
SA("sa")
BGC("bgc")
TH("th")
CH_SIM("ch_sim")
CH_TRA("ch_tra")
JA("ja")
KO("ko")
TA("ta")
TE("te")
KN("kn")
Optional<Long> maxPages

The maximum number of pages to parse

Optional<List<Long>> targetPages

The pages to target for parsing (0-indexed, so first page is at 0)

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time
Optional<Long> continueAsNewThreshold

Maximum files to process per execution cycle in directory mode. Defaults to page_size.

Optional<String> directoryId

ID of the directory containing files to process

Optional<List<String>> itemIds

List of specific item IDs to process. Either this or directory_id must be provided.

Optional<Long> pageSize

Number of files to process per batch when using directory mode

maximum1000
minimum1
ReturnsExpand Collapse
class BatchCreateResponse:

Response schema for a batch processing job.

String id

Unique identifier for the batch job

JobType jobType

Type of processing operation (parse or classify)

One of the following:
PARSE("parse")
EXTRACT("extract")
CLASSIFY("classify")
String projectId

Project this job belongs to

Status status

Current job status

One of the following:
PENDING("pending")
RUNNING("running")
DISPATCHED("dispatched")
COMPLETED("completed")
FAILED("failed")
CANCELLED("cancelled")
long totalItems

Total number of items in the job

Optional<LocalDateTime> completedAt

Timestamp when job completed

formatdate-time
Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time
Optional<String> directoryId

Directory being processed

Optional<LocalDateTime> effectiveAt
Optional<String> errorMessage

Error message for the latest job attempt, if any.

Optional<Long> failedItems

Number of items that failed processing

Optional<String> jobRecordId

The job record ID associated with this status, if any.

Optional<Long> processedItems

Number of items processed so far

Optional<Long> skippedItems

Number of items skipped (already processed or size limit)

Optional<LocalDateTime> startedAt

Timestamp when job processing started

formatdate-time
Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time
Optional<String> workflowId

Async job tracking ID

Create Batch Job

package com.llamacloud_prod.api.example;

import com.llamacloud_prod.api.client.LlamaCloudClient;
import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient;
import com.llamacloud_prod.api.models.beta.batch.BatchCreateParams;
import com.llamacloud_prod.api.models.beta.batch.BatchCreateResponse;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        BatchCreateParams params = BatchCreateParams.builder()
            .jobConfig(BatchCreateParams.JobConfig.BatchParseJobRecordCreate.builder().build())
            .build();
        BatchCreateResponse batch = client.beta().batch().create(params);
    }
}
{
  "id": "bjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "job_type": "parse",
  "project_id": "proj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "status": "pending",
  "total_items": 0,
  "completed_at": "2019-12-27T18:11:19.117Z",
  "created_at": "2019-12-27T18:11:19.117Z",
  "directory_id": "dir-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "effective_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "failed_items": 0,
  "job_record_id": "job_record_id",
  "processed_items": 0,
  "skipped_items": 0,
  "started_at": "2019-12-27T18:11:19.117Z",
  "updated_at": "2019-12-27T18:11:19.117Z",
  "workflow_id": "workflow_id"
}
Returns Examples
{
  "id": "bjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "job_type": "parse",
  "project_id": "proj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "status": "pending",
  "total_items": 0,
  "completed_at": "2019-12-27T18:11:19.117Z",
  "created_at": "2019-12-27T18:11:19.117Z",
  "directory_id": "dir-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "effective_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "failed_items": 0,
  "job_record_id": "job_record_id",
  "processed_items": 0,
  "skipped_items": 0,
  "started_at": "2019-12-27T18:11:19.117Z",
  "updated_at": "2019-12-27T18:11:19.117Z",
  "workflow_id": "workflow_id"
}