Create Batch Job

BatchCreateResponse beta().batch().create(, )

POST/api/v1/beta/batch-processing

Create a batch processing job.

Processes files from a directory or a specific list of item IDs. Supports batch parsing and classification operations.

Provide either directory_id to process all files in a directory, or item_ids for specific items. The job runs asynchronously — poll GET /batch/{job_id} for progress.

ParametersExpand Collapse

BatchCreateParams params

Optional<String> organizationId

Optional<String> projectId

Optional<String> temporalNamespace

JobConfig jobConfig

Job configuration — either a parse or classify config

class BatchParseJobRecordCreate:

Batch-specific parse job record for batch processing.

This model contains the metadata and configuration for a batch parse job, but excludes file-specific information. It’s used as input to the batch parent workflow and combined with DirectoryFile data to create full ParseJobRecordCreate instances for each file.

Attributes: job_name: Must be PARSE_RAW_FILE partitions: Partitions for job output location parameters: Generic parse configuration (BatchParseJobConfig) session_id: Upstream request ID for tracking correlation_id: Correlation ID for cross-service tracking parent_job_execution_id: Parent job execution ID if nested user_id: User who created the job project_id: Project this job belongs to webhook_url: Optional webhook URL for job completion notifications

Optional<String> correlationId

The correlation ID for this job. Used for tracking the job across services.

formatuuid

Optional<JobName> jobName

Optional<Parameters> parameters

Generic parse job configuration for batch processing.

This model contains the parsing configuration that applies to all files in a batch, but excludes file-specific fields like file_name, file_id, etc. Those file-specific fields are populated from DirectoryFile data when creating individual ParseJobRecordCreate instances for each file.

The fields in this model should be generic settings that apply uniformly to all files being processed in the batch.

Optional<Boolean> adaptiveLongTable

Optional<Boolean> aggressiveTableExtraction

Optional<Boolean> annotateLinks

Optional<Boolean> autoMode

Optional<String> autoModeConfigurationJson

Optional<Boolean> autoModeTriggerOnImageInPage

Optional<String> autoModeTriggerOnRegexpInPage

Optional<Boolean> autoModeTriggerOnTableInPage

Optional<String> autoModeTriggerOnTextInPage

Optional<String> azureOpenAIApiVersion

Optional<String> azureOpenAIDeploymentName

Optional<String> azureOpenAIEndpoint

Optional<String> azureOpenAIKey

Optional<Double> bboxBottom

Optional<Double> bboxLeft

Optional<Double> bboxRight

Optional<Double> bboxTop

Optional<String> boundingBox

Optional<Boolean> compactMarkdownTable

Optional<String> complementalFormattingInstruction

Optional<String> confidenceScoreEffort

Optional<String> contentGuidelineInstruction

Optional<Boolean> continuousMode

Optional<CustomMetadata> customMetadata

The custom metadata to attach to the documents.

Optional<Boolean> disableImageExtraction

Optional<Boolean> disableOcr

Optional<Boolean> disableReconstruction

Optional<Boolean> doNotCache

Optional<Boolean> doNotUnrollColumns

Optional<Boolean> enableCostOptimizer

Optional<Boolean> extractCharts

Optional<Boolean> extractLayout

Optional<Boolean> extractPrintedPageNumber

Optional<Boolean> fastMode

Optional<String> formattingInstruction

Optional<String> gpt4oApiKey

Optional<Boolean> gpt4oMode

Optional<Boolean> guessXlsxSheetName

Optional<Boolean> hideFooters

Optional<Boolean> hideHeaders

Optional<Boolean> highResOcr

Optional<Boolean> htmlMakeAllElementsVisible

Optional<Boolean> htmlRemoveFixedElements

Optional<Boolean> htmlRemoveNavigationElements

Optional<String> httpProxy

Optional<Boolean> ignoreDocumentElementsForLayoutDetection

Optional<List<ImagesToSave>> imagesToSave

One of the following:

EMBEDDED("embedded")

LAYOUT("layout")

SCREENSHOT("screenshot")

Optional<Boolean> inlineImagesInMarkdown

Optional<String> inputS3Path

Optional<String> inputS3Region

The region for the input S3 bucket.

Optional<String> inputUrl

Optional<Boolean> internalIsScreenshotJob

Optional<Boolean> invalidateCache

Optional<Boolean> isFormattingInstruction

Optional<Double> jobTimeoutExtraTimePerPageInSeconds

Optional<Double> jobTimeoutInSeconds

Optional<Boolean> keepPageSeparatorWhenMergingTables

Optional<String> lang

The language.

Optional<List<ParsingLanguages>> languages

One of the following:

ABQ("abq")

ADY("ady")

AF("af")

ANG("ang")

AR("ar")

AS("as")

AVA("ava")

AZ("az")

BE("be")

BG("bg")

BGC("bgc")

BH("bh")

BHO("bho")

BN("bn")

BS("bs")

CH_SIM("ch_sim")

CH_TRA("ch_tra")

CHE("che")

CS("cs")

CY("cy")

DA("da")

DAR("dar")

DE("de")

EN("en")

ES("es")

ET("et")

FA("fa")

FR("fr")

GA("ga")

GOM("gom")

HI("hi")

HR("hr")

HU("hu")

ID("id")

INH("inh")

IS("is")

IT("it")

JA("ja")

KBD("kbd")

KN("kn")

KO("ko")

KU("ku")

LA("la")

LBE("lbe")

LEZ("lez")

LT("lt")

LV("lv")

MAH("mah")

MAI("mai")

MI("mi")

MN("mn")

MNI("mni")

MR("mr")

MS("ms")

MT("mt")

NE("ne")

NEW("new")

NL("nl")

NO("no")

OC("oc")

PI("pi")

PL("pl")

PT("pt")

RO("ro")

RS_CYRILLIC("rs_cyrillic")

RS_LATIN("rs_latin")

RU("ru")

SA("sa")

SCK("sck")

SK("sk")

SL("sl")

SQ("sq")

SV("sv")

SW("sw")

TA("ta")

TAB("tab")

TE("te")

TH("th")

TJK("tjk")

TL("tl")

TR("tr")

UG("ug")

UK("uk")

UR("ur")

UZ("uz")

VI("vi")

Optional<Boolean> layoutAware

Optional<Boolean> lineLevelBoundingBox

Optional<String> markdownTableMultilineHeaderSeparator

Optional<Long> maxPages

Optional<Long> maxPagesEnforced

Optional<Boolean> mergeTablesAcrossPagesInMarkdown

Optional<String> model

Optional<Boolean> outlinedTableExtraction

Optional<Boolean> outputPdfOfDocument

Optional<String> outputS3PathPrefix

If specified, llamaParse will save the output to the specified path. All output file will use this ‘prefix’ should be a valid s3:// url

Optional<String> outputS3Region

The region for the output S3 bucket.

Optional<Boolean> outputTablesAsHtml

Optional<String> outputBucket

The output bucket.

Optional<Double> pageErrorTolerance

Optional<String> pageFooterPrefix

Optional<String> pageFooterSuffix

Optional<String> pageHeaderPrefix

Optional<String> pageHeaderSuffix

Optional<String> pagePrefix

Optional<String> pageSeparator

Optional<String> pageSuffix

Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:

PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")

PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")

PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")

PARSE_PAGE_WITH_AGENT("parse_page_with_agent")

PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")

PARSE_PAGE_WITH_LLM("parse_page_with_llm")

PARSE_PAGE_WITH_LVM("parse_page_with_lvm")

PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")

Optional<String> parsingInstruction

Optional<String> pipelineId

The pipeline ID.

Optional<Boolean> preciseBoundingBox

Optional<Boolean> premiumMode

Optional<Boolean> presentationOutOfBoundsContent

Optional<Boolean> presentationSkipEmbeddedData

Optional<Boolean> preserveLayoutAlignmentAcrossPages

Optional<Boolean> preserveVerySmallText

Optional<String> preset

Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:

CRITICAL("critical")

HIGH("high")

LOW("low")

MEDIUM("medium")

Optional<String> projectId

Optional<Boolean> removeHiddenText

Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:

BLANK_PAGE("blank_page")

ERROR_MESSAGE("error_message")

RAW_TEXT("raw_text")

Optional<String> replaceFailedPageWithErrorMessagePrefix

Optional<String> replaceFailedPageWithErrorMessageSuffix

Optional<ResourceInfo> resourceInfo

The resource info about the file

Optional<Boolean> saveImages

Optional<Boolean> skipDiagonalText

Optional<Boolean> specializedChartParsingAgentic

Optional<Boolean> specializedChartParsingEfficient

Optional<Boolean> specializedChartParsingPlus

Optional<Boolean> specializedImageParsing

Optional<Boolean> spreadsheetExtractSubTables

Optional<Boolean> spreadsheetForceFormulaComputation

Optional<Boolean> spreadsheetIncludeHiddenSheets

Optional<Boolean> strictModeBuggyFont

Optional<Boolean> strictModeImageExtraction

Optional<Boolean> strictModeImageOcr

Optional<Boolean> strictModeReconstruction

Optional<Boolean> structuredOutput

Optional<String> structuredOutputJsonSchema

Optional<String> structuredOutputJsonSchemaName

Optional<String> systemPrompt

Optional<String> systemPromptAppend

Optional<Boolean> takeScreenshot

Optional<String> targetPages

Optional<String> tier

Optional<Type> type

Optional<Boolean> useVendorMultimodalModel

Optional<String> userPrompt

Optional<String> vendorMultimodalApiKey

Optional<String> vendorMultimodalModelName

Optional<String> version

Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:

CLASSIFY_CANCELLED("classify.cancelled")

CLASSIFY_ERROR("classify.error")

CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")

CLASSIFY_PENDING("classify.pending")

CLASSIFY_RUNNING("classify.running")

CLASSIFY_SUCCESS("classify.success")

EXTRACT_CANCELLED("extract.cancelled")

EXTRACT_ERROR("extract.error")

EXTRACT_PARTIAL_SUCCESS("extract.partial_success")

EXTRACT_PENDING("extract.pending")

EXTRACT_SUCCESS("extract.success")

PARSE_CANCELLED("parse.cancelled")

PARSE_ERROR("parse.error")

PARSE_PARTIAL_SUCCESS("parse.partial_success")

PARSE_PENDING("parse.pending")

PARSE_RUNNING("parse.running")

PARSE_SUCCESS("parse.success")

SHEETS_CANCELLED("sheets.cancelled")

SHEETS_ERROR("sheets.error")

SHEETS_PARTIAL_SUCCESS("sheets.partial_success")

SHEETS_PENDING("sheets.pending")

SHEETS_SUCCESS("sheets.success")

SPLIT_CANCELLED("split.cancelled")

SPLIT_ERROR("split.error")

SPLIT_PENDING("split.pending")

SPLIT_PROCESSING("split.processing")

SPLIT_SUCCESS("split.success")

UNMAPPED_EVENT("unmapped_event")

Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookSigningSecret

Shared signing secret used to sign webhook deliveries. When set, each request includes an HMAC-SHA256 signature of the request body in the ‘LC-Signature’ header (value ‘sha256=’). Recompute the HMAC over the raw request body with this secret to verify the delivery is authentic.

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl

Optional<String> parentJobExecutionId

The ID of the parent job execution.

formatuuid

Optional<Partitions> partitions

The partitions for this execution. Used for determining where to save job output.

Optional<String> projectId

The ID of the project this job belongs to.

formatuuid

Optional<String> sessionId

The upstream request ID that created this job. Used for tracking the job across services.

formatuuid

Optional<String> userId

The ID of the user that created this job

Optional<String> webhookUrl

The URL that needs to be called at the end of the parsing job.

class ClassifyJob:

A classify job.

String id

Unique identifier

formatuuid

String projectId

The ID of the project

formatuuid

List<ClassifierRule> rules

The rules to classify the files

String description

Natural language description of what to classify. Be specific about the content characteristics that identify this document type.

maxLength2000

minLength10

String type

The document type to assign when this rule matches (e.g., ‘invoice’, ‘receipt’, ‘contract’)

maxLength50

minLength1

StatusEnum status

The status of the classify job

One of the following:

CANCELLED("CANCELLED")

ERROR("ERROR")

PARTIAL_SUCCESS("PARTIAL_SUCCESS")

PENDING("PENDING")

SUCCESS("SUCCESS")

String userId

The ID of the user

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<LocalDateTime> effectiveAt

Optional<String> errorMessage

Error message for the latest job attempt, if any.

Optional<String> jobRecordId

The job record ID associated with this status, if any.

Optional<Mode> mode

The classification mode to use

One of the following:

FAST("FAST")

MULTIMODAL("MULTIMODAL")

Optional<ClassifyParsingConfiguration> parsingConfiguration

The configuration for the parsing job

Optional<ParsingLanguages> lang

The language to parse the files in

One of the following:

ABQ("abq")

ADY("ady")

AF("af")

ANG("ang")

AR("ar")

AS("as")

AVA("ava")

AZ("az")

BE("be")

BG("bg")

BGC("bgc")

BH("bh")

BHO("bho")

BN("bn")

BS("bs")

CH_SIM("ch_sim")

CH_TRA("ch_tra")

CHE("che")

CS("cs")

CY("cy")

DA("da")

DAR("dar")

DE("de")

EN("en")

ES("es")

ET("et")

FA("fa")

FR("fr")

GA("ga")

GOM("gom")

HI("hi")

HR("hr")

HU("hu")

ID("id")

INH("inh")

IS("is")

IT("it")

JA("ja")

KBD("kbd")

KN("kn")

KO("ko")

KU("ku")

LA("la")

LBE("lbe")

LEZ("lez")

LT("lt")

LV("lv")

MAH("mah")

MAI("mai")

MI("mi")

MN("mn")

MNI("mni")

MR("mr")

MS("ms")

MT("mt")

NE("ne")

NEW("new")

NL("nl")

NO("no")

OC("oc")

PI("pi")

PL("pl")

PT("pt")

RO("ro")

RS_CYRILLIC("rs_cyrillic")

RS_LATIN("rs_latin")

RU("ru")

SA("sa")

SCK("sck")

SK("sk")

SL("sl")

SQ("sq")

SV("sv")

SW("sw")

TA("ta")

TAB("tab")

TE("te")

TH("th")

TJK("tjk")

TL("tl")

TR("tr")

UG("ug")

UK("uk")

UR("ur")

UZ("uz")

VI("vi")

Optional<Long> maxPages

The maximum number of pages to parse

Optional<List<Long>> targetPages

The pages to target for parsing (0-indexed, so first page is at 0)

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

Optional<Long> continueAsNewThreshold

Maximum files to process per execution cycle in directory mode. Defaults to page_size.

Optional<String> directoryId

ID of the directory containing files to process

Optional<List<String>> itemIds

List of specific item IDs to process. Either this or directory_id must be provided.

Optional<Long> pageSize

Number of files to process per batch when using directory mode

maximum1000

minimum1

ReturnsExpand Collapse

class BatchCreateResponse:

Response schema for a batch processing job.

String id

Unique identifier for the batch job

JobType jobType

Type of processing operation (parse or classify)

One of the following:

CLASSIFY("classify")

EXTRACT("extract")

PARSE("parse")

String projectId

Project this job belongs to

Status status

Current job status

One of the following:

CANCELLED("cancelled")

COMPLETED("completed")

DISPATCHED("dispatched")

FAILED("failed")

PENDING("pending")

RUNNING("running")

long totalItems

Total number of items in the job

Optional<LocalDateTime> completedAt

Timestamp when job completed

formatdate-time

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<String> directoryId

Directory being processed

Optional<LocalDateTime> effectiveAt

Optional<String> errorMessage

Error message for the latest job attempt, if any.

Optional<Long> failedItems

Number of items that failed processing

Optional<String> jobRecordId

The job record ID associated with this status, if any.

Optional<Long> processedItems

Number of items processed so far

Optional<Long> skippedItems

Number of items skipped (already processed or size limit)

Optional<LocalDateTime> startedAt

Timestamp when job processing started

formatdate-time

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

Optional<String> workflowId

Async job tracking ID

Create Batch Job

package ai.llamaindex.llamacloud.example;

import ai.llamaindex.llamacloud.client.LlamaCloudClient;
import ai.llamaindex.llamacloud.client.okhttp.LlamaCloudOkHttpClient;
import ai.llamaindex.llamacloud.models.beta.batch.BatchCreateParams;
import ai.llamaindex.llamacloud.models.beta.batch.BatchCreateResponse;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        BatchCreateParams params = BatchCreateParams.builder()
            .jobConfig(BatchCreateParams.JobConfig.BatchParseJobRecordCreate.builder().build())
            .build();
        BatchCreateResponse batch = client.beta().batch().create(params);
    }
}

{
  "id": "bjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "job_type": "classify",
  "project_id": "proj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "status": "cancelled",
  "total_items": 0,
  "completed_at": "2019-12-27T18:11:19.117Z",
  "created_at": "2019-12-27T18:11:19.117Z",
  "directory_id": "dir-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "effective_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "failed_items": 0,
  "job_record_id": "job_record_id",
  "processed_items": 0,
  "skipped_items": 0,
  "started_at": "2019-12-27T18:11:19.117Z",
  "updated_at": "2019-12-27T18:11:19.117Z",
  "workflow_id": "workflow_id"
}

Returns Examples

{
  "id": "bjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "job_type": "classify",
  "project_id": "proj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "status": "cancelled",
  "total_items": 0,
  "completed_at": "2019-12-27T18:11:19.117Z",
  "created_at": "2019-12-27T18:11:19.117Z",
  "directory_id": "dir-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "effective_at": "2019-12-27T18:11:19.117Z",
  "error_message": "error_message",
  "failed_items": 0,
  "job_record_id": "job_record_id",
  "processed_items": 0,
  "skipped_items": 0,
  "started_at": "2019-12-27T18:11:19.117Z",
  "updated_at": "2019-12-27T18:11:19.117Z",
  "workflow_id": "workflow_id"
}