## Create Batch Job `BatchCreateResponse beta().batch().create(BatchCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **post** `/api/v1/beta/batch-processing` Create a batch processing job. Processes files from a directory or a specific list of item IDs. Supports batch parsing and classification operations. Provide either `directory_id` to process all files in a directory, or `item_ids` for specific items. The job runs asynchronously — poll `GET /batch/{job_id}` for progress. ### Parameters - `BatchCreateParams params` - `Optional organizationId` - `Optional projectId` - `Optional temporalNamespace` - `JobConfig jobConfig` Job configuration — either a parse or classify config - `class BatchParseJobRecordCreate:` Batch-specific parse job record for batch processing. This model contains the metadata and configuration for a batch parse job, but excludes file-specific information. It's used as input to the batch parent workflow and combined with DirectoryFile data to create full ParseJobRecordCreate instances for each file. Attributes: job_name: Must be PARSE_RAW_FILE partitions: Partitions for job output location parameters: Generic parse configuration (BatchParseJobConfig) session_id: Upstream request ID for tracking correlation_id: Correlation ID for cross-service tracking parent_job_execution_id: Parent job execution ID if nested user_id: User who created the job project_id: Project this job belongs to webhook_url: Optional webhook URL for job completion notifications - `Optional correlationId` The correlation ID for this job. Used for tracking the job across services. - `Optional jobName` - `PARSE_RAW_FILE_JOB("parse_raw_file_job")` - `Optional parameters` Generic parse job configuration for batch processing. This model contains the parsing configuration that applies to all files in a batch, but excludes file-specific fields like file_name, file_id, etc. Those file-specific fields are populated from DirectoryFile data when creating individual ParseJobRecordCreate instances for each file. The fields in this model should be generic settings that apply uniformly to all files being processed in the batch. - `Optional adaptiveLongTable` - `Optional aggressiveTableExtraction` - `Optional annotateLinks` - `Optional autoMode` - `Optional autoModeConfigurationJson` - `Optional autoModeTriggerOnImageInPage` - `Optional autoModeTriggerOnRegexpInPage` - `Optional autoModeTriggerOnTableInPage` - `Optional autoModeTriggerOnTextInPage` - `Optional azureOpenAIApiVersion` - `Optional azureOpenAIDeploymentName` - `Optional azureOpenAIEndpoint` - `Optional azureOpenAIKey` - `Optional bboxBottom` - `Optional bboxLeft` - `Optional bboxRight` - `Optional bboxTop` - `Optional boundingBox` - `Optional compactMarkdownTable` - `Optional complementalFormattingInstruction` - `Optional contentGuidelineInstruction` - `Optional continuousMode` - `Optional customMetadata` The custom metadata to attach to the documents. - `Optional disableImageExtraction` - `Optional disableOcr` - `Optional disableReconstruction` - `Optional doNotCache` - `Optional doNotUnrollColumns` - `Optional enableCostOptimizer` - `Optional extractCharts` - `Optional extractLayout` - `Optional extractPrintedPageNumber` - `Optional fastMode` - `Optional formattingInstruction` - `Optional gpt4oApiKey` - `Optional gpt4oMode` - `Optional guessXlsxSheetName` - `Optional hideFooters` - `Optional hideHeaders` - `Optional highResOcr` - `Optional htmlMakeAllElementsVisible` - `Optional htmlRemoveFixedElements` - `Optional htmlRemoveNavigationElements` - `Optional httpProxy` - `Optional ignoreDocumentElementsForLayoutDetection` - `Optional> imagesToSave` - `SCREENSHOT("screenshot")` - `EMBEDDED("embedded")` - `LAYOUT("layout")` - `Optional inlineImagesInMarkdown` - `Optional inputS3Path` - `Optional inputS3Region` The region for the input S3 bucket. - `Optional inputUrl` - `Optional internalIsScreenshotJob` - `Optional invalidateCache` - `Optional isFormattingInstruction` - `Optional jobTimeoutExtraTimePerPageInSeconds` - `Optional jobTimeoutInSeconds` - `Optional keepPageSeparatorWhenMergingTables` - `Optional lang` The language. - `Optional> languages` - `AF("af")` - `AZ("az")` - `BS("bs")` - `CS("cs")` - `CY("cy")` - `DA("da")` - `DE("de")` - `EN("en")` - `ES("es")` - `ET("et")` - `FR("fr")` - `GA("ga")` - `HR("hr")` - `HU("hu")` - `ID("id")` - `IS("is")` - `IT("it")` - `KU("ku")` - `LA("la")` - `LT("lt")` - `LV("lv")` - `MI("mi")` - `MS("ms")` - `MT("mt")` - `NL("nl")` - `NO("no")` - `OC("oc")` - `PI("pi")` - `PL("pl")` - `PT("pt")` - `RO("ro")` - `RS_LATIN("rs_latin")` - `SK("sk")` - `SL("sl")` - `SQ("sq")` - `SV("sv")` - `SW("sw")` - `TL("tl")` - `TR("tr")` - `UZ("uz")` - `VI("vi")` - `AR("ar")` - `FA("fa")` - `UG("ug")` - `UR("ur")` - `BN("bn")` - `AS("as")` - `MNI("mni")` - `RU("ru")` - `RS_CYRILLIC("rs_cyrillic")` - `BE("be")` - `BG("bg")` - `UK("uk")` - `MN("mn")` - `ABQ("abq")` - `ADY("ady")` - `KBD("kbd")` - `AVA("ava")` - `DAR("dar")` - `INH("inh")` - `CHE("che")` - `LBE("lbe")` - `LEZ("lez")` - `TAB("tab")` - `TJK("tjk")` - `HI("hi")` - `MR("mr")` - `NE("ne")` - `BH("bh")` - `MAI("mai")` - `ANG("ang")` - `BHO("bho")` - `MAH("mah")` - `SCK("sck")` - `NEW("new")` - `GOM("gom")` - `SA("sa")` - `BGC("bgc")` - `TH("th")` - `CH_SIM("ch_sim")` - `CH_TRA("ch_tra")` - `JA("ja")` - `KO("ko")` - `TA("ta")` - `TE("te")` - `KN("kn")` - `Optional layoutAware` - `Optional lineLevelBoundingBox` - `Optional markdownTableMultilineHeaderSeparator` - `Optional maxPages` - `Optional maxPagesEnforced` - `Optional mergeTablesAcrossPagesInMarkdown` - `Optional model` - `Optional outlinedTableExtraction` - `Optional outputPdfOfDocument` - `Optional outputS3PathPrefix` If specified, llamaParse will save the output to the specified path. All output file will use this 'prefix' should be a valid s3:// url - `Optional outputS3Region` The region for the output S3 bucket. - `Optional outputTablesAsHtml` - `Optional outputBucket` The output bucket. - `Optional pageErrorTolerance` - `Optional pageFooterPrefix` - `Optional pageFooterSuffix` - `Optional pageHeaderPrefix` - `Optional pageHeaderSuffix` - `Optional pagePrefix` - `Optional pageSeparator` - `Optional pageSuffix` - `Optional parseMode` Enum for representing the mode of parsing to be used. - `PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")` - `PARSE_PAGE_WITH_LLM("parse_page_with_llm")` - `PARSE_PAGE_WITH_LVM("parse_page_with_lvm")` - `PARSE_PAGE_WITH_AGENT("parse_page_with_agent")` - `PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")` - `PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")` - `PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")` - `PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")` - `Optional parsingInstruction` - `Optional pipelineId` The pipeline ID. - `Optional preciseBoundingBox` - `Optional premiumMode` - `Optional presentationOutOfBoundsContent` - `Optional presentationSkipEmbeddedData` - `Optional preserveLayoutAlignmentAcrossPages` - `Optional preserveVerySmallText` - `Optional preset` - `Optional priority` The priority for the request. This field may be ignored or overwritten depending on the organization tier. - `LOW("low")` - `MEDIUM("medium")` - `HIGH("high")` - `CRITICAL("critical")` - `Optional projectId` - `Optional removeHiddenText` - `Optional replaceFailedPageMode` Enum for representing the different available page error handling modes. - `RAW_TEXT("raw_text")` - `BLANK_PAGE("blank_page")` - `ERROR_MESSAGE("error_message")` - `Optional replaceFailedPageWithErrorMessagePrefix` - `Optional replaceFailedPageWithErrorMessageSuffix` - `Optional resourceInfo` The resource info about the file - `Optional saveImages` - `Optional skipDiagonalText` - `Optional specializedChartParsingAgentic` - `Optional specializedChartParsingEfficient` - `Optional specializedChartParsingPlus` - `Optional specializedImageParsing` - `Optional spreadsheetExtractSubTables` - `Optional spreadsheetForceFormulaComputation` - `Optional spreadsheetIncludeHiddenSheets` - `Optional strictModeBuggyFont` - `Optional strictModeImageExtraction` - `Optional strictModeImageOcr` - `Optional strictModeReconstruction` - `Optional structuredOutput` - `Optional structuredOutputJsonSchema` - `Optional structuredOutputJsonSchemaName` - `Optional systemPrompt` - `Optional systemPromptAppend` - `Optional takeScreenshot` - `Optional targetPages` - `Optional tier` - `Optional type` - `PARSE("parse")` - `Optional useVendorMultimodalModel` - `Optional userPrompt` - `Optional vendorMultimodalApiKey` - `Optional vendorMultimodalModelName` - `Optional version` - `Optional> webhookConfigurations` Outbound webhook endpoints to notify on job status changes - `Optional> webhookEvents` Events to subscribe to (e.g. 'parse.success', 'extract.error'). If null, all events are delivered. - `EXTRACT_PENDING("extract.pending")` - `EXTRACT_SUCCESS("extract.success")` - `EXTRACT_ERROR("extract.error")` - `EXTRACT_PARTIAL_SUCCESS("extract.partial_success")` - `EXTRACT_CANCELLED("extract.cancelled")` - `PARSE_PENDING("parse.pending")` - `PARSE_RUNNING("parse.running")` - `PARSE_SUCCESS("parse.success")` - `PARSE_ERROR("parse.error")` - `PARSE_PARTIAL_SUCCESS("parse.partial_success")` - `PARSE_CANCELLED("parse.cancelled")` - `CLASSIFY_PENDING("classify.pending")` - `CLASSIFY_RUNNING("classify.running")` - `CLASSIFY_SUCCESS("classify.success")` - `CLASSIFY_ERROR("classify.error")` - `CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")` - `CLASSIFY_CANCELLED("classify.cancelled")` - `SHEETS_PENDING("sheets.pending")` - `SHEETS_SUCCESS("sheets.success")` - `SHEETS_ERROR("sheets.error")` - `SHEETS_PARTIAL_SUCCESS("sheets.partial_success")` - `SHEETS_CANCELLED("sheets.cancelled")` - `UNMAPPED_EVENT("unmapped_event")` - `Optional webhookHeaders` Custom HTTP headers sent with each webhook request (e.g. auth tokens) - `Optional webhookOutputFormat` Response format sent to the webhook: 'string' (default) or 'json' - `Optional webhookUrl` URL to receive webhook POST notifications - `Optional webhookUrl` - `Optional parentJobExecutionId` The ID of the parent job execution. - `Optional partitions` The partitions for this execution. Used for determining where to save job output. - `Optional projectId` The ID of the project this job belongs to. - `Optional sessionId` The upstream request ID that created this job. Used for tracking the job across services. - `Optional userId` The ID of the user that created this job - `Optional webhookUrl` The URL that needs to be called at the end of the parsing job. - `class ClassifyJob:` A classify job. - `String id` Unique identifier - `String projectId` The ID of the project - `List rules` The rules to classify the files - `String description` Natural language description of what to classify. Be specific about the content characteristics that identify this document type. - `String type` The document type to assign when this rule matches (e.g., 'invoice', 'receipt', 'contract') - `StatusEnum status` The status of the classify job - `PENDING("PENDING")` - `SUCCESS("SUCCESS")` - `ERROR("ERROR")` - `PARTIAL_SUCCESS("PARTIAL_SUCCESS")` - `CANCELLED("CANCELLED")` - `String userId` The ID of the user - `Optional createdAt` Creation datetime - `Optional effectiveAt` - `Optional errorMessage` Error message for the latest job attempt, if any. - `Optional jobRecordId` The job record ID associated with this status, if any. - `Optional mode` The classification mode to use - `FAST("FAST")` - `MULTIMODAL("MULTIMODAL")` - `Optional parsingConfiguration` The configuration for the parsing job - `Optional lang` The language to parse the files in - `AF("af")` - `AZ("az")` - `BS("bs")` - `CS("cs")` - `CY("cy")` - `DA("da")` - `DE("de")` - `EN("en")` - `ES("es")` - `ET("et")` - `FR("fr")` - `GA("ga")` - `HR("hr")` - `HU("hu")` - `ID("id")` - `IS("is")` - `IT("it")` - `KU("ku")` - `LA("la")` - `LT("lt")` - `LV("lv")` - `MI("mi")` - `MS("ms")` - `MT("mt")` - `NL("nl")` - `NO("no")` - `OC("oc")` - `PI("pi")` - `PL("pl")` - `PT("pt")` - `RO("ro")` - `RS_LATIN("rs_latin")` - `SK("sk")` - `SL("sl")` - `SQ("sq")` - `SV("sv")` - `SW("sw")` - `TL("tl")` - `TR("tr")` - `UZ("uz")` - `VI("vi")` - `AR("ar")` - `FA("fa")` - `UG("ug")` - `UR("ur")` - `BN("bn")` - `AS("as")` - `MNI("mni")` - `RU("ru")` - `RS_CYRILLIC("rs_cyrillic")` - `BE("be")` - `BG("bg")` - `UK("uk")` - `MN("mn")` - `ABQ("abq")` - `ADY("ady")` - `KBD("kbd")` - `AVA("ava")` - `DAR("dar")` - `INH("inh")` - `CHE("che")` - `LBE("lbe")` - `LEZ("lez")` - `TAB("tab")` - `TJK("tjk")` - `HI("hi")` - `MR("mr")` - `NE("ne")` - `BH("bh")` - `MAI("mai")` - `ANG("ang")` - `BHO("bho")` - `MAH("mah")` - `SCK("sck")` - `NEW("new")` - `GOM("gom")` - `SA("sa")` - `BGC("bgc")` - `TH("th")` - `CH_SIM("ch_sim")` - `CH_TRA("ch_tra")` - `JA("ja")` - `KO("ko")` - `TA("ta")` - `TE("te")` - `KN("kn")` - `Optional maxPages` The maximum number of pages to parse - `Optional> targetPages` The pages to target for parsing (0-indexed, so first page is at 0) - `Optional updatedAt` Update datetime - `Optional continueAsNewThreshold` Maximum files to process per execution cycle in directory mode. Defaults to page_size. - `Optional directoryId` ID of the directory containing files to process - `Optional> itemIds` List of specific item IDs to process. Either this or directory_id must be provided. - `Optional pageSize` Number of files to process per batch when using directory mode ### Returns - `class BatchCreateResponse:` Response schema for a batch processing job. - `String id` Unique identifier for the batch job - `JobType jobType` Type of processing operation (parse or classify) - `PARSE("parse")` - `EXTRACT("extract")` - `CLASSIFY("classify")` - `String projectId` Project this job belongs to - `Status status` Current job status - `PENDING("pending")` - `RUNNING("running")` - `DISPATCHED("dispatched")` - `COMPLETED("completed")` - `FAILED("failed")` - `CANCELLED("cancelled")` - `long totalItems` Total number of items in the job - `Optional completedAt` Timestamp when job completed - `Optional createdAt` Creation datetime - `Optional directoryId` Directory being processed - `Optional effectiveAt` - `Optional errorMessage` Error message for the latest job attempt, if any. - `Optional failedItems` Number of items that failed processing - `Optional jobRecordId` The job record ID associated with this status, if any. - `Optional processedItems` Number of items processed so far - `Optional skippedItems` Number of items skipped (already processed or size limit) - `Optional startedAt` Timestamp when job processing started - `Optional updatedAt` Update datetime - `Optional workflowId` Async job tracking ID ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.beta.batch.BatchCreateParams; import com.llamacloud_prod.api.models.beta.batch.BatchCreateResponse; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); BatchCreateParams params = BatchCreateParams.builder() .jobConfig(BatchCreateParams.JobConfig.BatchParseJobRecordCreate.builder().build()) .build(); BatchCreateResponse batch = client.beta().batch().create(params); } } ``` #### Response ```json { "id": "bjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "job_type": "parse", "project_id": "proj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "status": "pending", "total_items": 0, "completed_at": "2019-12-27T18:11:19.117Z", "created_at": "2019-12-27T18:11:19.117Z", "directory_id": "dir-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee", "effective_at": "2019-12-27T18:11:19.117Z", "error_message": "error_message", "failed_items": 0, "job_record_id": "job_record_id", "processed_items": 0, "skipped_items": 0, "started_at": "2019-12-27T18:11:19.117Z", "updated_at": "2019-12-27T18:11:19.117Z", "workflow_id": "workflow_id" } ```