Pipelines

Search Pipelines

Deprecated

List<Pipeline> pipelines().list(, )

GET/api/v1/pipelines

Create Pipeline

Deprecated

Pipeline pipelines().create(, )

POST/api/v1/pipelines

Get Pipeline

Deprecated

Pipeline pipelines().get(, )

GET/api/v1/pipelines/{pipeline_id}

Update Existing Pipeline

Deprecated

Pipeline pipelines().update(, )

PUT/api/v1/pipelines/{pipeline_id}

Delete Pipeline

Deprecated

pipelines().delete(, )

DELETE/api/v1/pipelines/{pipeline_id}

Get Pipeline Status

Deprecated

ManagedIngestionStatusResponse pipelines().getStatus(, )

GET/api/v1/pipelines/{pipeline_id}/status

Upsert Pipeline

Deprecated

Pipeline pipelines().upsert(, )

PUT/api/v1/pipelines

Run Search

Deprecated

PipelineRetrieveResponse pipelines().retrieve(, )

POST/api/v1/pipelines/{pipeline_id}/retrieve

ModelsExpand Collapse

class AdvancedModeTransformConfig:

Optional<ChunkingConfig> chunkingConfig

Configuration for the chunking.

One of the following:

class NoneChunkingConfig:

Optional<Mode> mode

class CharacterChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

class TokenChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

Optional<String> separator

class SentenceChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

Optional<String> paragraphSeparator

Optional<String> separator

class SemanticChunkingConfig:

Optional<Long> breakpointPercentileThreshold

Optional<Long> bufferSize

Optional<Mode> mode

Optional<SegmentationConfig> segmentationConfig

Configuration for the segmentation.

One of the following:

class NoneSegmentationConfig:

Optional<Mode> mode

class PageSegmentationConfig:

Optional<Mode> mode

Optional<String> pageSeparator

class ElementSegmentationConfig:

Optional<Mode> mode

class AutoTransformConfig:

Optional<Long> chunkOverlap

Chunk overlap for the transformation.

Optional<Long> chunkSize

Chunk size for the transformation.

exclusiveMinimum0

Optional<Mode> mode

class AzureOpenAIEmbedding:

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

class AzureOpenAIEmbeddingConfig:

Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class BedrockEmbedding:

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0

Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

class BedrockEmbeddingConfig:

Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0

Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

class CohereEmbedding:

Optional<String> apiKey

The Cohere API key.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

class CohereEmbeddingConfig:

Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class DataSinkCreate:

Schema for creating a data sink.

Component component

Component that implements the data sink

One of the following:

class UnionMember0:

class CloudPineconeVectorStore:

Cloud Pinecone Vector Store.

This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.

Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion

String apiKey

The API key for authenticating with Pinecone

formatpassword

String indexName

Optional<String> className

Optional<InsertKwargs> insertKwargs

Optional<String> namespace

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

class CloudPostgresVectorStore:

String database

long embedDim

String host

String password

long port

String schemaName

String tableName

String user

Optional<String> className

Optional<PgVectorHnswSettings> hnswSettings

HNSW settings for PGVector.

Optional<DistanceMethod> distanceMethod

The distance method to use.

One of the following:

COSINE("cosine")

HAMMING("hamming")

IP("ip")

JACCARD("jaccard")

L1("l1")

L2("l2")

Optional<Long> efConstruction

The number of edges to use during the construction phase.

minimum1

Optional<Long> efSearch

The number of edges to use during the search phase.

minimum1

Optional<Long> m

The number of bi-directional links created for each new element.

minimum1

Optional<VectorType> vectorType

The type of vector to use.

One of the following:

BIT("bit")

HALF_VEC("half_vec")

SPARSE_VEC("sparse_vec")

VECTOR("vector")

Optional<Boolean> hybridSearch

Optional<Boolean> performSetup

Optional<Boolean> supportsNestedMetadataFilters

class CloudQdrantVectorStore:

Cloud Qdrant Vector Store.

This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.

Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client

String apiKey

String collectionName

String url

Optional<String> className

Optional<ClientKwargs> clientKwargs

Optional<Long> maxRetries

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

class CloudAzureAiSearchVectorStore:

Cloud Azure AI Search Vector Store.

String searchServiceApiKey

String searchServiceEndpoint

Optional<String> className

Optional<String> clientId

Optional<String> clientSecret

Optional<Long> embeddingDimension

Optional<FilterableMetadataFieldKeys> filterableMetadataFieldKeys

Optional<String> indexName

Optional<String> searchServiceApiVersion

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

Optional<String> tenantId

class CloudMongoDBAtlasVectorSearch:

Cloud MongoDB Atlas Vector Store.

This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.

Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index

String collectionName

String dbName

String mongoDBUri

Optional<String> className

Optional<Long> embeddingDimension

Optional<String> fulltextIndexName

Optional<Boolean> supportsNestedMetadataFilters

Optional<String> vectorIndexName

class CloudMilvusVectorStore:

Cloud Milvus Vector Store.

String uri

Optional<String> token

Optional<String> className

Optional<String> collectionName

Optional<Long> embeddingDimension

Optional<Boolean> supportsNestedMetadataFilters

class CloudAstraDbVectorStore:

Cloud AstraDB Vector Store.

This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.

Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, ‘default_keyspace’

String token

The Astra DB Application Token to use

formatpassword

String apiEndpoint

The Astra DB JSON API endpoint for your database

String collectionName

Collection name to use. If not existing, it will be created

long embeddingDimension

Length of the embedding vectors in use

Optional<String> className

Optional<String> keyspace

The keyspace to use. If not provided, ‘default_keyspace’

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

String name

The name of the data sink.

SinkType sinkType

One of the following:

ASTRA_DB("ASTRA_DB")

AZUREAI_SEARCH("AZUREAI_SEARCH")

MILVUS("MILVUS")

MONGODB_ATLAS("MONGODB_ATLAS")

PINECONE("PINECONE")

POSTGRES("POSTGRES")

QDRANT("QDRANT")

class GeminiEmbedding:

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

class GeminiEmbeddingConfig:

Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbedding:

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:

String

boolean

Optional<String> className

Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:

CLS("cls")

LAST("last")

MEAN("mean")

Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.

class HuggingFaceInferenceApiEmbeddingConfig:

Optional<HuggingFaceInferenceApiEmbedding> component

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:

String

boolean

Optional<String> className

Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:

CLS("cls")

LAST("last")

MEAN("mean")

Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

Optional<Type> type

Type of the embedding model.

class LlamaParseParameters:

Optional<Boolean> adaptiveLongTable

Optional<Boolean> aggressiveTableExtraction

Optional<Boolean> annotateLinks

Optional<Boolean> autoMode

Optional<String> autoModeConfigurationJson

Optional<Boolean> autoModeTriggerOnImageInPage

Optional<String> autoModeTriggerOnRegexpInPage

Optional<Boolean> autoModeTriggerOnTableInPage

Optional<String> autoModeTriggerOnTextInPage

Optional<String> azureOpenAIApiVersion

Optional<String> azureOpenAIDeploymentName

Optional<String> azureOpenAIEndpoint

Optional<String> azureOpenAIKey

Optional<Double> bboxBottom

Optional<Double> bboxLeft

Optional<Double> bboxRight

Optional<Double> bboxTop

Optional<String> boundingBox

Optional<Boolean> compactMarkdownTable

Optional<String> complementalFormattingInstruction

Optional<String> confidenceScoreEffort

Optional<String> contentGuidelineInstruction

Optional<Boolean> continuousMode

Optional<Boolean> disableImageExtraction

Optional<Boolean> disableOcr

Optional<Boolean> disableReconstruction

Optional<Boolean> doNotCache

Optional<Boolean> doNotUnrollColumns

Optional<Boolean> enableCostOptimizer

Optional<Boolean> extractCharts

Optional<Boolean> extractLayout

Optional<Boolean> extractPrintedPageNumber

Optional<Boolean> fastMode

Optional<String> formattingInstruction

Optional<String> gpt4oApiKey

Optional<Boolean> gpt4oMode

Optional<Boolean> guessXlsxSheetName

Optional<Boolean> hideFooters

Optional<Boolean> hideHeaders

Optional<Boolean> highResOcr

Optional<Boolean> htmlMakeAllElementsVisible

Optional<Boolean> htmlRemoveFixedElements

Optional<Boolean> htmlRemoveNavigationElements

Optional<String> httpProxy

Optional<Boolean> ignoreDocumentElementsForLayoutDetection

Optional<List<ImagesToSave>> imagesToSave

One of the following:

EMBEDDED("embedded")

LAYOUT("layout")

SCREENSHOT("screenshot")

Optional<Boolean> inlineImagesInMarkdown

Optional<String> inputS3Path

Optional<String> inputS3Region

Optional<String> inputUrl

Optional<Boolean> internalIsScreenshotJob

Optional<Boolean> invalidateCache

Optional<Boolean> isFormattingInstruction

Optional<Double> jobTimeoutExtraTimePerPageInSeconds

Optional<Double> jobTimeoutInSeconds

Optional<Boolean> keepPageSeparatorWhenMergingTables

Optional<List<ParsingLanguages>> languages

One of the following:

ABQ("abq")

ADY("ady")

AF("af")

ANG("ang")

AR("ar")

AS("as")

AVA("ava")

AZ("az")

BE("be")

BG("bg")

BGC("bgc")

BH("bh")

BHO("bho")

BN("bn")

BS("bs")

CH_SIM("ch_sim")

CH_TRA("ch_tra")

CHE("che")

CS("cs")

CY("cy")

DA("da")

DAR("dar")

DE("de")

EN("en")

ES("es")

ET("et")

FA("fa")

FR("fr")

GA("ga")

GOM("gom")

HI("hi")

HR("hr")

HU("hu")

ID("id")

INH("inh")

IS("is")

IT("it")

JA("ja")

KBD("kbd")

KN("kn")

KO("ko")

KU("ku")

LA("la")

LBE("lbe")

LEZ("lez")

LT("lt")

LV("lv")

MAH("mah")

MAI("mai")

MI("mi")

MN("mn")

MNI("mni")

MR("mr")

MS("ms")

MT("mt")

NE("ne")

NEW("new")

NL("nl")

NO("no")

OC("oc")

PI("pi")

PL("pl")

PT("pt")

RO("ro")

RS_CYRILLIC("rs_cyrillic")

RS_LATIN("rs_latin")

RU("ru")

SA("sa")

SCK("sck")

SK("sk")

SL("sl")

SQ("sq")

SV("sv")

SW("sw")

TA("ta")

TAB("tab")

TE("te")

TH("th")

TJK("tjk")

TL("tl")

TR("tr")

UG("ug")

UK("uk")

UR("ur")

UZ("uz")

VI("vi")

Optional<Boolean> layoutAware

Optional<Boolean> lineLevelBoundingBox

Optional<String> markdownTableMultilineHeaderSeparator

Optional<Long> maxPages

Optional<Long> maxPagesEnforced

Optional<Boolean> mergeTablesAcrossPagesInMarkdown

Optional<String> model

Optional<Boolean> outlinedTableExtraction

Optional<Boolean> outputPdfOfDocument

Optional<String> outputS3PathPrefix

Optional<String> outputS3Region

Optional<Boolean> outputTablesAsHtml

Optional<Double> pageErrorTolerance

Optional<String> pageFooterPrefix

Optional<String> pageFooterSuffix

Optional<String> pageHeaderPrefix

Optional<String> pageHeaderSuffix

Optional<String> pagePrefix

Optional<String> pageSeparator

Optional<String> pageSuffix

Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:

PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")

PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")

PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")

PARSE_PAGE_WITH_AGENT("parse_page_with_agent")

PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")

PARSE_PAGE_WITH_LLM("parse_page_with_llm")

PARSE_PAGE_WITH_LVM("parse_page_with_lvm")

PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")

Optional<String> parsingInstruction

Optional<Boolean> preciseBoundingBox

Optional<Boolean> premiumMode

Optional<Boolean> presentationOutOfBoundsContent

Optional<Boolean> presentationSkipEmbeddedData

Optional<Boolean> preserveLayoutAlignmentAcrossPages

Optional<Boolean> preserveVerySmallText

Optional<String> preset

Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:

CRITICAL("critical")

HIGH("high")

LOW("low")

MEDIUM("medium")

Optional<String> projectId

Optional<Boolean> removeHiddenText

Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:

BLANK_PAGE("blank_page")

ERROR_MESSAGE("error_message")

RAW_TEXT("raw_text")

Optional<String> replaceFailedPageWithErrorMessagePrefix

Optional<String> replaceFailedPageWithErrorMessageSuffix

Optional<Boolean> saveImages

Optional<Boolean> skipDiagonalText

Optional<Boolean> specializedChartParsingAgentic

Optional<Boolean> specializedChartParsingEfficient

Optional<Boolean> specializedChartParsingPlus

Optional<Boolean> specializedImageParsing

Optional<Boolean> spreadsheetExtractSubTables

Optional<Boolean> spreadsheetForceFormulaComputation

Optional<Boolean> spreadsheetIncludeHiddenSheets

Optional<Boolean> strictModeBuggyFont

Optional<Boolean> strictModeImageExtraction

Optional<Boolean> strictModeImageOcr

Optional<Boolean> strictModeReconstruction

Optional<Boolean> structuredOutput

Optional<String> structuredOutputJsonSchema

Optional<String> structuredOutputJsonSchemaName

Optional<String> systemPrompt

Optional<String> systemPromptAppend

Optional<Boolean> takeScreenshot

Optional<String> targetPages

Optional<String> tier

Optional<Boolean> useVendorMultimodalModel

Optional<String> userPrompt

Optional<String> vendorMultimodalApiKey

Optional<String> vendorMultimodalModelName

Optional<String> version

Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:

CLASSIFY_CANCELLED("classify.cancelled")

CLASSIFY_ERROR("classify.error")

CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")

CLASSIFY_PENDING("classify.pending")

CLASSIFY_RUNNING("classify.running")

CLASSIFY_SUCCESS("classify.success")

EXTRACT_CANCELLED("extract.cancelled")

EXTRACT_ERROR("extract.error")

EXTRACT_PARTIAL_SUCCESS("extract.partial_success")

EXTRACT_PENDING("extract.pending")

EXTRACT_SUCCESS("extract.success")

PARSE_CANCELLED("parse.cancelled")

PARSE_ERROR("parse.error")

PARSE_PARTIAL_SUCCESS("parse.partial_success")

PARSE_PENDING("parse.pending")

PARSE_RUNNING("parse.running")

PARSE_SUCCESS("parse.success")

SHEETS_CANCELLED("sheets.cancelled")

SHEETS_ERROR("sheets.error")

SHEETS_PARTIAL_SUCCESS("sheets.partial_success")

SHEETS_PENDING("sheets.pending")

SHEETS_SUCCESS("sheets.success")

SPLIT_CANCELLED("split.cancelled")

SPLIT_ERROR("split.error")

SPLIT_PENDING("split.pending")

SPLIT_PROCESSING("split.processing")

SPLIT_SUCCESS("split.success")

UNMAPPED_EVENT("unmapped_event")

Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookSigningSecret

Shared signing secret used to sign webhook deliveries. When set, each request includes an HMAC-SHA256 signature of the request body in the ‘LC-Signature’ header (value ‘sha256=’). Recompute the HMAC over the raw request body with this secret to verify the delivery is authentic.

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl

class LlmParameters:

Optional<String> className

Optional<ModelName> modelName

The name of the model to use for LLM completions.

One of the following:

AZURE_OPENAI_GPT_4_O("AZURE_OPENAI_GPT_4O")

AZURE_OPENAI_GPT_4_O_MINI("AZURE_OPENAI_GPT_4O_MINI")

AZURE_OPENAI_GPT_4_1("AZURE_OPENAI_GPT_4_1")

AZURE_OPENAI_GPT_4_1_MINI("AZURE_OPENAI_GPT_4_1_MINI")

AZURE_OPENAI_GPT_4_1_NANO("AZURE_OPENAI_GPT_4_1_NANO")

BEDROCK_CLAUDE_3_5_SONNET_V1("BEDROCK_CLAUDE_3_5_SONNET_V1")

BEDROCK_CLAUDE_3_5_SONNET_V2("BEDROCK_CLAUDE_3_5_SONNET_V2")

CLAUDE_4_5_SONNET("CLAUDE_4_5_SONNET")

GPT_4_O("GPT_4O")

GPT_4_O_MINI("GPT_4O_MINI")

GPT_4_1("GPT_4_1")

GPT_4_1_MINI("GPT_4_1_MINI")

GPT_4_1_NANO("GPT_4_1_NANO")

Optional<String> systemPrompt

The system prompt to use for the completion.

maxLength3000

Optional<Double> temperature

The temperature value for the model.

Optional<Boolean> useChainOfThoughtReasoning

Whether to use chain of thought reasoning.

Optional<Boolean> useCitation

Whether to show citations in the response.

class ManagedIngestionStatusResponse:

Status status

Status of the ingestion.

One of the following:

CANCELLED("CANCELLED")

ERROR("ERROR")

IN_PROGRESS("IN_PROGRESS")

NOT_STARTED("NOT_STARTED")

PARTIAL_SUCCESS("PARTIAL_SUCCESS")

SUCCESS("SUCCESS")

Optional<LocalDateTime> deploymentDate

Date of the deployment.

formatdate-time

Optional<LocalDateTime> effectiveAt

When the status is effective

formatdate-time

Optional<List<Error>> error

List of errors that occurred during ingestion.

String jobId

ID of the job that failed.

formatuuid

String message

List of errors that occurred during ingestion.

Step step

Name of the job that failed.

One of the following:

DATA_SOURCE("DATA_SOURCE")

FILE_UPDATER("FILE_UPDATER")

INGESTION("INGESTION")

MANAGED_INGESTION("MANAGED_INGESTION")

METADATA_UPDATE("METADATA_UPDATE")

PARSE("PARSE")

TRANSFORM("TRANSFORM")

Optional<String> jobId

ID of the latest job.

formatuuid

enum MessageRole:

Message role.

ASSISTANT("assistant")

CHATBOT("chatbot")

DEVELOPER("developer")

FUNCTION("function")

MODEL("model")

SYSTEM("system")

TOOL("tool")

USER("user")

class MetadataFilters:

Metadata filters for vector stores.

List<Filter> filters

One of the following:

class MetadataFilter:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

String key

Optional<Value> value

One of the following:

double

String

List<String>

List<double>

List<long>

Optional<Operator> operator

Vector store filter operator.

One of the following:

NOT_EQUALS("!=")

LESS("<")

LESS_OR_EQUALS("<=")

EQUALS("==")

GREATER(">")

GREATER_OR_EQUALS(">=")

ALL("all")

ANY("any")

CONTAINS("contains")

IN("in")

IS_EMPTY("is_empty")

NIN("nin")

TEXT_MATCH("text_match")

TEXT_MATCH_INSENSITIVE("text_match_insensitive")

MetadataFilters

Optional<Condition> condition

Vector store filter conditions to combine different filters.

One of the following:

AND("and")

NOT("not")

OR("or")

class OpenAIEmbedding:

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

class OpenAIEmbeddingConfig:

Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class PageFigureNodeWithScore:

Page figure metadata with score

Node node

double confidence

The confidence of the figure

maximum1

minimum0

String figureName

The name of the figure

long figureSize

The size of the figure in bytes

minimum0

String fileId

The ID of the file that the figure was taken from

formatuuid

long pageIndex

The index of the page for which the figure is taken (0-indexed)

minimum0

Optional<Boolean> isLikelyNoise

Whether the figure is likely to be noise

Optional<Metadata> metadata

Metadata for the figure

double score

The score of the figure node

Optional<String> className

class PageScreenshotNodeWithScore:

Page screenshot metadata with score

Node node

String fileId

The ID of the file that the page screenshot was taken from

formatuuid

long imageSize

The size of the image in bytes

minimum0

long pageIndex

The index of the page for which the screenshot is taken (0-indexed)

minimum0

Optional<Metadata> metadata

Metadata for the screenshot

double score

The score of the screenshot node

Optional<String> className

class Pipeline:

Schema for a pipeline.

String id

Unique identifier

formatuuid

EmbeddingConfig embeddingConfig

One of the following:

class AzureOpenAIEmbeddingConfig:

Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class BedrockEmbeddingConfig:

Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0

Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

class CohereEmbeddingConfig:

Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class GeminiEmbeddingConfig:

Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbeddingConfig:

Optional<HuggingFaceInferenceApiEmbedding> component

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:

String

boolean

Optional<String> className

Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:

CLS("cls")

LAST("last")

MEAN("mean")

Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

Optional<Type> type

Type of the embedding model.

class ManagedOpenAIEmbedding:

Optional<Component> component

Configuration for the Managed OpenAI embedding model.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<ModelName> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

class OpenAIEmbeddingConfig:

Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class VertexAiEmbeddingConfig:

Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:

CLASSIFICATION("classification")

CLUSTERING("clustering")

DEFAULT("default")

RETRIEVAL("retrieval")

SIMILARITY("similarity")

Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

String name

String projectId

Optional<ConfigHash> configHash

Hashes for the configuration of a pipeline.

Optional<String> embeddingConfigHash

Hash of the embedding config.

Optional<String> parsingConfigHash

Hash of the llama parse parameters.

Optional<String> transformConfigHash

Hash of the transform config.

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<DataSink> dataSink

Schema for a data sink.

String id

Unique identifier

formatuuid

Component component

Component that implements the data sink

One of the following:

class UnionMember0:

class CloudPineconeVectorStore:

Cloud Pinecone Vector Store.

This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.

String apiKey

The API key for authenticating with Pinecone

formatpassword

String indexName

Optional<String> className

Optional<InsertKwargs> insertKwargs

Optional<String> namespace

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

class CloudPostgresVectorStore:

String database

long embedDim

String host

String password

long port

String schemaName

String tableName

String user

Optional<String> className

Optional<PgVectorHnswSettings> hnswSettings

HNSW settings for PGVector.

Optional<DistanceMethod> distanceMethod

The distance method to use.

One of the following:

COSINE("cosine")

HAMMING("hamming")

IP("ip")

JACCARD("jaccard")

L1("l1")

L2("l2")

Optional<Long> efConstruction

The number of edges to use during the construction phase.

minimum1

Optional<Long> efSearch

The number of edges to use during the search phase.

minimum1

Optional<Long> m

The number of bi-directional links created for each new element.

minimum1

Optional<VectorType> vectorType

The type of vector to use.

One of the following:

BIT("bit")

HALF_VEC("half_vec")

SPARSE_VEC("sparse_vec")

VECTOR("vector")

Optional<Boolean> hybridSearch

Optional<Boolean> performSetup

Optional<Boolean> supportsNestedMetadataFilters

class CloudQdrantVectorStore:

Cloud Qdrant Vector Store.

This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.

String apiKey

String collectionName

String url

Optional<String> className

Optional<ClientKwargs> clientKwargs

Optional<Long> maxRetries

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

class CloudAzureAiSearchVectorStore:

Cloud Azure AI Search Vector Store.

String searchServiceApiKey

String searchServiceEndpoint

Optional<String> className

Optional<String> clientId

Optional<String> clientSecret

Optional<Long> embeddingDimension

Optional<FilterableMetadataFieldKeys> filterableMetadataFieldKeys

Optional<String> indexName

Optional<String> searchServiceApiVersion

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

Optional<String> tenantId

class CloudMongoDBAtlasVectorSearch:

Cloud MongoDB Atlas Vector Store.

This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.

String collectionName

String dbName

String mongoDBUri

Optional<String> className

Optional<Long> embeddingDimension

Optional<String> fulltextIndexName

Optional<Boolean> supportsNestedMetadataFilters

Optional<String> vectorIndexName

class CloudMilvusVectorStore:

Cloud Milvus Vector Store.

String uri

Optional<String> token

Optional<String> className

Optional<String> collectionName

Optional<Long> embeddingDimension

Optional<Boolean> supportsNestedMetadataFilters

class CloudAstraDbVectorStore:

Cloud AstraDB Vector Store.

This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.

String token

The Astra DB Application Token to use

formatpassword

String apiEndpoint

The Astra DB JSON API endpoint for your database

String collectionName

Collection name to use. If not existing, it will be created

long embeddingDimension

Length of the embedding vectors in use

Optional<String> className

Optional<String> keyspace

The keyspace to use. If not provided, ‘default_keyspace’

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

String name

The name of the data sink.

String projectId

SinkType sinkType

One of the following:

ASTRA_DB("ASTRA_DB")

AZUREAI_SEARCH("AZUREAI_SEARCH")

MILVUS("MILVUS")

MONGODB_ATLAS("MONGODB_ATLAS")

PINECONE("PINECONE")

POSTGRES("POSTGRES")

QDRANT("QDRANT")

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

Optional<EmbeddingModelConfig> embeddingModelConfig

Schema for an embedding model config.

String id

Unique identifier

formatuuid

EmbeddingConfig embeddingConfig

The embedding configuration for the embedding model config.

One of the following:

class AzureOpenAIEmbeddingConfig:

Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class BedrockEmbeddingConfig:

Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0

Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

class CohereEmbeddingConfig:

Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class GeminiEmbeddingConfig:

Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbeddingConfig:

Optional<HuggingFaceInferenceApiEmbedding> component

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:

String

boolean

Optional<String> className

Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:

CLS("cls")

LAST("last")

MEAN("mean")

Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

Optional<Type> type

Type of the embedding model.

class OpenAIEmbeddingConfig:

Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class VertexAiEmbeddingConfig:

Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:

CLASSIFICATION("classification")

CLUSTERING("clustering")

DEFAULT("default")

RETRIEVAL("retrieval")

SIMILARITY("similarity")

Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

String name

The name of the embedding model config.

String projectId

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

Optional<String> embeddingModelConfigId

The ID of the EmbeddingModelConfig this pipeline is using.

formatuuid

Optional<LlamaParseParameters> llamaParseParameters

Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.

Optional<Boolean> adaptiveLongTable

Optional<Boolean> aggressiveTableExtraction

Optional<Boolean> annotateLinks

Optional<Boolean> autoMode

Optional<String> autoModeConfigurationJson

Optional<Boolean> autoModeTriggerOnImageInPage

Optional<String> autoModeTriggerOnRegexpInPage

Optional<Boolean> autoModeTriggerOnTableInPage

Optional<String> autoModeTriggerOnTextInPage

Optional<String> azureOpenAIApiVersion

Optional<String> azureOpenAIDeploymentName

Optional<String> azureOpenAIEndpoint

Optional<String> azureOpenAIKey

Optional<Double> bboxBottom

Optional<Double> bboxLeft

Optional<Double> bboxRight

Optional<Double> bboxTop

Optional<String> boundingBox

Optional<Boolean> compactMarkdownTable

Optional<String> complementalFormattingInstruction

Optional<String> confidenceScoreEffort

Optional<String> contentGuidelineInstruction

Optional<Boolean> continuousMode

Optional<Boolean> disableImageExtraction

Optional<Boolean> disableOcr

Optional<Boolean> disableReconstruction

Optional<Boolean> doNotCache

Optional<Boolean> doNotUnrollColumns

Optional<Boolean> enableCostOptimizer

Optional<Boolean> extractCharts

Optional<Boolean> extractLayout

Optional<Boolean> extractPrintedPageNumber

Optional<Boolean> fastMode

Optional<String> formattingInstruction

Optional<String> gpt4oApiKey

Optional<Boolean> gpt4oMode

Optional<Boolean> guessXlsxSheetName

Optional<Boolean> hideFooters

Optional<Boolean> hideHeaders

Optional<Boolean> highResOcr

Optional<Boolean> htmlMakeAllElementsVisible

Optional<Boolean> htmlRemoveFixedElements

Optional<Boolean> htmlRemoveNavigationElements

Optional<String> httpProxy

Optional<Boolean> ignoreDocumentElementsForLayoutDetection

Optional<List<ImagesToSave>> imagesToSave

One of the following:

EMBEDDED("embedded")

LAYOUT("layout")

SCREENSHOT("screenshot")

Optional<Boolean> inlineImagesInMarkdown

Optional<String> inputS3Path

Optional<String> inputS3Region

Optional<String> inputUrl

Optional<Boolean> internalIsScreenshotJob

Optional<Boolean> invalidateCache

Optional<Boolean> isFormattingInstruction

Optional<Double> jobTimeoutExtraTimePerPageInSeconds

Optional<Double> jobTimeoutInSeconds

Optional<Boolean> keepPageSeparatorWhenMergingTables

Optional<List<ParsingLanguages>> languages

One of the following:

ABQ("abq")

ADY("ady")

AF("af")

ANG("ang")

AR("ar")

AS("as")

AVA("ava")

AZ("az")

BE("be")

BG("bg")

BGC("bgc")

BH("bh")

BHO("bho")

BN("bn")

BS("bs")

CH_SIM("ch_sim")

CH_TRA("ch_tra")

CHE("che")

CS("cs")

CY("cy")

DA("da")

DAR("dar")

DE("de")

EN("en")

ES("es")

ET("et")

FA("fa")

FR("fr")

GA("ga")

GOM("gom")

HI("hi")

HR("hr")

HU("hu")

ID("id")

INH("inh")

IS("is")

IT("it")

JA("ja")

KBD("kbd")

KN("kn")

KO("ko")

KU("ku")

LA("la")

LBE("lbe")

LEZ("lez")

LT("lt")

LV("lv")

MAH("mah")

MAI("mai")

MI("mi")

MN("mn")

MNI("mni")

MR("mr")

MS("ms")

MT("mt")

NE("ne")

NEW("new")

NL("nl")

NO("no")

OC("oc")

PI("pi")

PL("pl")

PT("pt")

RO("ro")

RS_CYRILLIC("rs_cyrillic")

RS_LATIN("rs_latin")

RU("ru")

SA("sa")

SCK("sck")

SK("sk")

SL("sl")

SQ("sq")

SV("sv")

SW("sw")

TA("ta")

TAB("tab")

TE("te")

TH("th")

TJK("tjk")

TL("tl")

TR("tr")

UG("ug")

UK("uk")

UR("ur")

UZ("uz")

VI("vi")

Optional<Boolean> layoutAware

Optional<Boolean> lineLevelBoundingBox

Optional<String> markdownTableMultilineHeaderSeparator

Optional<Long> maxPages

Optional<Long> maxPagesEnforced

Optional<Boolean> mergeTablesAcrossPagesInMarkdown

Optional<String> model

Optional<Boolean> outlinedTableExtraction

Optional<Boolean> outputPdfOfDocument

Optional<String> outputS3PathPrefix

Optional<String> outputS3Region

Optional<Boolean> outputTablesAsHtml

Optional<Double> pageErrorTolerance

Optional<String> pageFooterPrefix

Optional<String> pageFooterSuffix

Optional<String> pageHeaderPrefix

Optional<String> pageHeaderSuffix

Optional<String> pagePrefix

Optional<String> pageSeparator

Optional<String> pageSuffix

Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:

PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")

PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")

PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")

PARSE_PAGE_WITH_AGENT("parse_page_with_agent")

PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")

PARSE_PAGE_WITH_LLM("parse_page_with_llm")

PARSE_PAGE_WITH_LVM("parse_page_with_lvm")

PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")

Optional<String> parsingInstruction

Optional<Boolean> preciseBoundingBox

Optional<Boolean> premiumMode

Optional<Boolean> presentationOutOfBoundsContent

Optional<Boolean> presentationSkipEmbeddedData

Optional<Boolean> preserveLayoutAlignmentAcrossPages

Optional<Boolean> preserveVerySmallText

Optional<String> preset

Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:

CRITICAL("critical")

HIGH("high")

LOW("low")

MEDIUM("medium")

Optional<String> projectId

Optional<Boolean> removeHiddenText

Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:

BLANK_PAGE("blank_page")

ERROR_MESSAGE("error_message")

RAW_TEXT("raw_text")

Optional<String> replaceFailedPageWithErrorMessagePrefix

Optional<String> replaceFailedPageWithErrorMessageSuffix

Optional<Boolean> saveImages

Optional<Boolean> skipDiagonalText

Optional<Boolean> specializedChartParsingAgentic

Optional<Boolean> specializedChartParsingEfficient

Optional<Boolean> specializedChartParsingPlus

Optional<Boolean> specializedImageParsing

Optional<Boolean> spreadsheetExtractSubTables

Optional<Boolean> spreadsheetForceFormulaComputation

Optional<Boolean> spreadsheetIncludeHiddenSheets

Optional<Boolean> strictModeBuggyFont

Optional<Boolean> strictModeImageExtraction

Optional<Boolean> strictModeImageOcr

Optional<Boolean> strictModeReconstruction

Optional<Boolean> structuredOutput

Optional<String> structuredOutputJsonSchema

Optional<String> structuredOutputJsonSchemaName

Optional<String> systemPrompt

Optional<String> systemPromptAppend

Optional<Boolean> takeScreenshot

Optional<String> targetPages

Optional<String> tier

Optional<Boolean> useVendorMultimodalModel

Optional<String> userPrompt

Optional<String> vendorMultimodalApiKey

Optional<String> vendorMultimodalModelName

Optional<String> version

Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:

CLASSIFY_CANCELLED("classify.cancelled")

CLASSIFY_ERROR("classify.error")

CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")

CLASSIFY_PENDING("classify.pending")

CLASSIFY_RUNNING("classify.running")

CLASSIFY_SUCCESS("classify.success")

EXTRACT_CANCELLED("extract.cancelled")

EXTRACT_ERROR("extract.error")

EXTRACT_PARTIAL_SUCCESS("extract.partial_success")

EXTRACT_PENDING("extract.pending")

EXTRACT_SUCCESS("extract.success")

PARSE_CANCELLED("parse.cancelled")

PARSE_ERROR("parse.error")

PARSE_PARTIAL_SUCCESS("parse.partial_success")

PARSE_PENDING("parse.pending")

PARSE_RUNNING("parse.running")

PARSE_SUCCESS("parse.success")

SHEETS_CANCELLED("sheets.cancelled")

SHEETS_ERROR("sheets.error")

SHEETS_PARTIAL_SUCCESS("sheets.partial_success")

SHEETS_PENDING("sheets.pending")

SHEETS_SUCCESS("sheets.success")

SPLIT_CANCELLED("split.cancelled")

SPLIT_ERROR("split.error")

SPLIT_PENDING("split.pending")

SPLIT_PROCESSING("split.processing")

SPLIT_SUCCESS("split.success")

UNMAPPED_EVENT("unmapped_event")

Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookSigningSecret

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl

Optional<String> managedPipelineId

The ID of the ManagedPipeline this playground pipeline is linked to.

formatuuid

Optional<PipelineMetadataConfig> metadataConfig

Metadata configuration for the pipeline.

Optional<List<String>> excludedEmbedMetadataKeys

List of metadata keys to exclude from embeddings

Optional<List<String>> excludedLlmMetadataKeys

List of metadata keys to exclude from LLM during retrieval

Optional<PipelineType> pipelineType

Type of pipeline. Either PLAYGROUND or MANAGED.

One of the following:

MANAGED("MANAGED")

PLAYGROUND("PLAYGROUND")

Optional<PresetRetrievalParams> presetRetrievalParameters

Preset retrieval parameters for the pipeline.

Optional<Double> alpha

Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.

maximum1

minimum0

Optional<String> className

Optional<Double> denseSimilarityCutoff

Minimum similarity score wrt query for retrieval

maximum1

minimum0

Optional<Long> denseSimilarityTopK

Number of nodes for dense retrieval.

maximum100

minimum1

Optional<Boolean> enableReranking

Enable reranking for retrieval

Optional<Long> filesTopK

Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).

maximum5

minimum1

Optional<Long> rerankTopN

Number of reranked nodes for returning.

maximum100

minimum1

Optional<RetrievalMode> retrievalMode

The retrieval mode for the query.

One of the following:

AUTO_ROUTED("auto_routed")

CHUNKS("chunks")

FILES_VIA_CONTENT("files_via_content")

FILES_VIA_METADATA("files_via_metadata")

DeprecatedOptional<Boolean> retrieveImageNodes

Whether to retrieve image nodes.

Optional<Boolean> retrievePageFigureNodes

Whether to retrieve page figure nodes.

Optional<Boolean> retrievePageScreenshotNodes

Whether to retrieve page screenshot nodes.

Metadata filters for vector stores.

One of the following:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

One of the following:

Vector store filter operator.

One of the following:

MetadataFilters

Vector store filter conditions to combine different filters.

One of the following:

Optional<SearchFiltersInferenceSchema> searchFiltersInferenceSchema

JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<Long> sparseSimilarityTopK

Number of nodes for sparse retrieval.

maximum100

minimum1

Optional<SparseModelConfig> sparseModelConfig

Configuration for sparse embedding models used in hybrid search.

This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.

Optional<String> className

Optional<ModelType> modelType

The sparse model type to use. ‘bm25’ uses Qdrant’s FastEmbed BM25 model (default for new pipelines), ‘splade’ uses HuggingFace Splade model, ‘auto’ selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).

One of the following:

AUTO("auto")

BM25("bm25")

SPLADE("splade")

Optional<Status> status

Status of the pipeline.

One of the following:

CREATED("CREATED")

DELETING("DELETING")

Optional<TransformConfig> transformConfig

Configuration for the transformation.

One of the following:

class AutoTransformConfig:

Optional<Long> chunkOverlap

Chunk overlap for the transformation.

Optional<Long> chunkSize

Chunk size for the transformation.

exclusiveMinimum0

Optional<Mode> mode

class AdvancedModeTransformConfig:

Optional<ChunkingConfig> chunkingConfig

Configuration for the chunking.

One of the following:

class NoneChunkingConfig:

Optional<Mode> mode

class CharacterChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

class TokenChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

Optional<String> separator

class SentenceChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

Optional<String> paragraphSeparator

Optional<String> separator

class SemanticChunkingConfig:

Optional<Long> breakpointPercentileThreshold

Optional<Long> bufferSize

Optional<Mode> mode

Optional<SegmentationConfig> segmentationConfig

Configuration for the segmentation.

One of the following:

class NoneSegmentationConfig:

Optional<Mode> mode

class PageSegmentationConfig:

Optional<Mode> mode

Optional<String> pageSeparator

class ElementSegmentationConfig:

Optional<Mode> mode

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

class PipelineCreate:

Schema for creating a pipeline.

String name

Optional<DataSinkCreate> dataSink

Schema for creating a data sink.

Component component

Component that implements the data sink

One of the following:

class UnionMember0:

class CloudPineconeVectorStore:

Cloud Pinecone Vector Store.

This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.

String apiKey

The API key for authenticating with Pinecone

formatpassword

String indexName

Optional<String> className

Optional<InsertKwargs> insertKwargs

Optional<String> namespace

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

class CloudPostgresVectorStore:

String database

long embedDim

String host

String password

long port

String schemaName

String tableName

String user

Optional<String> className

Optional<PgVectorHnswSettings> hnswSettings

HNSW settings for PGVector.

Optional<DistanceMethod> distanceMethod

The distance method to use.

One of the following:

COSINE("cosine")

HAMMING("hamming")

IP("ip")

JACCARD("jaccard")

L1("l1")

L2("l2")

Optional<Long> efConstruction

The number of edges to use during the construction phase.

minimum1

Optional<Long> efSearch

The number of edges to use during the search phase.

minimum1

Optional<Long> m

The number of bi-directional links created for each new element.

minimum1

Optional<VectorType> vectorType

The type of vector to use.

One of the following:

BIT("bit")

HALF_VEC("half_vec")

SPARSE_VEC("sparse_vec")

VECTOR("vector")

Optional<Boolean> hybridSearch

Optional<Boolean> performSetup

Optional<Boolean> supportsNestedMetadataFilters

class CloudQdrantVectorStore:

Cloud Qdrant Vector Store.

This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.

String apiKey

String collectionName

String url

Optional<String> className

Optional<ClientKwargs> clientKwargs

Optional<Long> maxRetries

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

class CloudAzureAiSearchVectorStore:

Cloud Azure AI Search Vector Store.

String searchServiceApiKey

String searchServiceEndpoint

Optional<String> className

Optional<String> clientId

Optional<String> clientSecret

Optional<Long> embeddingDimension

Optional<FilterableMetadataFieldKeys> filterableMetadataFieldKeys

Optional<String> indexName

Optional<String> searchServiceApiVersion

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

Optional<String> tenantId

class CloudMongoDBAtlasVectorSearch:

Cloud MongoDB Atlas Vector Store.

This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.

String collectionName

String dbName

String mongoDBUri

Optional<String> className

Optional<Long> embeddingDimension

Optional<String> fulltextIndexName

Optional<Boolean> supportsNestedMetadataFilters

Optional<String> vectorIndexName

class CloudMilvusVectorStore:

Cloud Milvus Vector Store.

String uri

Optional<String> token

Optional<String> className

Optional<String> collectionName

Optional<Long> embeddingDimension

Optional<Boolean> supportsNestedMetadataFilters

class CloudAstraDbVectorStore:

Cloud AstraDB Vector Store.

This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.

String token

The Astra DB Application Token to use

formatpassword

String apiEndpoint

The Astra DB JSON API endpoint for your database

String collectionName

Collection name to use. If not existing, it will be created

long embeddingDimension

Length of the embedding vectors in use

Optional<String> className

Optional<String> keyspace

The keyspace to use. If not provided, ‘default_keyspace’

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

String name

The name of the data sink.

SinkType sinkType

One of the following:

ASTRA_DB("ASTRA_DB")

AZUREAI_SEARCH("AZUREAI_SEARCH")

MILVUS("MILVUS")

MONGODB_ATLAS("MONGODB_ATLAS")

PINECONE("PINECONE")

POSTGRES("POSTGRES")

QDRANT("QDRANT")

Optional<String> dataSinkId

Data sink ID. When provided instead of data_sink, the data sink will be looked up by ID.

formatuuid

Optional<EmbeddingConfig> embeddingConfig

One of the following:

class AzureOpenAIEmbeddingConfig:

Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class BedrockEmbeddingConfig:

Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0

Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

class CohereEmbeddingConfig:

Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class GeminiEmbeddingConfig:

Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbeddingConfig:

Optional<HuggingFaceInferenceApiEmbedding> component

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:

String

boolean

Optional<String> className

Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:

CLS("cls")

LAST("last")

MEAN("mean")

Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

Optional<Type> type

Type of the embedding model.

class OpenAIEmbeddingConfig:

Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class VertexAiEmbeddingConfig:

Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:

CLASSIFICATION("classification")

CLUSTERING("clustering")

DEFAULT("default")

RETRIEVAL("retrieval")

SIMILARITY("similarity")

Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

Optional<String> embeddingModelConfigId

Embedding model config ID. When provided instead of embedding_config, the embedding model config will be looked up by ID.

formatuuid

Optional<LlamaParseParameters> llamaParseParameters

Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.

Optional<Boolean> adaptiveLongTable

Optional<Boolean> aggressiveTableExtraction

Optional<Boolean> annotateLinks

Optional<Boolean> autoMode

Optional<String> autoModeConfigurationJson

Optional<Boolean> autoModeTriggerOnImageInPage

Optional<String> autoModeTriggerOnRegexpInPage

Optional<Boolean> autoModeTriggerOnTableInPage

Optional<String> autoModeTriggerOnTextInPage

Optional<String> azureOpenAIApiVersion

Optional<String> azureOpenAIDeploymentName

Optional<String> azureOpenAIEndpoint

Optional<String> azureOpenAIKey

Optional<Double> bboxBottom

Optional<Double> bboxLeft

Optional<Double> bboxRight

Optional<Double> bboxTop

Optional<String> boundingBox

Optional<Boolean> compactMarkdownTable

Optional<String> complementalFormattingInstruction

Optional<String> confidenceScoreEffort

Optional<String> contentGuidelineInstruction

Optional<Boolean> continuousMode

Optional<Boolean> disableImageExtraction

Optional<Boolean> disableOcr

Optional<Boolean> disableReconstruction

Optional<Boolean> doNotCache

Optional<Boolean> doNotUnrollColumns

Optional<Boolean> enableCostOptimizer

Optional<Boolean> extractCharts

Optional<Boolean> extractLayout

Optional<Boolean> extractPrintedPageNumber

Optional<Boolean> fastMode

Optional<String> formattingInstruction

Optional<String> gpt4oApiKey

Optional<Boolean> gpt4oMode

Optional<Boolean> guessXlsxSheetName

Optional<Boolean> hideFooters

Optional<Boolean> hideHeaders

Optional<Boolean> highResOcr

Optional<Boolean> htmlMakeAllElementsVisible

Optional<Boolean> htmlRemoveFixedElements

Optional<Boolean> htmlRemoveNavigationElements

Optional<String> httpProxy

Optional<Boolean> ignoreDocumentElementsForLayoutDetection

Optional<List<ImagesToSave>> imagesToSave

One of the following:

EMBEDDED("embedded")

LAYOUT("layout")

SCREENSHOT("screenshot")

Optional<Boolean> inlineImagesInMarkdown

Optional<String> inputS3Path

Optional<String> inputS3Region

Optional<String> inputUrl

Optional<Boolean> internalIsScreenshotJob

Optional<Boolean> invalidateCache

Optional<Boolean> isFormattingInstruction

Optional<Double> jobTimeoutExtraTimePerPageInSeconds

Optional<Double> jobTimeoutInSeconds

Optional<Boolean> keepPageSeparatorWhenMergingTables

Optional<List<ParsingLanguages>> languages

One of the following:

ABQ("abq")

ADY("ady")

AF("af")

ANG("ang")

AR("ar")

AS("as")

AVA("ava")

AZ("az")

BE("be")

BG("bg")

BGC("bgc")

BH("bh")

BHO("bho")

BN("bn")

BS("bs")

CH_SIM("ch_sim")

CH_TRA("ch_tra")

CHE("che")

CS("cs")

CY("cy")

DA("da")

DAR("dar")

DE("de")

EN("en")

ES("es")

ET("et")

FA("fa")

FR("fr")

GA("ga")

GOM("gom")

HI("hi")

HR("hr")

HU("hu")

ID("id")

INH("inh")

IS("is")

IT("it")

JA("ja")

KBD("kbd")

KN("kn")

KO("ko")

KU("ku")

LA("la")

LBE("lbe")

LEZ("lez")

LT("lt")

LV("lv")

MAH("mah")

MAI("mai")

MI("mi")

MN("mn")

MNI("mni")

MR("mr")

MS("ms")

MT("mt")

NE("ne")

NEW("new")

NL("nl")

NO("no")

OC("oc")

PI("pi")

PL("pl")

PT("pt")

RO("ro")

RS_CYRILLIC("rs_cyrillic")

RS_LATIN("rs_latin")

RU("ru")

SA("sa")

SCK("sck")

SK("sk")

SL("sl")

SQ("sq")

SV("sv")

SW("sw")

TA("ta")

TAB("tab")

TE("te")

TH("th")

TJK("tjk")

TL("tl")

TR("tr")

UG("ug")

UK("uk")

UR("ur")

UZ("uz")

VI("vi")

Optional<Boolean> layoutAware

Optional<Boolean> lineLevelBoundingBox

Optional<String> markdownTableMultilineHeaderSeparator

Optional<Long> maxPages

Optional<Long> maxPagesEnforced

Optional<Boolean> mergeTablesAcrossPagesInMarkdown

Optional<String> model

Optional<Boolean> outlinedTableExtraction

Optional<Boolean> outputPdfOfDocument

Optional<String> outputS3PathPrefix

Optional<String> outputS3Region

Optional<Boolean> outputTablesAsHtml

Optional<Double> pageErrorTolerance

Optional<String> pageFooterPrefix

Optional<String> pageFooterSuffix

Optional<String> pageHeaderPrefix

Optional<String> pageHeaderSuffix

Optional<String> pagePrefix

Optional<String> pageSeparator

Optional<String> pageSuffix

Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:

PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")

PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")

PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")

PARSE_PAGE_WITH_AGENT("parse_page_with_agent")

PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")

PARSE_PAGE_WITH_LLM("parse_page_with_llm")

PARSE_PAGE_WITH_LVM("parse_page_with_lvm")

PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")

Optional<String> parsingInstruction

Optional<Boolean> preciseBoundingBox

Optional<Boolean> premiumMode

Optional<Boolean> presentationOutOfBoundsContent

Optional<Boolean> presentationSkipEmbeddedData

Optional<Boolean> preserveLayoutAlignmentAcrossPages

Optional<Boolean> preserveVerySmallText

Optional<String> preset

Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:

CRITICAL("critical")

HIGH("high")

LOW("low")

MEDIUM("medium")

Optional<String> projectId

Optional<Boolean> removeHiddenText

Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:

BLANK_PAGE("blank_page")

ERROR_MESSAGE("error_message")

RAW_TEXT("raw_text")

Optional<String> replaceFailedPageWithErrorMessagePrefix

Optional<String> replaceFailedPageWithErrorMessageSuffix

Optional<Boolean> saveImages

Optional<Boolean> skipDiagonalText

Optional<Boolean> specializedChartParsingAgentic

Optional<Boolean> specializedChartParsingEfficient

Optional<Boolean> specializedChartParsingPlus

Optional<Boolean> specializedImageParsing

Optional<Boolean> spreadsheetExtractSubTables

Optional<Boolean> spreadsheetForceFormulaComputation

Optional<Boolean> spreadsheetIncludeHiddenSheets

Optional<Boolean> strictModeBuggyFont

Optional<Boolean> strictModeImageExtraction

Optional<Boolean> strictModeImageOcr

Optional<Boolean> strictModeReconstruction

Optional<Boolean> structuredOutput

Optional<String> structuredOutputJsonSchema

Optional<String> structuredOutputJsonSchemaName

Optional<String> systemPrompt

Optional<String> systemPromptAppend

Optional<Boolean> takeScreenshot

Optional<String> targetPages

Optional<String> tier

Optional<Boolean> useVendorMultimodalModel

Optional<String> userPrompt

Optional<String> vendorMultimodalApiKey

Optional<String> vendorMultimodalModelName

Optional<String> version

Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:

CLASSIFY_CANCELLED("classify.cancelled")

CLASSIFY_ERROR("classify.error")

CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")

CLASSIFY_PENDING("classify.pending")

CLASSIFY_RUNNING("classify.running")

CLASSIFY_SUCCESS("classify.success")

EXTRACT_CANCELLED("extract.cancelled")

EXTRACT_ERROR("extract.error")

EXTRACT_PARTIAL_SUCCESS("extract.partial_success")

EXTRACT_PENDING("extract.pending")

EXTRACT_SUCCESS("extract.success")

PARSE_CANCELLED("parse.cancelled")

PARSE_ERROR("parse.error")

PARSE_PARTIAL_SUCCESS("parse.partial_success")

PARSE_PENDING("parse.pending")

PARSE_RUNNING("parse.running")

PARSE_SUCCESS("parse.success")

SHEETS_CANCELLED("sheets.cancelled")

SHEETS_ERROR("sheets.error")

SHEETS_PARTIAL_SUCCESS("sheets.partial_success")

SHEETS_PENDING("sheets.pending")

SHEETS_SUCCESS("sheets.success")

SPLIT_CANCELLED("split.cancelled")

SPLIT_ERROR("split.error")

SPLIT_PENDING("split.pending")

SPLIT_PROCESSING("split.processing")

SPLIT_SUCCESS("split.success")

UNMAPPED_EVENT("unmapped_event")

Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookSigningSecret

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl

Optional<String> managedPipelineId

The ID of the ManagedPipeline this playground pipeline is linked to.

formatuuid

Optional<PipelineMetadataConfig> metadataConfig

Metadata configuration for the pipeline.

Optional<List<String>> excludedEmbedMetadataKeys

List of metadata keys to exclude from embeddings

Optional<List<String>> excludedLlmMetadataKeys

List of metadata keys to exclude from LLM during retrieval

Optional<PipelineType> pipelineType

Type of pipeline. Either PLAYGROUND or MANAGED.

One of the following:

MANAGED("MANAGED")

PLAYGROUND("PLAYGROUND")

Optional<PresetRetrievalParams> presetRetrievalParameters

Preset retrieval parameters for the pipeline.

Optional<Double> alpha

Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.

maximum1

minimum0

Optional<String> className

Optional<Double> denseSimilarityCutoff

Minimum similarity score wrt query for retrieval

maximum1

minimum0

Optional<Long> denseSimilarityTopK

Number of nodes for dense retrieval.

maximum100

minimum1

Optional<Boolean> enableReranking

Enable reranking for retrieval

Optional<Long> filesTopK

Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).

maximum5

minimum1

Optional<Long> rerankTopN

Number of reranked nodes for returning.

maximum100

minimum1

Optional<RetrievalMode> retrievalMode

The retrieval mode for the query.

One of the following:

AUTO_ROUTED("auto_routed")

CHUNKS("chunks")

FILES_VIA_CONTENT("files_via_content")

FILES_VIA_METADATA("files_via_metadata")

DeprecatedOptional<Boolean> retrieveImageNodes

Whether to retrieve image nodes.

Optional<Boolean> retrievePageFigureNodes

Whether to retrieve page figure nodes.

Optional<Boolean> retrievePageScreenshotNodes

Whether to retrieve page screenshot nodes.

Metadata filters for vector stores.

One of the following:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

One of the following:

Vector store filter operator.

One of the following:

MetadataFilters

Vector store filter conditions to combine different filters.

One of the following:

Optional<SearchFiltersInferenceSchema> searchFiltersInferenceSchema

JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<Long> sparseSimilarityTopK

Number of nodes for sparse retrieval.

maximum100

minimum1

Optional<SparseModelConfig> sparseModelConfig

Configuration for sparse embedding models used in hybrid search.

This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.

Optional<String> className

Optional<ModelType> modelType

One of the following:

AUTO("auto")

BM25("bm25")

SPLADE("splade")

Optional<String> status

Status of the pipeline deployment.

Optional<TransformConfig> transformConfig

Configuration for the transformation.

One of the following:

class AutoTransformConfig:

Optional<Long> chunkOverlap

Chunk overlap for the transformation.

Optional<Long> chunkSize

Chunk size for the transformation.

exclusiveMinimum0

Optional<Mode> mode

class AdvancedModeTransformConfig:

Optional<ChunkingConfig> chunkingConfig

Configuration for the chunking.

One of the following:

class NoneChunkingConfig:

Optional<Mode> mode

class CharacterChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

class TokenChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

Optional<String> separator

class SentenceChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

Optional<String> paragraphSeparator

Optional<String> separator

class SemanticChunkingConfig:

Optional<Long> breakpointPercentileThreshold

Optional<Long> bufferSize

Optional<Mode> mode

Optional<SegmentationConfig> segmentationConfig

Configuration for the segmentation.

One of the following:

class NoneSegmentationConfig:

Optional<Mode> mode

class PageSegmentationConfig:

Optional<Mode> mode

Optional<String> pageSeparator

class ElementSegmentationConfig:

Optional<Mode> mode

class PipelineMetadataConfig:

Optional<List<String>> excludedEmbedMetadataKeys

List of metadata keys to exclude from embeddings

Optional<List<String>> excludedLlmMetadataKeys

List of metadata keys to exclude from LLM during retrieval

enum PipelineType:

Enum for representing the type of a pipeline

MANAGED("MANAGED")

PLAYGROUND("PLAYGROUND")

class PresetRetrievalParams:

Schema for the search params for an retrieval execution that can be preset for a pipeline.

Optional<Double> alpha

Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.

maximum1

minimum0

Optional<String> className

Optional<Double> denseSimilarityCutoff

Minimum similarity score wrt query for retrieval

maximum1

minimum0

Optional<Long> denseSimilarityTopK

Number of nodes for dense retrieval.

maximum100

minimum1

Optional<Boolean> enableReranking

Enable reranking for retrieval

Optional<Long> filesTopK

Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).

maximum5

minimum1

Optional<Long> rerankTopN

Number of reranked nodes for returning.

maximum100

minimum1

Optional<RetrievalMode> retrievalMode

The retrieval mode for the query.

One of the following:

AUTO_ROUTED("auto_routed")

CHUNKS("chunks")

FILES_VIA_CONTENT("files_via_content")

FILES_VIA_METADATA("files_via_metadata")

DeprecatedOptional<Boolean> retrieveImageNodes

Whether to retrieve image nodes.

Optional<Boolean> retrievePageFigureNodes

Whether to retrieve page figure nodes.

Optional<Boolean> retrievePageScreenshotNodes

Whether to retrieve page screenshot nodes.

Metadata filters for vector stores.

One of the following:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

One of the following:

Vector store filter operator.

One of the following:

MetadataFilters

Vector store filter conditions to combine different filters.

One of the following:

Optional<SearchFiltersInferenceSchema> searchFiltersInferenceSchema

JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<Long> sparseSimilarityTopK

Number of nodes for sparse retrieval.

maximum100

minimum1

enum RetrievalMode:

AUTO_ROUTED("auto_routed")

CHUNKS("chunks")

FILES_VIA_CONTENT("files_via_content")

FILES_VIA_METADATA("files_via_metadata")

class SparseModelConfig:

Configuration for sparse embedding models used in hybrid search.

This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.

Optional<String> className

Optional<ModelType> modelType

One of the following:

AUTO("auto")

BM25("bm25")

SPLADE("splade")

class VertexAiEmbeddingConfig:

Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:

CLASSIFICATION("classification")

CLUSTERING("clustering")

DEFAULT("default")

RETRIEVAL("retrieval")

SIMILARITY("similarity")

Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

class VertexTextEmbedding:

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:

CLASSIFICATION("classification")

CLUSTERING("clustering")

DEFAULT("default")

RETRIEVAL("retrieval")

SIMILARITY("similarity")

Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

PipelinesSync

Sync Pipeline

Deprecated

Pipeline pipelines().sync().create(, )

POST/api/v1/pipelines/{pipeline_id}/sync

Cancel Pipeline Sync

Deprecated

Pipeline pipelines().sync().cancel(, )

POST/api/v1/pipelines/{pipeline_id}/sync/cancel

PipelinesData Sources

List Pipeline Data Sources

Deprecated

List<PipelineDataSource> pipelines().dataSources().getDataSources(, )

GET/api/v1/pipelines/{pipeline_id}/data-sources

Add Data Sources To Pipeline

Deprecated

List<PipelineDataSource> pipelines().dataSources().updateDataSources(, )

PUT/api/v1/pipelines/{pipeline_id}/data-sources

Update Pipeline Data Source

Deprecated

PipelineDataSource pipelines().dataSources().update(, )

PUT/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}

Get Pipeline Data Source Status

Deprecated

ManagedIngestionStatusResponse pipelines().dataSources().getStatus(, )

GET/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}/status

Sync Pipeline Data Source

Deprecated

Pipeline pipelines().dataSources().sync(, )

POST/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}/sync

ModelsExpand Collapse

class PipelineDataSource:

Schema for a data source in a pipeline.

String id

Unique identifier

formatuuid

Component component

Component that implements the data source

One of the following:

class UnionMember0:

class CloudS3DataSource:

String bucket

The name of the S3 bucket to read from.

Optional<String> awsAccessId

The AWS access ID to use for authentication.

Optional<String> awsAccessSecret

The AWS access secret to use for authentication.

formatpassword

Optional<String> className

Optional<String> prefix

The prefix of the S3 objects to read from.

Optional<String> regexPattern

The regex pattern to filter S3 objects. Must be a valid regex pattern.

Optional<String> s3EndpointUrl

The S3 endpoint URL to use for authentication.

Optional<Boolean> supportsAccessControl

class CloudAzStorageBlobDataSource:

String accountUrl

The Azure Storage Blob account URL to use for authentication.

String containerName

The name of the Azure Storage Blob container to read from.

Optional<String> accountKey

The Azure Storage Blob account key to use for authentication.

formatpassword

Optional<String> accountName

The Azure Storage Blob account name to use for authentication.

Optional<String> blob

The blob name to read from.

Optional<String> className

Optional<String> clientId

The Azure AD client ID to use for authentication.

Optional<String> clientSecret

The Azure AD client secret to use for authentication.

formatpassword

Optional<String> prefix

The prefix of the Azure Storage Blob objects to read from.

Optional<Boolean> supportsAccessControl

Optional<String> tenantId

The Azure AD tenant ID to use for authentication.

class CloudGoogleDriveDataSource:

String folderId

The ID of the Google Drive folder to read from.

Optional<String> className

Optional<String> folderName

Human-readable name of the selected folder, for display.

Optional<ServiceAccountKey> serviceAccountKey

A dictionary containing secret values

Optional<Boolean> supportsAccessControl

class CloudOneDriveDataSource:

String clientId

The client ID to use for authentication.

String clientSecret

The client secret to use for authentication.

formatpassword

String tenantId

The tenant ID to use for authentication.

String userPrincipalName

The user principal name to use for authentication.

Optional<String> className

Optional<String> folderId

The ID of the OneDrive folder to read from.

Optional<String> folderPath

The path of the OneDrive folder to read from.

Optional<List<String>> requiredExts

The list of required file extensions.

Optional<SupportsAccessControl> supportsAccessControl

class CloudSharepointDataSource:

String clientId

The client ID to use for authentication.

String clientSecret

The client secret to use for authentication.

formatpassword

String tenantId

The tenant ID to use for authentication.

Optional<String> className

Optional<String> driveName

The name of the Sharepoint drive to read from.

Optional<List<String>> excludePathPatterns

List of regex patterns for file paths to exclude. Files whose paths (including filename) match any pattern will be excluded. Example: [‘/temp/’, ‘/backup/’, ‘.git/’, ‘.tmp$’, ’^~’]

Optional<String> folderId

The ID of the Sharepoint folder to read from.

Optional<String> folderPath

The path of the Sharepoint folder to read from.

Optional<Boolean> getPermissions

Whether to get permissions for the sharepoint site.

Optional<List<String>> includePathPatterns

List of regex patterns for file paths to include. Full paths (including filename) must match at least one pattern to be included. Example: [‘/reports/’, ‘/docs/..pdf$’, ‘^Report..pdf$’]

Optional<List<String>> requiredExts

The list of required file extensions.

Optional<String> siteId

The ID of the SharePoint site to download from.

Optional<String> siteName

The name of the SharePoint site to download from.

Optional<SupportsAccessControl> supportsAccessControl

class CloudSlackDataSource:

String slackToken

Slack Bot Token.

formatpassword

Optional<String> channelIds

Slack Channel.

Optional<String> channelPatterns

Slack Channel name pattern.

Optional<String> className

Optional<String> earliestDate

Earliest date.

Optional<Double> earliestDateTimestamp

Earliest date timestamp.

Optional<String> latestDate

Latest date.

Optional<Double> latestDateTimestamp

Latest date timestamp.

Optional<Boolean> supportsAccessControl

class CloudNotionPageDataSource:

String integrationToken

The integration token to use for authentication.

formatpassword

Optional<String> className

Optional<String> databaseIds

The Notion Database Id to read content from.

Optional<String> pageIds

The Page ID’s of the Notion to read from.

Optional<Boolean> supportsAccessControl

class CloudConfluenceDataSource:

String authenticationMechanism

Type of Authentication for connecting to Confluence APIs.

String serverUrl

The server URL of the Confluence instance.

Optional<String> apiToken

The API token to use for authentication.

formatpassword

Optional<String> className

Optional<String> cql

The CQL query to use for fetching pages.

Optional<FailureHandlingConfig> failureHandling

Configuration for handling failures during processing. Key-value object controlling failure handling behaviors.

Example: { “skip_list_failures”: true }

Currently supports:

skip_list_failures: Skip failed batches/lists and continue processing

Optional<Boolean> skipListFailures

Whether to skip failed batches/lists and continue processing

Optional<Boolean> indexRestrictedPages

Whether to index restricted pages.

Optional<Boolean> keepMarkdownFormat

Whether to keep the markdown format.

Optional<String> label

The label to use for fetching pages.

Optional<String> pageIds

The page IDs of the Confluence to read from.

Optional<String> spaceKey

The space key to read from.

Optional<Boolean> supportsAccessControl

Optional<Boolean> syncPermissions

Whether to fetch space-level permissions (allowed users/groups) and attach them to document metadata for access control. Disable for Confluence Server/Data Center versions whose permission APIs are unavailable (e.g. the JSON-RPC API removed in Data Center 9.2.6+), which otherwise surface as 401 errors during sync.

Optional<String> userName

The username to use for authentication.

class CloudJiraDataSource:

Cloud Jira Data Source integrating JiraReader.

String authenticationMechanism

Type of Authentication for connecting to Jira APIs.

String query

JQL (Jira Query Language) query to search.

Optional<String> apiToken

The API/ Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword

Optional<String> className

Optional<String> cloudId

The cloud ID, used in case of OAuth2.

Optional<String> email

The email address to use for authentication.

Optional<String> serverUrl

The server url for Jira Cloud.

Optional<Boolean> supportsAccessControl

class CloudJiraDataSourceV2:

Cloud Jira Data Source integrating JiraReaderV2.

String authenticationMechanism

Type of Authentication for connecting to Jira APIs.

String query

JQL (Jira Query Language) query to search.

String serverUrl

The server url for Jira Cloud.

Optional<String> apiToken

The API Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword

Optional<ApiVersion> apiVersion

Jira REST API version to use (2 or 3). 3 supports Atlassian Document Format (ADF).

One of the following:

_2("2")

_3("3")

Optional<String> className

Optional<String> cloudId

The cloud ID, used in case of OAuth2.

Optional<String> email

The email address to use for authentication.

Optional<String> expand

Fields to expand in the response.

Optional<List<String>> fields

List of fields to retrieve from Jira. If None, retrieves all fields.

Optional<Boolean> getPermissions

Whether to fetch project role permissions and issue-level security

Optional<Long> requestsPerMinute

Rate limit for Jira API requests per minute.

Optional<Boolean> supportsAccessControl

class CloudBoxDataSource:

AuthenticationMechanism authenticationMechanism

The type of authentication to use (Developer Token or CCG)

One of the following:

CCG("ccg")

DEVELOPER_TOKEN("developer_token")

Optional<String> className

Optional<String> clientId

Box API key used for identifying the application the user is authenticating with

Optional<String> clientSecret

Box API secret used for making auth requests.

formatpassword

Optional<String> developerToken

Developer token for authentication if authentication_mechanism is ‘developer_token’.

formatpassword

Optional<String> enterpriseId

Box Enterprise ID, if provided authenticates as service.

Optional<String> folderId

The ID of the Box folder to read from.

Optional<Boolean> supportsAccessControl

Optional<String> userId

Box User ID, if provided authenticates as user.

String dataSourceId

The ID of the data source.

formatuuid

LocalDateTime lastSyncedAt

The last time the data source was automatically synced.

formatdate-time

String name

The name of the data source.

String pipelineId

The ID of the pipeline.

formatuuid

String projectId

SourceType sourceType

One of the following:

AZURE_STORAGE_BLOB("AZURE_STORAGE_BLOB")

BOX("BOX")

CONFLUENCE("CONFLUENCE")

GOOGLE_DRIVE("GOOGLE_DRIVE")

JIRA("JIRA")

JIRA_V2("JIRA_V2")

MICROSOFT_ONEDRIVE("MICROSOFT_ONEDRIVE")

MICROSOFT_SHAREPOINT("MICROSOFT_SHAREPOINT")

NOTION_PAGE("NOTION_PAGE")

S3("S3")

SLACK("SLACK")

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<CustomMetadata> customMetadata

Custom metadata that will be present on all data loaded from the data source

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<Status> status

The status of the data source in the pipeline.

One of the following:

CANCELLED("CANCELLED")

ERROR("ERROR")

IN_PROGRESS("IN_PROGRESS")

NOT_STARTED("NOT_STARTED")

SUCCESS("SUCCESS")

Optional<LocalDateTime> statusUpdatedAt

The last time the status was updated.

formatdate-time

Optional<Double> syncInterval

The interval at which the data source should be synced.

Optional<String> syncScheduleSetBy

The id of the user who set the sync schedule.

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

Optional<DataSourceReaderVersionMetadata> versionMetadata

Version metadata for the data source

Optional<ReaderVersion> readerVersion

The version of the reader to use for this data source.

One of the following:

_1_0("1.0")

_2_0("2.0")

_2_1("2.1")

PipelinesImages

List File Page Screenshots

List<ImageListPageScreenshotsResponse> pipelines().images().listPageScreenshots(, )

GET/api/v1/files/{id}/page_screenshots

Get File Page Screenshot

JsonValue pipelines().images().getPageScreenshot(, )

GET/api/v1/files/{id}/page_screenshots/{page_index}

Get File Page Figure

JsonValue pipelines().images().getPageFigure(, )

GET/api/v1/files/{id}/page-figures/{page_index}/{figure_name}

List File Pages Figures

List<ImageListPageFiguresResponse> pipelines().images().listPageFigures(, )

GET/api/v1/files/{id}/page-figures

PipelinesFiles

Get Pipeline File Status Counts

Deprecated

FileGetStatusCountsResponse pipelines().files().getStatusCounts(, )

GET/api/v1/pipelines/{pipeline_id}/files/status-counts

Get Pipeline File Status

Deprecated

ManagedIngestionStatusResponse pipelines().files().getStatus(, )

GET/api/v1/pipelines/{pipeline_id}/files/{file_id}/status

Add Files To Pipeline Api

Deprecated

List<PipelineFile> pipelines().files().create(, )

PUT/api/v1/pipelines/{pipeline_id}/files

Update Pipeline File

Deprecated

PipelineFile pipelines().files().update(, )

PUT/api/v1/pipelines/{pipeline_id}/files/{file_id}

Delete Pipeline File

Deprecated

pipelines().files().delete(, )

DELETE/api/v1/pipelines/{pipeline_id}/files/{file_id}

List Pipeline Files2

Deprecated

FileListPage pipelines().files().list(, )

GET/api/v1/pipelines/{pipeline_id}/files2

ModelsExpand Collapse

class PipelineFile:

A file associated with a pipeline.

String id

Unique identifier for the pipeline file.

formatuuid

String pipelineId

The ID of the pipeline that the file is associated with.

formatuuid

Optional<ConfigHash> configHash

Hashes for the configuration of the pipeline.

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<LocalDateTime> createdAt

When the pipeline file was created.

formatdate-time

Optional<CustomMetadata> customMetadata

Custom metadata for the file.

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<String> dataSourceId

The ID of the data source that the file belongs to.

formatuuid

Optional<String> externalFileId

The ID of the file in the external system.

Optional<String> fileId

The ID of the file.

formatuuid

Optional<Long> fileSize

Size of the file in bytes.

Optional<String> fileType

File type (e.g. pdf, docx, etc.).

Optional<Long> indexedPageCount

The number of pages that have been indexed for this file.

Optional<LocalDateTime> lastModifiedAt

The last modified time of the file.

formatdate-time

Optional<String> name

Name of the file.

Optional<PermissionInfo> permissionInfo

Permission information for the file.

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<String> projectId

The ID of the project that the file belongs to.

formatuuid

Optional<ResourceInfo> resourceInfo

Resource information for the file.

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<Status> status

Status of the pipeline file.

One of the following:

CANCELLED("CANCELLED")

ERROR("ERROR")

IN_PROGRESS("IN_PROGRESS")

NOT_STARTED("NOT_STARTED")

SUCCESS("SUCCESS")

Optional<LocalDateTime> statusUpdatedAt

The last time the status was updated.

formatdate-time

Optional<LocalDateTime> updatedAt

When the pipeline file was last updated.

formatdate-time

PipelinesMetadata

Import Pipeline Metadata

Deprecated

MetadataCreateResponse pipelines().metadata().create(, )

PUT/api/v1/pipelines/{pipeline_id}/metadata

Delete Pipeline Files Metadata

Deprecated

pipelines().metadata().deleteAll(, )

DELETE/api/v1/pipelines/{pipeline_id}/metadata

PipelinesDocuments

Create Batch Pipeline Documents

Deprecated

List<CloudDocument> pipelines().documents().create(, )

POST/api/v1/pipelines/{pipeline_id}/documents

Paginated List Pipeline Documents

Deprecated

DocumentListPage pipelines().documents().list(, )

GET/api/v1/pipelines/{pipeline_id}/documents/paginated

Get Pipeline Document

Deprecated

CloudDocument pipelines().documents().get(, )

GET/api/v1/pipelines/{pipeline_id}/documents/{document_id}

Delete Pipeline Document

Deprecated

pipelines().documents().delete(, )

DELETE/api/v1/pipelines/{pipeline_id}/documents/{document_id}

Get Pipeline Document Status

Deprecated

ManagedIngestionStatusResponse pipelines().documents().getStatus(, )

GET/api/v1/pipelines/{pipeline_id}/documents/{document_id}/status

Sync Pipeline Document

Deprecated

JsonValue pipelines().documents().sync(, )

POST/api/v1/pipelines/{pipeline_id}/documents/{document_id}/sync

List Pipeline Document Chunks

Deprecated

List<TextNode> pipelines().documents().getChunks(, )

GET/api/v1/pipelines/{pipeline_id}/documents/{document_id}/chunks

Upsert Batch Pipeline Documents

Deprecated

List<CloudDocument> pipelines().documents().upsert(, )

PUT/api/v1/pipelines/{pipeline_id}/documents

ModelsExpand Collapse

class CloudDocument:

Cloud document stored in S3.

String id

Metadata metadata

String text

Optional<List<String>> excludedEmbedMetadataKeys

Optional<List<String>> excludedLlmMetadataKeys

Optional<List<Long>> pagePositions

indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

Optional<StatusMetadata> statusMetadata

class CloudDocumentCreate:

Create a new cloud document.

Metadata metadata

String text

Optional<String> id

Optional<List<String>> excludedEmbedMetadataKeys

Optional<List<String>> excludedLlmMetadataKeys

Optional<List<Long>> pagePositions

indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

class TextNode:

Provided for backward compatibility.

Optional<String> className

Optional<List<Double>> embedding

Embedding of the node.

Optional<Long> endCharIdx

End char index of the node.

Optional<List<String>> excludedEmbedMetadataKeys

Metadata keys that are excluded from text for the embed model.

Optional<List<String>> excludedLlmMetadataKeys

Metadata keys that are excluded from text for the LLM.

Optional<ExtraInfo> extraInfo

A flat dictionary of metadata fields

Optional<String> id

Unique ID of the node.

Optional<String> metadataSeperator

Separator between metadata fields when converting to string.

Optional<String> metadataTemplate

Template for how metadata is formatted, with {key} and {value} placeholders.

Optional<String> mimetype

MIME type of the node content.

Optional<Relationships> relationships

A mapping of relationships to other node information.

One of the following:

class RelatedNodeInfo:

String nodeId

Optional<String> className

Optional<String> hash

Optional<Metadata> metadata

Optional<NodeType> nodeType

One of the following:

_1("1")

_2("2")

_3("3")

_4("4")

_5("5")

List<RelatedNodeInfo>

String nodeId

Optional<String> className

Optional<String> hash

Optional<Metadata> metadata

Optional<NodeType> nodeType

One of the following:

_1("1")

_2("2")

_3("3")

_4("4")

_5("5")

Optional<Long> startCharIdx

Start char index of the node.

Optional<String> text

Text content of the node.

Optional<String> textTemplate

Template for how text is formatted, with {content} and {metadata_str} placeholders.