Skip to content
Framework Docs

Pipelines

Search Pipelines
List<Pipeline> pipelines().list(PipelineListParamsparams = PipelineListParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines
Create Pipeline
Pipeline pipelines().create(PipelineCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
POST/api/v1/pipelines
Get Pipeline
Pipeline pipelines().get(PipelineGetParamsparams = PipelineGetParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}
Update Existing Pipeline
Pipeline pipelines().update(PipelineUpdateParamsparams = PipelineUpdateParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
PUT/api/v1/pipelines/{pipeline_id}
Delete Pipeline
pipelines().delete(PipelineDeleteParamsparams = PipelineDeleteParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
DELETE/api/v1/pipelines/{pipeline_id}
Get Pipeline Status
ManagedIngestionStatusResponse pipelines().getStatus(PipelineGetStatusParamsparams = PipelineGetStatusParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/status
Upsert Pipeline
Pipeline pipelines().upsert(PipelineUpsertParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
PUT/api/v1/pipelines
Run Search
PipelineRetrieveResponse pipelines().retrieve(PipelineRetrieveParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
POST/api/v1/pipelines/{pipeline_id}/retrieve
ModelsExpand Collapse
class AdvancedModeTransformConfig:
Optional<ChunkingConfig> chunkingConfig

Configuration for the chunking.

One of the following:
class NoneChunkingConfig:
Optional<Mode> mode
class CharacterChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
class TokenChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
Optional<String> separator
class SentenceChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
Optional<String> paragraphSeparator
Optional<String> separator
class SemanticChunkingConfig:
Optional<Long> breakpointPercentileThreshold
Optional<Long> bufferSize
Optional<Mode> mode
Optional<Mode> mode
Optional<SegmentationConfig> segmentationConfig

Configuration for the segmentation.

One of the following:
class NoneSegmentationConfig:
Optional<Mode> mode
class PageSegmentationConfig:
Optional<Mode> mode
Optional<String> pageSeparator
class ElementSegmentationConfig:
Optional<Mode> mode
class AutoTransformConfig:
Optional<Long> chunkOverlap

Chunk overlap for the transformation.

Optional<Long> chunkSize

Chunk size for the transformation.

exclusiveMinimum0
Optional<Mode> mode
class AzureOpenAIEmbedding:
Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
class AzureOpenAIEmbeddingConfig:
Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
Optional<Type> type

Type of the embedding model.

class BedrockEmbedding:
Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0
Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

class BedrockEmbeddingConfig:
Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0
Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

class CohereEmbedding:
Optional<String> apiKey

The Cohere API key.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

class CohereEmbeddingConfig:
Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class DataSinkCreate:

Schema for creating a data sink.

Component component

Component that implements the data sink

One of the following:
class UnionMember0:
class CloudPineconeVectorStore:

Cloud Pinecone Vector Store.

This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.

Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion

String apiKey

The API key for authenticating with Pinecone

formatpassword
String indexName
Optional<String> className
Optional<InsertKwargs> insertKwargs
Optional<String> namespace
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
class CloudPostgresVectorStore:
String database
long embedDim
String host
String password
long port
String schemaName
String tableName
String user
Optional<String> className
Optional<PgVectorHnswSettings> hnswSettings

HNSW settings for PGVector.

Optional<DistanceMethod> distanceMethod

The distance method to use.

One of the following:
L2("l2")
IP("ip")
COSINE("cosine")
L1("l1")
HAMMING("hamming")
JACCARD("jaccard")
Optional<Long> efConstruction

The number of edges to use during the construction phase.

minimum1

The number of edges to use during the search phase.

minimum1
Optional<Long> m

The number of bi-directional links created for each new element.

minimum1
Optional<VectorType> vectorType

The type of vector to use.

One of the following:
VECTOR("vector")
HALF_VEC("half_vec")
BIT("bit")
SPARSE_VEC("sparse_vec")
Optional<Boolean> performSetup
Optional<Boolean> supportsNestedMetadataFilters
class CloudQdrantVectorStore:

Cloud Qdrant Vector Store.

This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.

Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client

String apiKey
String collectionName
String url
Optional<String> className
Optional<ClientKwargs> clientKwargs
Optional<Long> maxRetries
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
class CloudAzureAiSearchVectorStore:

Cloud Azure AI Search Vector Store.

String searchServiceApiKey
String searchServiceEndpoint
Optional<String> className
Optional<String> clientId
Optional<String> clientSecret
Optional<Long> embeddingDimension
Optional<FilterableMetadataFieldKeys> filterableMetadataFieldKeys
Optional<String> indexName
Optional<String> searchServiceApiVersion
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
Optional<String> tenantId

Cloud MongoDB Atlas Vector Store.

This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.

Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index

class CloudMilvusVectorStore:

Cloud Milvus Vector Store.

String uri
Optional<String> token
Optional<String> className
Optional<String> collectionName
Optional<Long> embeddingDimension
Optional<Boolean> supportsNestedMetadataFilters
class CloudAstraDbVectorStore:

Cloud AstraDB Vector Store.

This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.

Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, ‘default_keyspace’

String token

The Astra DB Application Token to use

formatpassword
String apiEndpoint

The Astra DB JSON API endpoint for your database

String collectionName

Collection name to use. If not existing, it will be created

long embeddingDimension

Length of the embedding vectors in use

Optional<String> className
Optional<String> keyspace

The keyspace to use. If not provided, ‘default_keyspace’

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
String name

The name of the data sink.

SinkType sinkType
One of the following:
PINECONE("PINECONE")
POSTGRES("POSTGRES")
QDRANT("QDRANT")
AZUREAI_SEARCH("AZUREAI_SEARCH")
MONGODB_ATLAS("MONGODB_ATLAS")
MILVUS("MILVUS")
ASTRA_DB("ASTRA_DB")
class GeminiEmbedding:
Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

class GeminiEmbeddingConfig:
Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbedding:
Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:
String
boolean
Optional<String> className
Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:
CLS("cls")
MEAN("mean")
LAST("last")
Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.

class HuggingFaceInferenceApiEmbeddingConfig:

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:
String
boolean
Optional<String> className
Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:
CLS("cls")
MEAN("mean")
LAST("last")
Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.

Optional<Type> type

Type of the embedding model.

class LlamaParseParameters:
Optional<Boolean> adaptiveLongTable
Optional<Boolean> aggressiveTableExtraction
Optional<Boolean> autoMode
Optional<String> autoModeConfigurationJson
Optional<Boolean> autoModeTriggerOnImageInPage
Optional<String> autoModeTriggerOnRegexpInPage
Optional<Boolean> autoModeTriggerOnTableInPage
Optional<String> autoModeTriggerOnTextInPage
Optional<String> azureOpenAIApiVersion
Optional<String> azureOpenAIDeploymentName
Optional<String> azureOpenAIEndpoint
Optional<String> azureOpenAIKey
Optional<Double> bboxBottom
Optional<Double> bboxLeft
Optional<Double> bboxRight
Optional<Double> bboxTop
Optional<String> boundingBox
Optional<Boolean> compactMarkdownTable
Optional<String> complementalFormattingInstruction
Optional<String> contentGuidelineInstruction
Optional<Boolean> continuousMode
Optional<Boolean> disableImageExtraction
Optional<Boolean> disableOcr
Optional<Boolean> disableReconstruction
Optional<Boolean> doNotCache
Optional<Boolean> doNotUnrollColumns
Optional<Boolean> enableCostOptimizer
Optional<Boolean> extractCharts
Optional<Boolean> extractLayout
Optional<Boolean> extractPrintedPageNumber
Optional<Boolean> fastMode
Optional<String> formattingInstruction
Optional<String> gpt4oApiKey
Optional<Boolean> gpt4oMode
Optional<Boolean> guessXlsxSheetName
Optional<Boolean> hideFooters
Optional<Boolean> hideHeaders
Optional<Boolean> highResOcr
Optional<Boolean> htmlMakeAllElementsVisible
Optional<Boolean> htmlRemoveFixedElements
Optional<Boolean> htmlRemoveNavigationElements
Optional<String> httpProxy
Optional<Boolean> ignoreDocumentElementsForLayoutDetection
Optional<List<ImagesToSave>> imagesToSave
One of the following:
SCREENSHOT("screenshot")
EMBEDDED("embedded")
LAYOUT("layout")
Optional<Boolean> inlineImagesInMarkdown
Optional<String> inputS3Path
Optional<String> inputS3Region
Optional<String> inputUrl
Optional<Boolean> internalIsScreenshotJob
Optional<Boolean> invalidateCache
Optional<Boolean> isFormattingInstruction
Optional<Double> jobTimeoutExtraTimePerPageInSeconds
Optional<Double> jobTimeoutInSeconds
Optional<Boolean> keepPageSeparatorWhenMergingTables
Optional<List<ParsingLanguages>> languages
One of the following:
AF("af")
AZ("az")
BS("bs")
CS("cs")
CY("cy")
DA("da")
DE("de")
EN("en")
ES("es")
ET("et")
FR("fr")
GA("ga")
HR("hr")
HU("hu")
ID("id")
IS("is")
IT("it")
KU("ku")
LA("la")
LT("lt")
LV("lv")
MI("mi")
MS("ms")
MT("mt")
NL("nl")
NO("no")
OC("oc")
PI("pi")
PL("pl")
PT("pt")
RO("ro")
RS_LATIN("rs_latin")
SK("sk")
SL("sl")
SQ("sq")
SV("sv")
SW("sw")
TL("tl")
TR("tr")
UZ("uz")
VI("vi")
AR("ar")
FA("fa")
UG("ug")
UR("ur")
BN("bn")
AS("as")
MNI("mni")
RU("ru")
RS_CYRILLIC("rs_cyrillic")
BE("be")
BG("bg")
UK("uk")
MN("mn")
ABQ("abq")
ADY("ady")
KBD("kbd")
AVA("ava")
DAR("dar")
INH("inh")
CHE("che")
LBE("lbe")
LEZ("lez")
TAB("tab")
TJK("tjk")
HI("hi")
MR("mr")
NE("ne")
BH("bh")
MAI("mai")
ANG("ang")
BHO("bho")
MAH("mah")
SCK("sck")
NEW("new")
GOM("gom")
SA("sa")
BGC("bgc")
TH("th")
CH_SIM("ch_sim")
CH_TRA("ch_tra")
JA("ja")
KO("ko")
TA("ta")
TE("te")
KN("kn")
Optional<Boolean> layoutAware
Optional<Boolean> lineLevelBoundingBox
Optional<String> markdownTableMultilineHeaderSeparator
Optional<Long> maxPages
Optional<Long> maxPagesEnforced
Optional<Boolean> mergeTablesAcrossPagesInMarkdown
Optional<String> model
Optional<Boolean> outlinedTableExtraction
Optional<Boolean> outputPdfOfDocument
Optional<String> outputS3PathPrefix
Optional<String> outputS3Region
Optional<Boolean> outputTablesAsHtml
Optional<Double> pageErrorTolerance
Optional<String> pageHeaderPrefix
Optional<String> pageHeaderSuffix
Optional<String> pagePrefix
Optional<String> pageSeparator
Optional<String> pageSuffix
Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:
PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")
PARSE_PAGE_WITH_LLM("parse_page_with_llm")
PARSE_PAGE_WITH_LVM("parse_page_with_lvm")
PARSE_PAGE_WITH_AGENT("parse_page_with_agent")
PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")
PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")
PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")
PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")
Optional<String> parsingInstruction
Optional<Boolean> preciseBoundingBox
Optional<Boolean> premiumMode
Optional<Boolean> presentationOutOfBoundsContent
Optional<Boolean> presentationSkipEmbeddedData
Optional<Boolean> preserveLayoutAlignmentAcrossPages
Optional<Boolean> preserveVerySmallText
Optional<String> preset
Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:
LOW("low")
MEDIUM("medium")
HIGH("high")
CRITICAL("critical")
Optional<String> projectId
Optional<Boolean> removeHiddenText
Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:
RAW_TEXT("raw_text")
BLANK_PAGE("blank_page")
ERROR_MESSAGE("error_message")
Optional<String> replaceFailedPageWithErrorMessagePrefix
Optional<String> replaceFailedPageWithErrorMessageSuffix
Optional<Boolean> saveImages
Optional<Boolean> skipDiagonalText
Optional<Boolean> specializedChartParsingAgentic
Optional<Boolean> specializedChartParsingEfficient
Optional<Boolean> specializedChartParsingPlus
Optional<Boolean> specializedImageParsing
Optional<Boolean> spreadsheetExtractSubTables
Optional<Boolean> spreadsheetForceFormulaComputation
Optional<Boolean> spreadsheetIncludeHiddenSheets
Optional<Boolean> strictModeBuggyFont
Optional<Boolean> strictModeImageExtraction
Optional<Boolean> strictModeImageOcr
Optional<Boolean> strictModeReconstruction
Optional<Boolean> structuredOutput
Optional<String> structuredOutputJsonSchema
Optional<String> structuredOutputJsonSchemaName
Optional<String> systemPrompt
Optional<String> systemPromptAppend
Optional<Boolean> takeScreenshot
Optional<String> targetPages
Optional<String> tier
Optional<Boolean> useVendorMultimodalModel
Optional<String> userPrompt
Optional<String> vendorMultimodalApiKey
Optional<String> vendorMultimodalModelName
Optional<String> version
Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:
EXTRACT_PENDING("extract.pending")
EXTRACT_SUCCESS("extract.success")
EXTRACT_ERROR("extract.error")
EXTRACT_PARTIAL_SUCCESS("extract.partial_success")
EXTRACT_CANCELLED("extract.cancelled")
PARSE_PENDING("parse.pending")
PARSE_RUNNING("parse.running")
PARSE_SUCCESS("parse.success")
PARSE_ERROR("parse.error")
PARSE_PARTIAL_SUCCESS("parse.partial_success")
PARSE_CANCELLED("parse.cancelled")
CLASSIFY_PENDING("classify.pending")
CLASSIFY_RUNNING("classify.running")
CLASSIFY_SUCCESS("classify.success")
CLASSIFY_ERROR("classify.error")
CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")
CLASSIFY_CANCELLED("classify.cancelled")
SHEETS_PENDING("sheets.pending")
SHEETS_SUCCESS("sheets.success")
SHEETS_ERROR("sheets.error")
SHEETS_PARTIAL_SUCCESS("sheets.partial_success")
SHEETS_CANCELLED("sheets.cancelled")
UNMAPPED_EVENT("unmapped_event")
Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl
class LlmParameters:
Optional<String> className
Optional<ModelName> modelName

The name of the model to use for LLM completions.

One of the following:
GPT_4_O("GPT_4O")
GPT_4_O_MINI("GPT_4O_MINI")
GPT_4_1("GPT_4_1")
GPT_4_1_NANO("GPT_4_1_NANO")
GPT_4_1_MINI("GPT_4_1_MINI")
AZURE_OPENAI_GPT_4_O("AZURE_OPENAI_GPT_4O")
AZURE_OPENAI_GPT_4_O_MINI("AZURE_OPENAI_GPT_4O_MINI")
AZURE_OPENAI_GPT_4_1("AZURE_OPENAI_GPT_4_1")
AZURE_OPENAI_GPT_4_1_MINI("AZURE_OPENAI_GPT_4_1_MINI")
AZURE_OPENAI_GPT_4_1_NANO("AZURE_OPENAI_GPT_4_1_NANO")
CLAUDE_4_5_SONNET("CLAUDE_4_5_SONNET")
BEDROCK_CLAUDE_3_5_SONNET_V1("BEDROCK_CLAUDE_3_5_SONNET_V1")
BEDROCK_CLAUDE_3_5_SONNET_V2("BEDROCK_CLAUDE_3_5_SONNET_V2")
Optional<String> systemPrompt

The system prompt to use for the completion.

maxLength3000
Optional<Double> temperature

The temperature value for the model.

Optional<Boolean> useChainOfThoughtReasoning

Whether to use chain of thought reasoning.

Optional<Boolean> useCitation

Whether to show citations in the response.

class ManagedIngestionStatusResponse:
Status status

Status of the ingestion.

One of the following:
NOT_STARTED("NOT_STARTED")
IN_PROGRESS("IN_PROGRESS")
SUCCESS("SUCCESS")
ERROR("ERROR")
PARTIAL_SUCCESS("PARTIAL_SUCCESS")
CANCELLED("CANCELLED")
Optional<LocalDateTime> deploymentDate

Date of the deployment.

formatdate-time
Optional<LocalDateTime> effectiveAt

When the status is effective

formatdate-time
Optional<List<Error>> error

List of errors that occurred during ingestion.

String jobId

ID of the job that failed.

formatuuid
String message

List of errors that occurred during ingestion.

Step step

Name of the job that failed.

One of the following:
MANAGED_INGESTION("MANAGED_INGESTION")
DATA_SOURCE("DATA_SOURCE")
FILE_UPDATER("FILE_UPDATER")
PARSE("PARSE")
TRANSFORM("TRANSFORM")
INGESTION("INGESTION")
METADATA_UPDATE("METADATA_UPDATE")
Optional<String> jobId

ID of the latest job.

formatuuid
enum MessageRole:

Message role.

SYSTEM("system")
DEVELOPER("developer")
USER("user")
ASSISTANT("assistant")
FUNCTION("function")
TOOL("tool")
CHATBOT("chatbot")
MODEL("model")
class MetadataFilters:

Metadata filters for vector stores.

List<Filter> filters
One of the following:
class MetadataFilter:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

String key
Optional<Value> value
One of the following:
double
String
List<String>
List<double>
List<long>
Optional<Operator> operator

Vector store filter operator.

One of the following:
EQUALS("==")
GREATER(">")
LESS("<")
NOT_EQUALS("!=")
GREATER_OR_EQUALS(">=")
LESS_OR_EQUALS("<=")
IN("in")
NIN("nin")
ANY("any")
ALL("all")
TEXT_MATCH("text_match")
TEXT_MATCH_INSENSITIVE("text_match_insensitive")
CONTAINS("contains")
IS_EMPTY("is_empty")
MetadataFilters
Optional<Condition> condition

Vector store filter conditions to combine different filters.

One of the following:
AND("and")
OR("or")
NOT("not")
class OpenAIEmbedding:
Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
class OpenAIEmbeddingConfig:
Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
Optional<Type> type

Type of the embedding model.

class PageFigureNodeWithScore:

Page figure metadata with score

Node node
double confidence

The confidence of the figure

maximum1
minimum0
String figureName

The name of the figure

long figureSize

The size of the figure in bytes

minimum0
String fileId

The ID of the file that the figure was taken from

formatuuid
long pageIndex

The index of the page for which the figure is taken (0-indexed)

minimum0
Optional<Boolean> isLikelyNoise

Whether the figure is likely to be noise

Optional<Metadata> metadata

Metadata for the figure

double score

The score of the figure node

Optional<String> className
class PageScreenshotNodeWithScore:

Page screenshot metadata with score

Node node
String fileId

The ID of the file that the page screenshot was taken from

formatuuid
long imageSize

The size of the image in bytes

minimum0
long pageIndex

The index of the page for which the screenshot is taken (0-indexed)

minimum0
Optional<Metadata> metadata

Metadata for the screenshot

double score

The score of the screenshot node

Optional<String> className
class Pipeline:

Schema for a pipeline.

String id

Unique identifier

formatuuid
EmbeddingConfig embeddingConfig
One of the following:
class ManagedOpenAIEmbedding:
Optional<Component> component

Configuration for the Managed OpenAI embedding model.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<ModelName> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

class AzureOpenAIEmbeddingConfig:
Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
Optional<Type> type

Type of the embedding model.

class CohereEmbeddingConfig:
Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class GeminiEmbeddingConfig:
Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbeddingConfig:

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:
String
boolean
Optional<String> className
Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:
CLS("cls")
MEAN("mean")
LAST("last")
Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.

Optional<Type> type

Type of the embedding model.

class OpenAIEmbeddingConfig:
Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
Optional<Type> type

Type of the embedding model.

class VertexAiEmbeddingConfig:
Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:
DEFAULT("default")
CLASSIFICATION("classification")
CLUSTERING("clustering")
SIMILARITY("similarity")
RETRIEVAL("retrieval")
Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

class BedrockEmbeddingConfig:
Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0
Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

String name
String projectId
Optional<ConfigHash> configHash

Hashes for the configuration of a pipeline.

Optional<String> embeddingConfigHash

Hash of the embedding config.

Optional<String> parsingConfigHash

Hash of the llama parse parameters.

Optional<String> transformConfigHash

Hash of the transform config.

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time
Optional<DataSink> dataSink

Schema for a data sink.

String id

Unique identifier

formatuuid
Component component

Component that implements the data sink

One of the following:
class UnionMember0:
class CloudPineconeVectorStore:

Cloud Pinecone Vector Store.

This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.

Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion

String apiKey

The API key for authenticating with Pinecone

formatpassword
String indexName
Optional<String> className
Optional<InsertKwargs> insertKwargs
Optional<String> namespace
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
class CloudPostgresVectorStore:
String database
long embedDim
String host
String password
long port
String schemaName
String tableName
String user
Optional<String> className
Optional<PgVectorHnswSettings> hnswSettings

HNSW settings for PGVector.

Optional<DistanceMethod> distanceMethod

The distance method to use.

One of the following:
L2("l2")
IP("ip")
COSINE("cosine")
L1("l1")
HAMMING("hamming")
JACCARD("jaccard")
Optional<Long> efConstruction

The number of edges to use during the construction phase.

minimum1

The number of edges to use during the search phase.

minimum1
Optional<Long> m

The number of bi-directional links created for each new element.

minimum1
Optional<VectorType> vectorType

The type of vector to use.

One of the following:
VECTOR("vector")
HALF_VEC("half_vec")
BIT("bit")
SPARSE_VEC("sparse_vec")
Optional<Boolean> performSetup
Optional<Boolean> supportsNestedMetadataFilters
class CloudQdrantVectorStore:

Cloud Qdrant Vector Store.

This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.

Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client

String apiKey
String collectionName
String url
Optional<String> className
Optional<ClientKwargs> clientKwargs
Optional<Long> maxRetries
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
class CloudAzureAiSearchVectorStore:

Cloud Azure AI Search Vector Store.

String searchServiceApiKey
String searchServiceEndpoint
Optional<String> className
Optional<String> clientId
Optional<String> clientSecret
Optional<Long> embeddingDimension
Optional<FilterableMetadataFieldKeys> filterableMetadataFieldKeys
Optional<String> indexName
Optional<String> searchServiceApiVersion
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
Optional<String> tenantId

Cloud MongoDB Atlas Vector Store.

This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.

Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index

class CloudMilvusVectorStore:

Cloud Milvus Vector Store.

String uri
Optional<String> token
Optional<String> className
Optional<String> collectionName
Optional<Long> embeddingDimension
Optional<Boolean> supportsNestedMetadataFilters
class CloudAstraDbVectorStore:

Cloud AstraDB Vector Store.

This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.

Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, ‘default_keyspace’

String token

The Astra DB Application Token to use

formatpassword
String apiEndpoint

The Astra DB JSON API endpoint for your database

String collectionName

Collection name to use. If not existing, it will be created

long embeddingDimension

Length of the embedding vectors in use

Optional<String> className
Optional<String> keyspace

The keyspace to use. If not provided, ‘default_keyspace’

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
String name

The name of the data sink.

String projectId
SinkType sinkType
One of the following:
PINECONE("PINECONE")
POSTGRES("POSTGRES")
QDRANT("QDRANT")
AZUREAI_SEARCH("AZUREAI_SEARCH")
MONGODB_ATLAS("MONGODB_ATLAS")
MILVUS("MILVUS")
ASTRA_DB("ASTRA_DB")
Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time
Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time
Optional<EmbeddingModelConfig> embeddingModelConfig

Schema for an embedding model config.

String id

Unique identifier

formatuuid
EmbeddingConfig embeddingConfig

The embedding configuration for the embedding model config.

One of the following:
class AzureOpenAIEmbeddingConfig:
Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
Optional<Type> type

Type of the embedding model.

class CohereEmbeddingConfig:
Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class GeminiEmbeddingConfig:
Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbeddingConfig:

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:
String
boolean
Optional<String> className
Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:
CLS("cls")
MEAN("mean")
LAST("last")
Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.

Optional<Type> type

Type of the embedding model.

class OpenAIEmbeddingConfig:
Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
Optional<Type> type

Type of the embedding model.

class VertexAiEmbeddingConfig:
Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:
DEFAULT("default")
CLASSIFICATION("classification")
CLUSTERING("clustering")
SIMILARITY("similarity")
RETRIEVAL("retrieval")
Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

class BedrockEmbeddingConfig:
Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0
Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

String name

The name of the embedding model config.

String projectId
Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time
Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time
Optional<String> embeddingModelConfigId

The ID of the EmbeddingModelConfig this pipeline is using.

formatuuid
Optional<LlamaParseParameters> llamaParseParameters

Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.

Optional<Boolean> adaptiveLongTable
Optional<Boolean> aggressiveTableExtraction
Optional<Boolean> autoMode
Optional<String> autoModeConfigurationJson
Optional<Boolean> autoModeTriggerOnImageInPage
Optional<String> autoModeTriggerOnRegexpInPage
Optional<Boolean> autoModeTriggerOnTableInPage
Optional<String> autoModeTriggerOnTextInPage
Optional<String> azureOpenAIApiVersion
Optional<String> azureOpenAIDeploymentName
Optional<String> azureOpenAIEndpoint
Optional<String> azureOpenAIKey
Optional<Double> bboxBottom
Optional<Double> bboxLeft
Optional<Double> bboxRight
Optional<Double> bboxTop
Optional<String> boundingBox
Optional<Boolean> compactMarkdownTable
Optional<String> complementalFormattingInstruction
Optional<String> contentGuidelineInstruction
Optional<Boolean> continuousMode
Optional<Boolean> disableImageExtraction
Optional<Boolean> disableOcr
Optional<Boolean> disableReconstruction
Optional<Boolean> doNotCache
Optional<Boolean> doNotUnrollColumns
Optional<Boolean> enableCostOptimizer
Optional<Boolean> extractCharts
Optional<Boolean> extractLayout
Optional<Boolean> extractPrintedPageNumber
Optional<Boolean> fastMode
Optional<String> formattingInstruction
Optional<String> gpt4oApiKey
Optional<Boolean> gpt4oMode
Optional<Boolean> guessXlsxSheetName
Optional<Boolean> hideFooters
Optional<Boolean> hideHeaders
Optional<Boolean> highResOcr
Optional<Boolean> htmlMakeAllElementsVisible
Optional<Boolean> htmlRemoveFixedElements
Optional<Boolean> htmlRemoveNavigationElements
Optional<String> httpProxy
Optional<Boolean> ignoreDocumentElementsForLayoutDetection
Optional<List<ImagesToSave>> imagesToSave
One of the following:
SCREENSHOT("screenshot")
EMBEDDED("embedded")
LAYOUT("layout")
Optional<Boolean> inlineImagesInMarkdown
Optional<String> inputS3Path
Optional<String> inputS3Region
Optional<String> inputUrl
Optional<Boolean> internalIsScreenshotJob
Optional<Boolean> invalidateCache
Optional<Boolean> isFormattingInstruction
Optional<Double> jobTimeoutExtraTimePerPageInSeconds
Optional<Double> jobTimeoutInSeconds
Optional<Boolean> keepPageSeparatorWhenMergingTables
Optional<List<ParsingLanguages>> languages
One of the following:
AF("af")
AZ("az")
BS("bs")
CS("cs")
CY("cy")
DA("da")
DE("de")
EN("en")
ES("es")
ET("et")
FR("fr")
GA("ga")
HR("hr")
HU("hu")
ID("id")
IS("is")
IT("it")
KU("ku")
LA("la")
LT("lt")
LV("lv")
MI("mi")
MS("ms")
MT("mt")
NL("nl")
NO("no")
OC("oc")
PI("pi")
PL("pl")
PT("pt")
RO("ro")
RS_LATIN("rs_latin")
SK("sk")
SL("sl")
SQ("sq")
SV("sv")
SW("sw")
TL("tl")
TR("tr")
UZ("uz")
VI("vi")
AR("ar")
FA("fa")
UG("ug")
UR("ur")
BN("bn")
AS("as")
MNI("mni")
RU("ru")
RS_CYRILLIC("rs_cyrillic")
BE("be")
BG("bg")
UK("uk")
MN("mn")
ABQ("abq")
ADY("ady")
KBD("kbd")
AVA("ava")
DAR("dar")
INH("inh")
CHE("che")
LBE("lbe")
LEZ("lez")
TAB("tab")
TJK("tjk")
HI("hi")
MR("mr")
NE("ne")
BH("bh")
MAI("mai")
ANG("ang")
BHO("bho")
MAH("mah")
SCK("sck")
NEW("new")
GOM("gom")
SA("sa")
BGC("bgc")
TH("th")
CH_SIM("ch_sim")
CH_TRA("ch_tra")
JA("ja")
KO("ko")
TA("ta")
TE("te")
KN("kn")
Optional<Boolean> layoutAware
Optional<Boolean> lineLevelBoundingBox
Optional<String> markdownTableMultilineHeaderSeparator
Optional<Long> maxPages
Optional<Long> maxPagesEnforced
Optional<Boolean> mergeTablesAcrossPagesInMarkdown
Optional<String> model
Optional<Boolean> outlinedTableExtraction
Optional<Boolean> outputPdfOfDocument
Optional<String> outputS3PathPrefix
Optional<String> outputS3Region
Optional<Boolean> outputTablesAsHtml
Optional<Double> pageErrorTolerance
Optional<String> pageHeaderPrefix
Optional<String> pageHeaderSuffix
Optional<String> pagePrefix
Optional<String> pageSeparator
Optional<String> pageSuffix
Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:
PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")
PARSE_PAGE_WITH_LLM("parse_page_with_llm")
PARSE_PAGE_WITH_LVM("parse_page_with_lvm")
PARSE_PAGE_WITH_AGENT("parse_page_with_agent")
PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")
PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")
PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")
PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")
Optional<String> parsingInstruction
Optional<Boolean> preciseBoundingBox
Optional<Boolean> premiumMode
Optional<Boolean> presentationOutOfBoundsContent
Optional<Boolean> presentationSkipEmbeddedData
Optional<Boolean> preserveLayoutAlignmentAcrossPages
Optional<Boolean> preserveVerySmallText
Optional<String> preset
Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:
LOW("low")
MEDIUM("medium")
HIGH("high")
CRITICAL("critical")
Optional<String> projectId
Optional<Boolean> removeHiddenText
Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:
RAW_TEXT("raw_text")
BLANK_PAGE("blank_page")
ERROR_MESSAGE("error_message")
Optional<String> replaceFailedPageWithErrorMessagePrefix
Optional<String> replaceFailedPageWithErrorMessageSuffix
Optional<Boolean> saveImages
Optional<Boolean> skipDiagonalText
Optional<Boolean> specializedChartParsingAgentic
Optional<Boolean> specializedChartParsingEfficient
Optional<Boolean> specializedChartParsingPlus
Optional<Boolean> specializedImageParsing
Optional<Boolean> spreadsheetExtractSubTables
Optional<Boolean> spreadsheetForceFormulaComputation
Optional<Boolean> spreadsheetIncludeHiddenSheets
Optional<Boolean> strictModeBuggyFont
Optional<Boolean> strictModeImageExtraction
Optional<Boolean> strictModeImageOcr
Optional<Boolean> strictModeReconstruction
Optional<Boolean> structuredOutput
Optional<String> structuredOutputJsonSchema
Optional<String> structuredOutputJsonSchemaName
Optional<String> systemPrompt
Optional<String> systemPromptAppend
Optional<Boolean> takeScreenshot
Optional<String> targetPages
Optional<String> tier
Optional<Boolean> useVendorMultimodalModel
Optional<String> userPrompt
Optional<String> vendorMultimodalApiKey
Optional<String> vendorMultimodalModelName
Optional<String> version
Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:
EXTRACT_PENDING("extract.pending")
EXTRACT_SUCCESS("extract.success")
EXTRACT_ERROR("extract.error")
EXTRACT_PARTIAL_SUCCESS("extract.partial_success")
EXTRACT_CANCELLED("extract.cancelled")
PARSE_PENDING("parse.pending")
PARSE_RUNNING("parse.running")
PARSE_SUCCESS("parse.success")
PARSE_ERROR("parse.error")
PARSE_PARTIAL_SUCCESS("parse.partial_success")
PARSE_CANCELLED("parse.cancelled")
CLASSIFY_PENDING("classify.pending")
CLASSIFY_RUNNING("classify.running")
CLASSIFY_SUCCESS("classify.success")
CLASSIFY_ERROR("classify.error")
CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")
CLASSIFY_CANCELLED("classify.cancelled")
SHEETS_PENDING("sheets.pending")
SHEETS_SUCCESS("sheets.success")
SHEETS_ERROR("sheets.error")
SHEETS_PARTIAL_SUCCESS("sheets.partial_success")
SHEETS_CANCELLED("sheets.cancelled")
UNMAPPED_EVENT("unmapped_event")
Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl
Optional<String> managedPipelineId

The ID of the ManagedPipeline this playground pipeline is linked to.

formatuuid
Optional<PipelineMetadataConfig> metadataConfig

Metadata configuration for the pipeline.

Optional<List<String>> excludedEmbedMetadataKeys

List of metadata keys to exclude from embeddings

Optional<List<String>> excludedLlmMetadataKeys

List of metadata keys to exclude from LLM during retrieval

Optional<PipelineType> pipelineType

Type of pipeline. Either PLAYGROUND or MANAGED.

One of the following:
PLAYGROUND("PLAYGROUND")
MANAGED("MANAGED")
Optional<PresetRetrievalParams> presetRetrievalParameters

Preset retrieval parameters for the pipeline.

Optional<Double> alpha

Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.

maximum1
minimum0
Optional<String> className
Optional<Double> denseSimilarityCutoff

Minimum similarity score wrt query for retrieval

maximum1
minimum0
Optional<Long> denseSimilarityTopK

Number of nodes for dense retrieval.

maximum100
minimum1
Optional<Boolean> enableReranking

Enable reranking for retrieval

Optional<Long> filesTopK

Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).

maximum5
minimum1
Optional<Long> rerankTopN

Number of reranked nodes for returning.

maximum100
minimum1
Optional<RetrievalMode> retrievalMode

The retrieval mode for the query.

One of the following:
CHUNKS("chunks")
FILES_VIA_METADATA("files_via_metadata")
FILES_VIA_CONTENT("files_via_content")
AUTO_ROUTED("auto_routed")
DeprecatedOptional<Boolean> retrieveImageNodes

Whether to retrieve image nodes.

Optional<Boolean> retrievePageFigureNodes

Whether to retrieve page figure nodes.

Optional<Boolean> retrievePageScreenshotNodes

Whether to retrieve page screenshot nodes.

Optional<MetadataFilters> searchFilters

Metadata filters for vector stores.

List<Filter> filters
One of the following:
class MetadataFilter:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

String key
Optional<Value> value
One of the following:
double
String
List<String>
List<double>
List<long>
Optional<Operator> operator

Vector store filter operator.

One of the following:
EQUALS("==")
GREATER(">")
LESS("<")
NOT_EQUALS("!=")
GREATER_OR_EQUALS(">=")
LESS_OR_EQUALS("<=")
IN("in")
NIN("nin")
ANY("any")
ALL("all")
TEXT_MATCH("text_match")
TEXT_MATCH_INSENSITIVE("text_match_insensitive")
CONTAINS("contains")
IS_EMPTY("is_empty")
MetadataFilters
Optional<Condition> condition

Vector store filter conditions to combine different filters.

One of the following:
AND("and")
OR("or")
NOT("not")
Optional<SearchFiltersInferenceSchema> searchFiltersInferenceSchema

JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.

One of the following:
class UnionMember0:
List<JsonValue>
String
double
boolean
Optional<Long> sparseSimilarityTopK

Number of nodes for sparse retrieval.

maximum100
minimum1
Optional<SparseModelConfig> sparseModelConfig

Configuration for sparse embedding models used in hybrid search.

This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.

Optional<String> className
Optional<ModelType> modelType

The sparse model type to use. ‘bm25’ uses Qdrant’s FastEmbed BM25 model (default for new pipelines), ‘splade’ uses HuggingFace Splade model, ‘auto’ selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).

One of the following:
SPLADE("splade")
BM25("bm25")
AUTO("auto")
Optional<Status> status

Status of the pipeline.

One of the following:
CREATED("CREATED")
DELETING("DELETING")
Optional<TransformConfig> transformConfig

Configuration for the transformation.

One of the following:
class AutoTransformConfig:
Optional<Long> chunkOverlap

Chunk overlap for the transformation.

Optional<Long> chunkSize

Chunk size for the transformation.

exclusiveMinimum0
Optional<Mode> mode
class AdvancedModeTransformConfig:
Optional<ChunkingConfig> chunkingConfig

Configuration for the chunking.

One of the following:
class NoneChunkingConfig:
Optional<Mode> mode
class CharacterChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
class TokenChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
Optional<String> separator
class SentenceChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
Optional<String> paragraphSeparator
Optional<String> separator
class SemanticChunkingConfig:
Optional<Long> breakpointPercentileThreshold
Optional<Long> bufferSize
Optional<Mode> mode
Optional<Mode> mode
Optional<SegmentationConfig> segmentationConfig

Configuration for the segmentation.

One of the following:
class NoneSegmentationConfig:
Optional<Mode> mode
class PageSegmentationConfig:
Optional<Mode> mode
Optional<String> pageSeparator
class ElementSegmentationConfig:
Optional<Mode> mode
Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time
class PipelineCreate:

Schema for creating a pipeline.

String name
Optional<DataSinkCreate> dataSink

Schema for creating a data sink.

Component component

Component that implements the data sink

One of the following:
class UnionMember0:
class CloudPineconeVectorStore:

Cloud Pinecone Vector Store.

This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.

Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion

String apiKey

The API key for authenticating with Pinecone

formatpassword
String indexName
Optional<String> className
Optional<InsertKwargs> insertKwargs
Optional<String> namespace
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
class CloudPostgresVectorStore:
String database
long embedDim
String host
String password
long port
String schemaName
String tableName
String user
Optional<String> className
Optional<PgVectorHnswSettings> hnswSettings

HNSW settings for PGVector.

Optional<DistanceMethod> distanceMethod

The distance method to use.

One of the following:
L2("l2")
IP("ip")
COSINE("cosine")
L1("l1")
HAMMING("hamming")
JACCARD("jaccard")
Optional<Long> efConstruction

The number of edges to use during the construction phase.

minimum1

The number of edges to use during the search phase.

minimum1
Optional<Long> m

The number of bi-directional links created for each new element.

minimum1
Optional<VectorType> vectorType

The type of vector to use.

One of the following:
VECTOR("vector")
HALF_VEC("half_vec")
BIT("bit")
SPARSE_VEC("sparse_vec")
Optional<Boolean> performSetup
Optional<Boolean> supportsNestedMetadataFilters
class CloudQdrantVectorStore:

Cloud Qdrant Vector Store.

This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.

Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client

String apiKey
String collectionName
String url
Optional<String> className
Optional<ClientKwargs> clientKwargs
Optional<Long> maxRetries
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
class CloudAzureAiSearchVectorStore:

Cloud Azure AI Search Vector Store.

String searchServiceApiKey
String searchServiceEndpoint
Optional<String> className
Optional<String> clientId
Optional<String> clientSecret
Optional<Long> embeddingDimension
Optional<FilterableMetadataFieldKeys> filterableMetadataFieldKeys
Optional<String> indexName
Optional<String> searchServiceApiVersion
Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
Optional<String> tenantId

Cloud MongoDB Atlas Vector Store.

This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.

Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index

class CloudMilvusVectorStore:

Cloud Milvus Vector Store.

String uri
Optional<String> token
Optional<String> className
Optional<String> collectionName
Optional<Long> embeddingDimension
Optional<Boolean> supportsNestedMetadataFilters
class CloudAstraDbVectorStore:

Cloud AstraDB Vector Store.

This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.

Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, ‘default_keyspace’

String token

The Astra DB Application Token to use

formatpassword
String apiEndpoint

The Astra DB JSON API endpoint for your database

String collectionName

Collection name to use. If not existing, it will be created

long embeddingDimension

Length of the embedding vectors in use

Optional<String> className
Optional<String> keyspace

The keyspace to use. If not provided, ‘default_keyspace’

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters
String name

The name of the data sink.

SinkType sinkType
One of the following:
PINECONE("PINECONE")
POSTGRES("POSTGRES")
QDRANT("QDRANT")
AZUREAI_SEARCH("AZUREAI_SEARCH")
MONGODB_ATLAS("MONGODB_ATLAS")
MILVUS("MILVUS")
ASTRA_DB("ASTRA_DB")
Optional<String> dataSinkId

Data sink ID. When provided instead of data_sink, the data sink will be looked up by ID.

formatuuid
Optional<EmbeddingConfig> embeddingConfig
One of the following:
class AzureOpenAIEmbeddingConfig:
Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
Optional<Type> type

Type of the embedding model.

class CohereEmbeddingConfig:
Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class GeminiEmbeddingConfig:
Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbeddingConfig:

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:
String
boolean
Optional<String> className
Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:
CLS("cls")
MEAN("mean")
LAST("last")
Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.

Optional<Type> type

Type of the embedding model.

class OpenAIEmbeddingConfig:
Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className
Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

Maximum number of retries.

minimum0
Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0
Optional<Type> type

Type of the embedding model.

class VertexAiEmbeddingConfig:
Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:
DEFAULT("default")
CLASSIFICATION("classification")
CLUSTERING("clustering")
SIMILARITY("similarity")
RETRIEVAL("retrieval")
Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

class BedrockEmbeddingConfig:
Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0
Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

Optional<String> embeddingModelConfigId

Embedding model config ID. When provided instead of embedding_config, the embedding model config will be looked up by ID.

formatuuid
Optional<LlamaParseParameters> llamaParseParameters

Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.

Optional<Boolean> adaptiveLongTable
Optional<Boolean> aggressiveTableExtraction
Optional<Boolean> autoMode
Optional<String> autoModeConfigurationJson
Optional<Boolean> autoModeTriggerOnImageInPage
Optional<String> autoModeTriggerOnRegexpInPage
Optional<Boolean> autoModeTriggerOnTableInPage
Optional<String> autoModeTriggerOnTextInPage
Optional<String> azureOpenAIApiVersion
Optional<String> azureOpenAIDeploymentName
Optional<String> azureOpenAIEndpoint
Optional<String> azureOpenAIKey
Optional<Double> bboxBottom
Optional<Double> bboxLeft
Optional<Double> bboxRight
Optional<Double> bboxTop
Optional<String> boundingBox
Optional<Boolean> compactMarkdownTable
Optional<String> complementalFormattingInstruction
Optional<String> contentGuidelineInstruction
Optional<Boolean> continuousMode
Optional<Boolean> disableImageExtraction
Optional<Boolean> disableOcr
Optional<Boolean> disableReconstruction
Optional<Boolean> doNotCache
Optional<Boolean> doNotUnrollColumns
Optional<Boolean> enableCostOptimizer
Optional<Boolean> extractCharts
Optional<Boolean> extractLayout
Optional<Boolean> extractPrintedPageNumber
Optional<Boolean> fastMode
Optional<String> formattingInstruction
Optional<String> gpt4oApiKey
Optional<Boolean> gpt4oMode
Optional<Boolean> guessXlsxSheetName
Optional<Boolean> hideFooters
Optional<Boolean> hideHeaders
Optional<Boolean> highResOcr
Optional<Boolean> htmlMakeAllElementsVisible
Optional<Boolean> htmlRemoveFixedElements
Optional<Boolean> htmlRemoveNavigationElements
Optional<String> httpProxy
Optional<Boolean> ignoreDocumentElementsForLayoutDetection
Optional<List<ImagesToSave>> imagesToSave
One of the following:
SCREENSHOT("screenshot")
EMBEDDED("embedded")
LAYOUT("layout")
Optional<Boolean> inlineImagesInMarkdown
Optional<String> inputS3Path
Optional<String> inputS3Region
Optional<String> inputUrl
Optional<Boolean> internalIsScreenshotJob
Optional<Boolean> invalidateCache
Optional<Boolean> isFormattingInstruction
Optional<Double> jobTimeoutExtraTimePerPageInSeconds
Optional<Double> jobTimeoutInSeconds
Optional<Boolean> keepPageSeparatorWhenMergingTables
Optional<List<ParsingLanguages>> languages
One of the following:
AF("af")
AZ("az")
BS("bs")
CS("cs")
CY("cy")
DA("da")
DE("de")
EN("en")
ES("es")
ET("et")
FR("fr")
GA("ga")
HR("hr")
HU("hu")
ID("id")
IS("is")
IT("it")
KU("ku")
LA("la")
LT("lt")
LV("lv")
MI("mi")
MS("ms")
MT("mt")
NL("nl")
NO("no")
OC("oc")
PI("pi")
PL("pl")
PT("pt")
RO("ro")
RS_LATIN("rs_latin")
SK("sk")
SL("sl")
SQ("sq")
SV("sv")
SW("sw")
TL("tl")
TR("tr")
UZ("uz")
VI("vi")
AR("ar")
FA("fa")
UG("ug")
UR("ur")
BN("bn")
AS("as")
MNI("mni")
RU("ru")
RS_CYRILLIC("rs_cyrillic")
BE("be")
BG("bg")
UK("uk")
MN("mn")
ABQ("abq")
ADY("ady")
KBD("kbd")
AVA("ava")
DAR("dar")
INH("inh")
CHE("che")
LBE("lbe")
LEZ("lez")
TAB("tab")
TJK("tjk")
HI("hi")
MR("mr")
NE("ne")
BH("bh")
MAI("mai")
ANG("ang")
BHO("bho")
MAH("mah")
SCK("sck")
NEW("new")
GOM("gom")
SA("sa")
BGC("bgc")
TH("th")
CH_SIM("ch_sim")
CH_TRA("ch_tra")
JA("ja")
KO("ko")
TA("ta")
TE("te")
KN("kn")
Optional<Boolean> layoutAware
Optional<Boolean> lineLevelBoundingBox
Optional<String> markdownTableMultilineHeaderSeparator
Optional<Long> maxPages
Optional<Long> maxPagesEnforced
Optional<Boolean> mergeTablesAcrossPagesInMarkdown
Optional<String> model
Optional<Boolean> outlinedTableExtraction
Optional<Boolean> outputPdfOfDocument
Optional<String> outputS3PathPrefix
Optional<String> outputS3Region
Optional<Boolean> outputTablesAsHtml
Optional<Double> pageErrorTolerance
Optional<String> pageHeaderPrefix
Optional<String> pageHeaderSuffix
Optional<String> pagePrefix
Optional<String> pageSeparator
Optional<String> pageSuffix
Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:
PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")
PARSE_PAGE_WITH_LLM("parse_page_with_llm")
PARSE_PAGE_WITH_LVM("parse_page_with_lvm")
PARSE_PAGE_WITH_AGENT("parse_page_with_agent")
PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")
PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")
PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")
PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")
Optional<String> parsingInstruction
Optional<Boolean> preciseBoundingBox
Optional<Boolean> premiumMode
Optional<Boolean> presentationOutOfBoundsContent
Optional<Boolean> presentationSkipEmbeddedData
Optional<Boolean> preserveLayoutAlignmentAcrossPages
Optional<Boolean> preserveVerySmallText
Optional<String> preset
Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:
LOW("low")
MEDIUM("medium")
HIGH("high")
CRITICAL("critical")
Optional<String> projectId
Optional<Boolean> removeHiddenText
Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:
RAW_TEXT("raw_text")
BLANK_PAGE("blank_page")
ERROR_MESSAGE("error_message")
Optional<String> replaceFailedPageWithErrorMessagePrefix
Optional<String> replaceFailedPageWithErrorMessageSuffix
Optional<Boolean> saveImages
Optional<Boolean> skipDiagonalText
Optional<Boolean> specializedChartParsingAgentic
Optional<Boolean> specializedChartParsingEfficient
Optional<Boolean> specializedChartParsingPlus
Optional<Boolean> specializedImageParsing
Optional<Boolean> spreadsheetExtractSubTables
Optional<Boolean> spreadsheetForceFormulaComputation
Optional<Boolean> spreadsheetIncludeHiddenSheets
Optional<Boolean> strictModeBuggyFont
Optional<Boolean> strictModeImageExtraction
Optional<Boolean> strictModeImageOcr
Optional<Boolean> strictModeReconstruction
Optional<Boolean> structuredOutput
Optional<String> structuredOutputJsonSchema
Optional<String> structuredOutputJsonSchemaName
Optional<String> systemPrompt
Optional<String> systemPromptAppend
Optional<Boolean> takeScreenshot
Optional<String> targetPages
Optional<String> tier
Optional<Boolean> useVendorMultimodalModel
Optional<String> userPrompt
Optional<String> vendorMultimodalApiKey
Optional<String> vendorMultimodalModelName
Optional<String> version
Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:
EXTRACT_PENDING("extract.pending")
EXTRACT_SUCCESS("extract.success")
EXTRACT_ERROR("extract.error")
EXTRACT_PARTIAL_SUCCESS("extract.partial_success")
EXTRACT_CANCELLED("extract.cancelled")
PARSE_PENDING("parse.pending")
PARSE_RUNNING("parse.running")
PARSE_SUCCESS("parse.success")
PARSE_ERROR("parse.error")
PARSE_PARTIAL_SUCCESS("parse.partial_success")
PARSE_CANCELLED("parse.cancelled")
CLASSIFY_PENDING("classify.pending")
CLASSIFY_RUNNING("classify.running")
CLASSIFY_SUCCESS("classify.success")
CLASSIFY_ERROR("classify.error")
CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")
CLASSIFY_CANCELLED("classify.cancelled")
SHEETS_PENDING("sheets.pending")
SHEETS_SUCCESS("sheets.success")
SHEETS_ERROR("sheets.error")
SHEETS_PARTIAL_SUCCESS("sheets.partial_success")
SHEETS_CANCELLED("sheets.cancelled")
UNMAPPED_EVENT("unmapped_event")
Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl
Optional<String> managedPipelineId

The ID of the ManagedPipeline this playground pipeline is linked to.

formatuuid
Optional<PipelineMetadataConfig> metadataConfig

Metadata configuration for the pipeline.

Optional<List<String>> excludedEmbedMetadataKeys

List of metadata keys to exclude from embeddings

Optional<List<String>> excludedLlmMetadataKeys

List of metadata keys to exclude from LLM during retrieval

Optional<PipelineType> pipelineType

Type of pipeline. Either PLAYGROUND or MANAGED.

One of the following:
PLAYGROUND("PLAYGROUND")
MANAGED("MANAGED")
Optional<PresetRetrievalParams> presetRetrievalParameters

Preset retrieval parameters for the pipeline.

Optional<Double> alpha

Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.

maximum1
minimum0
Optional<String> className
Optional<Double> denseSimilarityCutoff

Minimum similarity score wrt query for retrieval

maximum1
minimum0
Optional<Long> denseSimilarityTopK

Number of nodes for dense retrieval.

maximum100
minimum1
Optional<Boolean> enableReranking

Enable reranking for retrieval

Optional<Long> filesTopK

Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).

maximum5
minimum1
Optional<Long> rerankTopN

Number of reranked nodes for returning.

maximum100
minimum1
Optional<RetrievalMode> retrievalMode

The retrieval mode for the query.

One of the following:
CHUNKS("chunks")
FILES_VIA_METADATA("files_via_metadata")
FILES_VIA_CONTENT("files_via_content")
AUTO_ROUTED("auto_routed")
DeprecatedOptional<Boolean> retrieveImageNodes

Whether to retrieve image nodes.

Optional<Boolean> retrievePageFigureNodes

Whether to retrieve page figure nodes.

Optional<Boolean> retrievePageScreenshotNodes

Whether to retrieve page screenshot nodes.

Optional<MetadataFilters> searchFilters

Metadata filters for vector stores.

List<Filter> filters
One of the following:
class MetadataFilter:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

String key
Optional<Value> value
One of the following:
double
String
List<String>
List<double>
List<long>
Optional<Operator> operator

Vector store filter operator.

One of the following:
EQUALS("==")
GREATER(">")
LESS("<")
NOT_EQUALS("!=")
GREATER_OR_EQUALS(">=")
LESS_OR_EQUALS("<=")
IN("in")
NIN("nin")
ANY("any")
ALL("all")
TEXT_MATCH("text_match")
TEXT_MATCH_INSENSITIVE("text_match_insensitive")
CONTAINS("contains")
IS_EMPTY("is_empty")
MetadataFilters
Optional<Condition> condition

Vector store filter conditions to combine different filters.

One of the following:
AND("and")
OR("or")
NOT("not")
Optional<SearchFiltersInferenceSchema> searchFiltersInferenceSchema

JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.

One of the following:
class UnionMember0:
List<JsonValue>
String
double
boolean
Optional<Long> sparseSimilarityTopK

Number of nodes for sparse retrieval.

maximum100
minimum1
Optional<SparseModelConfig> sparseModelConfig

Configuration for sparse embedding models used in hybrid search.

This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.

Optional<String> className
Optional<ModelType> modelType

The sparse model type to use. ‘bm25’ uses Qdrant’s FastEmbed BM25 model (default for new pipelines), ‘splade’ uses HuggingFace Splade model, ‘auto’ selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).

One of the following:
SPLADE("splade")
BM25("bm25")
AUTO("auto")
Optional<String> status

Status of the pipeline deployment.

Optional<TransformConfig> transformConfig

Configuration for the transformation.

One of the following:
class AutoTransformConfig:
Optional<Long> chunkOverlap

Chunk overlap for the transformation.

Optional<Long> chunkSize

Chunk size for the transformation.

exclusiveMinimum0
Optional<Mode> mode
class AdvancedModeTransformConfig:
Optional<ChunkingConfig> chunkingConfig

Configuration for the chunking.

One of the following:
class NoneChunkingConfig:
Optional<Mode> mode
class CharacterChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
class TokenChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
Optional<String> separator
class SentenceChunkingConfig:
Optional<Long> chunkOverlap
Optional<Long> chunkSize
Optional<Mode> mode
Optional<String> paragraphSeparator
Optional<String> separator
class SemanticChunkingConfig:
Optional<Long> breakpointPercentileThreshold
Optional<Long> bufferSize
Optional<Mode> mode
Optional<Mode> mode
Optional<SegmentationConfig> segmentationConfig

Configuration for the segmentation.

One of the following:
class NoneSegmentationConfig:
Optional<Mode> mode
class PageSegmentationConfig:
Optional<Mode> mode
Optional<String> pageSeparator
class ElementSegmentationConfig:
Optional<Mode> mode
class PipelineMetadataConfig:
Optional<List<String>> excludedEmbedMetadataKeys

List of metadata keys to exclude from embeddings

Optional<List<String>> excludedLlmMetadataKeys

List of metadata keys to exclude from LLM during retrieval

enum PipelineType:

Enum for representing the type of a pipeline

PLAYGROUND("PLAYGROUND")
MANAGED("MANAGED")
class PresetRetrievalParams:

Schema for the search params for an retrieval execution that can be preset for a pipeline.

Optional<Double> alpha

Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.

maximum1
minimum0
Optional<String> className
Optional<Double> denseSimilarityCutoff

Minimum similarity score wrt query for retrieval

maximum1
minimum0
Optional<Long> denseSimilarityTopK

Number of nodes for dense retrieval.

maximum100
minimum1
Optional<Boolean> enableReranking

Enable reranking for retrieval

Optional<Long> filesTopK

Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).

maximum5
minimum1
Optional<Long> rerankTopN

Number of reranked nodes for returning.

maximum100
minimum1
Optional<RetrievalMode> retrievalMode

The retrieval mode for the query.

One of the following:
CHUNKS("chunks")
FILES_VIA_METADATA("files_via_metadata")
FILES_VIA_CONTENT("files_via_content")
AUTO_ROUTED("auto_routed")
DeprecatedOptional<Boolean> retrieveImageNodes

Whether to retrieve image nodes.

Optional<Boolean> retrievePageFigureNodes

Whether to retrieve page figure nodes.

Optional<Boolean> retrievePageScreenshotNodes

Whether to retrieve page screenshot nodes.

Optional<MetadataFilters> searchFilters

Metadata filters for vector stores.

List<Filter> filters
One of the following:
class MetadataFilter:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

String key
Optional<Value> value
One of the following:
double
String
List<String>
List<double>
List<long>
Optional<Operator> operator

Vector store filter operator.

One of the following:
EQUALS("==")
GREATER(">")
LESS("<")
NOT_EQUALS("!=")
GREATER_OR_EQUALS(">=")
LESS_OR_EQUALS("<=")
IN("in")
NIN("nin")
ANY("any")
ALL("all")
TEXT_MATCH("text_match")
TEXT_MATCH_INSENSITIVE("text_match_insensitive")
CONTAINS("contains")
IS_EMPTY("is_empty")
MetadataFilters
Optional<Condition> condition

Vector store filter conditions to combine different filters.

One of the following:
AND("and")
OR("or")
NOT("not")
Optional<SearchFiltersInferenceSchema> searchFiltersInferenceSchema

JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.

One of the following:
class UnionMember0:
List<JsonValue>
String
double
boolean
Optional<Long> sparseSimilarityTopK

Number of nodes for sparse retrieval.

maximum100
minimum1
enum RetrievalMode:
CHUNKS("chunks")
FILES_VIA_METADATA("files_via_metadata")
FILES_VIA_CONTENT("files_via_content")
AUTO_ROUTED("auto_routed")
class SparseModelConfig:

Configuration for sparse embedding models used in hybrid search.

This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.

Optional<String> className
Optional<ModelType> modelType

The sparse model type to use. ‘bm25’ uses Qdrant’s FastEmbed BM25 model (default for new pipelines), ‘splade’ uses HuggingFace Splade model, ‘auto’ selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).

One of the following:
SPLADE("splade")
BM25("bm25")
AUTO("auto")
class VertexAiEmbeddingConfig:
Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:
DEFAULT("default")
CLASSIFICATION("classification")
CLUSTERING("clustering")
SIMILARITY("similarity")
RETRIEVAL("retrieval")
Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

class VertexTextEmbedding:
Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className
Optional<Long> embedBatchSize

The batch size for embedding calls.

maximum2048
exclusiveMinimum0
Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:
DEFAULT("default")
CLASSIFICATION("classification")
CLUSTERING("clustering")
SIMILARITY("similarity")
RETRIEVAL("retrieval")
Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

PipelinesSync

Sync Pipeline
Pipeline pipelines().sync().create(SyncCreateParamsparams = SyncCreateParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
POST/api/v1/pipelines/{pipeline_id}/sync
Cancel Pipeline Sync
Pipeline pipelines().sync().cancel(SyncCancelParamsparams = SyncCancelParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
POST/api/v1/pipelines/{pipeline_id}/sync/cancel

PipelinesData Sources

List Pipeline Data Sources
List<PipelineDataSource> pipelines().dataSources().getDataSources(DataSourceGetDataSourcesParamsparams = DataSourceGetDataSourcesParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/data-sources
Add Data Sources To Pipeline
List<PipelineDataSource> pipelines().dataSources().updateDataSources(DataSourceUpdateDataSourcesParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
PUT/api/v1/pipelines/{pipeline_id}/data-sources
Update Pipeline Data Source
PipelineDataSource pipelines().dataSources().update(DataSourceUpdateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
PUT/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}
Get Pipeline Data Source Status
ManagedIngestionStatusResponse pipelines().dataSources().getStatus(DataSourceGetStatusParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}/status
Sync Pipeline Data Source
Pipeline pipelines().dataSources().sync(DataSourceSyncParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
POST/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}/sync
ModelsExpand Collapse
class PipelineDataSource:

Schema for a data source in a pipeline.

String id

Unique identifier

formatuuid
Component component

Component that implements the data source

One of the following:
class UnionMember0:
class CloudS3DataSource:
String bucket

The name of the S3 bucket to read from.

Optional<String> awsAccessId

The AWS access ID to use for authentication.

Optional<String> awsAccessSecret

The AWS access secret to use for authentication.

formatpassword
Optional<String> className
Optional<String> prefix

The prefix of the S3 objects to read from.

Optional<String> regexPattern

The regex pattern to filter S3 objects. Must be a valid regex pattern.

Optional<String> s3EndpointUrl

The S3 endpoint URL to use for authentication.

Optional<Boolean> supportsAccessControl
class CloudAzStorageBlobDataSource:
String accountUrl

The Azure Storage Blob account URL to use for authentication.

String containerName

The name of the Azure Storage Blob container to read from.

Optional<String> accountKey

The Azure Storage Blob account key to use for authentication.

formatpassword
Optional<String> accountName

The Azure Storage Blob account name to use for authentication.

Optional<String> blob

The blob name to read from.

Optional<String> className
Optional<String> clientId

The Azure AD client ID to use for authentication.

Optional<String> clientSecret

The Azure AD client secret to use for authentication.

formatpassword
Optional<String> prefix

The prefix of the Azure Storage Blob objects to read from.

Optional<Boolean> supportsAccessControl
Optional<String> tenantId

The Azure AD tenant ID to use for authentication.

class CloudGoogleDriveDataSource:
String folderId

The ID of the Google Drive folder to read from.

Optional<String> className
Optional<ServiceAccountKey> serviceAccountKey

A dictionary containing secret values

Optional<Boolean> supportsAccessControl
class CloudOneDriveDataSource:
String clientId

The client ID to use for authentication.

String clientSecret

The client secret to use for authentication.

formatpassword
String tenantId

The tenant ID to use for authentication.

String userPrincipalName

The user principal name to use for authentication.

Optional<String> className
Optional<String> folderId

The ID of the OneDrive folder to read from.

Optional<String> folderPath

The path of the OneDrive folder to read from.

Optional<List<String>> requiredExts

The list of required file extensions.

Optional<SupportsAccessControl> supportsAccessControl
class CloudSharepointDataSource:
String clientId

The client ID to use for authentication.

String clientSecret

The client secret to use for authentication.

formatpassword
String tenantId

The tenant ID to use for authentication.

Optional<String> className
Optional<String> driveName

The name of the Sharepoint drive to read from.

Optional<List<String>> excludePathPatterns

List of regex patterns for file paths to exclude. Files whose paths (including filename) match any pattern will be excluded. Example: [‘/temp/’, ‘/backup/’, ‘.git/’, ‘.tmp$’, ’^~’]

Optional<String> folderId

The ID of the Sharepoint folder to read from.

Optional<String> folderPath

The path of the Sharepoint folder to read from.

Optional<Boolean> getPermissions

Whether to get permissions for the sharepoint site.

Optional<List<String>> includePathPatterns

List of regex patterns for file paths to include. Full paths (including filename) must match at least one pattern to be included. Example: [‘/reports/’, ‘/docs/..pdf$’, ‘^Report..pdf$’]

Optional<List<String>> requiredExts

The list of required file extensions.

Optional<String> siteId

The ID of the SharePoint site to download from.

Optional<String> siteName

The name of the SharePoint site to download from.

Optional<SupportsAccessControl> supportsAccessControl
class CloudSlackDataSource:
String slackToken

Slack Bot Token.

formatpassword
Optional<String> channelIds

Slack Channel.

Optional<String> channelPatterns

Slack Channel name pattern.

Optional<String> className
Optional<String> earliestDate

Earliest date.

Optional<Double> earliestDateTimestamp

Earliest date timestamp.

Optional<String> latestDate

Latest date.

Optional<Double> latestDateTimestamp

Latest date timestamp.

Optional<Boolean> supportsAccessControl
class CloudNotionPageDataSource:
String integrationToken

The integration token to use for authentication.

formatpassword
Optional<String> className
Optional<String> databaseIds

The Notion Database Id to read content from.

Optional<String> pageIds

The Page ID’s of the Notion to read from.

Optional<Boolean> supportsAccessControl
class CloudConfluenceDataSource:
String authenticationMechanism

Type of Authentication for connecting to Confluence APIs.

String serverUrl

The server URL of the Confluence instance.

Optional<String> apiToken

The API token to use for authentication.

formatpassword
Optional<String> className
Optional<String> cql

The CQL query to use for fetching pages.

Optional<FailureHandlingConfig> failureHandling

Configuration for handling failures during processing. Key-value object controlling failure handling behaviors.

Example: { “skip_list_failures”: true }

Currently supports:

  • skip_list_failures: Skip failed batches/lists and continue processing
Optional<Boolean> skipListFailures

Whether to skip failed batches/lists and continue processing

Optional<Boolean> indexRestrictedPages

Whether to index restricted pages.

Optional<Boolean> keepMarkdownFormat

Whether to keep the markdown format.

Optional<String> label

The label to use for fetching pages.

Optional<String> pageIds

The page IDs of the Confluence to read from.

Optional<String> spaceKey

The space key to read from.

Optional<Boolean> supportsAccessControl
Optional<String> userName

The username to use for authentication.

class CloudJiraDataSource:

Cloud Jira Data Source integrating JiraReader.

String authenticationMechanism

Type of Authentication for connecting to Jira APIs.

String query

JQL (Jira Query Language) query to search.

Optional<String> apiToken

The API/ Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword
Optional<String> className
Optional<String> cloudId

The cloud ID, used in case of OAuth2.

Optional<String> email

The email address to use for authentication.

Optional<String> serverUrl

The server url for Jira Cloud.

Optional<Boolean> supportsAccessControl
class CloudJiraDataSourceV2:

Cloud Jira Data Source integrating JiraReaderV2.

String authenticationMechanism

Type of Authentication for connecting to Jira APIs.

String query

JQL (Jira Query Language) query to search.

String serverUrl

The server url for Jira Cloud.

Optional<String> apiToken

The API Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword
Optional<ApiVersion> apiVersion

Jira REST API version to use (2 or 3). 3 supports Atlassian Document Format (ADF).

One of the following:
_2("2")
_3("3")
Optional<String> className
Optional<String> cloudId

The cloud ID, used in case of OAuth2.

Optional<String> email

The email address to use for authentication.

Optional<String> expand

Fields to expand in the response.

Optional<List<String>> fields

List of fields to retrieve from Jira. If None, retrieves all fields.

Optional<Boolean> getPermissions

Whether to fetch project role permissions and issue-level security

Optional<Long> requestsPerMinute

Rate limit for Jira API requests per minute.

Optional<Boolean> supportsAccessControl
class CloudBoxDataSource:
AuthenticationMechanism authenticationMechanism

The type of authentication to use (Developer Token or CCG)

One of the following:
DEVELOPER_TOKEN("developer_token")
CCG("ccg")
Optional<String> className
Optional<String> clientId

Box API key used for identifying the application the user is authenticating with

Optional<String> clientSecret

Box API secret used for making auth requests.

formatpassword
Optional<String> developerToken

Developer token for authentication if authentication_mechanism is ‘developer_token’.

formatpassword
Optional<String> enterpriseId

Box Enterprise ID, if provided authenticates as service.

Optional<String> folderId

The ID of the Box folder to read from.

Optional<Boolean> supportsAccessControl
Optional<String> userId

Box User ID, if provided authenticates as user.

String dataSourceId

The ID of the data source.

formatuuid
LocalDateTime lastSyncedAt

The last time the data source was automatically synced.

formatdate-time
String name

The name of the data source.

String pipelineId

The ID of the pipeline.

formatuuid
String projectId
SourceType sourceType
One of the following:
S3("S3")
AZURE_STORAGE_BLOB("AZURE_STORAGE_BLOB")
GOOGLE_DRIVE("GOOGLE_DRIVE")
MICROSOFT_ONEDRIVE("MICROSOFT_ONEDRIVE")
MICROSOFT_SHAREPOINT("MICROSOFT_SHAREPOINT")
SLACK("SLACK")
NOTION_PAGE("NOTION_PAGE")
CONFLUENCE("CONFLUENCE")
JIRA("JIRA")
JIRA_V2("JIRA_V2")
BOX("BOX")
Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time
Optional<CustomMetadata> customMetadata

Custom metadata that will be present on all data loaded from the data source

One of the following:
class UnionMember0:
List<JsonValue>
String
double
boolean
Optional<Status> status

The status of the data source in the pipeline.

One of the following:
NOT_STARTED("NOT_STARTED")
IN_PROGRESS("IN_PROGRESS")
SUCCESS("SUCCESS")
ERROR("ERROR")
CANCELLED("CANCELLED")
Optional<LocalDateTime> statusUpdatedAt

The last time the status was updated.

formatdate-time
Optional<Double> syncInterval

The interval at which the data source should be synced.

Optional<String> syncScheduleSetBy

The id of the user who set the sync schedule.

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time
Optional<DataSourceReaderVersionMetadata> versionMetadata

Version metadata for the data source

Optional<ReaderVersion> readerVersion

The version of the reader to use for this data source.

One of the following:
_1_0("1.0")
_2_0("2.0")
_2_1("2.1")

PipelinesImages

List File Page Screenshots
List<ImageListPageScreenshotsResponse> pipelines().images().listPageScreenshots(ImageListPageScreenshotsParamsparams = ImageListPageScreenshotsParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/files/{id}/page_screenshots
Get File Page Screenshot
JsonValue pipelines().images().getPageScreenshot(ImageGetPageScreenshotParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/files/{id}/page_screenshots/{page_index}
Get File Page Figure
JsonValue pipelines().images().getPageFigure(ImageGetPageFigureParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/files/{id}/page-figures/{page_index}/{figure_name}
List File Pages Figures
List<ImageListPageFiguresResponse> pipelines().images().listPageFigures(ImageListPageFiguresParamsparams = ImageListPageFiguresParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/files/{id}/page-figures

PipelinesFiles

Get Pipeline File Status Counts
FileGetStatusCountsResponse pipelines().files().getStatusCounts(FileGetStatusCountsParamsparams = FileGetStatusCountsParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/files/status-counts
Get Pipeline File Status
ManagedIngestionStatusResponse pipelines().files().getStatus(FileGetStatusParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/files/{file_id}/status
Add Files To Pipeline Api
List<PipelineFile> pipelines().files().create(FileCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
PUT/api/v1/pipelines/{pipeline_id}/files
Update Pipeline File
PipelineFile pipelines().files().update(FileUpdateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
PUT/api/v1/pipelines/{pipeline_id}/files/{file_id}
Delete Pipeline File
pipelines().files().delete(FileDeleteParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
DELETE/api/v1/pipelines/{pipeline_id}/files/{file_id}
List Pipeline Files2
Deprecated
FileListPage pipelines().files().list(FileListParamsparams = FileListParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/files2
ModelsExpand Collapse
class PipelineFile:

A file associated with a pipeline.

String id

Unique identifier for the pipeline file.

formatuuid
String pipelineId

The ID of the pipeline that the file is associated with.

formatuuid
Optional<ConfigHash> configHash

Hashes for the configuration of the pipeline.

One of the following:
class UnionMember0:
List<JsonValue>
String
double
boolean
Optional<LocalDateTime> createdAt

When the pipeline file was created.

formatdate-time
Optional<CustomMetadata> customMetadata

Custom metadata for the file.

One of the following:
class UnionMember0:
List<JsonValue>
String
double
boolean
Optional<String> dataSourceId

The ID of the data source that the file belongs to.

formatuuid
Optional<String> externalFileId

The ID of the file in the external system.

Optional<String> fileId

The ID of the file.

formatuuid
Optional<Long> fileSize

Size of the file in bytes.

Optional<String> fileType

File type (e.g. pdf, docx, etc.).

Optional<Long> indexedPageCount

The number of pages that have been indexed for this file.

Optional<LocalDateTime> lastModifiedAt

The last modified time of the file.

formatdate-time
Optional<String> name

Name of the file.

Optional<PermissionInfo> permissionInfo

Permission information for the file.

One of the following:
class UnionMember0:
List<JsonValue>
String
double
boolean
Optional<String> projectId

The ID of the project that the file belongs to.

formatuuid
Optional<ResourceInfo> resourceInfo

Resource information for the file.

One of the following:
class UnionMember0:
List<JsonValue>
String
double
boolean
Optional<Status> status

Status of the pipeline file.

One of the following:
NOT_STARTED("NOT_STARTED")
IN_PROGRESS("IN_PROGRESS")
SUCCESS("SUCCESS")
ERROR("ERROR")
CANCELLED("CANCELLED")
Optional<LocalDateTime> statusUpdatedAt

The last time the status was updated.

formatdate-time
Optional<LocalDateTime> updatedAt

When the pipeline file was last updated.

formatdate-time

PipelinesMetadata

Import Pipeline Metadata
MetadataCreateResponse pipelines().metadata().create(MetadataCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
PUT/api/v1/pipelines/{pipeline_id}/metadata
Delete Pipeline Files Metadata
pipelines().metadata().deleteAll(MetadataDeleteAllParamsparams = MetadataDeleteAllParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
DELETE/api/v1/pipelines/{pipeline_id}/metadata

PipelinesDocuments

Create Batch Pipeline Documents
List<CloudDocument> pipelines().documents().create(DocumentCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
POST/api/v1/pipelines/{pipeline_id}/documents
Paginated List Pipeline Documents
DocumentListPage pipelines().documents().list(DocumentListParamsparams = DocumentListParams.none(), RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/documents/paginated
Get Pipeline Document
CloudDocument pipelines().documents().get(DocumentGetParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/documents/{document_id}
Delete Pipeline Document
pipelines().documents().delete(DocumentDeleteParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
DELETE/api/v1/pipelines/{pipeline_id}/documents/{document_id}
Get Pipeline Document Status
ManagedIngestionStatusResponse pipelines().documents().getStatus(DocumentGetStatusParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/documents/{document_id}/status
Sync Pipeline Document
JsonValue pipelines().documents().sync(DocumentSyncParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
POST/api/v1/pipelines/{pipeline_id}/documents/{document_id}/sync
List Pipeline Document Chunks
List<TextNode> pipelines().documents().getChunks(DocumentGetChunksParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
GET/api/v1/pipelines/{pipeline_id}/documents/{document_id}/chunks
Upsert Batch Pipeline Documents
List<CloudDocument> pipelines().documents().upsert(DocumentUpsertParamsparams, RequestOptionsrequestOptions = RequestOptions.none())
PUT/api/v1/pipelines/{pipeline_id}/documents
ModelsExpand Collapse
class CloudDocument:

Cloud document stored in S3.

String id
Metadata metadata
String text
Optional<List<String>> excludedEmbedMetadataKeys
Optional<List<String>> excludedLlmMetadataKeys
Optional<List<Long>> pagePositions

indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

Optional<StatusMetadata> statusMetadata
class CloudDocumentCreate:

Create a new cloud document.

Metadata metadata
String text
Optional<String> id
Optional<List<String>> excludedEmbedMetadataKeys
Optional<List<String>> excludedLlmMetadataKeys
Optional<List<Long>> pagePositions

indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].

class TextNode:

Provided for backward compatibility.

Optional<String> className
Optional<List<Double>> embedding

Embedding of the node.

Optional<Long> endCharIdx

End char index of the node.

Optional<List<String>> excludedEmbedMetadataKeys

Metadata keys that are excluded from text for the embed model.

Optional<List<String>> excludedLlmMetadataKeys

Metadata keys that are excluded from text for the LLM.

Optional<ExtraInfo> extraInfo

A flat dictionary of metadata fields

Optional<String> id

Unique ID of the node.

Optional<String> metadataSeperator

Separator between metadata fields when converting to string.

Optional<String> metadataTemplate

Template for how metadata is formatted, with {key} and {value} placeholders.

Optional<String> mimetype

MIME type of the node content.

Optional<Relationships> relationships

A mapping of relationships to other node information.

One of the following:
class RelatedNodeInfo:
String nodeId
Optional<String> className
Optional<String> hash
Optional<Metadata> metadata
Optional<NodeType> nodeType
One of the following:
_1("1")
_2("2")
_3("3")
_4("4")
_5("5")
List<RelatedNodeInfo>
String nodeId
Optional<String> className
Optional<String> hash
Optional<Metadata> metadata
Optional<NodeType> nodeType
One of the following:
_1("1")
_2("2")
_3("3")
_4("4")
_5("5")
Optional<Long> startCharIdx

Start char index of the node.

Optional<String> text

Text content of the node.

Optional<String> textTemplate

Template for how text is formatted, with {content} and {metadata_str} placeholders.