Pipelines
Search Pipelines
Create Pipeline
Get Pipeline
Update Existing Pipeline
Delete Pipeline
Get Pipeline Status
Upsert Pipeline
Run Search
ModelsExpand Collapse
AdvancedModeTransformConfig { chunking_config, mode, segmentation_config }
chunking_config?: NoneChunkingConfig { mode } | CharacterChunkingConfig { chunk_overlap, chunk_size, mode } | TokenChunkingConfig { chunk_overlap, chunk_size, mode, separator } | 2 more
Configuration for the chunking.
NoneChunkingConfig { mode }
CharacterChunkingConfig { chunk_overlap, chunk_size, mode }
TokenChunkingConfig { chunk_overlap, chunk_size, mode, separator }
SentenceChunkingConfig { chunk_overlap, chunk_size, mode, 2 more }
SemanticChunkingConfig { breakpoint_percentile_threshold, buffer_size, mode }
segmentation_config?: NoneSegmentationConfig { mode } | PageSegmentationConfig { mode, page_separator } | ElementSegmentationConfig { mode }
Configuration for the segmentation.
NoneSegmentationConfig { mode }
PageSegmentationConfig { mode, page_separator }
ElementSegmentationConfig { mode }
AutoTransformConfig { chunk_overlap, chunk_size, mode }
chunk_overlap?: number
Chunk overlap for the transformation.
chunk_size?: number
Chunk size for the transformation.
AzureOpenAIEmbedding { additional_kwargs, api_base, api_key, 12 more }
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string
The base URL for Azure deployment.
api_key?: string | null
The OpenAI API key.
api_version?: string
The version for Azure OpenAI API.
azure_deployment?: string | null
The Azure deployment to use.
azure_endpoint?: string | null
The Azure endpoint to use.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
AzureOpenAIEmbeddingConfig { component, type }
Configuration for the Azure OpenAI embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string
The base URL for Azure deployment.
api_key?: string | null
The OpenAI API key.
api_version?: string
The version for Azure OpenAI API.
azure_deployment?: string | null
The Azure deployment to use.
azure_endpoint?: string | null
The Azure endpoint to use.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
type?: "AZURE_EMBEDDING"
Type of the embedding model.
BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
additional_kwargs?: Record<string, unknown>
Additional kwargs for the bedrock client.
aws_access_key_id?: string | null
AWS Access Key ID to use
aws_secret_access_key?: string | null
AWS Secret Access Key to use
aws_session_token?: string | null
AWS Session Token to use
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
The maximum number of API retries.
model_name?: string
The modelId of the Bedrock model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
profile_name?: string | null
The name of aws profile to use. If not given, then the default profile is used.
region_name?: string | null
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout?: number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
BedrockEmbeddingConfig { component, type }
component?: BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
Configuration for the Bedrock embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the bedrock client.
aws_access_key_id?: string | null
AWS Access Key ID to use
aws_secret_access_key?: string | null
AWS Secret Access Key to use
aws_session_token?: string | null
AWS Session Token to use
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
The maximum number of API retries.
model_name?: string
The modelId of the Bedrock model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
profile_name?: string | null
The name of aws profile to use. If not given, then the default profile is used.
region_name?: string | null
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout?: number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
type?: "BEDROCK_EMBEDDING"
Type of the embedding model.
CohereEmbedding { api_key, class_name, embed_batch_size, 5 more }
api_key: string | null
The Cohere API key.
embed_batch_size?: number
The batch size for embedding calls.
embedding_type?: string
Embedding type. If not provided float embedding_type is used when needed.
input_type?: string | null
Model Input type. If not provided, search_document and search_query are used when needed.
model_name?: string
The modelId of the Cohere model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
truncate?: string
Truncation type - START/ END/ NONE
CohereEmbeddingConfig { component, type }
Configuration for the Cohere embedding model.
api_key: string | null
The Cohere API key.
embed_batch_size?: number
The batch size for embedding calls.
embedding_type?: string
Embedding type. If not provided float embedding_type is used when needed.
input_type?: string | null
Model Input type. If not provided, search_document and search_query are used when needed.
model_name?: string
The modelId of the Cohere model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
truncate?: string
Truncation type - START/ END/ NONE
type?: "COHERE_EMBEDDING"
Type of the embedding model.
DataSinkCreate { component, name, sink_type }
Schema for creating a data sink.
component: Record<string, unknown> | CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } | CloudPostgresVectorStore { database, embed_dim, host, 10 more } | 5 more
Component that implements the data sink
CloudPineconeVectorStore { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
api_key: string
The API key for authenticating with Pinecone
CloudPostgresVectorStore { database, embed_dim, host, 10 more }
HNSW settings for PGVector.
distance_method?: "l2" | "ip" | "cosine" | 3 more
The distance method to use.
ef_construction?: number
The number of edges to use during the construction phase.
ef_search?: number
The number of edges to use during the search phase.
m?: number
The number of bi-directional links created for each new element.
vector_type?: "vector" | "half_vec" | "bit" | "sparse_vec"
The type of vector to use.
CloudQdrantVectorStore { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
CloudAzureAISearchVectorStore { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
CloudMongoDBAtlasVectorSearch { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
CloudMilvusVectorStore { uri, token, class_name, 3 more }
Cloud Milvus Vector Store.
CloudAstraDBVectorStore { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
token: string
The Astra DB Application Token to use
api_endpoint: string
The Astra DB JSON API endpoint for your database
collection_name: string
Collection name to use. If not existing, it will be created
embedding_dimension: number
Length of the embedding vectors in use
keyspace?: string | null
The keyspace to use. If not provided, 'default_keyspace'
name: string
The name of the data sink.
sink_type: "PINECONE" | "POSTGRES" | "QDRANT" | 4 more
GeminiEmbedding { api_base, api_key, class_name, 6 more }
api_base?: string | null
API base to access the model. Defaults to None.
api_key?: string | null
API key to access the model. Defaults to None.
embed_batch_size?: number
The batch size for embedding calls.
model_name?: string
The modelId of the Gemini model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
task_type?: string | null
The task for embedding model.
title?: string | null
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport?: string | null
Transport to access the model. Defaults to None.
GeminiEmbeddingConfig { component, type }
Configuration for the Gemini embedding model.
api_base?: string | null
API base to access the model. Defaults to None.
api_key?: string | null
API key to access the model. Defaults to None.
embed_batch_size?: number
The batch size for embedding calls.
model_name?: string
The modelId of the Gemini model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
task_type?: string | null
The task for embedding model.
title?: string | null
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport?: string | null
Transport to access the model. Defaults to None.
type?: "GEMINI_EMBEDDING"
Type of the embedding model.
HuggingFaceInferenceAPIEmbedding { token, class_name, cookies, 9 more }
token?: string | boolean | null
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies?: Record<string, string> | null
Additional cookies to send to the server.
embed_batch_size?: number
The batch size for embedding calls.
headers?: Record<string, string> | null
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name?: string | null
Hugging Face model name. If None, the task will be used.
num_workers?: number | null
The number of workers to use for async embedding calls.
pooling?: "cls" | "mean" | "last" | null
Enum of possible pooling choices with pooling behaviors.
query_instruction?: string | null
Instruction to prepend during query embedding.
task?: string | null
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction?: string | null
Instruction to prepend during text embedding.
timeout?: number | null
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
HuggingFaceInferenceAPIEmbeddingConfig { component, type }
Configuration for the HuggingFace Inference API embedding model.
token?: string | boolean | null
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies?: Record<string, string> | null
Additional cookies to send to the server.
embed_batch_size?: number
The batch size for embedding calls.
headers?: Record<string, string> | null
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name?: string | null
Hugging Face model name. If None, the task will be used.
num_workers?: number | null
The number of workers to use for async embedding calls.
pooling?: "cls" | "mean" | "last" | null
Enum of possible pooling choices with pooling behaviors.
query_instruction?: string | null
Instruction to prepend during query embedding.
task?: string | null
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction?: string | null
Instruction to prepend during text embedding.
timeout?: number | null
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
type?: "HUGGINGFACE_API_EMBEDDING"
Type of the embedding model.
LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more }
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
images_to_save?: Array<"screenshot" | "embedded" | "layout"> | null
Enum for representing the mode of parsing to be used.
priority?: "low" | "medium" | "high" | "critical" | null
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
Enum for representing the different available page error handling modes.
webhook_configurations?: Array<WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url } > | null
The outbound webhook configurations
webhook_events?: Array<"extract.pending" | "extract.success" | "extract.error" | 13 more> | null
List of event names to subscribe to
webhook_headers?: Record<string, string> | null
Custom HTTP headers to include with webhook requests.
webhook_output_format?: string | null
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url?: string | null
The URL to send webhook notifications to.
LlmParameters { class_name, model_name, system_prompt, 3 more }
model_name?: "GPT_4O" | "GPT_4O_MINI" | "GPT_4_1" | 11 more
The name of the model to use for LLM completions.
system_prompt?: string | null
The system prompt to use for the completion.
temperature?: number | null
The temperature value for the model.
use_chain_of_thought_reasoning?: boolean | null
Whether to use chain of thought reasoning.
use_citation?: boolean | null
Whether to show citations in the response.
ManagedIngestionStatusResponse { status, deployment_date, effective_at, 2 more }
status: "NOT_STARTED" | "IN_PROGRESS" | "SUCCESS" | 3 more
Status of the ingestion.
deployment_date?: string | null
Date of the deployment.
effective_at?: string | null
When the status is effective
error?: Array<Error> | null
List of errors that occurred during ingestion.
job_id: string
ID of the job that failed.
message: string
List of errors that occurred during ingestion.
step: "MANAGED_INGESTION" | "DATA_SOURCE" | "FILE_UPDATER" | 4 more
Name of the job that failed.
job_id?: string | null
ID of the latest job.
MessageRole = "system" | "developer" | "user" | 5 more
Message role.
MetadataFilters { filters, condition }
Metadata filters for vector stores.
MetadataFilter { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number | string | Array<string> | 2 more | null
operator?: "==" | ">" | "<" | 11 more
Vector store filter operator.
MetadataFilters { filters, condition }
Metadata filters for vector stores.
MetadataFilter { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number | string | Array<string> | 2 more | null
operator?: "==" | ">" | "<" | 11 more
Vector store filter operator.
condition?: "and" | "or" | "not" | null
Vector store filter conditions to combine different filters.
condition?: "and" | "or" | "not" | null
Vector store filter conditions to combine different filters.
OpenAIEmbedding { additional_kwargs, api_base, api_key, 10 more }
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string | null
The base URL for OpenAI API.
api_key?: string | null
The OpenAI API key.
api_version?: string | null
The version for OpenAI API.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
OpenAIEmbeddingConfig { component, type }
Configuration for the OpenAI embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string | null
The base URL for OpenAI API.
api_key?: string | null
The OpenAI API key.
api_version?: string | null
The version for OpenAI API.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
type?: "OPENAI_EMBEDDING"
Type of the embedding model.
PageFigureNodeWithScore { node, score, class_name }
Page figure metadata with score
node: Node { confidence, figure_name, figure_size, 4 more }
confidence: number
The confidence of the figure
figure_name: string
The name of the figure
figure_size: number
The size of the figure in bytes
file_id: string
The ID of the file that the figure was taken from
page_index: number
The index of the page for which the figure is taken (0-indexed)
is_likely_noise?: boolean
Whether the figure is likely to be noise
metadata?: Record<string, unknown> | null
Metadata for the figure
score: number
The score of the figure node
PageScreenshotNodeWithScore { node, score, class_name }
Page screenshot metadata with score
node: Node { file_id, image_size, page_index, metadata }
file_id: string
The ID of the file that the page screenshot was taken from
image_size: number
The size of the image in bytes
page_index: number
The index of the page for which the screenshot is taken (0-indexed)
metadata?: Record<string, unknown> | null
Metadata for the screenshot
score: number
The score of the screenshot node
Pipeline { id, embedding_config, name, 15 more }
Schema for a pipeline.
id: string
Unique identifier
embedding_config: ManagedOpenAIEmbeddingConfig { component, type } | AzureOpenAIEmbeddingConfig { component, type } | CohereEmbeddingConfig { component, type } | 5 more
ManagedOpenAIEmbeddingConfig { component, type }
component?: Component { class_name, embed_batch_size, model_name, num_workers }
Configuration for the Managed OpenAI embedding model.
embed_batch_size?: number
The batch size for embedding calls.
model_name?: "openai-text-embedding-3-small"
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
type?: "MANAGED_OPENAI_EMBEDDING"
Type of the embedding model.
AzureOpenAIEmbeddingConfig { component, type }
Configuration for the Azure OpenAI embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string
The base URL for Azure deployment.
api_key?: string | null
The OpenAI API key.
api_version?: string
The version for Azure OpenAI API.
azure_deployment?: string | null
The Azure deployment to use.
azure_endpoint?: string | null
The Azure endpoint to use.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
type?: "AZURE_EMBEDDING"
Type of the embedding model.
CohereEmbeddingConfig { component, type }
Configuration for the Cohere embedding model.
api_key: string | null
The Cohere API key.
embed_batch_size?: number
The batch size for embedding calls.
embedding_type?: string
Embedding type. If not provided float embedding_type is used when needed.
input_type?: string | null
Model Input type. If not provided, search_document and search_query are used when needed.
model_name?: string
The modelId of the Cohere model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
truncate?: string
Truncation type - START/ END/ NONE
type?: "COHERE_EMBEDDING"
Type of the embedding model.
GeminiEmbeddingConfig { component, type }
Configuration for the Gemini embedding model.
api_base?: string | null
API base to access the model. Defaults to None.
api_key?: string | null
API key to access the model. Defaults to None.
embed_batch_size?: number
The batch size for embedding calls.
model_name?: string
The modelId of the Gemini model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
task_type?: string | null
The task for embedding model.
title?: string | null
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport?: string | null
Transport to access the model. Defaults to None.
type?: "GEMINI_EMBEDDING"
Type of the embedding model.
HuggingFaceInferenceAPIEmbeddingConfig { component, type }
Configuration for the HuggingFace Inference API embedding model.
token?: string | boolean | null
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies?: Record<string, string> | null
Additional cookies to send to the server.
embed_batch_size?: number
The batch size for embedding calls.
headers?: Record<string, string> | null
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name?: string | null
Hugging Face model name. If None, the task will be used.
num_workers?: number | null
The number of workers to use for async embedding calls.
pooling?: "cls" | "mean" | "last" | null
Enum of possible pooling choices with pooling behaviors.
query_instruction?: string | null
Instruction to prepend during query embedding.
task?: string | null
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction?: string | null
Instruction to prepend during text embedding.
timeout?: number | null
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
type?: "HUGGINGFACE_API_EMBEDDING"
Type of the embedding model.
OpenAIEmbeddingConfig { component, type }
Configuration for the OpenAI embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string | null
The base URL for OpenAI API.
api_key?: string | null
The OpenAI API key.
api_version?: string | null
The version for OpenAI API.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
type?: "OPENAI_EMBEDDING"
Type of the embedding model.
VertexAIEmbeddingConfig { component, type }
Configuration for the VertexAI embedding model.
client_email: string | null
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string | null
The private key for the VertexAI credentials.
private_key_id: string | null
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string | null
The token URI for the VertexAI credentials.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the Vertex.
embed_batch_size?: number
The batch size for embedding calls.
embed_mode?: "default" | "classification" | "clustering" | 2 more
The embedding mode to use.
model_name?: string
The modelId of the VertexAI model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
type?: "VERTEXAI_EMBEDDING"
Type of the embedding model.
BedrockEmbeddingConfig { component, type }
component?: BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
Configuration for the Bedrock embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the bedrock client.
aws_access_key_id?: string | null
AWS Access Key ID to use
aws_secret_access_key?: string | null
AWS Secret Access Key to use
aws_session_token?: string | null
AWS Session Token to use
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
The maximum number of API retries.
model_name?: string
The modelId of the Bedrock model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
profile_name?: string | null
The name of aws profile to use. If not given, then the default profile is used.
region_name?: string | null
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout?: number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
type?: "BEDROCK_EMBEDDING"
Type of the embedding model.
config_hash?: ConfigHash | null
Hashes for the configuration of a pipeline.
embedding_config_hash?: string | null
Hash of the embedding config.
parsing_config_hash?: string | null
Hash of the llama parse parameters.
transform_config_hash?: string | null
Hash of the transform config.
created_at?: string | null
Creation datetime
Schema for a data sink.
id: string
Unique identifier
component: Record<string, unknown> | CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } | CloudPostgresVectorStore { database, embed_dim, host, 10 more } | 5 more
Component that implements the data sink
CloudPineconeVectorStore { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
api_key: string
The API key for authenticating with Pinecone
CloudPostgresVectorStore { database, embed_dim, host, 10 more }
HNSW settings for PGVector.
distance_method?: "l2" | "ip" | "cosine" | 3 more
The distance method to use.
ef_construction?: number
The number of edges to use during the construction phase.
ef_search?: number
The number of edges to use during the search phase.
m?: number
The number of bi-directional links created for each new element.
vector_type?: "vector" | "half_vec" | "bit" | "sparse_vec"
The type of vector to use.
CloudQdrantVectorStore { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
CloudAzureAISearchVectorStore { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
CloudMongoDBAtlasVectorSearch { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
CloudMilvusVectorStore { uri, token, class_name, 3 more }
Cloud Milvus Vector Store.
CloudAstraDBVectorStore { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
token: string
The Astra DB Application Token to use
api_endpoint: string
The Astra DB JSON API endpoint for your database
collection_name: string
Collection name to use. If not existing, it will be created
embedding_dimension: number
Length of the embedding vectors in use
keyspace?: string | null
The keyspace to use. If not provided, 'default_keyspace'
name: string
The name of the data sink.
sink_type: "PINECONE" | "POSTGRES" | "QDRANT" | 4 more
created_at?: string | null
Creation datetime
updated_at?: string | null
Update datetime
embedding_model_config?: EmbeddingModelConfig | null
Schema for an embedding model config.
id: string
Unique identifier
embedding_config: AzureOpenAIEmbeddingConfig { component, type } | CohereEmbeddingConfig { component, type } | GeminiEmbeddingConfig { component, type } | 4 more
The embedding configuration for the embedding model config.
AzureOpenAIEmbeddingConfig { component, type }
Configuration for the Azure OpenAI embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string
The base URL for Azure deployment.
api_key?: string | null
The OpenAI API key.
api_version?: string
The version for Azure OpenAI API.
azure_deployment?: string | null
The Azure deployment to use.
azure_endpoint?: string | null
The Azure endpoint to use.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
type?: "AZURE_EMBEDDING"
Type of the embedding model.
CohereEmbeddingConfig { component, type }
Configuration for the Cohere embedding model.
api_key: string | null
The Cohere API key.
embed_batch_size?: number
The batch size for embedding calls.
embedding_type?: string
Embedding type. If not provided float embedding_type is used when needed.
input_type?: string | null
Model Input type. If not provided, search_document and search_query are used when needed.
model_name?: string
The modelId of the Cohere model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
truncate?: string
Truncation type - START/ END/ NONE
type?: "COHERE_EMBEDDING"
Type of the embedding model.
GeminiEmbeddingConfig { component, type }
Configuration for the Gemini embedding model.
api_base?: string | null
API base to access the model. Defaults to None.
api_key?: string | null
API key to access the model. Defaults to None.
embed_batch_size?: number
The batch size for embedding calls.
model_name?: string
The modelId of the Gemini model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
task_type?: string | null
The task for embedding model.
title?: string | null
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport?: string | null
Transport to access the model. Defaults to None.
type?: "GEMINI_EMBEDDING"
Type of the embedding model.
HuggingFaceInferenceAPIEmbeddingConfig { component, type }
Configuration for the HuggingFace Inference API embedding model.
token?: string | boolean | null
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies?: Record<string, string> | null
Additional cookies to send to the server.
embed_batch_size?: number
The batch size for embedding calls.
headers?: Record<string, string> | null
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name?: string | null
Hugging Face model name. If None, the task will be used.
num_workers?: number | null
The number of workers to use for async embedding calls.
pooling?: "cls" | "mean" | "last" | null
Enum of possible pooling choices with pooling behaviors.
query_instruction?: string | null
Instruction to prepend during query embedding.
task?: string | null
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction?: string | null
Instruction to prepend during text embedding.
timeout?: number | null
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
type?: "HUGGINGFACE_API_EMBEDDING"
Type of the embedding model.
OpenAIEmbeddingConfig { component, type }
Configuration for the OpenAI embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string | null
The base URL for OpenAI API.
api_key?: string | null
The OpenAI API key.
api_version?: string | null
The version for OpenAI API.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
type?: "OPENAI_EMBEDDING"
Type of the embedding model.
VertexAIEmbeddingConfig { component, type }
Configuration for the VertexAI embedding model.
client_email: string | null
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string | null
The private key for the VertexAI credentials.
private_key_id: string | null
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string | null
The token URI for the VertexAI credentials.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the Vertex.
embed_batch_size?: number
The batch size for embedding calls.
embed_mode?: "default" | "classification" | "clustering" | 2 more
The embedding mode to use.
model_name?: string
The modelId of the VertexAI model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
type?: "VERTEXAI_EMBEDDING"
Type of the embedding model.
BedrockEmbeddingConfig { component, type }
component?: BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
Configuration for the Bedrock embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the bedrock client.
aws_access_key_id?: string | null
AWS Access Key ID to use
aws_secret_access_key?: string | null
AWS Secret Access Key to use
aws_session_token?: string | null
AWS Session Token to use
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
The maximum number of API retries.
model_name?: string
The modelId of the Bedrock model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
profile_name?: string | null
The name of aws profile to use. If not given, then the default profile is used.
region_name?: string | null
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout?: number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
type?: "BEDROCK_EMBEDDING"
Type of the embedding model.
name: string
The name of the embedding model config.
created_at?: string | null
Creation datetime
updated_at?: string | null
Update datetime
embedding_model_config_id?: string | null
The ID of the EmbeddingModelConfig this pipeline is using.
llama_parse_parameters?: LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more } | null
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
images_to_save?: Array<"screenshot" | "embedded" | "layout"> | null
Enum for representing the mode of parsing to be used.
priority?: "low" | "medium" | "high" | "critical" | null
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
Enum for representing the different available page error handling modes.
webhook_configurations?: Array<WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url } > | null
The outbound webhook configurations
webhook_events?: Array<"extract.pending" | "extract.success" | "extract.error" | 13 more> | null
List of event names to subscribe to
webhook_headers?: Record<string, string> | null
Custom HTTP headers to include with webhook requests.
webhook_output_format?: string | null
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url?: string | null
The URL to send webhook notifications to.
managed_pipeline_id?: string | null
The ID of the ManagedPipeline this playground pipeline is linked to.
metadata_config?: PipelineMetadataConfig { excluded_embed_metadata_keys, excluded_llm_metadata_keys } | null
Metadata configuration for the pipeline.
excluded_embed_metadata_keys?: Array<string>
List of metadata keys to exclude from embeddings
excluded_llm_metadata_keys?: Array<string>
List of metadata keys to exclude from LLM during retrieval
Type of pipeline. Either PLAYGROUND or MANAGED.
preset_retrieval_parameters?: PresetRetrievalParams { alpha, class_name, dense_similarity_cutoff, 11 more }
Preset retrieval parameters for the pipeline.
alpha?: number | null
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
dense_similarity_cutoff?: number | null
Minimum similarity score wrt query for retrieval
dense_similarity_top_k?: number | null
Number of nodes for dense retrieval.
enable_reranking?: boolean | null
Enable reranking for retrieval
files_top_k?: number | null
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
rerank_top_n?: number | null
Number of reranked nodes for returning.
The retrieval mode for the query.
Deprecatedretrieve_image_nodes?: boolean
Whether to retrieve image nodes.
retrieve_page_figure_nodes?: boolean
Whether to retrieve page figure nodes.
retrieve_page_screenshot_nodes?: boolean
Whether to retrieve page screenshot nodes.
Metadata filters for vector stores.
MetadataFilter { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number | string | Array<string> | 2 more | null
operator?: "==" | ">" | "<" | 11 more
Vector store filter operator.
MetadataFilters { filters, condition }
Metadata filters for vector stores.
MetadataFilter { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number | string | Array<string> | 2 more | null
operator?: "==" | ">" | "<" | 11 more
Vector store filter operator.
condition?: "and" | "or" | "not" | null
Vector store filter conditions to combine different filters.
condition?: "and" | "or" | "not" | null
Vector store filter conditions to combine different filters.
search_filters_inference_schema?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.
sparse_similarity_top_k?: number | null
Number of nodes for sparse retrieval.
Configuration for sparse embedding models used in hybrid search.
This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.
model_type?: "splade" | "bm25" | "auto"
The sparse model type to use. 'bm25' uses Qdrant's FastEmbed BM25 model (default for new pipelines), 'splade' uses HuggingFace Splade model, 'auto' selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).
status?: "CREATED" | "DELETING" | null
Status of the pipeline.
transform_config?: AutoTransformConfig { chunk_overlap, chunk_size, mode } | AdvancedModeTransformConfig { chunking_config, mode, segmentation_config }
Configuration for the transformation.
AutoTransformConfig { chunk_overlap, chunk_size, mode }
chunk_overlap?: number
Chunk overlap for the transformation.
chunk_size?: number
Chunk size for the transformation.
AdvancedModeTransformConfig { chunking_config, mode, segmentation_config }
chunking_config?: NoneChunkingConfig { mode } | CharacterChunkingConfig { chunk_overlap, chunk_size, mode } | TokenChunkingConfig { chunk_overlap, chunk_size, mode, separator } | 2 more
Configuration for the chunking.
NoneChunkingConfig { mode }
CharacterChunkingConfig { chunk_overlap, chunk_size, mode }
TokenChunkingConfig { chunk_overlap, chunk_size, mode, separator }
SentenceChunkingConfig { chunk_overlap, chunk_size, mode, 2 more }
SemanticChunkingConfig { breakpoint_percentile_threshold, buffer_size, mode }
segmentation_config?: NoneSegmentationConfig { mode } | PageSegmentationConfig { mode, page_separator } | ElementSegmentationConfig { mode }
Configuration for the segmentation.
NoneSegmentationConfig { mode }
PageSegmentationConfig { mode, page_separator }
ElementSegmentationConfig { mode }
updated_at?: string | null
Update datetime
PipelineCreate { name, data_sink, data_sink_id, 10 more }
Schema for creating a pipeline.
Schema for creating a data sink.
component: Record<string, unknown> | CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } | CloudPostgresVectorStore { database, embed_dim, host, 10 more } | 5 more
Component that implements the data sink
CloudPineconeVectorStore { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
api_key: string
The API key for authenticating with Pinecone
CloudPostgresVectorStore { database, embed_dim, host, 10 more }
HNSW settings for PGVector.
distance_method?: "l2" | "ip" | "cosine" | 3 more
The distance method to use.
ef_construction?: number
The number of edges to use during the construction phase.
ef_search?: number
The number of edges to use during the search phase.
m?: number
The number of bi-directional links created for each new element.
vector_type?: "vector" | "half_vec" | "bit" | "sparse_vec"
The type of vector to use.
CloudQdrantVectorStore { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
CloudAzureAISearchVectorStore { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
CloudMongoDBAtlasVectorSearch { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
CloudMilvusVectorStore { uri, token, class_name, 3 more }
Cloud Milvus Vector Store.
CloudAstraDBVectorStore { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
token: string
The Astra DB Application Token to use
api_endpoint: string
The Astra DB JSON API endpoint for your database
collection_name: string
Collection name to use. If not existing, it will be created
embedding_dimension: number
Length of the embedding vectors in use
keyspace?: string | null
The keyspace to use. If not provided, 'default_keyspace'
name: string
The name of the data sink.
sink_type: "PINECONE" | "POSTGRES" | "QDRANT" | 4 more
data_sink_id?: string | null
Data sink ID. When provided instead of data_sink, the data sink will be looked up by ID.
embedding_config?: AzureOpenAIEmbeddingConfig { component, type } | CohereEmbeddingConfig { component, type } | GeminiEmbeddingConfig { component, type } | 4 more | null
AzureOpenAIEmbeddingConfig { component, type }
Configuration for the Azure OpenAI embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string
The base URL for Azure deployment.
api_key?: string | null
The OpenAI API key.
api_version?: string
The version for Azure OpenAI API.
azure_deployment?: string | null
The Azure deployment to use.
azure_endpoint?: string | null
The Azure endpoint to use.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
type?: "AZURE_EMBEDDING"
Type of the embedding model.
CohereEmbeddingConfig { component, type }
Configuration for the Cohere embedding model.
api_key: string | null
The Cohere API key.
embed_batch_size?: number
The batch size for embedding calls.
embedding_type?: string
Embedding type. If not provided float embedding_type is used when needed.
input_type?: string | null
Model Input type. If not provided, search_document and search_query are used when needed.
model_name?: string
The modelId of the Cohere model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
truncate?: string
Truncation type - START/ END/ NONE
type?: "COHERE_EMBEDDING"
Type of the embedding model.
GeminiEmbeddingConfig { component, type }
Configuration for the Gemini embedding model.
api_base?: string | null
API base to access the model. Defaults to None.
api_key?: string | null
API key to access the model. Defaults to None.
embed_batch_size?: number
The batch size for embedding calls.
model_name?: string
The modelId of the Gemini model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
task_type?: string | null
The task for embedding model.
title?: string | null
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport?: string | null
Transport to access the model. Defaults to None.
type?: "GEMINI_EMBEDDING"
Type of the embedding model.
HuggingFaceInferenceAPIEmbeddingConfig { component, type }
Configuration for the HuggingFace Inference API embedding model.
token?: string | boolean | null
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies?: Record<string, string> | null
Additional cookies to send to the server.
embed_batch_size?: number
The batch size for embedding calls.
headers?: Record<string, string> | null
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name?: string | null
Hugging Face model name. If None, the task will be used.
num_workers?: number | null
The number of workers to use for async embedding calls.
pooling?: "cls" | "mean" | "last" | null
Enum of possible pooling choices with pooling behaviors.
query_instruction?: string | null
Instruction to prepend during query embedding.
task?: string | null
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction?: string | null
Instruction to prepend during text embedding.
timeout?: number | null
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
type?: "HUGGINGFACE_API_EMBEDDING"
Type of the embedding model.
OpenAIEmbeddingConfig { component, type }
Configuration for the OpenAI embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the OpenAI API.
api_base?: string | null
The base URL for OpenAI API.
api_key?: string | null
The OpenAI API key.
api_version?: string | null
The version for OpenAI API.
default_headers?: Record<string, string> | null
The default headers for API requests.
dimensions?: number | null
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
Maximum number of retries.
model_name?: string
The name of the OpenAI embedding model.
num_workers?: number | null
The number of workers to use for async embedding calls.
reuse_client?: boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout?: number
Timeout for each request.
type?: "OPENAI_EMBEDDING"
Type of the embedding model.
VertexAIEmbeddingConfig { component, type }
Configuration for the VertexAI embedding model.
client_email: string | null
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string | null
The private key for the VertexAI credentials.
private_key_id: string | null
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string | null
The token URI for the VertexAI credentials.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the Vertex.
embed_batch_size?: number
The batch size for embedding calls.
embed_mode?: "default" | "classification" | "clustering" | 2 more
The embedding mode to use.
model_name?: string
The modelId of the VertexAI model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
type?: "VERTEXAI_EMBEDDING"
Type of the embedding model.
BedrockEmbeddingConfig { component, type }
component?: BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
Configuration for the Bedrock embedding model.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the bedrock client.
aws_access_key_id?: string | null
AWS Access Key ID to use
aws_secret_access_key?: string | null
AWS Secret Access Key to use
aws_session_token?: string | null
AWS Session Token to use
embed_batch_size?: number
The batch size for embedding calls.
max_retries?: number
The maximum number of API retries.
model_name?: string
The modelId of the Bedrock model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
profile_name?: string | null
The name of aws profile to use. If not given, then the default profile is used.
region_name?: string | null
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout?: number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
type?: "BEDROCK_EMBEDDING"
Type of the embedding model.
embedding_model_config_id?: string | null
Embedding model config ID. When provided instead of embedding_config, the embedding model config will be looked up by ID.
llama_parse_parameters?: LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more }
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
images_to_save?: Array<"screenshot" | "embedded" | "layout"> | null
Enum for representing the mode of parsing to be used.
priority?: "low" | "medium" | "high" | "critical" | null
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
Enum for representing the different available page error handling modes.
webhook_configurations?: Array<WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url } > | null
The outbound webhook configurations
webhook_events?: Array<"extract.pending" | "extract.success" | "extract.error" | 13 more> | null
List of event names to subscribe to
webhook_headers?: Record<string, string> | null
Custom HTTP headers to include with webhook requests.
webhook_output_format?: string | null
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url?: string | null
The URL to send webhook notifications to.
managed_pipeline_id?: string | null
The ID of the ManagedPipeline this playground pipeline is linked to.
metadata_config?: PipelineMetadataConfig { excluded_embed_metadata_keys, excluded_llm_metadata_keys } | null
Metadata configuration for the pipeline.
excluded_embed_metadata_keys?: Array<string>
List of metadata keys to exclude from embeddings
excluded_llm_metadata_keys?: Array<string>
List of metadata keys to exclude from LLM during retrieval
Type of pipeline. Either PLAYGROUND or MANAGED.
preset_retrieval_parameters?: PresetRetrievalParams { alpha, class_name, dense_similarity_cutoff, 11 more }
Preset retrieval parameters for the pipeline.
alpha?: number | null
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
dense_similarity_cutoff?: number | null
Minimum similarity score wrt query for retrieval
dense_similarity_top_k?: number | null
Number of nodes for dense retrieval.
enable_reranking?: boolean | null
Enable reranking for retrieval
files_top_k?: number | null
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
rerank_top_n?: number | null
Number of reranked nodes for returning.
The retrieval mode for the query.
Deprecatedretrieve_image_nodes?: boolean
Whether to retrieve image nodes.
retrieve_page_figure_nodes?: boolean
Whether to retrieve page figure nodes.
retrieve_page_screenshot_nodes?: boolean
Whether to retrieve page screenshot nodes.
Metadata filters for vector stores.
MetadataFilter { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number | string | Array<string> | 2 more | null
operator?: "==" | ">" | "<" | 11 more
Vector store filter operator.
MetadataFilters { filters, condition }
Metadata filters for vector stores.
MetadataFilter { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number | string | Array<string> | 2 more | null
operator?: "==" | ">" | "<" | 11 more
Vector store filter operator.
condition?: "and" | "or" | "not" | null
Vector store filter conditions to combine different filters.
condition?: "and" | "or" | "not" | null
Vector store filter conditions to combine different filters.
search_filters_inference_schema?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.
sparse_similarity_top_k?: number | null
Number of nodes for sparse retrieval.
Configuration for sparse embedding models used in hybrid search.
This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.
model_type?: "splade" | "bm25" | "auto"
The sparse model type to use. 'bm25' uses Qdrant's FastEmbed BM25 model (default for new pipelines), 'splade' uses HuggingFace Splade model, 'auto' selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).
status?: string | null
Status of the pipeline deployment.
transform_config?: AutoTransformConfig { chunk_overlap, chunk_size, mode } | AdvancedModeTransformConfig { chunking_config, mode, segmentation_config } | null
Configuration for the transformation.
AutoTransformConfig { chunk_overlap, chunk_size, mode }
chunk_overlap?: number
Chunk overlap for the transformation.
chunk_size?: number
Chunk size for the transformation.
AdvancedModeTransformConfig { chunking_config, mode, segmentation_config }
chunking_config?: NoneChunkingConfig { mode } | CharacterChunkingConfig { chunk_overlap, chunk_size, mode } | TokenChunkingConfig { chunk_overlap, chunk_size, mode, separator } | 2 more
Configuration for the chunking.
NoneChunkingConfig { mode }
CharacterChunkingConfig { chunk_overlap, chunk_size, mode }
TokenChunkingConfig { chunk_overlap, chunk_size, mode, separator }
SentenceChunkingConfig { chunk_overlap, chunk_size, mode, 2 more }
SemanticChunkingConfig { breakpoint_percentile_threshold, buffer_size, mode }
segmentation_config?: NoneSegmentationConfig { mode } | PageSegmentationConfig { mode, page_separator } | ElementSegmentationConfig { mode }
Configuration for the segmentation.
NoneSegmentationConfig { mode }
PageSegmentationConfig { mode, page_separator }
ElementSegmentationConfig { mode }
PipelineMetadataConfig { excluded_embed_metadata_keys, excluded_llm_metadata_keys }
excluded_embed_metadata_keys?: Array<string>
List of metadata keys to exclude from embeddings
excluded_llm_metadata_keys?: Array<string>
List of metadata keys to exclude from LLM during retrieval
PipelineType = "PLAYGROUND" | "MANAGED"
Enum for representing the type of a pipeline
PresetRetrievalParams { alpha, class_name, dense_similarity_cutoff, 11 more }
Schema for the search params for an retrieval execution that can be preset for a pipeline.
alpha?: number | null
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
dense_similarity_cutoff?: number | null
Minimum similarity score wrt query for retrieval
dense_similarity_top_k?: number | null
Number of nodes for dense retrieval.
enable_reranking?: boolean | null
Enable reranking for retrieval
files_top_k?: number | null
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
rerank_top_n?: number | null
Number of reranked nodes for returning.
The retrieval mode for the query.
Deprecatedretrieve_image_nodes?: boolean
Whether to retrieve image nodes.
retrieve_page_figure_nodes?: boolean
Whether to retrieve page figure nodes.
retrieve_page_screenshot_nodes?: boolean
Whether to retrieve page screenshot nodes.
Metadata filters for vector stores.
MetadataFilter { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number | string | Array<string> | 2 more | null
operator?: "==" | ">" | "<" | 11 more
Vector store filter operator.
MetadataFilters { filters, condition }
Metadata filters for vector stores.
MetadataFilter { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number | string | Array<string> | 2 more | null
operator?: "==" | ">" | "<" | 11 more
Vector store filter operator.
condition?: "and" | "or" | "not" | null
Vector store filter conditions to combine different filters.
condition?: "and" | "or" | "not" | null
Vector store filter conditions to combine different filters.
search_filters_inference_schema?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.
sparse_similarity_top_k?: number | null
Number of nodes for sparse retrieval.
RetrievalMode = "chunks" | "files_via_metadata" | "files_via_content" | "auto_routed"
SparseModelConfig { class_name, model_type }
Configuration for sparse embedding models used in hybrid search.
This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.
model_type?: "splade" | "bm25" | "auto"
The sparse model type to use. 'bm25' uses Qdrant's FastEmbed BM25 model (default for new pipelines), 'splade' uses HuggingFace Splade model, 'auto' selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).
VertexAIEmbeddingConfig { component, type }
Configuration for the VertexAI embedding model.
client_email: string | null
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string | null
The private key for the VertexAI credentials.
private_key_id: string | null
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string | null
The token URI for the VertexAI credentials.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the Vertex.
embed_batch_size?: number
The batch size for embedding calls.
embed_mode?: "default" | "classification" | "clustering" | 2 more
The embedding mode to use.
model_name?: string
The modelId of the VertexAI model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
type?: "VERTEXAI_EMBEDDING"
Type of the embedding model.
VertexTextEmbedding { client_email, location, private_key, 9 more }
client_email: string | null
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string | null
The private key for the VertexAI credentials.
private_key_id: string | null
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string | null
The token URI for the VertexAI credentials.
additional_kwargs?: Record<string, unknown>
Additional kwargs for the Vertex.
embed_batch_size?: number
The batch size for embedding calls.
embed_mode?: "default" | "classification" | "clustering" | 2 more
The embedding mode to use.
model_name?: string
The modelId of the VertexAI model to use.
num_workers?: number | null
The number of workers to use for async embedding calls.
PipelinesSync
Sync Pipeline
Cancel Pipeline Sync
PipelinesData Sources
List Pipeline Data Sources
Add Data Sources To Pipeline
Update Pipeline Data Source
Get Pipeline Data Source Status
Sync Pipeline Data Source
ModelsExpand Collapse
PipelineDataSource { id, component, data_source_id, 13 more }
Schema for a data source in a pipeline.
id: string
Unique identifier
component: Record<string, unknown> | CloudS3DataSource { bucket, aws_access_id, aws_access_secret, 5 more } | CloudAzStorageBlobDataSource { account_url, container_name, account_key, 8 more } | 8 more
Component that implements the data source
CloudS3DataSource { bucket, aws_access_id, aws_access_secret, 5 more }
bucket: string
The name of the S3 bucket to read from.
aws_access_id?: string | null
The AWS access ID to use for authentication.
aws_access_secret?: string | null
The AWS access secret to use for authentication.
prefix?: string | null
The prefix of the S3 objects to read from.
regex_pattern?: string | null
The regex pattern to filter S3 objects. Must be a valid regex pattern.
s3_endpoint_url?: string | null
The S3 endpoint URL to use for authentication.
CloudAzStorageBlobDataSource { account_url, container_name, account_key, 8 more }
account_url: string
The Azure Storage Blob account URL to use for authentication.
container_name: string
The name of the Azure Storage Blob container to read from.
account_key?: string | null
The Azure Storage Blob account key to use for authentication.
account_name?: string | null
The Azure Storage Blob account name to use for authentication.
blob?: string | null
The blob name to read from.
client_id?: string | null
The Azure AD client ID to use for authentication.
client_secret?: string | null
The Azure AD client secret to use for authentication.
prefix?: string | null
The prefix of the Azure Storage Blob objects to read from.
tenant_id?: string | null
The Azure AD tenant ID to use for authentication.
CloudOneDriveDataSource { client_id, client_secret, tenant_id, 6 more }
client_id: string
The client ID to use for authentication.
client_secret: string
The client secret to use for authentication.
tenant_id: string
The tenant ID to use for authentication.
user_principal_name: string
The user principal name to use for authentication.
folder_id?: string | null
The ID of the OneDrive folder to read from.
folder_path?: string | null
The path of the OneDrive folder to read from.
required_exts?: Array<string> | null
The list of required file extensions.
CloudSharepointDataSource { client_id, client_secret, tenant_id, 11 more }
client_id: string
The client ID to use for authentication.
client_secret: string
The client secret to use for authentication.
tenant_id: string
The tenant ID to use for authentication.
drive_name?: string | null
The name of the Sharepoint drive to read from.
exclude_path_patterns?: Array<string> | null
List of regex patterns for file paths to exclude. Files whose paths (including filename) match any pattern will be excluded. Example: ['/temp/', '/backup/', '.git/', '.tmp$', '^~']
folder_id?: string | null
The ID of the Sharepoint folder to read from.
folder_path?: string | null
The path of the Sharepoint folder to read from.
get_permissions?: boolean | null
Whether to get permissions for the sharepoint site.
include_path_patterns?: Array<string> | null
List of regex patterns for file paths to include. Full paths (including filename) must match at least one pattern to be included. Example: ['/reports/', '/docs/..pdf$', '^Report..pdf$']
required_exts?: Array<string> | null
The list of required file extensions.
site_id?: string | null
The ID of the SharePoint site to download from.
site_name?: string | null
The name of the SharePoint site to download from.
CloudSlackDataSource { slack_token, channel_ids, channel_patterns, 6 more }
slack_token: string
Slack Bot Token.
channel_ids?: string | null
Slack Channel.
channel_patterns?: string | null
Slack Channel name pattern.
earliest_date?: string | null
Earliest date.
earliest_date_timestamp?: number | null
Earliest date timestamp.
latest_date?: string | null
Latest date.
latest_date_timestamp?: number | null
Latest date timestamp.
CloudNotionPageDataSource { integration_token, class_name, database_ids, 2 more }
integration_token: string
The integration token to use for authentication.
database_ids?: string | null
The Notion Database Id to read content from.
page_ids?: string | null
The Page ID's of the Notion to read from.
CloudConfluenceDataSource { authentication_mechanism, server_url, api_token, 10 more }
authentication_mechanism: string
Type of Authentication for connecting to Confluence APIs.
server_url: string
The server URL of the Confluence instance.
api_token?: string | null
The API token to use for authentication.
cql?: string | null
The CQL query to use for fetching pages.
Configuration for handling failures during processing. Key-value object controlling failure handling behaviors.
Example: { "skip_list_failures": true }
Currently supports:
- skip_list_failures: Skip failed batches/lists and continue processing
skip_list_failures?: boolean
Whether to skip failed batches/lists and continue processing
index_restricted_pages?: boolean
Whether to index restricted pages.
keep_markdown_format?: boolean
Whether to keep the markdown format.
label?: string | null
The label to use for fetching pages.
page_ids?: string | null
The page IDs of the Confluence to read from.
space_key?: string | null
The space key to read from.
user_name?: string | null
The username to use for authentication.
CloudJiraDataSource { authentication_mechanism, query, api_token, 5 more }
Cloud Jira Data Source integrating JiraReader.
authentication_mechanism: string
Type of Authentication for connecting to Jira APIs.
query: string
JQL (Jira Query Language) query to search.
api_token?: string | null
The API/ Access Token used for Basic, PAT and OAuth2 authentication.
cloud_id?: string | null
The cloud ID, used in case of OAuth2.
email?: string | null
The email address to use for authentication.
server_url?: string | null
The server url for Jira Cloud.
CloudJiraDataSourceV2 { authentication_mechanism, query, server_url, 10 more }
Cloud Jira Data Source integrating JiraReaderV2.
authentication_mechanism: string
Type of Authentication for connecting to Jira APIs.
query: string
JQL (Jira Query Language) query to search.
server_url: string
The server url for Jira Cloud.
api_token?: string | null
The API Access Token used for Basic, PAT and OAuth2 authentication.
api_version?: "2" | "3"
Jira REST API version to use (2 or 3). 3 supports Atlassian Document Format (ADF).
cloud_id?: string | null
The cloud ID, used in case of OAuth2.
email?: string | null
The email address to use for authentication.
expand?: string | null
Fields to expand in the response.
fields?: Array<string> | null
List of fields to retrieve from Jira. If None, retrieves all fields.
get_permissions?: boolean
Whether to fetch project role permissions and issue-level security
requests_per_minute?: number | null
Rate limit for Jira API requests per minute.
CloudBoxDataSource { authentication_mechanism, class_name, client_id, 6 more }
authentication_mechanism: "developer_token" | "ccg"
The type of authentication to use (Developer Token or CCG)
client_id?: string | null
Box API key used for identifying the application the user is authenticating with
client_secret?: string | null
Box API secret used for making auth requests.
developer_token?: string | null
Developer token for authentication if authentication_mechanism is 'developer_token'.
enterprise_id?: string | null
Box Enterprise ID, if provided authenticates as service.
folder_id?: string | null
The ID of the Box folder to read from.
user_id?: string | null
Box User ID, if provided authenticates as user.
data_source_id: string
The ID of the data source.
last_synced_at: string
The last time the data source was automatically synced.
name: string
The name of the data source.
pipeline_id: string
The ID of the pipeline.
source_type: "S3" | "AZURE_STORAGE_BLOB" | "GOOGLE_DRIVE" | 8 more
created_at?: string | null
Creation datetime
custom_metadata?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Custom metadata that will be present on all data loaded from the data source
status?: "NOT_STARTED" | "IN_PROGRESS" | "SUCCESS" | 2 more | null
The status of the data source in the pipeline.
status_updated_at?: string | null
The last time the status was updated.
sync_interval?: number | null
The interval at which the data source should be synced.
sync_schedule_set_by?: string | null
The id of the user who set the sync schedule.
updated_at?: string | null
Update datetime
Version metadata for the data source
reader_version?: "1.0" | "2.0" | "2.1" | null
The version of the reader to use for this data source.
PipelinesImages
List File Page Screenshots
Get File Page Screenshot
Get File Page Figure
List File Pages Figures
PipelinesFiles
Get Pipeline File Status Counts
Get Pipeline File Status
Add Files To Pipeline Api
Update Pipeline File
Delete Pipeline File
List Pipeline Files2
ModelsExpand Collapse
PipelineFile { id, pipeline_id, config_hash, 16 more }
Schema for a file that is associated with a pipeline.
id: string
Unique identifier
pipeline_id: string
The ID of the pipeline that the file is associated with
config_hash?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Hashes for the configuration of the pipeline.
created_at?: string | null
Creation datetime
custom_metadata?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Custom metadata for the file
data_source_id?: string | null
The ID of the data source that the file belongs to
external_file_id?: string | null
The ID of the file in the external system
file_id?: string | null
The ID of the file
file_size?: number | null
Size of the file in bytes
file_type?: string | null
File type (e.g. pdf, docx, etc.)
indexed_page_count?: number | null
The number of pages that have been indexed for this file
last_modified_at?: string | null
The last modified time of the file
name?: string | null
Name of the file
permission_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Permission information for the file
project_id?: string | null
The ID of the project that the file belongs to
resource_info?: Record<string, Record<string, unknown> | Array<unknown> | string | 2 more | null> | null
Resource information for the file
status?: "NOT_STARTED" | "IN_PROGRESS" | "SUCCESS" | 2 more | null
Status of the pipeline file
status_updated_at?: string | null
The last time the status was updated
updated_at?: string | null
Update datetime
PipelinesMetadata
Import Pipeline Metadata
Delete Pipeline Files Metadata
PipelinesDocuments
Create Batch Pipeline Documents
Paginated List Pipeline Documents
Get Pipeline Document
Delete Pipeline Document
Get Pipeline Document Status
Sync Pipeline Document
List Pipeline Document Chunks
Upsert Batch Pipeline Documents
ModelsExpand Collapse
CloudDocument { id, metadata, text, 4 more }
Cloud document stored in S3.
page_positions?: Array<number> | null
indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].
CloudDocumentCreate { metadata, text, id, 3 more }
Create a new cloud document.
page_positions?: Array<number> | null
indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].
TextNode { class_name, embedding, end_char_idx, 11 more }
Provided for backward compatibility.
Note: we keep the field with the typo "seperator" to maintain backward compatibility for serialized objects.
embedding?: Array<number> | null
Embedding of the node.
end_char_idx?: number | null
End char index of the node.
excluded_embed_metadata_keys?: Array<string>
Metadata keys that are excluded from text for the embed model.
excluded_llm_metadata_keys?: Array<string>
Metadata keys that are excluded from text for the LLM.
extra_info?: Record<string, unknown>
A flat dictionary of metadata fields
id_?: string
Unique ID of the node.
metadata_seperator?: string
Separator between metadata fields when converting to string.
metadata_template?: string
Template for how metadata is formatted, with {key} and {value} placeholders.
mimetype?: string
MIME type of the node content.
relationships?: Record<string, RelatedNodeInfo { node_id, class_name, hash, 2 more } | Array<UnionMember1>>
A mapping of relationships to other node information.
RelatedNodeInfo { node_id, class_name, hash, 2 more }
node_type?: "1" | "2" | "3" | 2 more | (string & {}) | null
"1" | "2" | "3" | 2 more
Array<UnionMember1>
node_type?: "1" | "2" | "3" | 2 more | (string & {}) | null
"1" | "2" | "3" | 2 more
start_char_idx?: number | null
Start char index of the node.
text?: string
Text content of the node.
text_template?: string
Template for how text is formatted, with {content} and {metadata_str} placeholders.