Pipelines
Search Pipelines
Create Pipeline
Get Pipeline
Update Existing Pipeline
Delete Pipeline
Get Pipeline Status
Upsert Pipeline
Run Search
ModelsExpand Collapse
AdvancedModeTransformConfig = object { chunking_config, mode, segmentation_config }
chunking_config: optional object { mode } or object { chunk_overlap, chunk_size, mode } or object { chunk_overlap, chunk_size, mode, separator } or 2 more
Configuration for the chunking.
NoneChunkingConfig = object { mode }
CharacterChunkingConfig = object { chunk_overlap, chunk_size, mode }
TokenChunkingConfig = object { chunk_overlap, chunk_size, mode, separator }
SentenceChunkingConfig = object { chunk_overlap, chunk_size, mode, 2 more }
SemanticChunkingConfig = object { breakpoint_percentile_threshold, buffer_size, mode }
segmentation_config: optional object { mode } or object { mode, page_separator } or object { mode }
Configuration for the segmentation.
NoneSegmentationConfig = object { mode }
PageSegmentationConfig = object { mode, page_separator }
ElementSegmentationConfig = object { mode }
AutoTransformConfig = object { chunk_overlap, chunk_size, mode }
chunk_overlap: optional number
Chunk overlap for the transformation.
chunk_size: optional number
Chunk size for the transformation.
AzureOpenAIEmbedding = object { additional_kwargs, api_base, api_key, 12 more }
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for Azure deployment.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for Azure OpenAI API.
azure_deployment: optional string
The Azure deployment to use.
azure_endpoint: optional string
The Azure endpoint to use.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
AzureOpenAIEmbeddingConfig = object { component, type }
Configuration for the Azure OpenAI embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for Azure deployment.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for Azure OpenAI API.
azure_deployment: optional string
The Azure deployment to use.
azure_endpoint: optional string
The Azure endpoint to use.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
type: optional "AZURE_EMBEDDING"
Type of the embedding model.
BedrockEmbedding = object { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
additional_kwargs: optional map[unknown]
Additional kwargs for the bedrock client.
aws_access_key_id: optional string
AWS Access Key ID to use
aws_secret_access_key: optional string
AWS Secret Access Key to use
aws_session_token: optional string
AWS Session Token to use
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
The maximum number of API retries.
model_name: optional string
The modelId of the Bedrock model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
profile_name: optional string
The name of aws profile to use. If not given, then the default profile is used.
region_name: optional string
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout: optional number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
BedrockEmbeddingConfig = object { component, type }
component: optional BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
Configuration for the Bedrock embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the bedrock client.
aws_access_key_id: optional string
AWS Access Key ID to use
aws_secret_access_key: optional string
AWS Secret Access Key to use
aws_session_token: optional string
AWS Session Token to use
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
The maximum number of API retries.
model_name: optional string
The modelId of the Bedrock model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
profile_name: optional string
The name of aws profile to use. If not given, then the default profile is used.
region_name: optional string
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout: optional number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
type: optional "BEDROCK_EMBEDDING"
Type of the embedding model.
CohereEmbedding = object { api_key, class_name, embed_batch_size, 5 more }
api_key: string
The Cohere API key.
embed_batch_size: optional number
The batch size for embedding calls.
embedding_type: optional string
Embedding type. If not provided float embedding_type is used when needed.
input_type: optional string
Model Input type. If not provided, search_document and search_query are used when needed.
model_name: optional string
The modelId of the Cohere model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
truncate: optional string
Truncation type - START/ END/ NONE
CohereEmbeddingConfig = object { component, type }
Configuration for the Cohere embedding model.
api_key: string
The Cohere API key.
embed_batch_size: optional number
The batch size for embedding calls.
embedding_type: optional string
Embedding type. If not provided float embedding_type is used when needed.
input_type: optional string
Model Input type. If not provided, search_document and search_query are used when needed.
model_name: optional string
The modelId of the Cohere model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
truncate: optional string
Truncation type - START/ END/ NONE
type: optional "COHERE_EMBEDDING"
Type of the embedding model.
DataSinkCreate = object { component, name, sink_type }
Schema for creating a data sink.
component: map[unknown] or CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } or CloudPostgresVectorStore { database, embed_dim, host, 10 more } or 5 more
Component that implements the data sink
CloudPineconeVectorStore = object { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
api_key: string
The API key for authenticating with Pinecone
CloudPostgresVectorStore = object { database, embed_dim, host, 10 more }
hnsw_settings: optional PgVectorHnswSettings { distance_method, ef_construction, ef_search, 2 more }
HNSW settings for PGVector.
distance_method: optional "l2" or "ip" or "cosine" or 3 more
The distance method to use.
ef_construction: optional number
The number of edges to use during the construction phase.
ef_search: optional number
The number of edges to use during the search phase.
m: optional number
The number of bi-directional links created for each new element.
vector_type: optional "vector" or "half_vec" or "bit" or "sparse_vec"
The type of vector to use.
CloudQdrantVectorStore = object { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
CloudAzureAISearchVectorStore = object { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
CloudMongoDBAtlasVectorSearch = object { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
CloudMilvusVectorStore = object { uri, token, class_name, 3 more }
Cloud Milvus Vector Store.
CloudAstraDBVectorStore = object { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
token: string
The Astra DB Application Token to use
api_endpoint: string
The Astra DB JSON API endpoint for your database
collection_name: string
Collection name to use. If not existing, it will be created
embedding_dimension: number
Length of the embedding vectors in use
keyspace: optional string
The keyspace to use. If not provided, 'default_keyspace'
name: string
The name of the data sink.
sink_type: "PINECONE" or "POSTGRES" or "QDRANT" or 4 more
GeminiEmbedding = object { api_base, api_key, class_name, 6 more }
api_base: optional string
API base to access the model. Defaults to None.
api_key: optional string
API key to access the model. Defaults to None.
embed_batch_size: optional number
The batch size for embedding calls.
model_name: optional string
The modelId of the Gemini model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
task_type: optional string
The task for embedding model.
title: optional string
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport: optional string
Transport to access the model. Defaults to None.
GeminiEmbeddingConfig = object { component, type }
Configuration for the Gemini embedding model.
api_base: optional string
API base to access the model. Defaults to None.
api_key: optional string
API key to access the model. Defaults to None.
embed_batch_size: optional number
The batch size for embedding calls.
model_name: optional string
The modelId of the Gemini model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
task_type: optional string
The task for embedding model.
title: optional string
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport: optional string
Transport to access the model. Defaults to None.
type: optional "GEMINI_EMBEDDING"
Type of the embedding model.
HuggingFaceInferenceAPIEmbedding = object { token, class_name, cookies, 9 more }
token: optional string or boolean
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies: optional map[string]
Additional cookies to send to the server.
embed_batch_size: optional number
The batch size for embedding calls.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name: optional string
Hugging Face model name. If None, the task will be used.
num_workers: optional number
The number of workers to use for async embedding calls.
pooling: optional "cls" or "mean" or "last"
Enum of possible pooling choices with pooling behaviors.
query_instruction: optional string
Instruction to prepend during query embedding.
task: optional string
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction: optional string
Instruction to prepend during text embedding.
timeout: optional number
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
HuggingFaceInferenceAPIEmbeddingConfig = object { component, type }
Configuration for the HuggingFace Inference API embedding model.
token: optional string or boolean
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies: optional map[string]
Additional cookies to send to the server.
embed_batch_size: optional number
The batch size for embedding calls.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name: optional string
Hugging Face model name. If None, the task will be used.
num_workers: optional number
The number of workers to use for async embedding calls.
pooling: optional "cls" or "mean" or "last"
Enum of possible pooling choices with pooling behaviors.
query_instruction: optional string
Instruction to prepend during query embedding.
task: optional string
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction: optional string
Instruction to prepend during text embedding.
timeout: optional number
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
type: optional "HUGGINGFACE_API_EMBEDDING"
Type of the embedding model.
LlamaParseParameters = object { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more }
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
images_to_save: optional array of "screenshot" or "embedded" or "layout"
Enum for representing the mode of parsing to be used.
priority: optional "low" or "medium" or "high" or "critical"
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
Enum for representing the different available page error handling modes.
webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }
The outbound webhook configurations
webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 13 more
List of event names to subscribe to
webhook_headers: optional map[string]
Custom HTTP headers to include with webhook requests.
webhook_output_format: optional string
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url: optional string
The URL to send webhook notifications to.
LlmParameters = object { class_name, model_name, system_prompt, 3 more }
model_name: optional "GPT_4O" or "GPT_4O_MINI" or "GPT_4_1" or 11 more
The name of the model to use for LLM completions.
system_prompt: optional string
The system prompt to use for the completion.
temperature: optional number
The temperature value for the model.
use_chain_of_thought_reasoning: optional boolean
Whether to use chain of thought reasoning.
use_citation: optional boolean
Whether to show citations in the response.
ManagedIngestionStatusResponse = object { status, deployment_date, effective_at, 2 more }
status: "NOT_STARTED" or "IN_PROGRESS" or "SUCCESS" or 3 more
Status of the ingestion.
deployment_date: optional string
Date of the deployment.
effective_at: optional string
When the status is effective
error: optional array of object { job_id, message, step }
List of errors that occurred during ingestion.
job_id: string
ID of the job that failed.
message: string
List of errors that occurred during ingestion.
step: "MANAGED_INGESTION" or "DATA_SOURCE" or "FILE_UPDATER" or 4 more
Name of the job that failed.
job_id: optional string
ID of the latest job.
MessageRole = "system" or "developer" or "user" or 5 more
Message role.
MetadataFilters = object { filters, condition }
Metadata filters for vector stores.
MetadataFilter = object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number or string or array of string or 2 more
operator: optional "==" or ">" or "<" or 11 more
Vector store filter operator.
MetadataFilters = object { filters, condition }
Metadata filters for vector stores.
MetadataFilter = object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number or string or array of string or 2 more
operator: optional "==" or ">" or "<" or 11 more
Vector store filter operator.
condition: optional "and" or "or" or "not"
Vector store filter conditions to combine different filters.
condition: optional "and" or "or" or "not"
Vector store filter conditions to combine different filters.
OpenAIEmbedding = object { additional_kwargs, api_base, api_key, 10 more }
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for OpenAI API.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for OpenAI API.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
OpenAIEmbeddingConfig = object { component, type }
Configuration for the OpenAI embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for OpenAI API.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for OpenAI API.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
type: optional "OPENAI_EMBEDDING"
Type of the embedding model.
PageFigureNodeWithScore = object { node, score, class_name }
Page figure metadata with score
node: object { confidence, figure_name, figure_size, 4 more }
confidence: number
The confidence of the figure
figure_name: string
The name of the figure
figure_size: number
The size of the figure in bytes
file_id: string
The ID of the file that the figure was taken from
page_index: number
The index of the page for which the figure is taken (0-indexed)
is_likely_noise: optional boolean
Whether the figure is likely to be noise
metadata: optional map[unknown]
Metadata for the figure
score: number
The score of the figure node
PageScreenshotNodeWithScore = object { node, score, class_name }
Page screenshot metadata with score
node: object { file_id, image_size, page_index, metadata }
file_id: string
The ID of the file that the page screenshot was taken from
image_size: number
The size of the image in bytes
page_index: number
The index of the page for which the screenshot is taken (0-indexed)
metadata: optional map[unknown]
Metadata for the screenshot
score: number
The score of the screenshot node
Pipeline = object { id, embedding_config, name, 15 more }
Schema for a pipeline.
id: string
Unique identifier
embedding_config: object { component, type } or AzureOpenAIEmbeddingConfig { component, type } or CohereEmbeddingConfig { component, type } or 5 more
ManagedOpenAIEmbedding = object { component, type }
component: optional object { class_name, embed_batch_size, model_name, num_workers }
Configuration for the Managed OpenAI embedding model.
embed_batch_size: optional number
The batch size for embedding calls.
model_name: optional "openai-text-embedding-3-small"
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
type: optional "MANAGED_OPENAI_EMBEDDING"
Type of the embedding model.
AzureOpenAIEmbeddingConfig = object { component, type }
Configuration for the Azure OpenAI embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for Azure deployment.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for Azure OpenAI API.
azure_deployment: optional string
The Azure deployment to use.
azure_endpoint: optional string
The Azure endpoint to use.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
type: optional "AZURE_EMBEDDING"
Type of the embedding model.
CohereEmbeddingConfig = object { component, type }
Configuration for the Cohere embedding model.
api_key: string
The Cohere API key.
embed_batch_size: optional number
The batch size for embedding calls.
embedding_type: optional string
Embedding type. If not provided float embedding_type is used when needed.
input_type: optional string
Model Input type. If not provided, search_document and search_query are used when needed.
model_name: optional string
The modelId of the Cohere model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
truncate: optional string
Truncation type - START/ END/ NONE
type: optional "COHERE_EMBEDDING"
Type of the embedding model.
GeminiEmbeddingConfig = object { component, type }
Configuration for the Gemini embedding model.
api_base: optional string
API base to access the model. Defaults to None.
api_key: optional string
API key to access the model. Defaults to None.
embed_batch_size: optional number
The batch size for embedding calls.
model_name: optional string
The modelId of the Gemini model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
task_type: optional string
The task for embedding model.
title: optional string
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport: optional string
Transport to access the model. Defaults to None.
type: optional "GEMINI_EMBEDDING"
Type of the embedding model.
HuggingFaceInferenceAPIEmbeddingConfig = object { component, type }
Configuration for the HuggingFace Inference API embedding model.
token: optional string or boolean
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies: optional map[string]
Additional cookies to send to the server.
embed_batch_size: optional number
The batch size for embedding calls.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name: optional string
Hugging Face model name. If None, the task will be used.
num_workers: optional number
The number of workers to use for async embedding calls.
pooling: optional "cls" or "mean" or "last"
Enum of possible pooling choices with pooling behaviors.
query_instruction: optional string
Instruction to prepend during query embedding.
task: optional string
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction: optional string
Instruction to prepend during text embedding.
timeout: optional number
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
type: optional "HUGGINGFACE_API_EMBEDDING"
Type of the embedding model.
OpenAIEmbeddingConfig = object { component, type }
Configuration for the OpenAI embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for OpenAI API.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for OpenAI API.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
type: optional "OPENAI_EMBEDDING"
Type of the embedding model.
VertexAIEmbeddingConfig = object { component, type }
Configuration for the VertexAI embedding model.
client_email: string
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string
The private key for the VertexAI credentials.
private_key_id: string
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string
The token URI for the VertexAI credentials.
additional_kwargs: optional map[unknown]
Additional kwargs for the Vertex.
embed_batch_size: optional number
The batch size for embedding calls.
embed_mode: optional "default" or "classification" or "clustering" or 2 more
The embedding mode to use.
model_name: optional string
The modelId of the VertexAI model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
type: optional "VERTEXAI_EMBEDDING"
Type of the embedding model.
BedrockEmbeddingConfig = object { component, type }
component: optional BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
Configuration for the Bedrock embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the bedrock client.
aws_access_key_id: optional string
AWS Access Key ID to use
aws_secret_access_key: optional string
AWS Secret Access Key to use
aws_session_token: optional string
AWS Session Token to use
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
The maximum number of API retries.
model_name: optional string
The modelId of the Bedrock model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
profile_name: optional string
The name of aws profile to use. If not given, then the default profile is used.
region_name: optional string
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout: optional number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
type: optional "BEDROCK_EMBEDDING"
Type of the embedding model.
config_hash: optional object { embedding_config_hash, parsing_config_hash, transform_config_hash }
Hashes for the configuration of a pipeline.
embedding_config_hash: optional string
Hash of the embedding config.
parsing_config_hash: optional string
Hash of the llama parse parameters.
transform_config_hash: optional string
Hash of the transform config.
created_at: optional string
Creation datetime
Schema for a data sink.
id: string
Unique identifier
component: map[unknown] or CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } or CloudPostgresVectorStore { database, embed_dim, host, 10 more } or 5 more
Component that implements the data sink
CloudPineconeVectorStore = object { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
api_key: string
The API key for authenticating with Pinecone
CloudPostgresVectorStore = object { database, embed_dim, host, 10 more }
hnsw_settings: optional PgVectorHnswSettings { distance_method, ef_construction, ef_search, 2 more }
HNSW settings for PGVector.
distance_method: optional "l2" or "ip" or "cosine" or 3 more
The distance method to use.
ef_construction: optional number
The number of edges to use during the construction phase.
ef_search: optional number
The number of edges to use during the search phase.
m: optional number
The number of bi-directional links created for each new element.
vector_type: optional "vector" or "half_vec" or "bit" or "sparse_vec"
The type of vector to use.
CloudQdrantVectorStore = object { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
CloudAzureAISearchVectorStore = object { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
CloudMongoDBAtlasVectorSearch = object { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
CloudMilvusVectorStore = object { uri, token, class_name, 3 more }
Cloud Milvus Vector Store.
CloudAstraDBVectorStore = object { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
token: string
The Astra DB Application Token to use
api_endpoint: string
The Astra DB JSON API endpoint for your database
collection_name: string
Collection name to use. If not existing, it will be created
embedding_dimension: number
Length of the embedding vectors in use
keyspace: optional string
The keyspace to use. If not provided, 'default_keyspace'
name: string
The name of the data sink.
sink_type: "PINECONE" or "POSTGRES" or "QDRANT" or 4 more
created_at: optional string
Creation datetime
updated_at: optional string
Update datetime
embedding_model_config: optional object { id, embedding_config, name, 3 more }
Schema for an embedding model config.
id: string
Unique identifier
embedding_config: AzureOpenAIEmbeddingConfig { component, type } or CohereEmbeddingConfig { component, type } or GeminiEmbeddingConfig { component, type } or 4 more
The embedding configuration for the embedding model config.
AzureOpenAIEmbeddingConfig = object { component, type }
Configuration for the Azure OpenAI embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for Azure deployment.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for Azure OpenAI API.
azure_deployment: optional string
The Azure deployment to use.
azure_endpoint: optional string
The Azure endpoint to use.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
type: optional "AZURE_EMBEDDING"
Type of the embedding model.
CohereEmbeddingConfig = object { component, type }
Configuration for the Cohere embedding model.
api_key: string
The Cohere API key.
embed_batch_size: optional number
The batch size for embedding calls.
embedding_type: optional string
Embedding type. If not provided float embedding_type is used when needed.
input_type: optional string
Model Input type. If not provided, search_document and search_query are used when needed.
model_name: optional string
The modelId of the Cohere model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
truncate: optional string
Truncation type - START/ END/ NONE
type: optional "COHERE_EMBEDDING"
Type of the embedding model.
GeminiEmbeddingConfig = object { component, type }
Configuration for the Gemini embedding model.
api_base: optional string
API base to access the model. Defaults to None.
api_key: optional string
API key to access the model. Defaults to None.
embed_batch_size: optional number
The batch size for embedding calls.
model_name: optional string
The modelId of the Gemini model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
task_type: optional string
The task for embedding model.
title: optional string
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport: optional string
Transport to access the model. Defaults to None.
type: optional "GEMINI_EMBEDDING"
Type of the embedding model.
HuggingFaceInferenceAPIEmbeddingConfig = object { component, type }
Configuration for the HuggingFace Inference API embedding model.
token: optional string or boolean
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies: optional map[string]
Additional cookies to send to the server.
embed_batch_size: optional number
The batch size for embedding calls.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name: optional string
Hugging Face model name. If None, the task will be used.
num_workers: optional number
The number of workers to use for async embedding calls.
pooling: optional "cls" or "mean" or "last"
Enum of possible pooling choices with pooling behaviors.
query_instruction: optional string
Instruction to prepend during query embedding.
task: optional string
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction: optional string
Instruction to prepend during text embedding.
timeout: optional number
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
type: optional "HUGGINGFACE_API_EMBEDDING"
Type of the embedding model.
OpenAIEmbeddingConfig = object { component, type }
Configuration for the OpenAI embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for OpenAI API.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for OpenAI API.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
type: optional "OPENAI_EMBEDDING"
Type of the embedding model.
VertexAIEmbeddingConfig = object { component, type }
Configuration for the VertexAI embedding model.
client_email: string
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string
The private key for the VertexAI credentials.
private_key_id: string
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string
The token URI for the VertexAI credentials.
additional_kwargs: optional map[unknown]
Additional kwargs for the Vertex.
embed_batch_size: optional number
The batch size for embedding calls.
embed_mode: optional "default" or "classification" or "clustering" or 2 more
The embedding mode to use.
model_name: optional string
The modelId of the VertexAI model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
type: optional "VERTEXAI_EMBEDDING"
Type of the embedding model.
BedrockEmbeddingConfig = object { component, type }
component: optional BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
Configuration for the Bedrock embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the bedrock client.
aws_access_key_id: optional string
AWS Access Key ID to use
aws_secret_access_key: optional string
AWS Secret Access Key to use
aws_session_token: optional string
AWS Session Token to use
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
The maximum number of API retries.
model_name: optional string
The modelId of the Bedrock model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
profile_name: optional string
The name of aws profile to use. If not given, then the default profile is used.
region_name: optional string
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout: optional number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
type: optional "BEDROCK_EMBEDDING"
Type of the embedding model.
name: string
The name of the embedding model config.
created_at: optional string
Creation datetime
updated_at: optional string
Update datetime
embedding_model_config_id: optional string
The ID of the EmbeddingModelConfig this pipeline is using.
llama_parse_parameters: optional LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more }
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
images_to_save: optional array of "screenshot" or "embedded" or "layout"
Enum for representing the mode of parsing to be used.
priority: optional "low" or "medium" or "high" or "critical"
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
Enum for representing the different available page error handling modes.
webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }
The outbound webhook configurations
webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 13 more
List of event names to subscribe to
webhook_headers: optional map[string]
Custom HTTP headers to include with webhook requests.
webhook_output_format: optional string
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url: optional string
The URL to send webhook notifications to.
managed_pipeline_id: optional string
The ID of the ManagedPipeline this playground pipeline is linked to.
metadata_config: optional PipelineMetadataConfig { excluded_embed_metadata_keys, excluded_llm_metadata_keys }
Metadata configuration for the pipeline.
excluded_embed_metadata_keys: optional array of string
List of metadata keys to exclude from embeddings
excluded_llm_metadata_keys: optional array of string
List of metadata keys to exclude from LLM during retrieval
Type of pipeline. Either PLAYGROUND or MANAGED.
preset_retrieval_parameters: optional PresetRetrievalParams { alpha, class_name, dense_similarity_cutoff, 11 more }
Preset retrieval parameters for the pipeline.
alpha: optional number
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
dense_similarity_cutoff: optional number
Minimum similarity score wrt query for retrieval
dense_similarity_top_k: optional number
Number of nodes for dense retrieval.
enable_reranking: optional boolean
Enable reranking for retrieval
files_top_k: optional number
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
rerank_top_n: optional number
Number of reranked nodes for returning.
The retrieval mode for the query.
Deprecatedretrieve_image_nodes: optional boolean
Whether to retrieve image nodes.
retrieve_page_figure_nodes: optional boolean
Whether to retrieve page figure nodes.
retrieve_page_screenshot_nodes: optional boolean
Whether to retrieve page screenshot nodes.
Metadata filters for vector stores.
MetadataFilter = object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number or string or array of string or 2 more
operator: optional "==" or ">" or "<" or 11 more
Vector store filter operator.
MetadataFilters = object { filters, condition }
Metadata filters for vector stores.
MetadataFilter = object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number or string or array of string or 2 more
operator: optional "==" or ">" or "<" or 11 more
Vector store filter operator.
condition: optional "and" or "or" or "not"
Vector store filter conditions to combine different filters.
condition: optional "and" or "or" or "not"
Vector store filter conditions to combine different filters.
search_filters_inference_schema: optional map[map[unknown] or array of unknown or string or 2 more]
JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.
sparse_similarity_top_k: optional number
Number of nodes for sparse retrieval.
Configuration for sparse embedding models used in hybrid search.
This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.
model_type: optional "splade" or "bm25" or "auto"
The sparse model type to use. 'bm25' uses Qdrant's FastEmbed BM25 model (default for new pipelines), 'splade' uses HuggingFace Splade model, 'auto' selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).
status: optional "CREATED" or "DELETING"
Status of the pipeline.
transform_config: optional AutoTransformConfig { chunk_overlap, chunk_size, mode } or AdvancedModeTransformConfig { chunking_config, mode, segmentation_config }
Configuration for the transformation.
AutoTransformConfig = object { chunk_overlap, chunk_size, mode }
chunk_overlap: optional number
Chunk overlap for the transformation.
chunk_size: optional number
Chunk size for the transformation.
AdvancedModeTransformConfig = object { chunking_config, mode, segmentation_config }
chunking_config: optional object { mode } or object { chunk_overlap, chunk_size, mode } or object { chunk_overlap, chunk_size, mode, separator } or 2 more
Configuration for the chunking.
NoneChunkingConfig = object { mode }
CharacterChunkingConfig = object { chunk_overlap, chunk_size, mode }
TokenChunkingConfig = object { chunk_overlap, chunk_size, mode, separator }
SentenceChunkingConfig = object { chunk_overlap, chunk_size, mode, 2 more }
SemanticChunkingConfig = object { breakpoint_percentile_threshold, buffer_size, mode }
segmentation_config: optional object { mode } or object { mode, page_separator } or object { mode }
Configuration for the segmentation.
NoneSegmentationConfig = object { mode }
PageSegmentationConfig = object { mode, page_separator }
ElementSegmentationConfig = object { mode }
updated_at: optional string
Update datetime
PipelineCreate = object { name, data_sink, data_sink_id, 10 more }
Schema for creating a pipeline.
Schema for creating a data sink.
component: map[unknown] or CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } or CloudPostgresVectorStore { database, embed_dim, host, 10 more } or 5 more
Component that implements the data sink
CloudPineconeVectorStore = object { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
api_key: string
The API key for authenticating with Pinecone
CloudPostgresVectorStore = object { database, embed_dim, host, 10 more }
hnsw_settings: optional PgVectorHnswSettings { distance_method, ef_construction, ef_search, 2 more }
HNSW settings for PGVector.
distance_method: optional "l2" or "ip" or "cosine" or 3 more
The distance method to use.
ef_construction: optional number
The number of edges to use during the construction phase.
ef_search: optional number
The number of edges to use during the search phase.
m: optional number
The number of bi-directional links created for each new element.
vector_type: optional "vector" or "half_vec" or "bit" or "sparse_vec"
The type of vector to use.
CloudQdrantVectorStore = object { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
CloudAzureAISearchVectorStore = object { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
CloudMongoDBAtlasVectorSearch = object { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
CloudMilvusVectorStore = object { uri, token, class_name, 3 more }
Cloud Milvus Vector Store.
CloudAstraDBVectorStore = object { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
token: string
The Astra DB Application Token to use
api_endpoint: string
The Astra DB JSON API endpoint for your database
collection_name: string
Collection name to use. If not existing, it will be created
embedding_dimension: number
Length of the embedding vectors in use
keyspace: optional string
The keyspace to use. If not provided, 'default_keyspace'
name: string
The name of the data sink.
sink_type: "PINECONE" or "POSTGRES" or "QDRANT" or 4 more
data_sink_id: optional string
Data sink ID. When provided instead of data_sink, the data sink will be looked up by ID.
embedding_config: optional AzureOpenAIEmbeddingConfig { component, type } or CohereEmbeddingConfig { component, type } or GeminiEmbeddingConfig { component, type } or 4 more
AzureOpenAIEmbeddingConfig = object { component, type }
Configuration for the Azure OpenAI embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for Azure deployment.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for Azure OpenAI API.
azure_deployment: optional string
The Azure deployment to use.
azure_endpoint: optional string
The Azure endpoint to use.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
type: optional "AZURE_EMBEDDING"
Type of the embedding model.
CohereEmbeddingConfig = object { component, type }
Configuration for the Cohere embedding model.
api_key: string
The Cohere API key.
embed_batch_size: optional number
The batch size for embedding calls.
embedding_type: optional string
Embedding type. If not provided float embedding_type is used when needed.
input_type: optional string
Model Input type. If not provided, search_document and search_query are used when needed.
model_name: optional string
The modelId of the Cohere model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
truncate: optional string
Truncation type - START/ END/ NONE
type: optional "COHERE_EMBEDDING"
Type of the embedding model.
GeminiEmbeddingConfig = object { component, type }
Configuration for the Gemini embedding model.
api_base: optional string
API base to access the model. Defaults to None.
api_key: optional string
API key to access the model. Defaults to None.
embed_batch_size: optional number
The batch size for embedding calls.
model_name: optional string
The modelId of the Gemini model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
task_type: optional string
The task for embedding model.
title: optional string
Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.
transport: optional string
Transport to access the model. Defaults to None.
type: optional "GEMINI_EMBEDDING"
Type of the embedding model.
HuggingFaceInferenceAPIEmbeddingConfig = object { component, type }
Configuration for the HuggingFace Inference API embedding model.
token: optional string or boolean
Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.
cookies: optional map[string]
Additional cookies to send to the server.
embed_batch_size: optional number
The batch size for embedding calls.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
model_name: optional string
Hugging Face model name. If None, the task will be used.
num_workers: optional number
The number of workers to use for async embedding calls.
pooling: optional "cls" or "mean" or "last"
Enum of possible pooling choices with pooling behaviors.
query_instruction: optional string
Instruction to prepend during query embedding.
task: optional string
Optional task to pick Hugging Face's recommended model, used when model_name is left as default of None.
text_instruction: optional string
Instruction to prepend during text embedding.
timeout: optional number
The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.
type: optional "HUGGINGFACE_API_EMBEDDING"
Type of the embedding model.
OpenAIEmbeddingConfig = object { component, type }
Configuration for the OpenAI embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the OpenAI API.
api_base: optional string
The base URL for OpenAI API.
api_key: optional string
The OpenAI API key.
api_version: optional string
The version for OpenAI API.
default_headers: optional map[string]
The default headers for API requests.
dimensions: optional number
The number of dimensions on the output embedding vectors. Works only with v3 embedding models.
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
Maximum number of retries.
model_name: optional string
The name of the OpenAI embedding model.
num_workers: optional number
The number of workers to use for async embedding calls.
reuse_client: optional boolean
Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.
timeout: optional number
Timeout for each request.
type: optional "OPENAI_EMBEDDING"
Type of the embedding model.
VertexAIEmbeddingConfig = object { component, type }
Configuration for the VertexAI embedding model.
client_email: string
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string
The private key for the VertexAI credentials.
private_key_id: string
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string
The token URI for the VertexAI credentials.
additional_kwargs: optional map[unknown]
Additional kwargs for the Vertex.
embed_batch_size: optional number
The batch size for embedding calls.
embed_mode: optional "default" or "classification" or "clustering" or 2 more
The embedding mode to use.
model_name: optional string
The modelId of the VertexAI model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
type: optional "VERTEXAI_EMBEDDING"
Type of the embedding model.
BedrockEmbeddingConfig = object { component, type }
component: optional BedrockEmbedding { additional_kwargs, aws_access_key_id, aws_secret_access_key, 9 more }
Configuration for the Bedrock embedding model.
additional_kwargs: optional map[unknown]
Additional kwargs for the bedrock client.
aws_access_key_id: optional string
AWS Access Key ID to use
aws_secret_access_key: optional string
AWS Secret Access Key to use
aws_session_token: optional string
AWS Session Token to use
embed_batch_size: optional number
The batch size for embedding calls.
max_retries: optional number
The maximum number of API retries.
model_name: optional string
The modelId of the Bedrock model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
profile_name: optional string
The name of aws profile to use. If not given, then the default profile is used.
region_name: optional string
AWS region name to use. Uses region configured in AWS CLI if not passed
timeout: optional number
The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.
type: optional "BEDROCK_EMBEDDING"
Type of the embedding model.
embedding_model_config_id: optional string
Embedding model config ID. When provided instead of embedding_config, the embedding model config will be looked up by ID.
llama_parse_parameters: optional LlamaParseParameters { adaptive_long_table, aggressive_table_extraction, annotate_links, 115 more }
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
images_to_save: optional array of "screenshot" or "embedded" or "layout"
Enum for representing the mode of parsing to be used.
priority: optional "low" or "medium" or "high" or "critical"
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
Enum for representing the different available page error handling modes.
webhook_configurations: optional array of WebhookConfiguration { webhook_events, webhook_headers, webhook_output_format, webhook_url }
The outbound webhook configurations
webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 13 more
List of event names to subscribe to
webhook_headers: optional map[string]
Custom HTTP headers to include with webhook requests.
webhook_output_format: optional string
The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json
webhook_url: optional string
The URL to send webhook notifications to.
managed_pipeline_id: optional string
The ID of the ManagedPipeline this playground pipeline is linked to.
metadata_config: optional PipelineMetadataConfig { excluded_embed_metadata_keys, excluded_llm_metadata_keys }
Metadata configuration for the pipeline.
excluded_embed_metadata_keys: optional array of string
List of metadata keys to exclude from embeddings
excluded_llm_metadata_keys: optional array of string
List of metadata keys to exclude from LLM during retrieval
Type of pipeline. Either PLAYGROUND or MANAGED.
preset_retrieval_parameters: optional PresetRetrievalParams { alpha, class_name, dense_similarity_cutoff, 11 more }
Preset retrieval parameters for the pipeline.
alpha: optional number
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
dense_similarity_cutoff: optional number
Minimum similarity score wrt query for retrieval
dense_similarity_top_k: optional number
Number of nodes for dense retrieval.
enable_reranking: optional boolean
Enable reranking for retrieval
files_top_k: optional number
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
rerank_top_n: optional number
Number of reranked nodes for returning.
The retrieval mode for the query.
Deprecatedretrieve_image_nodes: optional boolean
Whether to retrieve image nodes.
retrieve_page_figure_nodes: optional boolean
Whether to retrieve page figure nodes.
retrieve_page_screenshot_nodes: optional boolean
Whether to retrieve page screenshot nodes.
Metadata filters for vector stores.
MetadataFilter = object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number or string or array of string or 2 more
operator: optional "==" or ">" or "<" or 11 more
Vector store filter operator.
MetadataFilters = object { filters, condition }
Metadata filters for vector stores.
MetadataFilter = object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number or string or array of string or 2 more
operator: optional "==" or ">" or "<" or 11 more
Vector store filter operator.
condition: optional "and" or "or" or "not"
Vector store filter conditions to combine different filters.
condition: optional "and" or "or" or "not"
Vector store filter conditions to combine different filters.
search_filters_inference_schema: optional map[map[unknown] or array of unknown or string or 2 more]
JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.
sparse_similarity_top_k: optional number
Number of nodes for sparse retrieval.
Configuration for sparse embedding models used in hybrid search.
This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.
model_type: optional "splade" or "bm25" or "auto"
The sparse model type to use. 'bm25' uses Qdrant's FastEmbed BM25 model (default for new pipelines), 'splade' uses HuggingFace Splade model, 'auto' selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).
status: optional string
Status of the pipeline deployment.
transform_config: optional AutoTransformConfig { chunk_overlap, chunk_size, mode } or AdvancedModeTransformConfig { chunking_config, mode, segmentation_config }
Configuration for the transformation.
AutoTransformConfig = object { chunk_overlap, chunk_size, mode }
chunk_overlap: optional number
Chunk overlap for the transformation.
chunk_size: optional number
Chunk size for the transformation.
AdvancedModeTransformConfig = object { chunking_config, mode, segmentation_config }
chunking_config: optional object { mode } or object { chunk_overlap, chunk_size, mode } or object { chunk_overlap, chunk_size, mode, separator } or 2 more
Configuration for the chunking.
NoneChunkingConfig = object { mode }
CharacterChunkingConfig = object { chunk_overlap, chunk_size, mode }
TokenChunkingConfig = object { chunk_overlap, chunk_size, mode, separator }
SentenceChunkingConfig = object { chunk_overlap, chunk_size, mode, 2 more }
SemanticChunkingConfig = object { breakpoint_percentile_threshold, buffer_size, mode }
segmentation_config: optional object { mode } or object { mode, page_separator } or object { mode }
Configuration for the segmentation.
NoneSegmentationConfig = object { mode }
PageSegmentationConfig = object { mode, page_separator }
ElementSegmentationConfig = object { mode }
PipelineMetadataConfig = object { excluded_embed_metadata_keys, excluded_llm_metadata_keys }
excluded_embed_metadata_keys: optional array of string
List of metadata keys to exclude from embeddings
excluded_llm_metadata_keys: optional array of string
List of metadata keys to exclude from LLM during retrieval
PipelineType = "PLAYGROUND" or "MANAGED"
Enum for representing the type of a pipeline
PresetRetrievalParams = object { alpha, class_name, dense_similarity_cutoff, 11 more }
Schema for the search params for an retrieval execution that can be preset for a pipeline.
alpha: optional number
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
dense_similarity_cutoff: optional number
Minimum similarity score wrt query for retrieval
dense_similarity_top_k: optional number
Number of nodes for dense retrieval.
enable_reranking: optional boolean
Enable reranking for retrieval
files_top_k: optional number
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
rerank_top_n: optional number
Number of reranked nodes for returning.
The retrieval mode for the query.
Deprecatedretrieve_image_nodes: optional boolean
Whether to retrieve image nodes.
retrieve_page_figure_nodes: optional boolean
Whether to retrieve page figure nodes.
retrieve_page_screenshot_nodes: optional boolean
Whether to retrieve page screenshot nodes.
Metadata filters for vector stores.
MetadataFilter = object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number or string or array of string or 2 more
operator: optional "==" or ">" or "<" or 11 more
Vector store filter operator.
MetadataFilters = object { filters, condition }
Metadata filters for vector stores.
MetadataFilter = object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
value: number or string or array of string or 2 more
operator: optional "==" or ">" or "<" or 11 more
Vector store filter operator.
condition: optional "and" or "or" or "not"
Vector store filter conditions to combine different filters.
condition: optional "and" or "or" or "not"
Vector store filter conditions to combine different filters.
search_filters_inference_schema: optional map[map[unknown] or array of unknown or string or 2 more]
JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.
sparse_similarity_top_k: optional number
Number of nodes for sparse retrieval.
RetrievalMode = "chunks" or "files_via_metadata" or "files_via_content" or "auto_routed"
SparseModelConfig = object { class_name, model_type }
Configuration for sparse embedding models used in hybrid search.
This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.
model_type: optional "splade" or "bm25" or "auto"
The sparse model type to use. 'bm25' uses Qdrant's FastEmbed BM25 model (default for new pipelines), 'splade' uses HuggingFace Splade model, 'auto' selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).
VertexAIEmbeddingConfig = object { component, type }
Configuration for the VertexAI embedding model.
client_email: string
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string
The private key for the VertexAI credentials.
private_key_id: string
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string
The token URI for the VertexAI credentials.
additional_kwargs: optional map[unknown]
Additional kwargs for the Vertex.
embed_batch_size: optional number
The batch size for embedding calls.
embed_mode: optional "default" or "classification" or "clustering" or 2 more
The embedding mode to use.
model_name: optional string
The modelId of the VertexAI model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
type: optional "VERTEXAI_EMBEDDING"
Type of the embedding model.
VertexTextEmbedding = object { client_email, location, private_key, 9 more }
client_email: string
The client email for the VertexAI credentials.
location: string
The default location to use when making API calls.
private_key: string
The private key for the VertexAI credentials.
private_key_id: string
The private key ID for the VertexAI credentials.
project: string
The default GCP project to use when making Vertex API calls.
token_uri: string
The token URI for the VertexAI credentials.
additional_kwargs: optional map[unknown]
Additional kwargs for the Vertex.
embed_batch_size: optional number
The batch size for embedding calls.
embed_mode: optional "default" or "classification" or "clustering" or 2 more
The embedding mode to use.
model_name: optional string
The modelId of the VertexAI model to use.
num_workers: optional number
The number of workers to use for async embedding calls.
PipelinesSync
Sync Pipeline
Cancel Pipeline Sync
PipelinesData Sources
List Pipeline Data Sources
Add Data Sources To Pipeline
Update Pipeline Data Source
Get Pipeline Data Source Status
Sync Pipeline Data Source
ModelsExpand Collapse
PipelineDataSource = object { id, component, data_source_id, 13 more }
Schema for a data source in a pipeline.
id: string
Unique identifier
component: map[unknown] or CloudS3DataSource { bucket, aws_access_id, aws_access_secret, 5 more } or CloudAzStorageBlobDataSource { account_url, container_name, account_key, 8 more } or 8 more
Component that implements the data source
CloudS3DataSource = object { bucket, aws_access_id, aws_access_secret, 5 more }
bucket: string
The name of the S3 bucket to read from.
aws_access_id: optional string
The AWS access ID to use for authentication.
aws_access_secret: optional string
The AWS access secret to use for authentication.
prefix: optional string
The prefix of the S3 objects to read from.
regex_pattern: optional string
The regex pattern to filter S3 objects. Must be a valid regex pattern.
s3_endpoint_url: optional string
The S3 endpoint URL to use for authentication.
CloudAzStorageBlobDataSource = object { account_url, container_name, account_key, 8 more }
account_url: string
The Azure Storage Blob account URL to use for authentication.
container_name: string
The name of the Azure Storage Blob container to read from.
account_key: optional string
The Azure Storage Blob account key to use for authentication.
account_name: optional string
The Azure Storage Blob account name to use for authentication.
blob: optional string
The blob name to read from.
client_id: optional string
The Azure AD client ID to use for authentication.
client_secret: optional string
The Azure AD client secret to use for authentication.
prefix: optional string
The prefix of the Azure Storage Blob objects to read from.
tenant_id: optional string
The Azure AD tenant ID to use for authentication.
CloudOneDriveDataSource = object { client_id, client_secret, tenant_id, 6 more }
client_id: string
The client ID to use for authentication.
client_secret: string
The client secret to use for authentication.
tenant_id: string
The tenant ID to use for authentication.
user_principal_name: string
The user principal name to use for authentication.
folder_id: optional string
The ID of the OneDrive folder to read from.
folder_path: optional string
The path of the OneDrive folder to read from.
required_exts: optional array of string
The list of required file extensions.
CloudSharepointDataSource = object { client_id, client_secret, tenant_id, 11 more }
client_id: string
The client ID to use for authentication.
client_secret: string
The client secret to use for authentication.
tenant_id: string
The tenant ID to use for authentication.
drive_name: optional string
The name of the Sharepoint drive to read from.
exclude_path_patterns: optional array of string
List of regex patterns for file paths to exclude. Files whose paths (including filename) match any pattern will be excluded. Example: ['/temp/', '/backup/', '.git/', '.tmp$', '^~']
folder_id: optional string
The ID of the Sharepoint folder to read from.
folder_path: optional string
The path of the Sharepoint folder to read from.
get_permissions: optional boolean
Whether to get permissions for the sharepoint site.
include_path_patterns: optional array of string
List of regex patterns for file paths to include. Full paths (including filename) must match at least one pattern to be included. Example: ['/reports/', '/docs/..pdf$', '^Report..pdf$']
required_exts: optional array of string
The list of required file extensions.
site_id: optional string
The ID of the SharePoint site to download from.
site_name: optional string
The name of the SharePoint site to download from.
CloudSlackDataSource = object { slack_token, channel_ids, channel_patterns, 6 more }
slack_token: string
Slack Bot Token.
channel_ids: optional string
Slack Channel.
channel_patterns: optional string
Slack Channel name pattern.
earliest_date: optional string
Earliest date.
earliest_date_timestamp: optional number
Earliest date timestamp.
latest_date: optional string
Latest date.
latest_date_timestamp: optional number
Latest date timestamp.
CloudNotionPageDataSource = object { integration_token, class_name, database_ids, 2 more }
integration_token: string
The integration token to use for authentication.
database_ids: optional string
The Notion Database Id to read content from.
page_ids: optional string
The Page ID's of the Notion to read from.
CloudConfluenceDataSource = object { authentication_mechanism, server_url, api_token, 10 more }
authentication_mechanism: string
Type of Authentication for connecting to Confluence APIs.
server_url: string
The server URL of the Confluence instance.
api_token: optional string
The API token to use for authentication.
cql: optional string
The CQL query to use for fetching pages.
Configuration for handling failures during processing. Key-value object controlling failure handling behaviors.
Example: { "skip_list_failures": true }
Currently supports:
- skip_list_failures: Skip failed batches/lists and continue processing
skip_list_failures: optional boolean
Whether to skip failed batches/lists and continue processing
index_restricted_pages: optional boolean
Whether to index restricted pages.
keep_markdown_format: optional boolean
Whether to keep the markdown format.
label: optional string
The label to use for fetching pages.
page_ids: optional string
The page IDs of the Confluence to read from.
space_key: optional string
The space key to read from.
user_name: optional string
The username to use for authentication.
CloudJiraDataSource = object { authentication_mechanism, query, api_token, 5 more }
Cloud Jira Data Source integrating JiraReader.
authentication_mechanism: string
Type of Authentication for connecting to Jira APIs.
query: string
JQL (Jira Query Language) query to search.
api_token: optional string
The API/ Access Token used for Basic, PAT and OAuth2 authentication.
cloud_id: optional string
The cloud ID, used in case of OAuth2.
email: optional string
The email address to use for authentication.
server_url: optional string
The server url for Jira Cloud.
CloudJiraDataSourceV2 = object { authentication_mechanism, query, server_url, 10 more }
Cloud Jira Data Source integrating JiraReaderV2.
authentication_mechanism: string
Type of Authentication for connecting to Jira APIs.
query: string
JQL (Jira Query Language) query to search.
server_url: string
The server url for Jira Cloud.
api_token: optional string
The API Access Token used for Basic, PAT and OAuth2 authentication.
api_version: optional "2" or "3"
Jira REST API version to use (2 or 3). 3 supports Atlassian Document Format (ADF).
cloud_id: optional string
The cloud ID, used in case of OAuth2.
email: optional string
The email address to use for authentication.
expand: optional string
Fields to expand in the response.
fields: optional array of string
List of fields to retrieve from Jira. If None, retrieves all fields.
get_permissions: optional boolean
Whether to fetch project role permissions and issue-level security
requests_per_minute: optional number
Rate limit for Jira API requests per minute.
CloudBoxDataSource = object { authentication_mechanism, class_name, client_id, 6 more }
authentication_mechanism: "developer_token" or "ccg"
The type of authentication to use (Developer Token or CCG)
client_id: optional string
Box API key used for identifying the application the user is authenticating with
client_secret: optional string
Box API secret used for making auth requests.
developer_token: optional string
Developer token for authentication if authentication_mechanism is 'developer_token'.
enterprise_id: optional string
Box Enterprise ID, if provided authenticates as service.
folder_id: optional string
The ID of the Box folder to read from.
user_id: optional string
Box User ID, if provided authenticates as user.
data_source_id: string
The ID of the data source.
last_synced_at: string
The last time the data source was automatically synced.
name: string
The name of the data source.
pipeline_id: string
The ID of the pipeline.
source_type: "S3" or "AZURE_STORAGE_BLOB" or "GOOGLE_DRIVE" or 8 more
created_at: optional string
Creation datetime
custom_metadata: optional map[map[unknown] or array of unknown or string or 2 more]
Custom metadata that will be present on all data loaded from the data source
status: optional "NOT_STARTED" or "IN_PROGRESS" or "SUCCESS" or 2 more
The status of the data source in the pipeline.
status_updated_at: optional string
The last time the status was updated.
sync_interval: optional number
The interval at which the data source should be synced.
sync_schedule_set_by: optional string
The id of the user who set the sync schedule.
updated_at: optional string
Update datetime
Version metadata for the data source
reader_version: optional "1.0" or "2.0" or "2.1"
The version of the reader to use for this data source.
PipelinesImages
List File Page Screenshots
Get File Page Screenshot
Get File Page Figure
List File Pages Figures
PipelinesFiles
Get Pipeline File Status Counts
Get Pipeline File Status
Add Files To Pipeline Api
Update Pipeline File
Delete Pipeline File
ModelsExpand Collapse
PipelineFile = object { id, pipeline_id, config_hash, 16 more }
Schema for a file that is associated with a pipeline.
id: string
Unique identifier
pipeline_id: string
The ID of the pipeline that the file is associated with
config_hash: optional map[map[unknown] or array of unknown or string or 2 more]
Hashes for the configuration of the pipeline.
created_at: optional string
Creation datetime
custom_metadata: optional map[map[unknown] or array of unknown or string or 2 more]
Custom metadata for the file
data_source_id: optional string
The ID of the data source that the file belongs to
external_file_id: optional string
The ID of the file in the external system
file_id: optional string
The ID of the file
file_size: optional number
Size of the file in bytes
file_type: optional string
File type (e.g. pdf, docx, etc.)
indexed_page_count: optional number
The number of pages that have been indexed for this file
last_modified_at: optional string
The last modified time of the file
name: optional string
Name of the file
permission_info: optional map[map[unknown] or array of unknown or string or 2 more]
Permission information for the file
project_id: optional string
The ID of the project that the file belongs to
resource_info: optional map[map[unknown] or array of unknown or string or 2 more]
Resource information for the file
status: optional "NOT_STARTED" or "IN_PROGRESS" or "SUCCESS" or 2 more
Status of the pipeline file
status_updated_at: optional string
The last time the status was updated
updated_at: optional string
Update datetime
PipelinesMetadata
Import Pipeline Metadata
Delete Pipeline Files Metadata
PipelinesDocuments
Create Batch Pipeline Documents
Paginated List Pipeline Documents
Get Pipeline Document
Delete Pipeline Document
Get Pipeline Document Status
Sync Pipeline Document
List Pipeline Document Chunks
Upsert Batch Pipeline Documents
ModelsExpand Collapse
CloudDocument = object { id, metadata, text, 4 more }
Cloud document stored in S3.
page_positions: optional array of number
indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].
CloudDocumentCreate = object { metadata, text, id, 3 more }
Create a new cloud document.
page_positions: optional array of number
indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1].
TextNode = object { class_name, embedding, end_char_idx, 11 more }
Provided for backward compatibility.
Note: we keep the field with the typo "seperator" to maintain backward compatibility for serialized objects.
embedding: optional array of number
Embedding of the node.
end_char_idx: optional number
End char index of the node.
excluded_embed_metadata_keys: optional array of string
Metadata keys that are excluded from text for the embed model.
excluded_llm_metadata_keys: optional array of string
Metadata keys that are excluded from text for the LLM.
extra_info: optional map[unknown]
A flat dictionary of metadata fields
id_: optional string
Unique ID of the node.
metadata_seperator: optional string
Separator between metadata fields when converting to string.
metadata_template: optional string
Template for how metadata is formatted, with {key} and {value} placeholders.
mimetype: optional string
MIME type of the node content.
relationships: optional map[object { node_id, class_name, hash, 2 more } or array of object { node_id, class_name, hash, 2 more } ]
A mapping of relationships to other node information.
RelatedNodeInfo = object { node_id, class_name, hash, 2 more }
node_type: optional "1" or "2" or "3" or 2 more or string
ObjectType = "1" or "2" or "3" or 2 more
UnionMember1 = array of object { node_id, class_name, hash, 2 more }
node_type: optional "1" or "2" or "3" or 2 more or string
ObjectType = "1" or "2" or "3" or 2 more
start_char_idx: optional number
Start char index of the node.
text: optional string
Text content of the node.
text_template: optional string
Template for how text is formatted, with {content} and {metadata_str} placeholders.