Pipelines
ModelsExpand Collapse
advanced_mode_transform_config: object { chunking_config, mode, segmentation_config }
chunking_config: optional object { mode } or object { chunk_overlap, chunk_size, mode } or object { chunk_overlap, chunk_size, mode, separator } or 2 more
azure_openai_embedding: object { additional_kwargs, api_base, api_key, 12 more }
azure_openai_embedding_config: object { component, type }
component: optional object { additional_kwargs, api_base, api_key, 12 more }
Configuration for the Azure OpenAI embedding model.
bedrock_embedding_config: object { component, type }
cohere_embedding_config: object { component, type }
component: optional object { api_key, class_name, embed_batch_size, 5 more }
data_sink_create: object { component, name, sink_type }
Schema for creating a data sink.
component: map[unknown] or CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } or CloudPostgresVectorStore { database, embed_dim, host, 10 more } or 5 more
Component that implements the data sink
cloud_pinecone_vector_store: object { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
cloud_postgres_vector_store: object { database, embed_dim, host, 10 more }
cloud_qdrant_vector_store: object { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
cloud_azure_ai_search_vector_store: object { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
cloud_mongodb_atlas_vector_search: object { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
cloud_astra_db_vector_store: object { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
gemini_embedding: object { api_base, api_key, class_name, 7 more }
output_dimensionality: optional number
Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.
gemini_embedding_config: object { component, type }
component: optional object { api_base, api_key, class_name, 7 more }
Configuration for the Gemini embedding model.
output_dimensionality: optional number
Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.
hugging_face_inference_api_embedding: object { token, class_name, cookies, 9 more }
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
hugging_face_inference_api_embedding_config: object { component, type }
component: optional object { token, class_name, cookies, 9 more }
Configuration for the HuggingFace Inference API embedding model.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
llama_parse_parameters: object { adaptive_long_table, aggressive_table_extraction, annotate_links, 116 more }
parse_mode: optional "parse_page_without_llm" or "parse_page_with_llm" or "parse_page_with_lvm" or 5 more
webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_output_format, webhook_url }
Outbound webhook endpoints to notify on job status changes
webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 14 more
Events to subscribe to (e.g. 'parse.success', 'extract.error'). If null, all events are delivered.
webhook_headers: optional map[string]
Custom HTTP headers sent with each webhook request (e.g. auth tokens)
metadata_filters: object { filters, condition }
Metadata filters for vector stores.
MetadataFilter: object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
openai_embedding: object { additional_kwargs, api_base, api_key, 10 more }
openai_embedding_config: object { component, type }
component: optional object { additional_kwargs, api_base, api_key, 10 more }
Configuration for the OpenAI embedding model.
pipeline: object { id, embedding_config, name, 15 more }
Schema for a pipeline.
embedding_config: object { component, type } or AzureOpenAIEmbeddingConfig { component, type } or CohereEmbeddingConfig { component, type } or 5 more
MANAGED_OPENAI_EMBEDDING: object { component, type }
azure_openai_embedding_config: object { component, type }
component: optional object { additional_kwargs, api_base, api_key, 12 more }
Configuration for the Azure OpenAI embedding model.
cohere_embedding_config: object { component, type }
component: optional object { api_key, class_name, embed_batch_size, 5 more }
gemini_embedding_config: object { component, type }
component: optional object { api_base, api_key, class_name, 7 more }
Configuration for the Gemini embedding model.
output_dimensionality: optional number
Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.
hugging_face_inference_api_embedding_config: object { component, type }
component: optional object { token, class_name, cookies, 9 more }
Configuration for the HuggingFace Inference API embedding model.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
openai_embedding_config: object { component, type }
component: optional object { additional_kwargs, api_base, api_key, 10 more }
Configuration for the OpenAI embedding model.
vertex_ai_embedding_config: object { component, type }
bedrock_embedding_config: object { component, type }
data_sink: optional object { id, component, name, 4 more }
Schema for a data sink.
component: map[unknown] or CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } or CloudPostgresVectorStore { database, embed_dim, host, 10 more } or 5 more
Component that implements the data sink
cloud_pinecone_vector_store: object { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
cloud_postgres_vector_store: object { database, embed_dim, host, 10 more }
cloud_qdrant_vector_store: object { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
cloud_azure_ai_search_vector_store: object { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
cloud_mongodb_atlas_vector_search: object { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
cloud_astra_db_vector_store: object { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
embedding_model_config: optional object { id, embedding_config, name, 3 more }
Schema for an embedding model config.
embedding_config: AzureOpenAIEmbeddingConfig { component, type } or CohereEmbeddingConfig { component, type } or GeminiEmbeddingConfig { component, type } or 4 more
The embedding configuration for the embedding model config.
azure_openai_embedding_config: object { component, type }
component: optional object { additional_kwargs, api_base, api_key, 12 more }
Configuration for the Azure OpenAI embedding model.
cohere_embedding_config: object { component, type }
component: optional object { api_key, class_name, embed_batch_size, 5 more }
gemini_embedding_config: object { component, type }
component: optional object { api_base, api_key, class_name, 7 more }
Configuration for the Gemini embedding model.
output_dimensionality: optional number
Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.
hugging_face_inference_api_embedding_config: object { component, type }
component: optional object { token, class_name, cookies, 9 more }
Configuration for the HuggingFace Inference API embedding model.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
openai_embedding_config: object { component, type }
component: optional object { additional_kwargs, api_base, api_key, 10 more }
Configuration for the OpenAI embedding model.
vertex_ai_embedding_config: object { component, type }
bedrock_embedding_config: object { component, type }
embedding_model_config_id: optional string
The ID of the EmbeddingModelConfig this pipeline is using.
llama_parse_parameters: optional object { adaptive_long_table, aggressive_table_extraction, annotate_links, 116 more }
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
parse_mode: optional "parse_page_without_llm" or "parse_page_with_llm" or "parse_page_with_lvm" or 5 more
webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_output_format, webhook_url }
Outbound webhook endpoints to notify on job status changes
webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 14 more
Events to subscribe to (e.g. 'parse.success', 'extract.error'). If null, all events are delivered.
webhook_headers: optional map[string]
Custom HTTP headers sent with each webhook request (e.g. auth tokens)
managed_pipeline_id: optional string
The ID of the ManagedPipeline this playground pipeline is linked to.
preset_retrieval_parameters: optional object { alpha, class_name, dense_similarity_cutoff, 11 more }
Preset retrieval parameters for the pipeline.
alpha: optional number
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
files_top_k: optional number
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
search_filters: optional object { filters, condition }
Metadata filters for vector stores.
MetadataFilter: object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
sparse_model_config: optional object { class_name, model_type }
Configuration for sparse embedding models used in hybrid search.
This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.
transform_config: optional AutoTransformConfig { chunk_overlap, chunk_size, mode } or AdvancedModeTransformConfig { chunking_config, mode, segmentation_config }
Configuration for the transformation.
advanced_mode_transform_config: object { chunking_config, mode, segmentation_config }
chunking_config: optional object { mode } or object { chunk_overlap, chunk_size, mode } or object { chunk_overlap, chunk_size, mode, separator } or 2 more
pipeline_create: object { name, data_sink, data_sink_id, 10 more }
Schema for creating a pipeline.
data_sink: optional object { component, name, sink_type }
Schema for creating a data sink.
component: map[unknown] or CloudPineconeVectorStore { api_key, index_name, class_name, 3 more } or CloudPostgresVectorStore { database, embed_dim, host, 10 more } or 5 more
Component that implements the data sink
cloud_pinecone_vector_store: object { api_key, index_name, class_name, 3 more }
Cloud Pinecone Vector Store.
This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.
Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion
cloud_postgres_vector_store: object { database, embed_dim, host, 10 more }
cloud_qdrant_vector_store: object { api_key, collection_name, url, 4 more }
Cloud Qdrant Vector Store.
This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.
Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client
cloud_azure_ai_search_vector_store: object { search_service_api_key, search_service_endpoint, class_name, 8 more }
Cloud Azure AI Search Vector Store.
cloud_mongodb_atlas_vector_search: object { collection_name, db_name, mongodb_uri, 5 more }
Cloud MongoDB Atlas Vector Store.
This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.
Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index
cloud_astra_db_vector_store: object { token, api_endpoint, collection_name, 4 more }
Cloud AstraDB Vector Store.
This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.
Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, 'default_keyspace'
data_sink_id: optional string
Data sink ID. When provided instead of data_sink, the data sink will be looked up by ID.
embedding_config: optional AzureOpenAIEmbeddingConfig { component, type } or CohereEmbeddingConfig { component, type } or GeminiEmbeddingConfig { component, type } or 4 more
azure_openai_embedding_config: object { component, type }
component: optional object { additional_kwargs, api_base, api_key, 12 more }
Configuration for the Azure OpenAI embedding model.
cohere_embedding_config: object { component, type }
component: optional object { api_key, class_name, embed_batch_size, 5 more }
gemini_embedding_config: object { component, type }
component: optional object { api_base, api_key, class_name, 7 more }
Configuration for the Gemini embedding model.
output_dimensionality: optional number
Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.
hugging_face_inference_api_embedding_config: object { component, type }
component: optional object { token, class_name, cookies, 9 more }
Configuration for the HuggingFace Inference API embedding model.
headers: optional map[string]
Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.
openai_embedding_config: object { component, type }
component: optional object { additional_kwargs, api_base, api_key, 10 more }
Configuration for the OpenAI embedding model.
vertex_ai_embedding_config: object { component, type }
bedrock_embedding_config: object { component, type }
embedding_model_config_id: optional string
Embedding model config ID. When provided instead of embedding_config, the embedding model config will be looked up by ID.
llama_parse_parameters: optional object { adaptive_long_table, aggressive_table_extraction, annotate_links, 116 more }
Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.
parse_mode: optional "parse_page_without_llm" or "parse_page_with_llm" or "parse_page_with_lvm" or 5 more
webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_output_format, webhook_url }
Outbound webhook endpoints to notify on job status changes
webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 14 more
Events to subscribe to (e.g. 'parse.success', 'extract.error'). If null, all events are delivered.
webhook_headers: optional map[string]
Custom HTTP headers sent with each webhook request (e.g. auth tokens)
managed_pipeline_id: optional string
The ID of the ManagedPipeline this playground pipeline is linked to.
preset_retrieval_parameters: optional object { alpha, class_name, dense_similarity_cutoff, 11 more }
Preset retrieval parameters for the pipeline.
alpha: optional number
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
files_top_k: optional number
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
search_filters: optional object { filters, condition }
Metadata filters for vector stores.
MetadataFilter: object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
sparse_model_config: optional object { class_name, model_type }
Configuration for sparse embedding models used in hybrid search.
This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.
transform_config: optional AutoTransformConfig { chunk_overlap, chunk_size, mode } or AdvancedModeTransformConfig { chunking_config, mode, segmentation_config }
Configuration for the transformation.
advanced_mode_transform_config: object { chunking_config, mode, segmentation_config }
chunking_config: optional object { mode } or object { chunk_overlap, chunk_size, mode } or object { chunk_overlap, chunk_size, mode, separator } or 2 more
preset_retrieval_params: object { alpha, class_name, dense_similarity_cutoff, 11 more }
Schema for the search params for an retrieval execution that can be preset for a pipeline.
alpha: optional number
Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.
files_top_k: optional number
Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).
search_filters: optional object { filters, condition }
Metadata filters for vector stores.
MetadataFilter: object { key, value, operator }
Comprehensive metadata filter for vector stores to support more operators.
Value uses Strict types, as int, float and str are compatible types and were all converted to string before.
See: https://docs.pydantic.dev/latest/usage/types/#strict-types
PipelinesSync
Cancel Pipeline Sync
PipelinesData Sources
List Pipeline Data Sources
Add Data Sources To Pipeline
Update Pipeline Data Source
Get Pipeline Data Source Status
Sync Pipeline Data Source
ModelsExpand Collapse
pipeline_data_source: object { id, component, data_source_id, 13 more }
Schema for a data source in a pipeline.
component: map[unknown] or CloudS3DataSource { bucket, aws_access_id, aws_access_secret, 5 more } or CloudAzStorageBlobDataSource { account_url, container_name, account_key, 8 more } or 9 more
Component that implements the data source
cloud_s3_data_source: object { bucket, aws_access_id, aws_access_secret, 5 more }
cloud_google_drive_data_source: object { folder_id, class_name, service_account_key, supports_access_control }
cloud_sharepoint_data_source: object { client_id, client_secret, tenant_id, 11 more }
exclude_path_patterns: optional array of string
List of regex patterns for file paths to exclude. Files whose paths (including filename) match any pattern will be excluded. Example: ['/temp/', '/backup/', '.git/', '.tmp$', '^~']
cloud_confluence_data_source: object { authentication_mechanism, server_url, api_token, 10 more }
cloud_jira_data_source: object { authentication_mechanism, query, api_token, 5 more }
Cloud Jira Data Source integrating JiraReader.
cloud_jira_data_source_v2: object { authentication_mechanism, query, server_url, 10 more }
cloud_box_data_source: object { authentication_mechanism, class_name, client_id, 6 more }
PipelinesImages
List File Page Screenshots
Get File Page Screenshot
Get File Page Figure
List File Pages Figures
PipelinesFiles
Get Pipeline File Status Counts
Get Pipeline File Status
Add Files To Pipeline Api
Update Pipeline File
Delete Pipeline File
List Pipeline Files2
ModelsExpand Collapse
pipeline_file: object { id, pipeline_id, config_hash, 16 more }
A file associated with a pipeline.
PipelinesMetadata
Import Pipeline Metadata
Delete Pipeline Files Metadata
PipelinesDocuments
Create Batch Pipeline Documents
Paginated List Pipeline Documents
Get Pipeline Document
Delete Pipeline Document
Get Pipeline Document Status
Sync Pipeline Document
List Pipeline Document Chunks
Upsert Batch Pipeline Documents
ModelsExpand Collapse
text_node: object { class_name, embedding, end_char_idx, 11 more }
Provided for backward compatibility.
excluded_embed_metadata_keys: optional array of string
Metadata keys that are excluded from text for the embed model.
excluded_llm_metadata_keys: optional array of string
Metadata keys that are excluded from text for the LLM.
metadata_template: optional string
Template for how metadata is formatted, with {key} and {value} placeholders.