Sync Pipeline Data Source

Deprecated

Pipeline pipelines().dataSources().sync(, )

POST/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}/sync

Run incremental ingestion: pull upstream changes from the data source into the data sink.

ParametersExpand Collapse

DataSourceSyncParams params

String pipelineId

Optional<String> dataSourceId

Optional<List<String>> pipelineFileIds

ReturnsExpand Collapse

class Pipeline:

Schema for a pipeline.

String id

Unique identifier

formatuuid

EmbeddingConfig embeddingConfig

One of the following:

class AzureOpenAIEmbeddingConfig:

Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class BedrockEmbeddingConfig:

Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0

Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

class CohereEmbeddingConfig:

Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class GeminiEmbeddingConfig:

Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbeddingConfig:

Optional<HuggingFaceInferenceApiEmbedding> component

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:

String

boolean

Optional<String> className

Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:

CLS("cls")

LAST("last")

MEAN("mean")

Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

The maximum number of seconds to wait for a response from the server. Loading a new model in Inference API can take up to several minutes. Defaults to None, meaning it will loop until the server is available.

Optional<Type> type

Type of the embedding model.

class ManagedOpenAIEmbedding:

Optional<Component> component

Configuration for the Managed OpenAI embedding model.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<ModelName> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

class OpenAIEmbeddingConfig:

Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class VertexAiEmbeddingConfig:

Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:

CLASSIFICATION("classification")

CLUSTERING("clustering")

DEFAULT("default")

RETRIEVAL("retrieval")

SIMILARITY("similarity")

Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

String name

String projectId

Optional<ConfigHash> configHash

Hashes for the configuration of a pipeline.

Optional<String> embeddingConfigHash

Hash of the embedding config.

Optional<String> parsingConfigHash

Hash of the llama parse parameters.

Optional<String> transformConfigHash

Hash of the transform config.

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<DataSink> dataSink

Schema for a data sink.

String id

Unique identifier

formatuuid

Component component

Component that implements the data sink

One of the following:

class UnionMember0:

class CloudPineconeVectorStore:

Cloud Pinecone Vector Store.

This class is used to store the configuration for a Pinecone vector store, so that it can be created and used in LlamaCloud.

Args: api_key (str): API key for authenticating with Pinecone index_name (str): name of the Pinecone index namespace (optional[str]): namespace to use in the Pinecone index insert_kwargs (optional[dict]): additional kwargs to pass during insertion

String apiKey

The API key for authenticating with Pinecone

formatpassword

String indexName

Optional<String> className

Optional<InsertKwargs> insertKwargs

Optional<String> namespace

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

class CloudPostgresVectorStore:

String database

long embedDim

String host

String password

long port

String schemaName

String tableName

String user

Optional<String> className

Optional<PgVectorHnswSettings> hnswSettings

HNSW settings for PGVector.

Optional<DistanceMethod> distanceMethod

The distance method to use.

One of the following:

COSINE("cosine")

HAMMING("hamming")

IP("ip")

JACCARD("jaccard")

L1("l1")

L2("l2")

Optional<Long> efConstruction

The number of edges to use during the construction phase.

minimum1

Optional<Long> efSearch

The number of edges to use during the search phase.

minimum1

Optional<Long> m

The number of bi-directional links created for each new element.

minimum1

Optional<VectorType> vectorType

The type of vector to use.

One of the following:

BIT("bit")

HALF_VEC("half_vec")

SPARSE_VEC("sparse_vec")

VECTOR("vector")

Optional<Boolean> hybridSearch

Optional<Boolean> performSetup

Optional<Boolean> supportsNestedMetadataFilters

class CloudQdrantVectorStore:

Cloud Qdrant Vector Store.

This class is used to store the configuration for a Qdrant vector store, so that it can be created and used in LlamaCloud.

Args: collection_name (str): name of the Qdrant collection url (str): url of the Qdrant instance api_key (str): API key for authenticating with Qdrant max_retries (int): maximum number of retries in case of a failure. Defaults to 3 client_kwargs (dict): additional kwargs to pass to the Qdrant client

String apiKey

String collectionName

String url

Optional<String> className

Optional<ClientKwargs> clientKwargs

Optional<Long> maxRetries

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

class CloudAzureAiSearchVectorStore:

Cloud Azure AI Search Vector Store.

String searchServiceApiKey

String searchServiceEndpoint

Optional<String> className

Optional<String> clientId

Optional<String> clientSecret

Optional<Long> embeddingDimension

Optional<FilterableMetadataFieldKeys> filterableMetadataFieldKeys

Optional<String> indexName

Optional<String> searchServiceApiVersion

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

Optional<String> tenantId

class CloudMongoDBAtlasVectorSearch:

Cloud MongoDB Atlas Vector Store.

This class is used to store the configuration for a MongoDB Atlas vector store, so that it can be created and used in LlamaCloud.

Args: mongodb_uri (str): URI for connecting to MongoDB Atlas db_name (str): name of the MongoDB database collection_name (str): name of the MongoDB collection vector_index_name (str): name of the MongoDB Atlas vector index fulltext_index_name (str): name of the MongoDB Atlas full-text index

String collectionName

String dbName

String mongoDBUri

Optional<String> className

Optional<Long> embeddingDimension

Optional<String> fulltextIndexName

Optional<Boolean> supportsNestedMetadataFilters

Optional<String> vectorIndexName

class CloudMilvusVectorStore:

Cloud Milvus Vector Store.

String uri

Optional<String> token

Optional<String> className

Optional<String> collectionName

Optional<Long> embeddingDimension

Optional<Boolean> supportsNestedMetadataFilters

class CloudAstraDbVectorStore:

Cloud AstraDB Vector Store.

This class is used to store the configuration for an AstraDB vector store, so that it can be created and used in LlamaCloud.

Args: token (str): The Astra DB Application Token to use. api_endpoint (str): The Astra DB JSON API endpoint for your database. collection_name (str): Collection name to use. If not existing, it will be created. embedding_dimension (int): Length of the embedding vectors in use. keyspace (optional[str]): The keyspace to use. If not provided, ‘default_keyspace’

String token

The Astra DB Application Token to use

formatpassword

String apiEndpoint

The Astra DB JSON API endpoint for your database

String collectionName

Collection name to use. If not existing, it will be created

long embeddingDimension

Length of the embedding vectors in use

Optional<String> className

Optional<String> keyspace

The keyspace to use. If not provided, ‘default_keyspace’

Optional<SupportsNestedMetadataFilters> supportsNestedMetadataFilters

String name

The name of the data sink.

String projectId

SinkType sinkType

One of the following:

ASTRA_DB("ASTRA_DB")

AZUREAI_SEARCH("AZUREAI_SEARCH")

MILVUS("MILVUS")

MONGODB_ATLAS("MONGODB_ATLAS")

PINECONE("PINECONE")

POSTGRES("POSTGRES")

QDRANT("QDRANT")

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

Optional<EmbeddingModelConfig> embeddingModelConfig

Schema for an embedding model config.

String id

Unique identifier

formatuuid

EmbeddingConfig embeddingConfig

The embedding configuration for the embedding model config.

One of the following:

class AzureOpenAIEmbeddingConfig:

Optional<AzureOpenAIEmbedding> component

Configuration for the Azure OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for Azure deployment.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for Azure OpenAI API.

Optional<String> azureDeployment

The Azure deployment to use.

Optional<String> azureEndpoint

The Azure endpoint to use.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class BedrockEmbeddingConfig:

Optional<BedrockEmbedding> component

Configuration for the Bedrock embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the bedrock client.

Optional<String> awsAccessKeyId

AWS Access Key ID to use

Optional<String> awsSecretAccessKey

AWS Secret Access Key to use

Optional<String> awsSessionToken

AWS Session Token to use

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

The maximum number of API retries.

exclusiveMinimum0

Optional<String> modelName

The modelId of the Bedrock model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> profileName

The name of aws profile to use. If not given, then the default profile is used.

Optional<String> regionName

AWS region name to use. Uses region configured in AWS CLI if not passed

Optional<Double> timeout

The timeout for the Bedrock API request in seconds. It will be used for both connect and read timeouts.

Optional<Type> type

Type of the embedding model.

class CohereEmbeddingConfig:

Optional<CohereEmbedding> component

Configuration for the Cohere embedding model.

Optional<String> apiKey

The Cohere API key.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> embeddingType

Embedding type. If not provided float embedding_type is used when needed.

Optional<String> inputType

Model Input type. If not provided, search_document and search_query are used when needed.

Optional<String> modelName

The modelId of the Cohere model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<String> truncate

Truncation type - START/ END/ NONE

Optional<Type> type

Type of the embedding model.

class GeminiEmbeddingConfig:

Optional<GeminiEmbedding> component

Configuration for the Gemini embedding model.

Optional<String> apiBase

API base to access the model. Defaults to None.

Optional<String> apiKey

API key to access the model. Defaults to None.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<String> modelName

The modelId of the Gemini model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Long> outputDimensionality

Optional reduced dimension for output embeddings. Supported by models/text-embedding-004 and newer (e.g. gemini-embedding-001). Not supported by models/embedding-001.

Optional<String> taskType

The task for embedding model.

Optional<String> title

Title is only applicable for retrieval_document tasks, and is used to represent a document title. For other tasks, title is invalid.

Optional<String> transport

Transport to access the model. Defaults to None.

Optional<Type> type

Type of the embedding model.

class HuggingFaceInferenceApiEmbeddingConfig:

Optional<HuggingFaceInferenceApiEmbedding> component

Configuration for the HuggingFace Inference API embedding model.

Optional<Token> token

Hugging Face token. Will default to the locally saved token. Pass token=False if you don’t want to send your token to the server.

One of the following:

String

boolean

Optional<String> className

Optional<Cookies> cookies

Additional cookies to send to the server.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Headers> headers

Additional headers to send to the server. By default only the authorization and user-agent headers are sent. Values in this dictionary will override the default values.

Optional<String> modelName

Hugging Face model name. If None, the task will be used.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Pooling> pooling

Enum of possible pooling choices with pooling behaviors.

One of the following:

CLS("cls")

LAST("last")

MEAN("mean")

Optional<String> queryInstruction

Instruction to prepend during query embedding.

Optional<String> task

Optional task to pick Hugging Face’s recommended model, used when model_name is left as default of None.

Optional<String> textInstruction

Instruction to prepend during text embedding.

Optional<Double> timeout

Optional<Type> type

Type of the embedding model.

class OpenAIEmbeddingConfig:

Optional<OpenAIEmbedding> component

Configuration for the OpenAI embedding model.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the OpenAI API.

Optional<String> apiBase

The base URL for OpenAI API.

Optional<String> apiKey

The OpenAI API key.

Optional<String> apiVersion

The version for OpenAI API.

Optional<String> className

Optional<DefaultHeaders> defaultHeaders

The default headers for API requests.

Optional<Long> dimensions

The number of dimensions on the output embedding vectors. Works only with v3 embedding models.

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<Long> maxRetries

Maximum number of retries.

minimum0

Optional<String> modelName

The name of the OpenAI embedding model.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Boolean> reuseClient

Reuse the OpenAI client between requests. When doing anything with large volumes of async API calls, setting this to false can improve stability.

Optional<Double> timeout

Timeout for each request.

minimum0

Optional<Type> type

Type of the embedding model.

class VertexAiEmbeddingConfig:

Optional<VertexTextEmbedding> component

Configuration for the VertexAI embedding model.

Optional<String> clientEmail

The client email for the VertexAI credentials.

String location

The default location to use when making API calls.

Optional<String> privateKey

The private key for the VertexAI credentials.

Optional<String> privateKeyId

The private key ID for the VertexAI credentials.

String project

The default GCP project to use when making Vertex API calls.

Optional<String> tokenUri

The token URI for the VertexAI credentials.

Optional<AdditionalKwargs> additionalKwargs

Additional kwargs for the Vertex.

Optional<String> className

Optional<Long> embedBatchSize

The batch size for embedding calls.

exclusiveMinimum0

maximum2048

Optional<EmbedMode> embedMode

The embedding mode to use.

One of the following:

CLASSIFICATION("classification")

CLUSTERING("clustering")

DEFAULT("default")

RETRIEVAL("retrieval")

SIMILARITY("similarity")

Optional<String> modelName

The modelId of the VertexAI model to use.

Optional<Long> numWorkers

The number of workers to use for async embedding calls.

Optional<Type> type

Type of the embedding model.

String name

The name of the embedding model config.

String projectId

Optional<LocalDateTime> createdAt

Creation datetime

formatdate-time

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

Optional<String> embeddingModelConfigId

The ID of the EmbeddingModelConfig this pipeline is using.

formatuuid

Optional<LlamaParseParameters> llamaParseParameters

Settings that can be configured for how to use LlamaParse to parse files within a LlamaCloud pipeline.

Optional<Boolean> adaptiveLongTable

Optional<Boolean> aggressiveTableExtraction

Optional<Boolean> annotateLinks

Optional<Boolean> autoMode

Optional<String> autoModeConfigurationJson

Optional<Boolean> autoModeTriggerOnImageInPage

Optional<String> autoModeTriggerOnRegexpInPage

Optional<Boolean> autoModeTriggerOnTableInPage

Optional<String> autoModeTriggerOnTextInPage

Optional<String> azureOpenAIApiVersion

Optional<String> azureOpenAIDeploymentName

Optional<String> azureOpenAIEndpoint

Optional<String> azureOpenAIKey

Optional<Double> bboxBottom

Optional<Double> bboxLeft

Optional<Double> bboxRight

Optional<Double> bboxTop

Optional<String> boundingBox

Optional<Boolean> compactMarkdownTable

Optional<String> complementalFormattingInstruction

Optional<String> confidenceScoreEffort

Optional<String> contentGuidelineInstruction

Optional<Boolean> continuousMode

Optional<Boolean> disableImageExtraction

Optional<Boolean> disableOcr

Optional<Boolean> disableReconstruction

Optional<Boolean> doNotCache

Optional<Boolean> doNotUnrollColumns

Optional<Boolean> enableCostOptimizer

Optional<Boolean> extractCharts

Optional<Boolean> extractLayout

Optional<Boolean> extractPrintedPageNumber

Optional<Boolean> fastMode

Optional<String> formattingInstruction

Optional<String> gpt4oApiKey

Optional<Boolean> gpt4oMode

Optional<Boolean> guessXlsxSheetName

Optional<Boolean> hideFooters

Optional<Boolean> hideHeaders

Optional<Boolean> highResOcr

Optional<Boolean> htmlMakeAllElementsVisible

Optional<Boolean> htmlRemoveFixedElements

Optional<Boolean> htmlRemoveNavigationElements

Optional<String> httpProxy

Optional<Boolean> ignoreDocumentElementsForLayoutDetection

Optional<List<ImagesToSave>> imagesToSave

One of the following:

EMBEDDED("embedded")

LAYOUT("layout")

SCREENSHOT("screenshot")

Optional<Boolean> inlineImagesInMarkdown

Optional<String> inputS3Path

Optional<String> inputS3Region

Optional<String> inputUrl

Optional<Boolean> internalIsScreenshotJob

Optional<Boolean> invalidateCache

Optional<Boolean> isFormattingInstruction

Optional<Double> jobTimeoutExtraTimePerPageInSeconds

Optional<Double> jobTimeoutInSeconds

Optional<Boolean> keepPageSeparatorWhenMergingTables

Optional<List<ParsingLanguages>> languages

One of the following:

ABQ("abq")

ADY("ady")

AF("af")

ANG("ang")

AR("ar")

AS("as")

AVA("ava")

AZ("az")

BE("be")

BG("bg")

BGC("bgc")

BH("bh")

BHO("bho")

BN("bn")

BS("bs")

CH_SIM("ch_sim")

CH_TRA("ch_tra")

CHE("che")

CS("cs")

CY("cy")

DA("da")

DAR("dar")

DE("de")

EN("en")

ES("es")

ET("et")

FA("fa")

FR("fr")

GA("ga")

GOM("gom")

HI("hi")

HR("hr")

HU("hu")

ID("id")

INH("inh")

IS("is")

IT("it")

JA("ja")

KBD("kbd")

KN("kn")

KO("ko")

KU("ku")

LA("la")

LBE("lbe")

LEZ("lez")

LT("lt")

LV("lv")

MAH("mah")

MAI("mai")

MI("mi")

MN("mn")

MNI("mni")

MR("mr")

MS("ms")

MT("mt")

NE("ne")

NEW("new")

NL("nl")

NO("no")

OC("oc")

PI("pi")

PL("pl")

PT("pt")

RO("ro")

RS_CYRILLIC("rs_cyrillic")

RS_LATIN("rs_latin")

RU("ru")

SA("sa")

SCK("sck")

SK("sk")

SL("sl")

SQ("sq")

SV("sv")

SW("sw")

TA("ta")

TAB("tab")

TE("te")

TH("th")

TJK("tjk")

TL("tl")

TR("tr")

UG("ug")

UK("uk")

UR("ur")

UZ("uz")

VI("vi")

Optional<Boolean> layoutAware

Optional<Boolean> lineLevelBoundingBox

Optional<String> markdownTableMultilineHeaderSeparator

Optional<Long> maxPages

Optional<Long> maxPagesEnforced

Optional<Boolean> mergeTablesAcrossPagesInMarkdown

Optional<String> model

Optional<Boolean> outlinedTableExtraction

Optional<Boolean> outputPdfOfDocument

Optional<String> outputS3PathPrefix

Optional<String> outputS3Region

Optional<Boolean> outputTablesAsHtml

Optional<Double> pageErrorTolerance

Optional<String> pageFooterPrefix

Optional<String> pageFooterSuffix

Optional<String> pageHeaderPrefix

Optional<String> pageHeaderSuffix

Optional<String> pagePrefix

Optional<String> pageSeparator

Optional<String> pageSuffix

Optional<ParsingMode> parseMode

Enum for representing the mode of parsing to be used.

One of the following:

PARSE_DOCUMENT_WITH_AGENT("parse_document_with_agent")

PARSE_DOCUMENT_WITH_LLM("parse_document_with_llm")

PARSE_DOCUMENT_WITH_LVM("parse_document_with_lvm")

PARSE_PAGE_WITH_AGENT("parse_page_with_agent")

PARSE_PAGE_WITH_LAYOUT_AGENT("parse_page_with_layout_agent")

PARSE_PAGE_WITH_LLM("parse_page_with_llm")

PARSE_PAGE_WITH_LVM("parse_page_with_lvm")

PARSE_PAGE_WITHOUT_LLM("parse_page_without_llm")

Optional<String> parsingInstruction

Optional<Boolean> preciseBoundingBox

Optional<Boolean> premiumMode

Optional<Boolean> presentationOutOfBoundsContent

Optional<Boolean> presentationSkipEmbeddedData

Optional<Boolean> preserveLayoutAlignmentAcrossPages

Optional<Boolean> preserveVerySmallText

Optional<String> preset

Optional<Priority> priority

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

One of the following:

CRITICAL("critical")

HIGH("high")

LOW("low")

MEDIUM("medium")

Optional<String> projectId

Optional<Boolean> removeHiddenText

Optional<FailPageMode> replaceFailedPageMode

Enum for representing the different available page error handling modes.

One of the following:

BLANK_PAGE("blank_page")

ERROR_MESSAGE("error_message")

RAW_TEXT("raw_text")

Optional<String> replaceFailedPageWithErrorMessagePrefix

Optional<String> replaceFailedPageWithErrorMessageSuffix

Optional<Boolean> saveImages

Optional<Boolean> skipDiagonalText

Optional<Boolean> specializedChartParsingAgentic

Optional<Boolean> specializedChartParsingEfficient

Optional<Boolean> specializedChartParsingPlus

Optional<Boolean> specializedImageParsing

Optional<Boolean> spreadsheetExtractSubTables

Optional<Boolean> spreadsheetForceFormulaComputation

Optional<Boolean> spreadsheetIncludeHiddenSheets

Optional<Boolean> strictModeBuggyFont

Optional<Boolean> strictModeImageExtraction

Optional<Boolean> strictModeImageOcr

Optional<Boolean> strictModeReconstruction

Optional<Boolean> structuredOutput

Optional<String> structuredOutputJsonSchema

Optional<String> structuredOutputJsonSchemaName

Optional<String> systemPrompt

Optional<String> systemPromptAppend

Optional<Boolean> takeScreenshot

Optional<String> targetPages

Optional<String> tier

Optional<Boolean> useVendorMultimodalModel

Optional<String> userPrompt

Optional<String> vendorMultimodalApiKey

Optional<String> vendorMultimodalModelName

Optional<String> version

Optional<List<WebhookConfiguration>> webhookConfigurations

Outbound webhook endpoints to notify on job status changes

Optional<List<WebhookEvent>> webhookEvents

Events to subscribe to (e.g. ‘parse.success’, ‘extract.error’). If null, all events are delivered.

One of the following:

CLASSIFY_CANCELLED("classify.cancelled")

CLASSIFY_ERROR("classify.error")

CLASSIFY_PARTIAL_SUCCESS("classify.partial_success")

CLASSIFY_PENDING("classify.pending")

CLASSIFY_RUNNING("classify.running")

CLASSIFY_SUCCESS("classify.success")

EXTRACT_CANCELLED("extract.cancelled")

EXTRACT_ERROR("extract.error")

EXTRACT_PARTIAL_SUCCESS("extract.partial_success")

EXTRACT_PENDING("extract.pending")

EXTRACT_SUCCESS("extract.success")

PARSE_CANCELLED("parse.cancelled")

PARSE_ERROR("parse.error")

PARSE_PARTIAL_SUCCESS("parse.partial_success")

PARSE_PENDING("parse.pending")

PARSE_RUNNING("parse.running")

PARSE_SUCCESS("parse.success")

SHEETS_CANCELLED("sheets.cancelled")

SHEETS_ERROR("sheets.error")

SHEETS_PARTIAL_SUCCESS("sheets.partial_success")

SHEETS_PENDING("sheets.pending")

SHEETS_SUCCESS("sheets.success")

SPLIT_CANCELLED("split.cancelled")

SPLIT_ERROR("split.error")

SPLIT_PENDING("split.pending")

SPLIT_PROCESSING("split.processing")

SPLIT_SUCCESS("split.success")

UNMAPPED_EVENT("unmapped_event")

Optional<WebhookHeaders> webhookHeaders

Custom HTTP headers sent with each webhook request (e.g. auth tokens)

Optional<String> webhookOutputFormat

Response format sent to the webhook: ‘string’ (default) or ‘json’

Optional<String> webhookSigningSecret

Shared signing secret used to sign webhook deliveries. When set, each request includes an HMAC-SHA256 signature of the request body in the ‘LC-Signature’ header (value ‘sha256=’). Recompute the HMAC over the raw request body with this secret to verify the delivery is authentic.

Optional<String> webhookUrl

URL to receive webhook POST notifications

Optional<String> webhookUrl

Optional<String> managedPipelineId

The ID of the ManagedPipeline this playground pipeline is linked to.

formatuuid

Optional<PipelineMetadataConfig> metadataConfig

Metadata configuration for the pipeline.

Optional<List<String>> excludedEmbedMetadataKeys

List of metadata keys to exclude from embeddings

Optional<List<String>> excludedLlmMetadataKeys

List of metadata keys to exclude from LLM during retrieval

Optional<PipelineType> pipelineType

Type of pipeline. Either PLAYGROUND or MANAGED.

One of the following:

MANAGED("MANAGED")

PLAYGROUND("PLAYGROUND")

Optional<PresetRetrievalParams> presetRetrievalParameters

Preset retrieval parameters for the pipeline.

Optional<Double> alpha

Alpha value for hybrid retrieval to determine the weights between dense and sparse retrieval. 0 is sparse retrieval and 1 is dense retrieval.

maximum1

minimum0

Optional<String> className

Optional<Double> denseSimilarityCutoff

Minimum similarity score wrt query for retrieval

maximum1

minimum0

Optional<Long> denseSimilarityTopK

Number of nodes for dense retrieval.

maximum100

minimum1

Optional<Boolean> enableReranking

Enable reranking for retrieval

Optional<Long> filesTopK

Number of files to retrieve (only for retrieval mode files_via_metadata and files_via_content).

maximum5

minimum1

Optional<Long> rerankTopN

Number of reranked nodes for returning.

maximum100

minimum1

Optional<RetrievalMode> retrievalMode

The retrieval mode for the query.

One of the following:

AUTO_ROUTED("auto_routed")

CHUNKS("chunks")

FILES_VIA_CONTENT("files_via_content")

FILES_VIA_METADATA("files_via_metadata")

DeprecatedOptional<Boolean> retrieveImageNodes

Whether to retrieve image nodes.

Optional<Boolean> retrievePageFigureNodes

Whether to retrieve page figure nodes.

Optional<Boolean> retrievePageScreenshotNodes

Whether to retrieve page screenshot nodes.

Metadata filters for vector stores.

One of the following:

Comprehensive metadata filter for vector stores to support more operators.

Value uses Strict types, as int, float and str are compatible types and were all converted to string before.

See: https://docs.pydantic.dev/latest/usage/types/#strict-types

One of the following:

Vector store filter operator.

One of the following:

MetadataFilters

Vector store filter conditions to combine different filters.

One of the following:

Optional<SearchFiltersInferenceSchema> searchFiltersInferenceSchema

JSON Schema that will be used to infer search_filters. Omit or leave as null to skip inference.

One of the following:

class UnionMember0:

List<JsonValue>

String

double

boolean

Optional<Long> sparseSimilarityTopK

Number of nodes for sparse retrieval.

maximum100

minimum1

Optional<SparseModelConfig> sparseModelConfig

Configuration for sparse embedding models used in hybrid search.

This allows users to choose between Splade and BM25 models for sparse retrieval in managed data sinks.

Optional<String> className

Optional<ModelType> modelType

The sparse model type to use. ‘bm25’ uses Qdrant’s FastEmbed BM25 model (default for new pipelines), ‘splade’ uses HuggingFace Splade model, ‘auto’ selects based on deployment mode (BYOC uses term frequency, Cloud uses Splade).

One of the following:

AUTO("auto")

BM25("bm25")

SPLADE("splade")

Optional<Status> status

Status of the pipeline.

One of the following:

CREATED("CREATED")

DELETING("DELETING")

Optional<TransformConfig> transformConfig

Configuration for the transformation.

One of the following:

class AutoTransformConfig:

Optional<Long> chunkOverlap

Chunk overlap for the transformation.

Optional<Long> chunkSize

Chunk size for the transformation.

exclusiveMinimum0

Optional<Mode> mode

class AdvancedModeTransformConfig:

Optional<ChunkingConfig> chunkingConfig

Configuration for the chunking.

One of the following:

class NoneChunkingConfig:

Optional<Mode> mode

class CharacterChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

class TokenChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

Optional<String> separator

class SentenceChunkingConfig:

Optional<Long> chunkOverlap

Optional<Long> chunkSize

Optional<Mode> mode

Optional<String> paragraphSeparator

Optional<String> separator

class SemanticChunkingConfig:

Optional<Long> breakpointPercentileThreshold

Optional<Long> bufferSize

Optional<Mode> mode

Optional<SegmentationConfig> segmentationConfig

Configuration for the segmentation.

One of the following:

class NoneSegmentationConfig:

Optional<Mode> mode

class PageSegmentationConfig:

Optional<Mode> mode

Optional<String> pageSeparator

class ElementSegmentationConfig:

Optional<Mode> mode

Optional<LocalDateTime> updatedAt

Update datetime

formatdate-time

Sync Pipeline Data Source

package ai.llamaindex.llamacloud.example;

import ai.llamaindex.llamacloud.client.LlamaCloudClient;
import ai.llamaindex.llamacloud.client.okhttp.LlamaCloudOkHttpClient;
import ai.llamaindex.llamacloud.models.pipelines.Pipeline;
import ai.llamaindex.llamacloud.models.pipelines.datasources.DataSourceSyncParams;

public final class Main {
    private Main() {}

    public static void main(String[] args) {
        LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv();

        DataSourceSyncParams params = DataSourceSyncParams.builder()
            .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .dataSourceId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e")
            .build();
        Pipeline pipeline = client.pipelines().dataSources().sync(params);
    }
}

{
  "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "embedding_config": {
    "component": {
      "additional_kwargs": {
        "foo": "bar"
      },
      "api_base": "api_base",
      "api_key": "api_key",
      "api_version": "api_version",
      "azure_deployment": "azure_deployment",
      "azure_endpoint": "azure_endpoint",
      "class_name": "class_name",
      "default_headers": {
        "foo": "string"
      },
      "dimensions": 0,
      "embed_batch_size": 1,
      "max_retries": 0,
      "model_name": "model_name",
      "num_workers": 0,
      "reuse_client": true,
      "timeout": 0
    },
    "type": "AZURE_EMBEDDING"
  },
  "name": "name",
  "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "config_hash": {
    "embedding_config_hash": "embedding_config_hash",
    "parsing_config_hash": "parsing_config_hash",
    "transform_config_hash": "transform_config_hash"
  },
  "created_at": "2019-12-27T18:11:19.117Z",
  "data_sink": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "component": {
      "foo": "bar"
    },
    "name": "name",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "sink_type": "ASTRA_DB",
    "created_at": "2019-12-27T18:11:19.117Z",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "embedding_model_config": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "embedding_config": {
      "component": {
        "additional_kwargs": {
          "foo": "bar"
        },
        "api_base": "api_base",
        "api_key": "api_key",
        "api_version": "api_version",
        "azure_deployment": "azure_deployment",
        "azure_endpoint": "azure_endpoint",
        "class_name": "class_name",
        "default_headers": {
          "foo": "string"
        },
        "dimensions": 0,
        "embed_batch_size": 1,
        "max_retries": 0,
        "model_name": "model_name",
        "num_workers": 0,
        "reuse_client": true,
        "timeout": 0
      },
      "type": "AZURE_EMBEDDING"
    },
    "name": "name",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "created_at": "2019-12-27T18:11:19.117Z",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "embedding_model_config_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "llama_parse_parameters": {
    "adaptive_long_table": true,
    "aggressive_table_extraction": true,
    "annotate_links": true,
    "auto_mode": true,
    "auto_mode_configuration_json": "auto_mode_configuration_json",
    "auto_mode_trigger_on_image_in_page": true,
    "auto_mode_trigger_on_regexp_in_page": "auto_mode_trigger_on_regexp_in_page",
    "auto_mode_trigger_on_table_in_page": true,
    "auto_mode_trigger_on_text_in_page": "auto_mode_trigger_on_text_in_page",
    "azure_openai_api_version": "azure_openai_api_version",
    "azure_openai_deployment_name": "azure_openai_deployment_name",
    "azure_openai_endpoint": "azure_openai_endpoint",
    "azure_openai_key": "azure_openai_key",
    "bbox_bottom": 0,
    "bbox_left": 0,
    "bbox_right": 0,
    "bbox_top": 0,
    "bounding_box": "bounding_box",
    "compact_markdown_table": true,
    "complemental_formatting_instruction": "complemental_formatting_instruction",
    "confidence_score_effort": "confidence_score_effort",
    "content_guideline_instruction": "content_guideline_instruction",
    "continuous_mode": true,
    "disable_image_extraction": true,
    "disable_ocr": true,
    "disable_reconstruction": true,
    "do_not_cache": true,
    "do_not_unroll_columns": true,
    "enable_cost_optimizer": true,
    "extract_charts": true,
    "extract_layout": true,
    "extract_printed_page_number": true,
    "fast_mode": true,
    "formatting_instruction": "formatting_instruction",
    "gpt4o_api_key": "gpt4o_api_key",
    "gpt4o_mode": true,
    "guess_xlsx_sheet_name": true,
    "hide_footers": true,
    "hide_headers": true,
    "high_res_ocr": true,
    "html_make_all_elements_visible": true,
    "html_remove_fixed_elements": true,
    "html_remove_navigation_elements": true,
    "http_proxy": "http_proxy",
    "ignore_document_elements_for_layout_detection": true,
    "images_to_save": [
      "embedded"
    ],
    "inline_images_in_markdown": true,
    "input_s3_path": "input_s3_path",
    "input_s3_region": "input_s3_region",
    "input_url": "input_url",
    "internal_is_screenshot_job": true,
    "invalidate_cache": true,
    "is_formatting_instruction": true,
    "job_timeout_extra_time_per_page_in_seconds": 0,
    "job_timeout_in_seconds": 0,
    "keep_page_separator_when_merging_tables": true,
    "languages": [
      "abq"
    ],
    "layout_aware": true,
    "line_level_bounding_box": true,
    "markdown_table_multiline_header_separator": "markdown_table_multiline_header_separator",
    "max_pages": 0,
    "max_pages_enforced": 0,
    "merge_tables_across_pages_in_markdown": true,
    "model": "model",
    "outlined_table_extraction": true,
    "output_pdf_of_document": true,
    "output_s3_path_prefix": "output_s3_path_prefix",
    "output_s3_region": "output_s3_region",
    "output_tables_as_HTML": true,
    "page_error_tolerance": 0,
    "page_footer_prefix": "page_footer_prefix",
    "page_footer_suffix": "page_footer_suffix",
    "page_header_prefix": "page_header_prefix",
    "page_header_suffix": "page_header_suffix",
    "page_prefix": "page_prefix",
    "page_separator": "page_separator",
    "page_suffix": "page_suffix",
    "parse_mode": "parse_document_with_agent",
    "parsing_instruction": "parsing_instruction",
    "precise_bounding_box": true,
    "premium_mode": true,
    "presentation_out_of_bounds_content": true,
    "presentation_skip_embedded_data": true,
    "preserve_layout_alignment_across_pages": true,
    "preserve_very_small_text": true,
    "preset": "preset",
    "priority": "critical",
    "project_id": "project_id",
    "remove_hidden_text": true,
    "replace_failed_page_mode": "blank_page",
    "replace_failed_page_with_error_message_prefix": "replace_failed_page_with_error_message_prefix",
    "replace_failed_page_with_error_message_suffix": "replace_failed_page_with_error_message_suffix",
    "save_images": true,
    "skip_diagonal_text": true,
    "specialized_chart_parsing_agentic": true,
    "specialized_chart_parsing_efficient": true,
    "specialized_chart_parsing_plus": true,
    "specialized_image_parsing": true,
    "spreadsheet_extract_sub_tables": true,
    "spreadsheet_force_formula_computation": true,
    "spreadsheet_include_hidden_sheets": true,
    "strict_mode_buggy_font": true,
    "strict_mode_image_extraction": true,
    "strict_mode_image_ocr": true,
    "strict_mode_reconstruction": true,
    "structured_output": true,
    "structured_output_json_schema": "structured_output_json_schema",
    "structured_output_json_schema_name": "structured_output_json_schema_name",
    "system_prompt": "system_prompt",
    "system_prompt_append": "system_prompt_append",
    "take_screenshot": true,
    "target_pages": "target_pages",
    "tier": "tier",
    "use_vendor_multimodal_model": true,
    "user_prompt": "user_prompt",
    "vendor_multimodal_api_key": "vendor_multimodal_api_key",
    "vendor_multimodal_model_name": "vendor_multimodal_model_name",
    "version": "version",
    "webhook_configurations": [
      {
        "webhook_events": [
          "parse.success",
          "parse.error"
        ],
        "webhook_headers": {
          "Authorization": "Bearer sk-..."
        },
        "webhook_output_format": "json",
        "webhook_signing_secret": "whsec_...",
        "webhook_url": "https://example.com/webhooks/llamacloud"
      }
    ],
    "webhook_url": "webhook_url"
  },
  "managed_pipeline_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "metadata_config": {
    "excluded_embed_metadata_keys": [
      "string"
    ],
    "excluded_llm_metadata_keys": [
      "string"
    ]
  },
  "pipeline_type": "MANAGED",
  "preset_retrieval_parameters": {
    "alpha": 0,
    "class_name": "class_name",
    "dense_similarity_cutoff": 0,
    "dense_similarity_top_k": 1,
    "enable_reranking": true,
    "files_top_k": 1,
    "rerank_top_n": 1,
    "retrieval_mode": "auto_routed",
    "retrieve_image_nodes": true,
    "retrieve_page_figure_nodes": true,
    "retrieve_page_screenshot_nodes": true,
    "search_filters": {
      "filters": [
        {
          "key": "key",
          "value": 0,
          "operator": "!="
        }
      ],
      "condition": "and"
    },
    "search_filters_inference_schema": {
      "foo": {
        "foo": "bar"
      }
    },
    "sparse_similarity_top_k": 1
  },
  "sparse_model_config": {
    "class_name": "class_name",
    "model_type": "auto"
  },
  "status": "CREATED",
  "transform_config": {
    "chunk_overlap": 0,
    "chunk_size": 1,
    "mode": "auto"
  },
  "updated_at": "2019-12-27T18:11:19.117Z"
}

Returns Examples

{
  "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "embedding_config": {
    "component": {
      "additional_kwargs": {
        "foo": "bar"
      },
      "api_base": "api_base",
      "api_key": "api_key",
      "api_version": "api_version",
      "azure_deployment": "azure_deployment",
      "azure_endpoint": "azure_endpoint",
      "class_name": "class_name",
      "default_headers": {
        "foo": "string"
      },
      "dimensions": 0,
      "embed_batch_size": 1,
      "max_retries": 0,
      "model_name": "model_name",
      "num_workers": 0,
      "reuse_client": true,
      "timeout": 0
    },
    "type": "AZURE_EMBEDDING"
  },
  "name": "name",
  "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "config_hash": {
    "embedding_config_hash": "embedding_config_hash",
    "parsing_config_hash": "parsing_config_hash",
    "transform_config_hash": "transform_config_hash"
  },
  "created_at": "2019-12-27T18:11:19.117Z",
  "data_sink": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "component": {
      "foo": "bar"
    },
    "name": "name",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "sink_type": "ASTRA_DB",
    "created_at": "2019-12-27T18:11:19.117Z",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "embedding_model_config": {
    "id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "embedding_config": {
      "component": {
        "additional_kwargs": {
          "foo": "bar"
        },
        "api_base": "api_base",
        "api_key": "api_key",
        "api_version": "api_version",
        "azure_deployment": "azure_deployment",
        "azure_endpoint": "azure_endpoint",
        "class_name": "class_name",
        "default_headers": {
          "foo": "string"
        },
        "dimensions": 0,
        "embed_batch_size": 1,
        "max_retries": 0,
        "model_name": "model_name",
        "num_workers": 0,
        "reuse_client": true,
        "timeout": 0
      },
      "type": "AZURE_EMBEDDING"
    },
    "name": "name",
    "project_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
    "created_at": "2019-12-27T18:11:19.117Z",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "embedding_model_config_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "llama_parse_parameters": {
    "adaptive_long_table": true,
    "aggressive_table_extraction": true,
    "annotate_links": true,
    "auto_mode": true,
    "auto_mode_configuration_json": "auto_mode_configuration_json",
    "auto_mode_trigger_on_image_in_page": true,
    "auto_mode_trigger_on_regexp_in_page": "auto_mode_trigger_on_regexp_in_page",
    "auto_mode_trigger_on_table_in_page": true,
    "auto_mode_trigger_on_text_in_page": "auto_mode_trigger_on_text_in_page",
    "azure_openai_api_version": "azure_openai_api_version",
    "azure_openai_deployment_name": "azure_openai_deployment_name",
    "azure_openai_endpoint": "azure_openai_endpoint",
    "azure_openai_key": "azure_openai_key",
    "bbox_bottom": 0,
    "bbox_left": 0,
    "bbox_right": 0,
    "bbox_top": 0,
    "bounding_box": "bounding_box",
    "compact_markdown_table": true,
    "complemental_formatting_instruction": "complemental_formatting_instruction",
    "confidence_score_effort": "confidence_score_effort",
    "content_guideline_instruction": "content_guideline_instruction",
    "continuous_mode": true,
    "disable_image_extraction": true,
    "disable_ocr": true,
    "disable_reconstruction": true,
    "do_not_cache": true,
    "do_not_unroll_columns": true,
    "enable_cost_optimizer": true,
    "extract_charts": true,
    "extract_layout": true,
    "extract_printed_page_number": true,
    "fast_mode": true,
    "formatting_instruction": "formatting_instruction",
    "gpt4o_api_key": "gpt4o_api_key",
    "gpt4o_mode": true,
    "guess_xlsx_sheet_name": true,
    "hide_footers": true,
    "hide_headers": true,
    "high_res_ocr": true,
    "html_make_all_elements_visible": true,
    "html_remove_fixed_elements": true,
    "html_remove_navigation_elements": true,
    "http_proxy": "http_proxy",
    "ignore_document_elements_for_layout_detection": true,
    "images_to_save": [
      "embedded"
    ],
    "inline_images_in_markdown": true,
    "input_s3_path": "input_s3_path",
    "input_s3_region": "input_s3_region",
    "input_url": "input_url",
    "internal_is_screenshot_job": true,
    "invalidate_cache": true,
    "is_formatting_instruction": true,
    "job_timeout_extra_time_per_page_in_seconds": 0,
    "job_timeout_in_seconds": 0,
    "keep_page_separator_when_merging_tables": true,
    "languages": [
      "abq"
    ],
    "layout_aware": true,
    "line_level_bounding_box": true,
    "markdown_table_multiline_header_separator": "markdown_table_multiline_header_separator",
    "max_pages": 0,
    "max_pages_enforced": 0,
    "merge_tables_across_pages_in_markdown": true,
    "model": "model",
    "outlined_table_extraction": true,
    "output_pdf_of_document": true,
    "output_s3_path_prefix": "output_s3_path_prefix",
    "output_s3_region": "output_s3_region",
    "output_tables_as_HTML": true,
    "page_error_tolerance": 0,
    "page_footer_prefix": "page_footer_prefix",
    "page_footer_suffix": "page_footer_suffix",
    "page_header_prefix": "page_header_prefix",
    "page_header_suffix": "page_header_suffix",
    "page_prefix": "page_prefix",
    "page_separator": "page_separator",
    "page_suffix": "page_suffix",
    "parse_mode": "parse_document_with_agent",
    "parsing_instruction": "parsing_instruction",
    "precise_bounding_box": true,
    "premium_mode": true,
    "presentation_out_of_bounds_content": true,
    "presentation_skip_embedded_data": true,
    "preserve_layout_alignment_across_pages": true,
    "preserve_very_small_text": true,
    "preset": "preset",
    "priority": "critical",
    "project_id": "project_id",
    "remove_hidden_text": true,
    "replace_failed_page_mode": "blank_page",
    "replace_failed_page_with_error_message_prefix": "replace_failed_page_with_error_message_prefix",
    "replace_failed_page_with_error_message_suffix": "replace_failed_page_with_error_message_suffix",
    "save_images": true,
    "skip_diagonal_text": true,
    "specialized_chart_parsing_agentic": true,
    "specialized_chart_parsing_efficient": true,
    "specialized_chart_parsing_plus": true,
    "specialized_image_parsing": true,
    "spreadsheet_extract_sub_tables": true,
    "spreadsheet_force_formula_computation": true,
    "spreadsheet_include_hidden_sheets": true,
    "strict_mode_buggy_font": true,
    "strict_mode_image_extraction": true,
    "strict_mode_image_ocr": true,
    "strict_mode_reconstruction": true,
    "structured_output": true,
    "structured_output_json_schema": "structured_output_json_schema",
    "structured_output_json_schema_name": "structured_output_json_schema_name",
    "system_prompt": "system_prompt",
    "system_prompt_append": "system_prompt_append",
    "take_screenshot": true,
    "target_pages": "target_pages",
    "tier": "tier",
    "use_vendor_multimodal_model": true,
    "user_prompt": "user_prompt",
    "vendor_multimodal_api_key": "vendor_multimodal_api_key",
    "vendor_multimodal_model_name": "vendor_multimodal_model_name",
    "version": "version",
    "webhook_configurations": [
      {
        "webhook_events": [
          "parse.success",
          "parse.error"
        ],
        "webhook_headers": {
          "Authorization": "Bearer sk-..."
        },
        "webhook_output_format": "json",
        "webhook_signing_secret": "whsec_...",
        "webhook_url": "https://example.com/webhooks/llamacloud"
      }
    ],
    "webhook_url": "webhook_url"
  },
  "managed_pipeline_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e",
  "metadata_config": {
    "excluded_embed_metadata_keys": [
      "string"
    ],
    "excluded_llm_metadata_keys": [
      "string"
    ]
  },
  "pipeline_type": "MANAGED",
  "preset_retrieval_parameters": {
    "alpha": 0,
    "class_name": "class_name",
    "dense_similarity_cutoff": 0,
    "dense_similarity_top_k": 1,
    "enable_reranking": true,
    "files_top_k": 1,
    "rerank_top_n": 1,
    "retrieval_mode": "auto_routed",
    "retrieve_image_nodes": true,
    "retrieve_page_figure_nodes": true,
    "retrieve_page_screenshot_nodes": true,
    "search_filters": {
      "filters": [
        {
          "key": "key",
          "value": 0,
          "operator": "!="
        }
      ],
      "condition": "and"
    },
    "search_filters_inference_schema": {
      "foo": {
        "foo": "bar"
      }
    },
    "sparse_similarity_top_k": 1
  },
  "sparse_model_config": {
    "class_name": "class_name",
    "model_type": "auto"
  },
  "status": "CREATED",
  "transform_config": {
    "chunk_overlap": 0,
    "chunk_size": 1,
    "mode": "auto"
  },
  "updated_at": "2019-12-27T18:11:19.117Z"
}