# Documents ## Create Batch Pipeline Documents `List pipelines().documents().create(DocumentCreateParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **post** `/api/v1/pipelines/{pipeline_id}/documents` Batch create documents for a pipeline. ### Parameters - `DocumentCreateParams params` - `Optional pipelineId` - `List body` - `Metadata metadata` - `String text` - `Optional id` - `Optional> excludedEmbedMetadataKeys` - `Optional> excludedLlmMetadataKeys` - `Optional> pagePositions` indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1]. ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.core.JsonValue; import com.llamacloud_prod.api.models.pipelines.documents.CloudDocument; import com.llamacloud_prod.api.models.pipelines.documents.CloudDocumentCreate; import com.llamacloud_prod.api.models.pipelines.documents.DocumentCreateParams; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); DocumentCreateParams params = DocumentCreateParams.builder() .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e") .addBody(CloudDocumentCreate.builder() .metadata(CloudDocumentCreate.Metadata.builder() .putAdditionalProperty("foo", JsonValue.from("bar")) .build()) .text("text") .build()) .build(); List cloudDocuments = client.pipelines().documents().create(params); } } ``` #### Response ```json [ { "id": "id", "metadata": { "foo": "bar" }, "text": "text", "excluded_embed_metadata_keys": [ "string" ], "excluded_llm_metadata_keys": [ "string" ], "page_positions": [ 0 ], "status_metadata": { "foo": "bar" } } ] ``` ## Paginated List Pipeline Documents `DocumentListPage pipelines().documents().list(DocumentListParamsparams = DocumentListParams.none(), RequestOptionsrequestOptions = RequestOptions.none())` **get** `/api/v1/pipelines/{pipeline_id}/documents/paginated` Return a list of documents for a pipeline. ### Parameters - `DocumentListParams params` - `Optional pipelineId` - `Optional fileId` - `Optional limit` - `Optional onlyApiDataSourceDocuments` - `Optional onlyDirectUpload` - `Optional skip` - `Optional statusRefreshPolicy` - `CACHED("cached")` - `TTL("ttl")` ### Returns - `class CloudDocument:` Cloud document stored in S3. - `String id` - `Metadata metadata` - `String text` - `Optional> excludedEmbedMetadataKeys` - `Optional> excludedLlmMetadataKeys` - `Optional> pagePositions` indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1]. - `Optional statusMetadata` ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.pipelines.documents.DocumentListPage; import com.llamacloud_prod.api.models.pipelines.documents.DocumentListParams; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); DocumentListPage page = client.pipelines().documents().list("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e"); } } ``` #### Response ```json { "documents": [ { "id": "id", "metadata": { "foo": "bar" }, "text": "text", "excluded_embed_metadata_keys": [ "string" ], "excluded_llm_metadata_keys": [ "string" ], "page_positions": [ 0 ], "status_metadata": { "foo": "bar" } } ], "limit": 0, "offset": 0, "total_count": 0 } ``` ## Get Pipeline Document `CloudDocument pipelines().documents().get(DocumentGetParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **get** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}` Return a single document for a pipeline. ### Parameters - `DocumentGetParams params` - `String pipelineId` - `Optional documentId` ### Returns - `class CloudDocument:` Cloud document stored in S3. - `String id` - `Metadata metadata` - `String text` - `Optional> excludedEmbedMetadataKeys` - `Optional> excludedLlmMetadataKeys` - `Optional> pagePositions` indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1]. - `Optional statusMetadata` ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.pipelines.documents.CloudDocument; import com.llamacloud_prod.api.models.pipelines.documents.DocumentGetParams; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); DocumentGetParams params = DocumentGetParams.builder() .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e") .documentId("document_id") .build(); CloudDocument cloudDocument = client.pipelines().documents().get(params); } } ``` #### Response ```json { "id": "id", "metadata": { "foo": "bar" }, "text": "text", "excluded_embed_metadata_keys": [ "string" ], "excluded_llm_metadata_keys": [ "string" ], "page_positions": [ 0 ], "status_metadata": { "foo": "bar" } } ``` ## Delete Pipeline Document `pipelines().documents().delete(DocumentDeleteParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **delete** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}` Delete a document from a pipeline; runs async (vectors first, then MongoDB record). ### Parameters - `DocumentDeleteParams params` - `String pipelineId` - `Optional documentId` ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.pipelines.documents.DocumentDeleteParams; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); DocumentDeleteParams params = DocumentDeleteParams.builder() .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e") .documentId("document_id") .build(); client.pipelines().documents().delete(params); } } ``` ## Get Pipeline Document Status `ManagedIngestionStatusResponse pipelines().documents().getStatus(DocumentGetStatusParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **get** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}/status` Return a single document for a pipeline. ### Parameters - `DocumentGetStatusParams params` - `String pipelineId` - `Optional documentId` ### Returns - `class ManagedIngestionStatusResponse:` - `Status status` Status of the ingestion. - `NOT_STARTED("NOT_STARTED")` - `IN_PROGRESS("IN_PROGRESS")` - `SUCCESS("SUCCESS")` - `ERROR("ERROR")` - `PARTIAL_SUCCESS("PARTIAL_SUCCESS")` - `CANCELLED("CANCELLED")` - `Optional deploymentDate` Date of the deployment. - `Optional effectiveAt` When the status is effective - `Optional> error` List of errors that occurred during ingestion. - `String jobId` ID of the job that failed. - `String message` List of errors that occurred during ingestion. - `Step step` Name of the job that failed. - `MANAGED_INGESTION("MANAGED_INGESTION")` - `DATA_SOURCE("DATA_SOURCE")` - `FILE_UPDATER("FILE_UPDATER")` - `PARSE("PARSE")` - `TRANSFORM("TRANSFORM")` - `INGESTION("INGESTION")` - `METADATA_UPDATE("METADATA_UPDATE")` - `Optional jobId` ID of the latest job. ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.pipelines.ManagedIngestionStatusResponse; import com.llamacloud_prod.api.models.pipelines.documents.DocumentGetStatusParams; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); DocumentGetStatusParams params = DocumentGetStatusParams.builder() .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e") .documentId("document_id") .build(); ManagedIngestionStatusResponse managedIngestionStatusResponse = client.pipelines().documents().getStatus(params); } } ``` #### Response ```json { "status": "NOT_STARTED", "deployment_date": "2019-12-27T18:11:19.117Z", "effective_at": "2019-12-27T18:11:19.117Z", "error": [ { "job_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e", "message": "message", "step": "MANAGED_INGESTION" } ], "job_id": "182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e" } ``` ## Sync Pipeline Document `JsonValue pipelines().documents().sync(DocumentSyncParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **post** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}/sync` Sync a specific document for a pipeline. ### Parameters - `DocumentSyncParams params` - `String pipelineId` - `Optional documentId` ### Returns - `class DocumentSyncResponse:` ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.pipelines.documents.DocumentSyncParams; import com.llamacloud_prod.api.models.pipelines.documents.DocumentSyncResponse; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); DocumentSyncParams params = DocumentSyncParams.builder() .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e") .documentId("document_id") .build(); DocumentSyncResponse response = client.pipelines().documents().sync(params); } } ``` #### Response ```json {} ``` ## List Pipeline Document Chunks `List pipelines().documents().getChunks(DocumentGetChunksParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **get** `/api/v1/pipelines/{pipeline_id}/documents/{document_id}/chunks` Return a list of chunks for a pipeline document. ### Parameters - `DocumentGetChunksParams params` - `String pipelineId` - `Optional documentId` ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.models.pipelines.documents.DocumentGetChunksParams; import com.llamacloud_prod.api.models.pipelines.documents.TextNode; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); DocumentGetChunksParams params = DocumentGetChunksParams.builder() .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e") .documentId("document_id") .build(); List textNodes = client.pipelines().documents().getChunks(params); } } ``` #### Response ```json [ { "class_name": "class_name", "embedding": [ 0 ], "end_char_idx": 0, "excluded_embed_metadata_keys": [ "string" ], "excluded_llm_metadata_keys": [ "string" ], "extra_info": { "foo": "bar" }, "id_": "id_", "metadata_seperator": "metadata_seperator", "metadata_template": "metadata_template", "mimetype": "mimetype", "relationships": { "foo": { "node_id": "node_id", "class_name": "class_name", "hash": "hash", "metadata": { "foo": "bar" }, "node_type": "1" } }, "start_char_idx": 0, "text": "text", "text_template": "text_template" } ] ``` ## Upsert Batch Pipeline Documents `List pipelines().documents().upsert(DocumentUpsertParamsparams, RequestOptionsrequestOptions = RequestOptions.none())` **put** `/api/v1/pipelines/{pipeline_id}/documents` Batch create or update a document for a pipeline. ### Parameters - `DocumentUpsertParams params` - `Optional pipelineId` - `List body` - `Metadata metadata` - `String text` - `Optional id` - `Optional> excludedEmbedMetadataKeys` - `Optional> excludedLlmMetadataKeys` - `Optional> pagePositions` indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1]. ### Example ```java package com.llamacloud_prod.api.example; import com.llamacloud_prod.api.client.LlamaCloudClient; import com.llamacloud_prod.api.client.okhttp.LlamaCloudOkHttpClient; import com.llamacloud_prod.api.core.JsonValue; import com.llamacloud_prod.api.models.pipelines.documents.CloudDocument; import com.llamacloud_prod.api.models.pipelines.documents.CloudDocumentCreate; import com.llamacloud_prod.api.models.pipelines.documents.DocumentUpsertParams; public final class Main { private Main() {} public static void main(String[] args) { LlamaCloudClient client = LlamaCloudOkHttpClient.fromEnv(); DocumentUpsertParams params = DocumentUpsertParams.builder() .pipelineId("182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e") .addBody(CloudDocumentCreate.builder() .metadata(CloudDocumentCreate.Metadata.builder() .putAdditionalProperty("foo", JsonValue.from("bar")) .build()) .text("text") .build()) .build(); List cloudDocuments = client.pipelines().documents().upsert(params); } } ``` #### Response ```json [ { "id": "id", "metadata": { "foo": "bar" }, "text": "text", "excluded_embed_metadata_keys": [ "string" ], "excluded_llm_metadata_keys": [ "string" ], "page_positions": [ 0 ], "status_metadata": { "foo": "bar" } } ] ``` ## Domain Types ### Cloud Document - `class CloudDocument:` Cloud document stored in S3. - `String id` - `Metadata metadata` - `String text` - `Optional> excludedEmbedMetadataKeys` - `Optional> excludedLlmMetadataKeys` - `Optional> pagePositions` indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1]. - `Optional statusMetadata` ### Cloud Document Create - `class CloudDocumentCreate:` Create a new cloud document. - `Metadata metadata` - `String text` - `Optional id` - `Optional> excludedEmbedMetadataKeys` - `Optional> excludedLlmMetadataKeys` - `Optional> pagePositions` indices in the CloudDocument.text where a new page begins. e.g. Second page starts at index specified by page_positions[1]. ### Text Node - `class TextNode:` Provided for backward compatibility. - `Optional className` - `Optional> embedding` Embedding of the node. - `Optional endCharIdx` End char index of the node. - `Optional> excludedEmbedMetadataKeys` Metadata keys that are excluded from text for the embed model. - `Optional> excludedLlmMetadataKeys` Metadata keys that are excluded from text for the LLM. - `Optional extraInfo` A flat dictionary of metadata fields - `Optional id` Unique ID of the node. - `Optional metadataSeperator` Separator between metadata fields when converting to string. - `Optional metadataTemplate` Template for how metadata is formatted, with {key} and {value} placeholders. - `Optional mimetype` MIME type of the node content. - `Optional relationships` A mapping of relationships to other node information. - `class RelatedNodeInfo:` - `String nodeId` - `Optional className` - `Optional hash` - `Optional metadata` - `Optional nodeType` - `_1("1")` - `_2("2")` - `_3("3")` - `_4("4")` - `_5("5")` - `List` - `String nodeId` - `Optional className` - `Optional hash` - `Optional metadata` - `Optional nodeType` - `_1("1")` - `_2("2")` - `_3("3")` - `_4("4")` - `_5("5")` - `Optional startCharIdx` Start char index of the node. - `Optional text` Text content of the node. - `Optional textTemplate` Template for how text is formatted, with {content} and {metadata_str} placeholders.