# Retrieval ## Retrieve `beta.retrieval.retrieve(RetrievalRetrieveParams**kwargs) -> RetrievalRetrieveResponse` **post** `/api/v1/retrieval/retrieve` Retrieve relevant chunks via hybrid search (vector + full-text), with filtering on built-in or user-defined metadata. ### Parameters - `index_id: str` ID of the index to retrieve against. - `query: str` Natural-language query to retrieve relevant chunks. - `organization_id: Optional[str]` - `project_id: Optional[str]` - `custom_filters: Optional[Dict[str, Optional[CustomFilters]]]` Filters on user-defined metadata fields. - `class CustomFiltersFilterTypeUnionStrIntBoolFloat: …` - `operator: Literal["eq", "ne", "gt", 5 more]` - `"eq"` - `"ne"` - `"gt"` - `"lt"` - `"gte"` - `"lte"` - `"in"` - `"nin"` - `value: Union[str, bool, float, Sequence[Union[str, bool, float]]]` - `str` - `bool` - `float` - `Sequence[Union[str, bool, float]]` - `str` - `bool` - `float` - `Iterable[CustomFiltersUnionMember1]` - `operator: Literal["eq", "ne", "gt", 5 more]` - `"eq"` - `"ne"` - `"gt"` - `"lt"` - `"gte"` - `"lte"` - `"in"` - `"nin"` - `value: Union[float, Iterable[float]]` - `float` - `Iterable[float]` - `full_text_pipeline_weight: Optional[float]` Weight of the full-text search pipeline (0-1). - `num_candidates: Optional[int]` Number of candidates for approximate nearest neighbor search. - `rerank: Optional[Rerank]` Reranking configuration applied after hybrid search. Enabled by default. - `enabled: Optional[bool]` Set to false to disable reranking. - `top_n: Optional[int]` Number of results to return after reranking. - `score_threshold: Optional[float]` Minimum score threshold for returned results. - `static_filters: Optional[StaticFilters]` Filters on built-in document fields (page range, chunk index, etc.). - `parsed_directory_file_id: Optional[StaticFiltersParsedDirectoryFileID]` - `operator: Literal["eq", "ne", "gt", 5 more]` - `"eq"` - `"ne"` - `"gt"` - `"lt"` - `"gte"` - `"lte"` - `"in"` - `"nin"` - `value: Union[str, Sequence[str]]` - `str` - `Sequence[str]` - `top_k: Optional[int]` Maximum number of results to return. - `vector_pipeline_weight: Optional[float]` Weight of the vector search pipeline (0-1). ### Returns - `class RetrievalRetrieveResponse: …` Response containing retrieval results. - `results: List[Result]` Ordered list of retrieved chunks. - `content: str` Text content of the retrieved chunk. - `metadata: Optional[Dict[str, Union[str, int, float, 3 more]]]` User-defined metadata associated with the chunk. - `str` - `int` - `float` - `bool` - `None` - `List[str]` - `rerank_score: Optional[float]` Relevance score from the reranker, if reranking was applied. - `score: Optional[float]` Hybrid search relevance score. - `static_fields: Optional[ResultStaticFields]` Built-in fields stored for every exported chunk. - `attachments: Optional[List[ResultStaticFieldsAttachment]]` Attachments associated with the chunk - `attachment_name: str` Attachment-relative path, e.g. 'screenshots/page_7.jpg'. - `source_id: str` File ID to pass as source_id when fetching the attachment. - `type: str` Attachment kind, e.g. 'screenshot', 'items'. - `chunk_end_char: Optional[int]` End character offset of the chunk. - `chunk_index: Optional[int]` Index of the chunk within the file. - `chunk_start_char: Optional[int]` Start character offset of the chunk. - `chunk_token_count: Optional[int]` Token count of the chunk. - `page_range_end: Optional[int]` Last page number covered by this chunk. - `page_range_start: Optional[int]` First page number covered by this chunk. - `parsed_directory_file_id: Optional[str]` ID of the parsed file. ### Example ```python import os from llama_cloud import LlamaCloud client = LlamaCloud( api_key=os.environ.get("LLAMA_CLOUD_API_KEY"), # This is the default and can be omitted ) retrieval = client.beta.retrieval.retrieve( index_id="idx-abc123", query="What are the key findings?", ) print(retrieval.results) ``` #### Response ```json { "results": [ { "content": "content", "metadata": { "foo": "string" }, "rerank_score": 0, "score": 0, "static_fields": { "attachments": [ { "attachment_name": "attachment_name", "source_id": "source_id", "type": "type" } ], "chunk_end_char": 0, "chunk_index": 0, "chunk_start_char": 0, "chunk_token_count": 0, "page_range_end": 0, "page_range_start": 0, "parsed_directory_file_id": "parsed_directory_file_id" } } ] } ``` ## Find Files `beta.retrieval.find(RetrievalFindParams**kwargs) -> SyncPaginatedCursorPost[RetrievalFindResponse]` **post** `/api/v1/retrieval/files/find` Search for files by name. ### Parameters - `index_id: str` ID of the index to search within. - `organization_id: Optional[str]` - `project_id: Optional[str]` - `file_name: Optional[str]` Exact file name to match. - `file_name_contains: Optional[str]` Substring match on file name (case-insensitive). - `page_size: Optional[int]` The maximum number of items to return. The service may return fewer than this value. If unspecified, a default page size will be used. The maximum value is typically 1000; values above this will be coerced to the maximum. - `page_token: Optional[str]` A page token, received from a previous list call. Provide this to retrieve the subsequent page. ### Returns - `class RetrievalFindResponse: …` A file returned by find. - `file_id: str` ID of the file. - `file_name: str` Display name of the file. ### Example ```python import os from llama_cloud import LlamaCloud client = LlamaCloud( api_key=os.environ.get("LLAMA_CLOUD_API_KEY"), # This is the default and can be omitted ) page = client.beta.retrieval.find( index_id="idx-abc123", ) page = page.items[0] print(page.file_id) ``` #### Response ```json { "items": [ { "file_id": "file_id", "file_name": "file_name" } ], "next_page_token": "next_page_token", "total_size": 0 } ``` ## Grep File `beta.retrieval.grep(RetrievalGrepParams**kwargs) -> SyncPaginatedCursorPost[RetrievalGrepResponse]` **post** `/api/v1/retrieval/files/grep` Grep within a file's parsed content using a regex pattern. ### Parameters - `file_id: str` ID of the file to grep. - `index_id: str` ID of the index the file belongs to. - `pattern: str` Regex pattern to search for. - `organization_id: Optional[str]` - `project_id: Optional[str]` - `context_chars: Optional[int]` Number of characters of context to include before and after the matched pattern in the content field of the response - `page_size: Optional[int]` The maximum number of items to return. The service may return fewer than this value. If unspecified, a default page size will be used. The maximum value is typically 1000; values above this will be coerced to the maximum. - `page_token: Optional[str]` A page token, received from a previous list call. Provide this to retrieve the subsequent page. ### Returns - `class RetrievalGrepResponse: …` A single grep match within a file. - `content: str` Matched text content. - `end_char: int` End character offset of the match. - `start_char: int` Start character offset of the match. ### Example ```python import os from llama_cloud import LlamaCloud client = LlamaCloud( api_key=os.environ.get("LLAMA_CLOUD_API_KEY"), # This is the default and can be omitted ) page = client.beta.retrieval.grep( file_id="file_id", index_id="idx-abc123", pattern="revenue|profit", ) page = page.items[0] print(page.content) ``` #### Response ```json { "items": [ { "content": "content", "end_char": 0, "start_char": 0 } ], "next_page_token": "next_page_token", "total_size": 0 } ``` ## Read File `beta.retrieval.read(RetrievalReadParams**kwargs) -> RetrievalReadResponse` **post** `/api/v1/retrieval/files/read` Read the parsed text content of a specific file. ### Parameters - `file_id: str` ID of the file to read. - `index_id: str` ID of the index the file belongs to. - `organization_id: Optional[str]` - `project_id: Optional[str]` - `max_length: Optional[int]` Maximum number of characters to read from the offset. - `offset: Optional[int]` Starting character offset. ### Returns - `class RetrievalReadResponse: …` File read result. - `content: str` Parsed text content of the file. ### Example ```python import os from llama_cloud import LlamaCloud client = LlamaCloud( api_key=os.environ.get("LLAMA_CLOUD_API_KEY"), # This is the default and can be omitted ) response = client.beta.retrieval.read( file_id="file_id", index_id="idx-abc123", ) print(response.content) ``` #### Response ```json { "content": "content" } ``` ## Domain Types ### Retrieval Retrieve Response - `class RetrievalRetrieveResponse: …` Response containing retrieval results. - `results: List[Result]` Ordered list of retrieved chunks. - `content: str` Text content of the retrieved chunk. - `metadata: Optional[Dict[str, Union[str, int, float, 3 more]]]` User-defined metadata associated with the chunk. - `str` - `int` - `float` - `bool` - `None` - `List[str]` - `rerank_score: Optional[float]` Relevance score from the reranker, if reranking was applied. - `score: Optional[float]` Hybrid search relevance score. - `static_fields: Optional[ResultStaticFields]` Built-in fields stored for every exported chunk. - `attachments: Optional[List[ResultStaticFieldsAttachment]]` Attachments associated with the chunk - `attachment_name: str` Attachment-relative path, e.g. 'screenshots/page_7.jpg'. - `source_id: str` File ID to pass as source_id when fetching the attachment. - `type: str` Attachment kind, e.g. 'screenshot', 'items'. - `chunk_end_char: Optional[int]` End character offset of the chunk. - `chunk_index: Optional[int]` Index of the chunk within the file. - `chunk_start_char: Optional[int]` Start character offset of the chunk. - `chunk_token_count: Optional[int]` Token count of the chunk. - `page_range_end: Optional[int]` Last page number covered by this chunk. - `page_range_start: Optional[int]` First page number covered by this chunk. - `parsed_directory_file_id: Optional[str]` ID of the parsed file. ### Retrieval Find Response - `class RetrievalFindResponse: …` A file returned by find. - `file_id: str` ID of the file. - `file_name: str` Display name of the file. ### Retrieval Grep Response - `class RetrievalGrepResponse: …` A single grep match within a file. - `content: str` Matched text content. - `end_char: int` End character offset of the match. - `start_char: int` Start character offset of the match. ### Retrieval Read Response - `class RetrievalReadResponse: …` File read result. - `content: str` Parsed text content of the file.