Get Parse Job

GET/api/v2/parse/{job_id}

Retrieve parse job with optional content or metadata.

ParametersExpand Collapse

job_id: str

expand: Optional[SequenceNotStr[str]]

Fields to include: text, markdown, items, metadata, job_metadata, text_content_metadata, markdown_content_metadata, items_content_metadata, metadata_content_metadata, xlsx_content_metadata, output_pdf_content_metadata, images_content_metadata. Metadata fields include presigned URLs.

image_filenames: Optional[str]

Filter to specific image filenames (optional). Example: image_0.png,image_1.jpg

organization_id: Optional[str]

project_id: Optional[str]

ReturnsExpand Collapse

class ParsingGetResponse: …

Parse result response with job status and optional content or metadata.

The job field is always included. Other fields are included based on expand parameters.

job: Job

Parse job status and metadata

id: str

Unique identifier for the parse job

project_id: str

Project this job belongs to

status: Literal["PENDING", "RUNNING", "COMPLETED", 2 more]

Current status of the job (e.g., pending, running, completed, failed, cancelled)

Accepts one of the following:

"PENDING"

"RUNNING"

"COMPLETED"

"FAILED"

"CANCELLED"

created_at: Optional[datetime]

Creation datetime

formatdate-time

error_message: Optional[str]

Error message if job failed

name: Optional[str]

User friendly name

updated_at: Optional[datetime]

Update datetime

formatdate-time

images_content_metadata: Optional[ImagesContentMetadata]

Metadata for all extracted images.

images: List[ImagesContentMetadataImage]

List of image metadata with presigned URLs

filename: str

Image filename (e.g., 'image_0.png')

index: int

Index of the image in the extraction order

bbox: Optional[ImagesContentMetadataImageBbox]

Bounding box for an image on its page.

h: int

Height of the bounding box

w: int

Width of the bounding box

x: int

X coordinate of the bounding box

y: int

Y coordinate of the bounding box

category: Optional[Literal["screenshot", "embedded", "layout"]]

Image category: 'screenshot' (full page), 'embedded' (images in document), or 'layout' (cropped from layout detection)

Accepts one of the following:

"screenshot"

"embedded"

"layout"

content_type: Optional[str]

MIME type of the image

presigned_url: Optional[str]

Presigned URL to download the image

size_bytes: Optional[int]

Size of the image file in bytes

total_count: int

Total number of extracted images

items: Optional[Items]

Structured JSON result (if requested)

pages: List[ItemsPage]

List of structured pages or failed page entries

Accepts one of the following:

class ItemsPageStructuredResultPage: …

items: List[ItemsPageStructuredResultPageItem]

List of structured items on the page

Accepts one of the following:

class TextItem: …

md: str

Markdown representation preserving formatting

value: str

Text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["text"]]

Text item type

class HeadingItem: …

level: int

Heading level (1-6)

md: str

Markdown representation preserving formatting

value: str

Heading text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["heading"]]

Heading item type

class ListItem: …

items: List[Item]

List of nested text or list items

Accepts one of the following:

class TextItem: …

md: str

Markdown representation preserving formatting

value: str

Text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["text"]]

Text item type

ListItem

md: str

Markdown representation preserving formatting

ordered: bool

Whether the list is ordered or unordered

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["list"]]

List item type

class CodeItem: …

md: str

Markdown representation preserving formatting

value: str

Code content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

language: Optional[str]

Programming language identifier

type: Optional[Literal["code"]]

Code block item type

class TableItem: …

csv: str

CSV representation of the table

html: str

HTML representation of the table

md: str

Markdown representation preserving formatting

rows: List[List[Union[str, float, null]]]

Table data as array of arrays (string, number, or null)

Accepts one of the following:

str

float

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

merged_from_pages: Optional[List[int]]

List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4])

merged_into_page: Optional[int]

Populated when merged into another table. Page number where the full merged table begins (used on empty tables).

parse_concerns: Optional[List[ParseConcern]]

Quality concerns detected during table extraction, indicating the table may have issues

details: str

Human-readable details about the concern

type: str

Type of parse concern (e.g. header_value_type_mismatch, inconsistent_row_cell_count)

type: Optional[Literal["table"]]

Table item type

class ImageItem: …

caption: str

Image caption

md: str

Markdown representation preserving formatting

url: str

URL to the image

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["image"]]

Image item type

class LinkItem: …

md: str

Markdown representation preserving formatting

text: str

Display text of the link

url: str

URL of the link

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["link"]]

Link item type

class HeaderItem: …

items: List[Item]

List of items within the header

Accepts one of the following:

class TextItem: …

md: str

Markdown representation preserving formatting

value: str

Text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["text"]]

Text item type

class HeadingItem: …

level: int

Heading level (1-6)

md: str

Markdown representation preserving formatting

value: str

Heading text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["heading"]]

Heading item type

class ListItem: …

items: List[Item]

List of nested text or list items

Accepts one of the following:

class TextItem: …

md: str

Markdown representation preserving formatting

value: str

Text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["text"]]

Text item type

ListItem

md: str

Markdown representation preserving formatting

ordered: bool

Whether the list is ordered or unordered

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["list"]]

List item type

class CodeItem: …

md: str

Markdown representation preserving formatting

value: str

Code content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

language: Optional[str]

Programming language identifier

type: Optional[Literal["code"]]

Code block item type

class TableItem: …

csv: str

CSV representation of the table

html: str

HTML representation of the table

md: str

Markdown representation preserving formatting

rows: List[List[Union[str, float, null]]]

Table data as array of arrays (string, number, or null)

Accepts one of the following:

str

float

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

merged_from_pages: Optional[List[int]]

List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4])

merged_into_page: Optional[int]

Populated when merged into another table. Page number where the full merged table begins (used on empty tables).

parse_concerns: Optional[List[ParseConcern]]

Quality concerns detected during table extraction, indicating the table may have issues

details: str

Human-readable details about the concern

type: str

Type of parse concern (e.g. header_value_type_mismatch, inconsistent_row_cell_count)

type: Optional[Literal["table"]]

Table item type

class ImageItem: …

caption: str

Image caption

md: str

Markdown representation preserving formatting

url: str

URL to the image

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["image"]]

Image item type

class LinkItem: …

md: str

Markdown representation preserving formatting

text: str

Display text of the link

url: str

URL of the link

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["link"]]

Link item type

md: str

Markdown representation preserving formatting

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["header"]]

Page header container

class FooterItem: …

items: List[Item]

List of items within the footer

Accepts one of the following:

class TextItem: …

md: str

Markdown representation preserving formatting

value: str

Text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["text"]]

Text item type

class HeadingItem: …

level: int

Heading level (1-6)

md: str

Markdown representation preserving formatting

value: str

Heading text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["heading"]]

Heading item type

class ListItem: …

items: List[Item]

List of nested text or list items

Accepts one of the following:

class TextItem: …

md: str

Markdown representation preserving formatting

value: str

Text content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["text"]]

Text item type

ListItem

md: str

Markdown representation preserving formatting

ordered: bool

Whether the list is ordered or unordered

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["list"]]

List item type

class CodeItem: …

md: str

Markdown representation preserving formatting

value: str

Code content

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

language: Optional[str]

Programming language identifier

type: Optional[Literal["code"]]

Code block item type

class TableItem: …

csv: str

CSV representation of the table

html: str

HTML representation of the table

md: str

Markdown representation preserving formatting

rows: List[List[Union[str, float, null]]]

Table data as array of arrays (string, number, or null)

Accepts one of the following:

str

float

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

merged_from_pages: Optional[List[int]]

List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4])

merged_into_page: Optional[int]

Populated when merged into another table. Page number where the full merged table begins (used on empty tables).

parse_concerns: Optional[List[ParseConcern]]

Quality concerns detected during table extraction, indicating the table may have issues

details: str

Human-readable details about the concern

type: str

Type of parse concern (e.g. header_value_type_mismatch, inconsistent_row_cell_count)

type: Optional[Literal["table"]]

Table item type

class ImageItem: …

caption: str

Image caption

md: str

Markdown representation preserving formatting

url: str

URL to the image

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["image"]]

Image item type

class LinkItem: …

md: str

Markdown representation preserving formatting

text: str

Display text of the link

url: str

URL of the link

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["link"]]

Link item type

md: str

Markdown representation preserving formatting

bbox: Optional[List[BBox]]

List of bounding boxes

h: float

Height of the bounding box

w: float

Width of the bounding box

x: float

X coordinate of the bounding box

y: float

Y coordinate of the bounding box

confidence: Optional[float]

Confidence score

end_index: Optional[int]

End index in the text

label: Optional[str]

Label for the bounding box

start_index: Optional[int]

Start index in the text

type: Optional[Literal["footer"]]

Page footer container

page_height: float

Height of the page in points

page_number: int

Page number of the document

page_width: float

Width of the page in points

success: Literal[true]

Success indicator

class ItemsPageFailedStructuredPage: …

error: str

Error message describing the failure

page_number: int

Page number of the document

success: bool

Failure indicator

job_metadata: Optional[Dict[str, object]]

Job execution metadata (if requested)

markdown: Optional[Markdown]

Markdown result (if requested)

pages: List[MarkdownPage]

List of markdown pages or failed page entries

Accepts one of the following:

class MarkdownPageMarkdownResultPage: …

markdown: str

Markdown content of the page

page_number: int

Page number of the document

success: Literal[true]

Success indicator

footer: Optional[str]

Footer of the page in markdown

header: Optional[str]

Header of the page in markdown

class MarkdownPageFailedMarkdownPage: …

error: str

Error message describing the failure

page_number: int

Page number of the document

success: bool

Failure indicator

markdown_full: Optional[str]

Full raw markdown content (if requested)

metadata: Optional[Metadata]

Result containing metadata (page level and general) for the parsed document.

pages: List[MetadataPage]

List of page metadata entries

page_number: int

Page number of the document

confidence: Optional[float]

Confidence score for the page parsing (0-1)

cost_optimized: Optional[bool]

Whether cost-optimized parsing was used for the page

original_orientation_angle: Optional[int]

Original orientation angle of the page in degrees

printed_page_number: Optional[str]

Printed page number as it appears in the document

slide_section_name: Optional[str]

Section name from presentation slides

speaker_notes: Optional[str]

Speaker notes from presentation slides

triggered_auto_mode: Optional[bool]

Whether auto mode was triggered for the page

raw_parameters: Optional[Dict[str, object]]

result_content_metadata: Optional[Dict[str, ResultContentMetadata]]

Metadata including size, existence, and presigned URLs for result files

size_bytes: int

Size of the result file in S3 (bytes)

exists: Optional[bool]

Whether the result file exists in S3

presigned_url: Optional[str]

Presigned URL to download the result file

text: Optional[Text]

Plain text result (if requested)

pages: List[TextPage]

List of text pages

page_number: int

Page number of the document

text: str

Plain text content of the page

text_full: Optional[str]

Full raw text content (if requested)

Get Parse Job

import os
from llama_cloud import LlamaCloud

client = LlamaCloud(
    api_key=os.environ.get("LLAMA_CLOUD_API_KEY"),  # This is the default and can be omitted
)
parsing = client.parsing.get(
    job_id="job_id",
)
print(parsing.job)

{
  "job": {
    "id": "id",
    "project_id": "project_id",
    "status": "PENDING",
    "created_at": "2019-12-27T18:11:19.117Z",
    "error_message": "error_message",
    "name": "name",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "images_content_metadata": {
    "images": [
      {
        "filename": "filename",
        "index": 0,
        "bbox": {
          "h": 0,
          "w": 0,
          "x": 0,
          "y": 0
        },
        "category": "screenshot",
        "content_type": "content_type",
        "presigned_url": "presigned_url",
        "size_bytes": 0
      }
    ],
    "total_count": 0
  },
  "items": {
    "pages": [
      {
        "items": [
          {
            "md": "md",
            "value": "value",
            "bbox": [
              {
                "h": 0,
                "w": 0,
                "x": 0,
                "y": 0,
                "confidence": 0,
                "end_index": 0,
                "label": "label",
                "start_index": 0
              }
            ],
            "type": "text"
          }
        ],
        "page_height": 0,
        "page_number": 0,
        "page_width": 0,
        "success": true
      }
    ]
  },
  "job_metadata": {
    "foo": "bar"
  },
  "markdown": {
    "pages": [
      {
        "markdown": "markdown",
        "page_number": 0,
        "success": true,
        "footer": "footer",
        "header": "header"
      }
    ]
  },
  "markdown_full": "markdown_full",
  "metadata": {
    "pages": [
      {
        "page_number": 0,
        "confidence": 0,
        "cost_optimized": true,
        "original_orientation_angle": 0,
        "printed_page_number": "printed_page_number",
        "slide_section_name": "slide_section_name",
        "speaker_notes": "speaker_notes",
        "triggered_auto_mode": true
      }
    ]
  },
  "raw_parameters": {
    "foo": "bar"
  },
  "result_content_metadata": {
    "foo": {
      "size_bytes": 0,
      "exists": true,
      "presigned_url": "presigned_url"
    }
  },
  "text": {
    "pages": [
      {
        "page_number": 0,
        "text": "text"
      }
    ]
  },
  "text_full": "text_full"
}

Returns Examples

{
  "job": {
    "id": "id",
    "project_id": "project_id",
    "status": "PENDING",
    "created_at": "2019-12-27T18:11:19.117Z",
    "error_message": "error_message",
    "name": "name",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "images_content_metadata": {
    "images": [
      {
        "filename": "filename",
        "index": 0,
        "bbox": {
          "h": 0,
          "w": 0,
          "x": 0,
          "y": 0
        },
        "category": "screenshot",
        "content_type": "content_type",
        "presigned_url": "presigned_url",
        "size_bytes": 0
      }
    ],
    "total_count": 0
  },
  "items": {
    "pages": [
      {
        "items": [
          {
            "md": "md",
            "value": "value",
            "bbox": [
              {
                "h": 0,
                "w": 0,
                "x": 0,
                "y": 0,
                "confidence": 0,
                "end_index": 0,
                "label": "label",
                "start_index": 0
              }
            ],
            "type": "text"
          }
        ],
        "page_height": 0,
        "page_number": 0,
        "page_width": 0,
        "success": true
      }
    ]
  },
  "job_metadata": {
    "foo": "bar"
  },
  "markdown": {
    "pages": [
      {
        "markdown": "markdown",
        "page_number": 0,
        "success": true,
        "footer": "footer",
        "header": "header"
      }
    ]
  },
  "markdown_full": "markdown_full",
  "metadata": {
    "pages": [
      {
        "page_number": 0,
        "confidence": 0,
        "cost_optimized": true,
        "original_orientation_angle": 0,
        "printed_page_number": "printed_page_number",
        "slide_section_name": "slide_section_name",
        "speaker_notes": "speaker_notes",
        "triggered_auto_mode": true
      }
    ]
  },
  "raw_parameters": {
    "foo": "bar"
  },
  "result_content_metadata": {
    "foo": {
      "size_bytes": 0,
      "exists": true,
      "presigned_url": "presigned_url"
    }
  },
  "text": {
    "pages": [
      {
        "page_number": 0,
        "text": "text"
      }
    ]
  },
  "text_full": "text_full"
}