Skip to content
Get started

Get Parse Job

GET/api/v2/parse/{job_id}

Retrieve parse job with optional content or metadata.

Path ParametersExpand Collapse
job_id: string
Query ParametersExpand Collapse
expand: optional array of string

Fields to include: text, markdown, items, metadata, text_content_metadata, markdown_content_metadata, items_content_metadata, metadata_content_metadata, xlsx_content_metadata, output_pdf_content_metadata, images_content_metadata. Metadata fields include presigned URLs.

image_filenames: optional string

Filter to specific image filenames (optional). Example: image_0.png,image_1.jpg

organization_id: optional string
project_id: optional string
Cookie ParametersExpand Collapse
session: optional string
ReturnsExpand Collapse
job: object { id, project_id, status, 4 more }

Parse job status and metadata

id: string

Unique identifier for the parse job

project_id: string

Project this job belongs to

status: "PENDING" or "RUNNING" or "COMPLETED" or 2 more

Current status of the job (e.g., pending, running, completed, failed, cancelled)

Accepts one of the following:
"PENDING"
"RUNNING"
"COMPLETED"
"FAILED"
"CANCELLED"
created_at: optional string

Creation datetime

formatdate-time
error_message: optional string

Error message if job failed

name: optional string

User friendly name

updated_at: optional string

Update datetime

formatdate-time
images_content_metadata: optional object { images, total_count }

Metadata for all extracted images.

images: array of object { filename, index, content_type, 2 more }

List of image metadata with presigned URLs

filename: string

Image filename (e.g., 'image_0.png')

index: number

Index of the image in the extraction order

content_type: optional string

MIME type of the image

presigned_url: optional string

Presigned URL to download the image

size_bytes: optional number

Size of the image file in bytes

total_count: number

Total number of extracted images

items: optional object { pages }

Structured JSON result (if requested)

pages: array of object { items, page_height, page_number, 2 more } or object { error, page_number, success }

List of structured pages or failed page entries

Accepts one of the following:
StructuredResultPage = object { items, page_height, page_number, 2 more }
items: array of object { md, value, bbox, type } or object { level, md, value, 2 more } or ListItem { items, md, ordered, 2 more } or 4 more

List of structured items on the page

Accepts one of the following:
Text = object { md, value, bbox, type }
md: string

Markdown representation preserving formatting

value: string

Text content

bbox: optional array of BBox { h, w, x, 5 more }

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence: optional number

Confidence score

end_index: optional number

End index in the text

label: optional string

Label for the bounding box

start_index: optional number

Start index in the text

type: optional "text"

Text item type

Heading = object { level, md, value, 2 more }
level: number

Heading level (1-6)

md: string

Markdown representation preserving formatting

value: string

Heading text content

bbox: optional array of BBox { h, w, x, 5 more }

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence: optional number

Confidence score

end_index: optional number

End index in the text

label: optional string

Label for the bounding box

start_index: optional number

Start index in the text

type: optional "heading"

Heading item type

ListItem = object { items, md, ordered, 2 more }
items: array of object { md, value, bbox, type } or ListItem { items, md, ordered, 2 more }

List of nested text or list items

Accepts one of the following:
TextItem = object { md, value, bbox, type }
md: string

Markdown representation preserving formatting

value: string

Text content

bbox: optional array of BBox { h, w, x, 5 more }

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence: optional number

Confidence score

end_index: optional number

End index in the text

label: optional string

Label for the bounding box

start_index: optional number

Start index in the text

type: optional "text"

Text item type

ListItem { items, md, ordered, 2 more }
md: string

Markdown representation preserving formatting

ordered: boolean

Whether the list is ordered or unordered

bbox: optional array of BBox { h, w, x, 5 more }

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence: optional number

Confidence score

end_index: optional number

End index in the text

label: optional string

Label for the bounding box

start_index: optional number

Start index in the text

type: optional "list"

List item type

Code = object { md, value, bbox, 2 more }
md: string

Markdown representation preserving formatting

value: string

Code content

bbox: optional array of BBox { h, w, x, 5 more }

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence: optional number

Confidence score

end_index: optional number

End index in the text

label: optional string

Label for the bounding box

start_index: optional number

Start index in the text

language: optional string

Programming language identifier

type: optional "code"

Code block item type

Table = object { csv, html, md, 5 more }
csv: string

CSV representation of the table

html: string

HTML representation of the table

md: string

Markdown representation preserving formatting

rows: array of array of string or number

Table data as array of arrays (string, number, or null)

Accepts one of the following:
UnionMember0 = string
UnionMember1 = number
bbox: optional array of BBox { h, w, x, 5 more }

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence: optional number

Confidence score

end_index: optional number

End index in the text

label: optional string

Label for the bounding box

start_index: optional number

Start index in the text

merged_from_pages: optional array of number

List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4])

merged_into_page: optional number

Populated when merged into another table. Page number where the full merged table begins (used on empty tables).

type: optional "table"

Table item type

Image = object { caption, md, url, 2 more }
caption: string

Image caption

md: string

Markdown representation preserving formatting

url: string

URL to the image

bbox: optional array of BBox { h, w, x, 5 more }

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence: optional number

Confidence score

end_index: optional number

End index in the text

label: optional string

Label for the bounding box

start_index: optional number

Start index in the text

type: optional "image"

Image item type

Link = object { md, text, url, 2 more }
md: string

Markdown representation preserving formatting

text: string

Display text of the link

url: string

URL of the link

bbox: optional array of BBox { h, w, x, 5 more }

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence: optional number

Confidence score

end_index: optional number

End index in the text

label: optional string

Label for the bounding box

start_index: optional number

Start index in the text

type: optional "link"

Link item type

page_height: number

Height of the page in points

page_number: number

Page number of the document

page_width: number

Width of the page in points

success: optional true

Success indicator

FailedStructuredPage = object { error, page_number, success }
error: string

Error message describing the failure

page_number: number

Page number of the document

success: optional boolean

Failure indicator

markdown: optional object { pages }

Markdown result (if requested)

pages: array of object { markdown, page_number, success } or object { error, page_number, success }

List of markdown pages or failed page entries

Accepts one of the following:
MarkdownResultPage = object { markdown, page_number, success }
markdown: string

Markdown content of the page

page_number: number

Page number of the document

success: optional true

Success indicator

FailedMarkdownPage = object { error, page_number, success }
error: string

Error message describing the failure

page_number: number

Page number of the document

success: optional boolean

Failure indicator

markdown_full: optional string

Full raw markdown content (if requested)

metadata: optional object { pages }

Result containing page-level metadata for the parsed document.

pages: array of object { page_number, confidence, cost_optimized, 5 more }

List of page metadata entries

page_number: number

Page number of the document

confidence: optional number

Confidence score for the page parsing (0-1)

cost_optimized: optional boolean

Whether cost-optimized parsing was used for the page

original_orientation_angle: optional number

Original orientation angle of the page in degrees

printed_page_number: optional string

Printed page number as it appears in the document

slide_section_name: optional string

Section name from presentation slides

speaker_notes: optional string

Speaker notes from presentation slides

triggered_auto_mode: optional boolean

Whether auto mode was triggered for the page

result_content_metadata: optional map[object { size_bytes, exists, presigned_url } ]

Metadata including size, existence, and presigned URLs for result files

size_bytes: number

Size of the result file in S3 (bytes)

exists: optional boolean

Whether the result file exists in S3

presigned_url: optional string

Presigned URL to download the result file

text: optional object { pages }

Plain text result (if requested)

pages: array of object { page_number, text }

List of text pages

page_number: number

Page number of the document

text: string

Plain text content of the page

Get Parse Job

curl https://api.cloud.llamaindex.ai/api/v2/parse/$JOB_ID \
    -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"
{
  "job": {
    "id": "id",
    "project_id": "project_id",
    "status": "PENDING",
    "created_at": "2019-12-27T18:11:19.117Z",
    "error_message": "error_message",
    "name": "name",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "images_content_metadata": {
    "images": [
      {
        "filename": "filename",
        "index": 0,
        "content_type": "content_type",
        "presigned_url": "presigned_url",
        "size_bytes": 0
      }
    ],
    "total_count": 0
  },
  "items": {
    "pages": [
      {
        "items": [
          {
            "md": "md",
            "value": "value",
            "bbox": [
              {
                "h": 0,
                "w": 0,
                "x": 0,
                "y": 0,
                "confidence": 0,
                "end_index": 0,
                "label": "label",
                "start_index": 0
              }
            ],
            "type": "text"
          }
        ],
        "page_height": 0,
        "page_number": 0,
        "page_width": 0,
        "success": true
      }
    ]
  },
  "markdown": {
    "pages": [
      {
        "markdown": "markdown",
        "page_number": 0,
        "success": true
      }
    ]
  },
  "markdown_full": "markdown_full",
  "metadata": {
    "pages": [
      {
        "page_number": 0,
        "confidence": 0,
        "cost_optimized": true,
        "original_orientation_angle": 0,
        "printed_page_number": "printed_page_number",
        "slide_section_name": "slide_section_name",
        "speaker_notes": "speaker_notes",
        "triggered_auto_mode": true
      }
    ]
  },
  "result_content_metadata": {
    "foo": {
      "size_bytes": 0,
      "exists": true,
      "presigned_url": "presigned_url"
    }
  },
  "text": {
    "pages": [
      {
        "page_number": 0,
        "text": "text"
      }
    ]
  }
}
Returns Examples
{
  "job": {
    "id": "id",
    "project_id": "project_id",
    "status": "PENDING",
    "created_at": "2019-12-27T18:11:19.117Z",
    "error_message": "error_message",
    "name": "name",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "images_content_metadata": {
    "images": [
      {
        "filename": "filename",
        "index": 0,
        "content_type": "content_type",
        "presigned_url": "presigned_url",
        "size_bytes": 0
      }
    ],
    "total_count": 0
  },
  "items": {
    "pages": [
      {
        "items": [
          {
            "md": "md",
            "value": "value",
            "bbox": [
              {
                "h": 0,
                "w": 0,
                "x": 0,
                "y": 0,
                "confidence": 0,
                "end_index": 0,
                "label": "label",
                "start_index": 0
              }
            ],
            "type": "text"
          }
        ],
        "page_height": 0,
        "page_number": 0,
        "page_width": 0,
        "success": true
      }
    ]
  },
  "markdown": {
    "pages": [
      {
        "markdown": "markdown",
        "page_number": 0,
        "success": true
      }
    ]
  },
  "markdown_full": "markdown_full",
  "metadata": {
    "pages": [
      {
        "page_number": 0,
        "confidence": 0,
        "cost_optimized": true,
        "original_orientation_angle": 0,
        "printed_page_number": "printed_page_number",
        "slide_section_name": "slide_section_name",
        "speaker_notes": "speaker_notes",
        "triggered_auto_mode": true
      }
    ]
  },
  "result_content_metadata": {
    "foo": {
      "size_bytes": 0,
      "exists": true,
      "presigned_url": "presigned_url"
    }
  },
  "text": {
    "pages": [
      {
        "page_number": 0,
        "text": "text"
      }
    ]
  }
}