Skip to content
Get started

Get Parse Job

client.parsing.get(stringjobID, ParsingGetParams { expand, image_filenames, organization_id, project_id } query?, RequestOptionsoptions?): ParsingGetResponse { job, images_content_metadata, items, 5 more }
GET/api/v2/parse/{job_id}

Retrieve parse job with optional content or metadata.

ParametersExpand Collapse
jobID: string
query: ParsingGetParams { expand, image_filenames, organization_id, project_id }
expand?: Array<string>

Fields to include: text, markdown, items, metadata, text_content_metadata, markdown_content_metadata, items_content_metadata, metadata_content_metadata, xlsx_content_metadata, output_pdf_content_metadata, images_content_metadata. Metadata fields include presigned URLs.

image_filenames?: string | null

Filter to specific image filenames (optional). Example: image_0.png,image_1.jpg

organization_id?: string | null
project_id?: string | null
ReturnsExpand Collapse
ParsingGetResponse { job, images_content_metadata, items, 5 more }

Parse result response with job status and optional content or metadata.

The job field is always included. Other fields are included based on expand parameters.

job: Job { id, project_id, status, 4 more }

Parse job status and metadata

id: string

Unique identifier for the parse job

project_id: string

Project this job belongs to

status: "PENDING" | "RUNNING" | "COMPLETED" | 2 more

Current status of the job (e.g., pending, running, completed, failed, cancelled)

Accepts one of the following:
"PENDING"
"RUNNING"
"COMPLETED"
"FAILED"
"CANCELLED"
created_at?: string | null

Creation datetime

formatdate-time
error_message?: string | null

Error message if job failed

name?: string | null

User friendly name

updated_at?: string | null

Update datetime

formatdate-time
images_content_metadata?: ImagesContentMetadata | null

Metadata for all extracted images.

images: Array<Image>

List of image metadata with presigned URLs

filename: string

Image filename (e.g., 'image_0.png')

index: number

Index of the image in the extraction order

content_type?: string | null

MIME type of the image

presigned_url?: string | null

Presigned URL to download the image

size_bytes?: number | null

Size of the image file in bytes

total_count: number

Total number of extracted images

items?: Items | null

Structured JSON result (if requested)

pages: Array<StructuredResultPage { items, page_height, page_number, 2 more } | FailedStructuredPage { error, page_number, success } >

List of structured pages or failed page entries

Accepts one of the following:
StructuredResultPage { items, page_height, page_number, 2 more }
items: Array<TextItem { md, value, bbox, type } | HeadingItem { level, md, value, 2 more } | ListItem { items, md, ordered, 2 more } | 4 more>

List of structured items on the page

Accepts one of the following:
TextItem { md, value, bbox, type }
md: string

Markdown representation preserving formatting

value: string

Text content

bbox?: Array<BBox { h, w, x, 5 more } > | null

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence?: number | null

Confidence score

end_index?: number | null

End index in the text

label?: string | null

Label for the bounding box

start_index?: number | null

Start index in the text

type?: "text"

Text item type

HeadingItem { level, md, value, 2 more }
level: number

Heading level (1-6)

md: string

Markdown representation preserving formatting

value: string

Heading text content

bbox?: Array<BBox { h, w, x, 5 more } > | null

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence?: number | null

Confidence score

end_index?: number | null

End index in the text

label?: string | null

Label for the bounding box

start_index?: number | null

Start index in the text

type?: "heading"

Heading item type

ListItem { items, md, ordered, 2 more }
items: Array<TextItem { md, value, bbox, type } | ListItem { items, md, ordered, 2 more } >

List of nested text or list items

Accepts one of the following:
TextItem { md, value, bbox, type }
md: string

Markdown representation preserving formatting

value: string

Text content

bbox?: Array<BBox { h, w, x, 5 more } > | null

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence?: number | null

Confidence score

end_index?: number | null

End index in the text

label?: string | null

Label for the bounding box

start_index?: number | null

Start index in the text

type?: "text"

Text item type

ListItem { items, md, ordered, 2 more }
md: string

Markdown representation preserving formatting

ordered: boolean

Whether the list is ordered or unordered

bbox?: Array<BBox { h, w, x, 5 more } > | null

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence?: number | null

Confidence score

end_index?: number | null

End index in the text

label?: string | null

Label for the bounding box

start_index?: number | null

Start index in the text

type?: "list"

List item type

CodeItem { md, value, bbox, 2 more }
md: string

Markdown representation preserving formatting

value: string

Code content

bbox?: Array<BBox { h, w, x, 5 more } > | null

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence?: number | null

Confidence score

end_index?: number | null

End index in the text

label?: string | null

Label for the bounding box

start_index?: number | null

Start index in the text

language?: string | null

Programming language identifier

type?: "code"

Code block item type

TableItem { csv, html, md, 5 more }
csv: string

CSV representation of the table

html: string

HTML representation of the table

md: string

Markdown representation preserving formatting

rows: Array<Array<string | number | null>>

Table data as array of arrays (string, number, or null)

Accepts one of the following:
string
number
bbox?: Array<BBox { h, w, x, 5 more } > | null

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence?: number | null

Confidence score

end_index?: number | null

End index in the text

label?: string | null

Label for the bounding box

start_index?: number | null

Start index in the text

merged_from_pages?: Array<number> | null

List of page numbers with tables that were merged into this table (e.g., [1, 2, 3, 4])

merged_into_page?: number | null

Populated when merged into another table. Page number where the full merged table begins (used on empty tables).

type?: "table"

Table item type

ImageItem { caption, md, url, 2 more }
caption: string

Image caption

md: string

Markdown representation preserving formatting

url: string

URL to the image

bbox?: Array<BBox { h, w, x, 5 more } > | null

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence?: number | null

Confidence score

end_index?: number | null

End index in the text

label?: string | null

Label for the bounding box

start_index?: number | null

Start index in the text

type?: "image"

Image item type

LinkItem { md, text, url, 2 more }
md: string

Markdown representation preserving formatting

text: string

Display text of the link

url: string

URL of the link

bbox?: Array<BBox { h, w, x, 5 more } > | null

List of bounding boxes

h: number

Height of the bounding box

w: number

Width of the bounding box

x: number

X coordinate of the bounding box

y: number

Y coordinate of the bounding box

confidence?: number | null

Confidence score

end_index?: number | null

End index in the text

label?: string | null

Label for the bounding box

start_index?: number | null

Start index in the text

type?: "link"

Link item type

page_height: number

Height of the page in points

page_number: number

Page number of the document

page_width: number

Width of the page in points

success?: true

Success indicator

FailedStructuredPage { error, page_number, success }
error: string

Error message describing the failure

page_number: number

Page number of the document

success?: boolean

Failure indicator

markdown?: Markdown | null

Markdown result (if requested)

pages: Array<MarkdownResultPage { markdown, page_number, success } | FailedMarkdownPage { error, page_number, success } >

List of markdown pages or failed page entries

Accepts one of the following:
MarkdownResultPage { markdown, page_number, success }
markdown: string

Markdown content of the page

page_number: number

Page number of the document

success?: true

Success indicator

FailedMarkdownPage { error, page_number, success }
error: string

Error message describing the failure

page_number: number

Page number of the document

success?: boolean

Failure indicator

markdown_full?: string | null

Full raw markdown content (if requested)

metadata?: Metadata | null

Result containing page-level metadata for the parsed document.

pages: Array<Page>

List of page metadata entries

page_number: number

Page number of the document

confidence?: number | null

Confidence score for the page parsing (0-1)

cost_optimized?: boolean | null

Whether cost-optimized parsing was used for the page

original_orientation_angle?: number | null

Original orientation angle of the page in degrees

printed_page_number?: string | null

Printed page number as it appears in the document

slide_section_name?: string | null

Section name from presentation slides

speaker_notes?: string | null

Speaker notes from presentation slides

triggered_auto_mode?: boolean | null

Whether auto mode was triggered for the page

result_content_metadata?: Record<string, ResultContentMetadata> | null

Metadata including size, existence, and presigned URLs for result files

size_bytes: number

Size of the result file in S3 (bytes)

exists?: boolean

Whether the result file exists in S3

presigned_url?: string | null

Presigned URL to download the result file

text?: Text | null

Plain text result (if requested)

pages: Array<Page>

List of text pages

page_number: number

Page number of the document

text: string

Plain text content of the page

Get Parse Job

import LlamaCloud from '@llamaindex/llama-cloud';

const client = new LlamaCloud({
  apiKey: process.env['LLAMA_CLOUD_API_KEY'], // This is the default and can be omitted
});

const parsing = await client.parsing.get('job_id');

console.log(parsing.job);
{
  "job": {
    "id": "id",
    "project_id": "project_id",
    "status": "PENDING",
    "created_at": "2019-12-27T18:11:19.117Z",
    "error_message": "error_message",
    "name": "name",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "images_content_metadata": {
    "images": [
      {
        "filename": "filename",
        "index": 0,
        "content_type": "content_type",
        "presigned_url": "presigned_url",
        "size_bytes": 0
      }
    ],
    "total_count": 0
  },
  "items": {
    "pages": [
      {
        "items": [
          {
            "md": "md",
            "value": "value",
            "bbox": [
              {
                "h": 0,
                "w": 0,
                "x": 0,
                "y": 0,
                "confidence": 0,
                "end_index": 0,
                "label": "label",
                "start_index": 0
              }
            ],
            "type": "text"
          }
        ],
        "page_height": 0,
        "page_number": 0,
        "page_width": 0,
        "success": true
      }
    ]
  },
  "markdown": {
    "pages": [
      {
        "markdown": "markdown",
        "page_number": 0,
        "success": true
      }
    ]
  },
  "markdown_full": "markdown_full",
  "metadata": {
    "pages": [
      {
        "page_number": 0,
        "confidence": 0,
        "cost_optimized": true,
        "original_orientation_angle": 0,
        "printed_page_number": "printed_page_number",
        "slide_section_name": "slide_section_name",
        "speaker_notes": "speaker_notes",
        "triggered_auto_mode": true
      }
    ]
  },
  "result_content_metadata": {
    "foo": {
      "size_bytes": 0,
      "exists": true,
      "presigned_url": "presigned_url"
    }
  },
  "text": {
    "pages": [
      {
        "page_number": 0,
        "text": "text"
      }
    ]
  }
}
Returns Examples
{
  "job": {
    "id": "id",
    "project_id": "project_id",
    "status": "PENDING",
    "created_at": "2019-12-27T18:11:19.117Z",
    "error_message": "error_message",
    "name": "name",
    "updated_at": "2019-12-27T18:11:19.117Z"
  },
  "images_content_metadata": {
    "images": [
      {
        "filename": "filename",
        "index": 0,
        "content_type": "content_type",
        "presigned_url": "presigned_url",
        "size_bytes": 0
      }
    ],
    "total_count": 0
  },
  "items": {
    "pages": [
      {
        "items": [
          {
            "md": "md",
            "value": "value",
            "bbox": [
              {
                "h": 0,
                "w": 0,
                "x": 0,
                "y": 0,
                "confidence": 0,
                "end_index": 0,
                "label": "label",
                "start_index": 0
              }
            ],
            "type": "text"
          }
        ],
        "page_height": 0,
        "page_number": 0,
        "page_width": 0,
        "success": true
      }
    ]
  },
  "markdown": {
    "pages": [
      {
        "markdown": "markdown",
        "page_number": 0,
        "success": true
      }
    ]
  },
  "markdown_full": "markdown_full",
  "metadata": {
    "pages": [
      {
        "page_number": 0,
        "confidence": 0,
        "cost_optimized": true,
        "original_orientation_angle": 0,
        "printed_page_number": "printed_page_number",
        "slide_section_name": "slide_section_name",
        "speaker_notes": "speaker_notes",
        "triggered_auto_mode": true
      }
    ]
  },
  "result_content_metadata": {
    "foo": {
      "size_bytes": 0,
      "exists": true,
      "presigned_url": "presigned_url"
    }
  },
  "text": {
    "pages": [
      {
        "page_number": 0,
        "text": "text"
      }
    ]
  }
}