Data Sources

List Pipeline Data Sources

client.pipelines.dataSources.getDataSources(, ?): DataSourceGetDataSourcesResponse { id, component, data_source_id, 13 more }

GET/api/v1/pipelines/{pipeline_id}/data-sources

Add Data Sources To Pipeline

client.pipelines.dataSources.updateDataSources(, , ?): DataSourceUpdateDataSourcesResponse { id, component, data_source_id, 13 more }

PUT/api/v1/pipelines/{pipeline_id}/data-sources

Update Pipeline Data Source

client.pipelines.dataSources.update(, , ?): PipelineDataSource { id, component, data_source_id, 13 more }

PUT/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}

Get Pipeline Data Source Status

client.pipelines.dataSources.getStatus(, , ?): ManagedIngestionStatusResponse { status, deployment_date, effective_at, 2 more }

GET/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}/status

Sync Pipeline Data Source

client.pipelines.dataSources.sync(, , ?): Pipeline { id, embedding_config, name, 15 more }

POST/api/v1/pipelines/{pipeline_id}/data-sources/{data_source_id}/sync

ModelsExpand Collapse

PipelineDataSource { id, component, data_source_id, 13 more }

Schema for a data source in a pipeline.

id: string

Unique identifier

formatuuid

component: Record<string, unknown> | CloudS3DataSource { bucket, aws_access_id, aws_access_secret, 5 more } | CloudAzStorageBlobDataSource { account_url, container_name, account_key, 8 more } | 9 more

Component that implements the data source

One of the following:

Record<string, unknown>

CloudS3DataSource { bucket, aws_access_id, aws_access_secret, 5 more }

bucket: string

The name of the S3 bucket to read from.

aws_access_id?: string | null

The AWS access ID to use for authentication.

aws_access_secret?: string | null

The AWS access secret to use for authentication.

formatpassword

class_name?: string

prefix?: string | null

The prefix of the S3 objects to read from.

regex_pattern?: string | null

The regex pattern to filter S3 objects. Must be a valid regex pattern.

s3_endpoint_url?: string | null

The S3 endpoint URL to use for authentication.

supports_access_control?: boolean

CloudAzStorageBlobDataSource { account_url, container_name, account_key, 8 more }

account_url: string

The Azure Storage Blob account URL to use for authentication.

container_name: string

The name of the Azure Storage Blob container to read from.

account_key?: string | null

The Azure Storage Blob account key to use for authentication.

formatpassword

account_name?: string | null

The Azure Storage Blob account name to use for authentication.

blob?: string | null

The blob name to read from.

class_name?: string

client_id?: string | null

The Azure AD client ID to use for authentication.

client_secret?: string | null

The Azure AD client secret to use for authentication.

formatpassword

prefix?: string | null

The prefix of the Azure Storage Blob objects to read from.

supports_access_control?: boolean

tenant_id?: string | null

The Azure AD tenant ID to use for authentication.

CloudGoogleDriveDataSource { folder_id, class_name, service_account_key, supports_access_control }

folder_id: string

The ID of the Google Drive folder to read from.

class_name?: string

service_account_key?: Record<string, string> | null

A dictionary containing secret values

supports_access_control?: boolean

CloudOneDriveDataSource { client_id, client_secret, tenant_id, 6 more }

client_id: string

The client ID to use for authentication.

client_secret: string

The client secret to use for authentication.

formatpassword

tenant_id: string

The tenant ID to use for authentication.

user_principal_name: string

The user principal name to use for authentication.

class_name?: string

folder_id?: string | null

The ID of the OneDrive folder to read from.

folder_path?: string | null

The path of the OneDrive folder to read from.

required_exts?: Array<string> | null

The list of required file extensions.

supports_access_control?: true

CloudSharepointDataSource { client_id, client_secret, tenant_id, 11 more }

client_id: string

The client ID to use for authentication.

client_secret: string

The client secret to use for authentication.

formatpassword

tenant_id: string

The tenant ID to use for authentication.

class_name?: string

drive_name?: string | null

The name of the Sharepoint drive to read from.

exclude_path_patterns?: Array<string> | null

List of regex patterns for file paths to exclude. Files whose paths (including filename) match any pattern will be excluded. Example: [‘/temp/’, ‘/backup/’, ‘.git/’, ‘.tmp$’, ’^~’]

folder_id?: string | null

The ID of the Sharepoint folder to read from.

folder_path?: string | null

The path of the Sharepoint folder to read from.

get_permissions?: boolean

Whether to get permissions for the sharepoint site.

include_path_patterns?: Array<string> | null

List of regex patterns for file paths to include. Full paths (including filename) must match at least one pattern to be included. Example: [‘/reports/’, ‘/docs/..pdf$’, ‘^Report..pdf$’]

required_exts?: Array<string> | null

The list of required file extensions.

site_id?: string | null

The ID of the SharePoint site to download from.

site_name?: string | null

The name of the SharePoint site to download from.

supports_access_control?: true

CloudSlackDataSource { slack_token, channel_ids, channel_patterns, 6 more }

slack_token: string

Slack Bot Token.

formatpassword

channel_ids?: string | null

Slack Channel.

channel_patterns?: string | null

Slack Channel name pattern.

class_name?: string

earliest_date?: string | null

Earliest date.

earliest_date_timestamp?: number | null

Earliest date timestamp.

latest_date?: string | null

Latest date.

latest_date_timestamp?: number | null

Latest date timestamp.

supports_access_control?: boolean

CloudNotionPageDataSource { integration_token, class_name, database_ids, 2 more }

integration_token: string

The integration token to use for authentication.

formatpassword

class_name?: string

database_ids?: string | null

The Notion Database Id to read content from.

page_ids?: string | null

The Page ID’s of the Notion to read from.

supports_access_control?: boolean

CloudConfluenceDataSource { authentication_mechanism, server_url, api_token, 10 more }

authentication_mechanism: string

Type of Authentication for connecting to Confluence APIs.

server_url: string

The server URL of the Confluence instance.

api_token?: string | null

The API token to use for authentication.

formatpassword

class_name?: string

cql?: string | null

The CQL query to use for fetching pages.

failure_handling?: FailureHandlingConfig { skip_list_failures }

Configuration for handling failures during processing. Key-value object controlling failure handling behaviors.

Example: { “skip_list_failures”: true }

Currently supports:

skip_list_failures: Skip failed batches/lists and continue processing

skip_list_failures?: boolean

Whether to skip failed batches/lists and continue processing

index_restricted_pages?: boolean

Whether to index restricted pages.

keep_markdown_format?: boolean

Whether to keep the markdown format.

label?: string | null

The label to use for fetching pages.

page_ids?: string | null

The page IDs of the Confluence to read from.

space_key?: string | null

The space key to read from.

supports_access_control?: boolean

user_name?: string | null

The username to use for authentication.

CloudJiraDataSource { authentication_mechanism, query, api_token, 5 more }

Cloud Jira Data Source integrating JiraReader.

authentication_mechanism: string

Type of Authentication for connecting to Jira APIs.

query: string

JQL (Jira Query Language) query to search.

api_token?: string | null

The API/ Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword

class_name?: string

cloud_id?: string | null

The cloud ID, used in case of OAuth2.

email?: string | null

The email address to use for authentication.

server_url?: string | null

The server url for Jira Cloud.

supports_access_control?: boolean

CloudJiraDataSourceV2 { authentication_mechanism, query, server_url, 10 more }

Cloud Jira Data Source integrating JiraReaderV2.

authentication_mechanism: string

Type of Authentication for connecting to Jira APIs.

query: string

JQL (Jira Query Language) query to search.

server_url: string

The server url for Jira Cloud.

api_token?: string | null

The API Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword

api_version?: "2" | "3"

Jira REST API version to use (2 or 3). 3 supports Atlassian Document Format (ADF).

One of the following:

"2"

"3"

class_name?: string

cloud_id?: string | null

The cloud ID, used in case of OAuth2.

email?: string | null

The email address to use for authentication.

expand?: string | null

Fields to expand in the response.

fields?: Array<string> | null

List of fields to retrieve from Jira. If None, retrieves all fields.

get_permissions?: boolean

Whether to fetch project role permissions and issue-level security

requests_per_minute?: number | null

Rate limit for Jira API requests per minute.

supports_access_control?: boolean

CloudBoxDataSource { authentication_mechanism, class_name, client_id, 6 more }

authentication_mechanism: "developer_token" | "ccg"

The type of authentication to use (Developer Token or CCG)

One of the following:

"developer_token"

"ccg"

class_name?: string

client_id?: string | null

Box API key used for identifying the application the user is authenticating with

client_secret?: string | null

Box API secret used for making auth requests.

formatpassword

developer_token?: string | null

Developer token for authentication if authentication_mechanism is ‘developer_token’.

formatpassword

enterprise_id?: string | null

Box Enterprise ID, if provided authenticates as service.

folder_id?: string | null

The ID of the Box folder to read from.

supports_access_control?: boolean

user_id?: string | null

Box User ID, if provided authenticates as user.

data_source_id: string

The ID of the data source.

formatuuid

last_synced_at: string

The last time the data source was automatically synced.

formatdate-time

The name of the data source.

pipeline_id: string

The ID of the pipeline.

formatuuid

project_id: string

source_type: "S3" | "AZURE_STORAGE_BLOB" | "GOOGLE_DRIVE" | 8 more

One of the following:

"S3"

"AZURE_STORAGE_BLOB"

"GOOGLE_DRIVE"

"MICROSOFT_ONEDRIVE"

"MICROSOFT_SHAREPOINT"

"SLACK"

"NOTION_PAGE"

"CONFLUENCE"

"JIRA"

"JIRA_V2"

"BOX"

created_at?: string | null

Creation datetime

formatdate-time

Custom metadata that will be present on all data loaded from the data source

One of the following:

Record<string, unknown>

Array<unknown>

string

number

boolean

status?: "NOT_STARTED" | "IN_PROGRESS" | "SUCCESS" | 2 more | null

The status of the data source in the pipeline.

One of the following:

"NOT_STARTED"

"IN_PROGRESS"

"SUCCESS"

"ERROR"

"CANCELLED"

status_updated_at?: string | null

The last time the status was updated.

formatdate-time

sync_interval?: number | null

The interval at which the data source should be synced.

sync_schedule_set_by?: string | null

The id of the user who set the sync schedule.

updated_at?: string | null

Update datetime

formatdate-time

version_metadata?: DataSourceReaderVersionMetadata { reader_version } | null

Version metadata for the data source

reader_version?: "1.0" | "2.0" | "2.1" | null

The version of the reader to use for this data source.

One of the following:

"1.0"

"2.0"

"2.1"

DataSourceGetDataSourcesResponse = Array<PipelineDataSource { id, component, data_source_id, 13 more } >

id: string

Unique identifier

formatuuid

Component that implements the data source

One of the following:

Record<string, unknown>

CloudS3DataSource { bucket, aws_access_id, aws_access_secret, 5 more }

bucket: string

The name of the S3 bucket to read from.

aws_access_id?: string | null

The AWS access ID to use for authentication.

aws_access_secret?: string | null

The AWS access secret to use for authentication.

formatpassword

class_name?: string

prefix?: string | null

The prefix of the S3 objects to read from.

regex_pattern?: string | null

The regex pattern to filter S3 objects. Must be a valid regex pattern.

s3_endpoint_url?: string | null

The S3 endpoint URL to use for authentication.

supports_access_control?: boolean

CloudAzStorageBlobDataSource { account_url, container_name, account_key, 8 more }

account_url: string

The Azure Storage Blob account URL to use for authentication.

container_name: string

The name of the Azure Storage Blob container to read from.

account_key?: string | null

The Azure Storage Blob account key to use for authentication.

formatpassword

account_name?: string | null

The Azure Storage Blob account name to use for authentication.

blob?: string | null

The blob name to read from.

class_name?: string

client_id?: string | null

The Azure AD client ID to use for authentication.

client_secret?: string | null

The Azure AD client secret to use for authentication.

formatpassword

prefix?: string | null

The prefix of the Azure Storage Blob objects to read from.

supports_access_control?: boolean

tenant_id?: string | null

The Azure AD tenant ID to use for authentication.

CloudGoogleDriveDataSource { folder_id, class_name, service_account_key, supports_access_control }

folder_id: string

The ID of the Google Drive folder to read from.

class_name?: string

service_account_key?: Record<string, string> | null

A dictionary containing secret values

supports_access_control?: boolean

CloudOneDriveDataSource { client_id, client_secret, tenant_id, 6 more }

client_id: string

The client ID to use for authentication.

client_secret: string

The client secret to use for authentication.

formatpassword

tenant_id: string

The tenant ID to use for authentication.

user_principal_name: string

The user principal name to use for authentication.

class_name?: string

folder_id?: string | null

The ID of the OneDrive folder to read from.

folder_path?: string | null

The path of the OneDrive folder to read from.

required_exts?: Array<string> | null

The list of required file extensions.

supports_access_control?: true

CloudSharepointDataSource { client_id, client_secret, tenant_id, 11 more }

client_id: string

The client ID to use for authentication.

client_secret: string

The client secret to use for authentication.

formatpassword

tenant_id: string

The tenant ID to use for authentication.

class_name?: string

drive_name?: string | null

The name of the Sharepoint drive to read from.

exclude_path_patterns?: Array<string> | null

List of regex patterns for file paths to exclude. Files whose paths (including filename) match any pattern will be excluded. Example: [‘/temp/’, ‘/backup/’, ‘.git/’, ‘.tmp$’, ’^~’]

folder_id?: string | null

The ID of the Sharepoint folder to read from.

folder_path?: string | null

The path of the Sharepoint folder to read from.

get_permissions?: boolean

Whether to get permissions for the sharepoint site.

include_path_patterns?: Array<string> | null

List of regex patterns for file paths to include. Full paths (including filename) must match at least one pattern to be included. Example: [‘/reports/’, ‘/docs/..pdf$’, ‘^Report..pdf$’]

required_exts?: Array<string> | null

The list of required file extensions.

site_id?: string | null

The ID of the SharePoint site to download from.

site_name?: string | null

The name of the SharePoint site to download from.

supports_access_control?: true

CloudSlackDataSource { slack_token, channel_ids, channel_patterns, 6 more }

slack_token: string

Slack Bot Token.

formatpassword

channel_ids?: string | null

Slack Channel.

channel_patterns?: string | null

Slack Channel name pattern.

class_name?: string

earliest_date?: string | null

Earliest date.

earliest_date_timestamp?: number | null

Earliest date timestamp.

latest_date?: string | null

Latest date.

latest_date_timestamp?: number | null

Latest date timestamp.

supports_access_control?: boolean

CloudNotionPageDataSource { integration_token, class_name, database_ids, 2 more }

integration_token: string

The integration token to use for authentication.

formatpassword

class_name?: string

database_ids?: string | null

The Notion Database Id to read content from.

page_ids?: string | null

The Page ID’s of the Notion to read from.

supports_access_control?: boolean

CloudConfluenceDataSource { authentication_mechanism, server_url, api_token, 10 more }

authentication_mechanism: string

Type of Authentication for connecting to Confluence APIs.

server_url: string

The server URL of the Confluence instance.

api_token?: string | null

The API token to use for authentication.

formatpassword

class_name?: string

cql?: string | null

The CQL query to use for fetching pages.

failure_handling?: FailureHandlingConfig { skip_list_failures }

Configuration for handling failures during processing. Key-value object controlling failure handling behaviors.

Example: { “skip_list_failures”: true }

Currently supports:

skip_list_failures: Skip failed batches/lists and continue processing

skip_list_failures?: boolean

Whether to skip failed batches/lists and continue processing

index_restricted_pages?: boolean

Whether to index restricted pages.

keep_markdown_format?: boolean

Whether to keep the markdown format.

label?: string | null

The label to use for fetching pages.

page_ids?: string | null

The page IDs of the Confluence to read from.

space_key?: string | null

The space key to read from.

supports_access_control?: boolean

user_name?: string | null

The username to use for authentication.

CloudJiraDataSource { authentication_mechanism, query, api_token, 5 more }

Cloud Jira Data Source integrating JiraReader.

authentication_mechanism: string

Type of Authentication for connecting to Jira APIs.

query: string

JQL (Jira Query Language) query to search.

api_token?: string | null

The API/ Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword

class_name?: string

cloud_id?: string | null

The cloud ID, used in case of OAuth2.

email?: string | null

The email address to use for authentication.

server_url?: string | null

The server url for Jira Cloud.

supports_access_control?: boolean

CloudJiraDataSourceV2 { authentication_mechanism, query, server_url, 10 more }

Cloud Jira Data Source integrating JiraReaderV2.

authentication_mechanism: string

Type of Authentication for connecting to Jira APIs.

query: string

JQL (Jira Query Language) query to search.

server_url: string

The server url for Jira Cloud.

api_token?: string | null

The API Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword

api_version?: "2" | "3"

Jira REST API version to use (2 or 3). 3 supports Atlassian Document Format (ADF).

One of the following:

"2"

"3"

class_name?: string

cloud_id?: string | null

The cloud ID, used in case of OAuth2.

email?: string | null

The email address to use for authentication.

expand?: string | null

Fields to expand in the response.

fields?: Array<string> | null

List of fields to retrieve from Jira. If None, retrieves all fields.

get_permissions?: boolean

Whether to fetch project role permissions and issue-level security

requests_per_minute?: number | null

Rate limit for Jira API requests per minute.

supports_access_control?: boolean

CloudBoxDataSource { authentication_mechanism, class_name, client_id, 6 more }

authentication_mechanism: "developer_token" | "ccg"

The type of authentication to use (Developer Token or CCG)

One of the following:

"developer_token"

"ccg"

class_name?: string

client_id?: string | null

Box API key used for identifying the application the user is authenticating with

client_secret?: string | null

Box API secret used for making auth requests.

formatpassword

developer_token?: string | null

Developer token for authentication if authentication_mechanism is ‘developer_token’.

formatpassword

enterprise_id?: string | null

Box Enterprise ID, if provided authenticates as service.

folder_id?: string | null

The ID of the Box folder to read from.

supports_access_control?: boolean

user_id?: string | null

Box User ID, if provided authenticates as user.

data_source_id: string

The ID of the data source.

formatuuid

last_synced_at: string

The last time the data source was automatically synced.

formatdate-time

The name of the data source.

pipeline_id: string

The ID of the pipeline.

formatuuid

project_id: string

source_type: "S3" | "AZURE_STORAGE_BLOB" | "GOOGLE_DRIVE" | 8 more

One of the following:

"S3"

"AZURE_STORAGE_BLOB"

"GOOGLE_DRIVE"

"MICROSOFT_ONEDRIVE"

"MICROSOFT_SHAREPOINT"

"SLACK"

"NOTION_PAGE"

"CONFLUENCE"

"JIRA"

"JIRA_V2"

"BOX"

created_at?: string | null

Creation datetime

formatdate-time

Custom metadata that will be present on all data loaded from the data source

One of the following:

Record<string, unknown>

Array<unknown>

string

number

boolean

status?: "NOT_STARTED" | "IN_PROGRESS" | "SUCCESS" | 2 more | null

The status of the data source in the pipeline.

One of the following:

"NOT_STARTED"

"IN_PROGRESS"

"SUCCESS"

"ERROR"

"CANCELLED"

status_updated_at?: string | null

The last time the status was updated.

formatdate-time

sync_interval?: number | null

The interval at which the data source should be synced.

sync_schedule_set_by?: string | null

The id of the user who set the sync schedule.

updated_at?: string | null

Update datetime

formatdate-time

version_metadata?: DataSourceReaderVersionMetadata { reader_version } | null

Version metadata for the data source

reader_version?: "1.0" | "2.0" | "2.1" | null

The version of the reader to use for this data source.

One of the following:

"1.0"

"2.0"

"2.1"

DataSourceUpdateDataSourcesResponse = Array<PipelineDataSource { id, component, data_source_id, 13 more } >

id: string

Unique identifier

formatuuid

Component that implements the data source

One of the following:

Record<string, unknown>

CloudS3DataSource { bucket, aws_access_id, aws_access_secret, 5 more }

bucket: string

The name of the S3 bucket to read from.

aws_access_id?: string | null

The AWS access ID to use for authentication.

aws_access_secret?: string | null

The AWS access secret to use for authentication.

formatpassword

class_name?: string

prefix?: string | null

The prefix of the S3 objects to read from.

regex_pattern?: string | null

The regex pattern to filter S3 objects. Must be a valid regex pattern.

s3_endpoint_url?: string | null

The S3 endpoint URL to use for authentication.

supports_access_control?: boolean

CloudAzStorageBlobDataSource { account_url, container_name, account_key, 8 more }

account_url: string

The Azure Storage Blob account URL to use for authentication.

container_name: string

The name of the Azure Storage Blob container to read from.

account_key?: string | null

The Azure Storage Blob account key to use for authentication.

formatpassword

account_name?: string | null

The Azure Storage Blob account name to use for authentication.

blob?: string | null

The blob name to read from.

class_name?: string

client_id?: string | null

The Azure AD client ID to use for authentication.

client_secret?: string | null

The Azure AD client secret to use for authentication.

formatpassword

prefix?: string | null

The prefix of the Azure Storage Blob objects to read from.

supports_access_control?: boolean

tenant_id?: string | null

The Azure AD tenant ID to use for authentication.

CloudGoogleDriveDataSource { folder_id, class_name, service_account_key, supports_access_control }

folder_id: string

The ID of the Google Drive folder to read from.

class_name?: string

service_account_key?: Record<string, string> | null

A dictionary containing secret values

supports_access_control?: boolean

CloudOneDriveDataSource { client_id, client_secret, tenant_id, 6 more }

client_id: string

The client ID to use for authentication.

client_secret: string

The client secret to use for authentication.

formatpassword

tenant_id: string

The tenant ID to use for authentication.

user_principal_name: string

The user principal name to use for authentication.

class_name?: string

folder_id?: string | null

The ID of the OneDrive folder to read from.

folder_path?: string | null

The path of the OneDrive folder to read from.

required_exts?: Array<string> | null

The list of required file extensions.

supports_access_control?: true

CloudSharepointDataSource { client_id, client_secret, tenant_id, 11 more }

client_id: string

The client ID to use for authentication.

client_secret: string

The client secret to use for authentication.

formatpassword

tenant_id: string

The tenant ID to use for authentication.

class_name?: string

drive_name?: string | null

The name of the Sharepoint drive to read from.

exclude_path_patterns?: Array<string> | null

List of regex patterns for file paths to exclude. Files whose paths (including filename) match any pattern will be excluded. Example: [‘/temp/’, ‘/backup/’, ‘.git/’, ‘.tmp$’, ’^~’]

folder_id?: string | null

The ID of the Sharepoint folder to read from.

folder_path?: string | null

The path of the Sharepoint folder to read from.

get_permissions?: boolean

Whether to get permissions for the sharepoint site.

include_path_patterns?: Array<string> | null

List of regex patterns for file paths to include. Full paths (including filename) must match at least one pattern to be included. Example: [‘/reports/’, ‘/docs/..pdf$’, ‘^Report..pdf$’]

required_exts?: Array<string> | null

The list of required file extensions.

site_id?: string | null

The ID of the SharePoint site to download from.

site_name?: string | null

The name of the SharePoint site to download from.

supports_access_control?: true

CloudSlackDataSource { slack_token, channel_ids, channel_patterns, 6 more }

slack_token: string

Slack Bot Token.

formatpassword

channel_ids?: string | null

Slack Channel.

channel_patterns?: string | null

Slack Channel name pattern.

class_name?: string

earliest_date?: string | null

Earliest date.

earliest_date_timestamp?: number | null

Earliest date timestamp.

latest_date?: string | null

Latest date.

latest_date_timestamp?: number | null

Latest date timestamp.

supports_access_control?: boolean

CloudNotionPageDataSource { integration_token, class_name, database_ids, 2 more }

integration_token: string

The integration token to use for authentication.

formatpassword

class_name?: string

database_ids?: string | null

The Notion Database Id to read content from.

page_ids?: string | null

The Page ID’s of the Notion to read from.

supports_access_control?: boolean

CloudConfluenceDataSource { authentication_mechanism, server_url, api_token, 10 more }

authentication_mechanism: string

Type of Authentication for connecting to Confluence APIs.

server_url: string

The server URL of the Confluence instance.

api_token?: string | null

The API token to use for authentication.

formatpassword

class_name?: string

cql?: string | null

The CQL query to use for fetching pages.

failure_handling?: FailureHandlingConfig { skip_list_failures }

Configuration for handling failures during processing. Key-value object controlling failure handling behaviors.

Example: { “skip_list_failures”: true }

Currently supports:

skip_list_failures: Skip failed batches/lists and continue processing

skip_list_failures?: boolean

Whether to skip failed batches/lists and continue processing

index_restricted_pages?: boolean

Whether to index restricted pages.

keep_markdown_format?: boolean

Whether to keep the markdown format.

label?: string | null

The label to use for fetching pages.

page_ids?: string | null

The page IDs of the Confluence to read from.

space_key?: string | null

The space key to read from.

supports_access_control?: boolean

user_name?: string | null

The username to use for authentication.

CloudJiraDataSource { authentication_mechanism, query, api_token, 5 more }

Cloud Jira Data Source integrating JiraReader.

authentication_mechanism: string

Type of Authentication for connecting to Jira APIs.

query: string

JQL (Jira Query Language) query to search.

api_token?: string | null

The API/ Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword

class_name?: string

cloud_id?: string | null

The cloud ID, used in case of OAuth2.

email?: string | null

The email address to use for authentication.

server_url?: string | null

The server url for Jira Cloud.

supports_access_control?: boolean

CloudJiraDataSourceV2 { authentication_mechanism, query, server_url, 10 more }

Cloud Jira Data Source integrating JiraReaderV2.

authentication_mechanism: string

Type of Authentication for connecting to Jira APIs.

query: string

JQL (Jira Query Language) query to search.

server_url: string

The server url for Jira Cloud.

api_token?: string | null

The API Access Token used for Basic, PAT and OAuth2 authentication.

formatpassword

api_version?: "2" | "3"

Jira REST API version to use (2 or 3). 3 supports Atlassian Document Format (ADF).

One of the following:

"2"

"3"

class_name?: string

cloud_id?: string | null

The cloud ID, used in case of OAuth2.

email?: string | null

The email address to use for authentication.

expand?: string | null

Fields to expand in the response.

fields?: Array<string> | null

List of fields to retrieve from Jira. If None, retrieves all fields.

get_permissions?: boolean

Whether to fetch project role permissions and issue-level security

requests_per_minute?: number | null

Rate limit for Jira API requests per minute.

supports_access_control?: boolean

CloudBoxDataSource { authentication_mechanism, class_name, client_id, 6 more }

authentication_mechanism: "developer_token" | "ccg"

The type of authentication to use (Developer Token or CCG)

One of the following:

"developer_token"

"ccg"

class_name?: string

client_id?: string | null

Box API key used for identifying the application the user is authenticating with

client_secret?: string | null

Box API secret used for making auth requests.

formatpassword

developer_token?: string | null

Developer token for authentication if authentication_mechanism is ‘developer_token’.

formatpassword

enterprise_id?: string | null

Box Enterprise ID, if provided authenticates as service.

folder_id?: string | null

The ID of the Box folder to read from.

supports_access_control?: boolean

user_id?: string | null

Box User ID, if provided authenticates as user.

data_source_id: string

The ID of the data source.

formatuuid

last_synced_at: string

The last time the data source was automatically synced.

formatdate-time

The name of the data source.

pipeline_id: string

The ID of the pipeline.

formatuuid

project_id: string

source_type: "S3" | "AZURE_STORAGE_BLOB" | "GOOGLE_DRIVE" | 8 more

One of the following:

"S3"

"AZURE_STORAGE_BLOB"

"GOOGLE_DRIVE"

"MICROSOFT_ONEDRIVE"

"MICROSOFT_SHAREPOINT"

"SLACK"

"NOTION_PAGE"

"CONFLUENCE"

"JIRA"

"JIRA_V2"

"BOX"

created_at?: string | null

Creation datetime

formatdate-time

Custom metadata that will be present on all data loaded from the data source

One of the following:

Record<string, unknown>

Array<unknown>

string

number

boolean

status?: "NOT_STARTED" | "IN_PROGRESS" | "SUCCESS" | 2 more | null

The status of the data source in the pipeline.

One of the following:

"NOT_STARTED"

"IN_PROGRESS"

"SUCCESS"

"ERROR"

"CANCELLED"

status_updated_at?: string | null

The last time the status was updated.

formatdate-time

sync_interval?: number | null

The interval at which the data source should be synced.

sync_schedule_set_by?: string | null

The id of the user who set the sync schedule.

updated_at?: string | null

Update datetime

formatdate-time

version_metadata?: DataSourceReaderVersionMetadata { reader_version } | null

Version metadata for the data source

reader_version?: "1.0" | "2.0" | "2.1" | null

The version of the reader to use for this data source.

One of the following:

"1.0"

"2.0"

"2.1"