Extraction

Extract Stateless

Deprecated

extraction.run() -> ExtractJob

POST/api/v1/extraction/run

ExtractionJobs

List Jobs

extraction.jobs.list() -> JobListResponse

GET/api/v1/extraction/jobs

Run Job

extraction.jobs.create() -> ExtractJob

POST/api/v1/extraction/jobs

Get Job

extraction.jobs.get() -> ExtractJob

GET/api/v1/extraction/jobs/{job_id}

Run Job On File

extraction.jobs.file() -> ExtractJob

POST/api/v1/extraction/jobs/file

Get Job Result

extraction.jobs.get_result(, ) -> JobGetResultResponse

GET/api/v1/extraction/jobs/{job_id}/result

ModelsExpand Collapse

class ExtractJob: …

Schema for an extraction job.

id: str

The id of the extraction job

formatuuid

extraction_agent: ExtractAgent

The agent that the job was run on.

id: str

The id of the extraction agent.

formatuuid

config: ExtractConfig

The configuration parameters for the extraction agent.

chunk_mode: Optional[Literal["PAGE", "SECTION"]]

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox: Optional[bool]

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources: Optional[bool]

Whether to cite sources for the extraction.

confidence_scores: Optional[bool]

Whether to fetch confidence scores for the extraction.

extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]

Extract model options.

Accepts one of the following:

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

str

extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode: Optional[bool]

Whether to use high resolution mode for the extraction.

invalidate_cache: Optional[bool]

Whether to invalidate the cache for the extraction.

multimodal_fast_mode: Optional[bool]

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context: Optional[int]

Number of pages to pass as context on long document extraction.

minimum1

page_range: Optional[str]

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority: Optional[Literal["low", "medium", "high", "critical"]]

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt: Optional[str]

The system prompt to use for the extraction.

use_reasoning: Optional[bool]

Whether to use reasoning for the extraction.

data_schema: Dict[str, Union[Dict[str, object], List[object], str, 3 more]]

The schema of the data.

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

The name of the extraction agent.

project_id: str

The ID of the project that the extraction agent belongs to.

formatuuid

created_at: Optional[datetime]

The creation time of the extraction agent.

formatdate-time

custom_configuration: Optional[Literal["default"]]

Custom configuration type for the extraction agent. Currently supports 'default'.

updated_at: Optional[datetime]

The last update time of the extraction agent.

formatdate-time

status: Literal["PENDING", "SUCCESS", "ERROR", 2 more]

The status of the extraction job

Accepts one of the following:

"PENDING"

"SUCCESS"

"ERROR"

"PARTIAL_SUCCESS"

"CANCELLED"

error: Optional[str]

The error that occurred during extraction

Deprecatedfile: Optional[File]

Schema for a file.

id: str

Unique identifier

formatuuid

project_id: str

The ID of the project that the file belongs to

formatuuid

created_at: Optional[datetime]

Creation datetime

formatdate-time

data_source_id: Optional[str]

The ID of the data source that the file belongs to

formatuuid

expires_at: Optional[datetime]

The expiration date for the file. Files past this date can be deleted.

formatdate-time

external_file_id: Optional[str]

The ID of the file in the external system

file_size: Optional[int]

Size of the file in bytes

minimum0

file_type: Optional[str]

File type (e.g. pdf, docx, etc.)

maxLength3000

minLength1

last_modified_at: Optional[datetime]

The last modified time of the file

formatdate-time

permission_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]

Permission information for the file

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

purpose: Optional[str]

The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')

resource_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]

Resource information for the file

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

updated_at: Optional[datetime]

Update datetime

formatdate-time

file_id: Optional[str]

The id of the file that the extract was extracted from

formatuuid

class WebhookConfiguration: …

Allows the user to configure webhook options for notifications and callbacks.

webhook_events: Optional[List[Literal["extract.pending", "extract.success", "extract.error", 14 more]]]

List of event names to subscribe to

Accepts one of the following:

"extract.pending"

"extract.success"

"extract.error"

"extract.partial_success"

"extract.cancelled"

"parse.pending"

"parse.running"

"parse.success"

"parse.error"

"parse.partial_success"

"parse.cancelled"

"classify.pending"

"classify.success"

"classify.error"

"classify.partial_success"

"classify.cancelled"

"unmapped_event"

webhook_headers: Optional[Dict[str, str]]

Custom HTTP headers to include with webhook requests.

webhook_output_format: Optional[str]

The output format to use for the webhook. Defaults to string if none supplied. Currently supported values: string, json

webhook_url: Optional[str]

The URL to send webhook notifications to.

ExtractionRuns

List Extract Runs

extraction.runs.list() -> SyncPaginatedExtractRuns[ExtractRun]

GET/api/v1/extraction/runs

Get Run

extraction.runs.get(, ) -> ExtractRun

GET/api/v1/extraction/runs/{run_id}

Delete Extraction Run

extraction.runs.delete(, ) -> object

DELETE/api/v1/extraction/runs/{run_id}

Get Run By Job Id

extraction.runs.get_by_job(, ) -> ExtractRun

GET/api/v1/extraction/runs/by-job/{job_id}

ModelsExpand Collapse

class ExtractConfig: …

Configuration parameters for the extraction agent.

chunk_mode: Optional[Literal["PAGE", "SECTION"]]

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox: Optional[bool]

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources: Optional[bool]

Whether to cite sources for the extraction.

confidence_scores: Optional[bool]

Whether to fetch confidence scores for the extraction.

extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]

Extract model options.

Accepts one of the following:

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

str

extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode: Optional[bool]

Whether to use high resolution mode for the extraction.

invalidate_cache: Optional[bool]

Whether to invalidate the cache for the extraction.

multimodal_fast_mode: Optional[bool]

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context: Optional[int]

Number of pages to pass as context on long document extraction.

minimum1

page_range: Optional[str]

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority: Optional[Literal["low", "medium", "high", "critical"]]

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt: Optional[str]

The system prompt to use for the extraction.

use_reasoning: Optional[bool]

Whether to use reasoning for the extraction.

class ExtractRun: …

Schema for an extraction run.

id: str

The id of the extraction run

formatuuid

config: ExtractConfig

The config used for extraction

chunk_mode: Optional[Literal["PAGE", "SECTION"]]

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox: Optional[bool]

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources: Optional[bool]

Whether to cite sources for the extraction.

confidence_scores: Optional[bool]

Whether to fetch confidence scores for the extraction.

extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]

Extract model options.

Accepts one of the following:

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

str

extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode: Optional[bool]

Whether to use high resolution mode for the extraction.

invalidate_cache: Optional[bool]

Whether to invalidate the cache for the extraction.

multimodal_fast_mode: Optional[bool]

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context: Optional[int]

Number of pages to pass as context on long document extraction.

minimum1

page_range: Optional[str]

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority: Optional[Literal["low", "medium", "high", "critical"]]

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt: Optional[str]

The system prompt to use for the extraction.

use_reasoning: Optional[bool]

Whether to use reasoning for the extraction.

data_schema: Dict[str, Union[Dict[str, object], List[object], str, 3 more]]

The schema used for extraction

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

extraction_agent_id: str

The id of the extraction agent

formatuuid

from_ui: bool

Whether this extraction run was triggered from the UI

project_id: str

The id of the project that the extraction run belongs to

formatuuid

status: Literal["CREATED", "PENDING", "SUCCESS", "ERROR"]

The status of the extraction run

Accepts one of the following:

"CREATED"

"PENDING"

"SUCCESS"

"ERROR"

created_at: Optional[datetime]

Creation datetime

formatdate-time

data: Optional[Union[Dict[str, Union[Dict[str, object], List[object], str, 3 more]], List[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]], null]]

The data extracted from the file

Accepts one of the following:

Dict[str, Union[Dict[str, object], List[object], str, 3 more]]

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

List[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

error: Optional[str]

The error that occurred during extraction

extraction_metadata: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]

The metadata extracted from the file

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

Deprecatedfile: Optional[File]

Schema for a file.

id: str

Unique identifier

formatuuid

project_id: str

The ID of the project that the file belongs to

formatuuid

created_at: Optional[datetime]

Creation datetime

formatdate-time

data_source_id: Optional[str]

The ID of the data source that the file belongs to

formatuuid

expires_at: Optional[datetime]

The expiration date for the file. Files past this date can be deleted.

formatdate-time

external_file_id: Optional[str]

The ID of the file in the external system

file_size: Optional[int]

Size of the file in bytes

minimum0

file_type: Optional[str]

File type (e.g. pdf, docx, etc.)

maxLength3000

minLength1

last_modified_at: Optional[datetime]

The last modified time of the file

formatdate-time

permission_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]

Permission information for the file

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

purpose: Optional[str]

The intended purpose of the file (e.g., 'user_data', 'parse', 'extract', 'split', 'classify')

resource_info: Optional[Dict[str, Union[Dict[str, object], List[object], str, 3 more]]]

Resource information for the file

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

updated_at: Optional[datetime]

Update datetime

formatdate-time

file_id: Optional[str]

The id of the file that the extract was extracted from

formatuuid

job_id: Optional[str]

The id of the job that the extraction run belongs to

formatuuid

updated_at: Optional[datetime]

Update datetime

formatdate-time

ExtractionExtraction Agents

Create Extraction Agent

extraction.extraction_agents.create() -> ExtractAgent

POST/api/v1/extraction/extraction-agents

List Extraction Agents

extraction.extraction_agents.list() -> ExtractionAgentListResponse

GET/api/v1/extraction/extraction-agents

Get Extraction Agent

extraction.extraction_agents.get() -> ExtractAgent

GET/api/v1/extraction/extraction-agents/{extraction_agent_id}

Delete Extraction Agent

extraction.extraction_agents.delete() -> object

DELETE/api/v1/extraction/extraction-agents/{extraction_agent_id}

Update Extraction Agent

extraction.extraction_agents.update(, ) -> ExtractAgent

PUT/api/v1/extraction/extraction-agents/{extraction_agent_id}

ModelsExpand Collapse

class ExtractAgent: …

Schema and configuration for an extraction agent.

id: str

The id of the extraction agent.

formatuuid

config: ExtractConfig

The configuration parameters for the extraction agent.

chunk_mode: Optional[Literal["PAGE", "SECTION"]]

The mode to use for chunking the document.

Accepts one of the following:

"PAGE"

"SECTION"

Deprecatedcitation_bbox: Optional[bool]

Whether to fetch citation bounding boxes for the extraction. Only available in PREMIUM mode. Deprecated: this is now synonymous with cite_sources.

cite_sources: Optional[bool]

Whether to cite sources for the extraction.

confidence_scores: Optional[bool]

Whether to fetch confidence scores for the extraction.

extract_model: Optional[Union[Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more], str, null]]

The extract model to use for data extraction. If not provided, uses the default for the extraction mode.

Accepts one of the following:

Literal["openai-gpt-4-1", "openai-gpt-4-1-mini", "openai-gpt-4-1-nano", 8 more]

Extract model options.

Accepts one of the following:

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"gemini-2.0-flash"

"gemini-2.5-flash"

"gemini-2.5-flash-lite"

"gemini-2.5-pro"

"openai-gpt-4o"

"openai-gpt-4o-mini"

str

extraction_mode: Optional[Literal["FAST", "BALANCED", "PREMIUM", "MULTIMODAL"]]

The extraction mode specified (FAST, BALANCED, MULTIMODAL, PREMIUM).

Accepts one of the following:

"FAST"

"BALANCED"

"PREMIUM"

"MULTIMODAL"

extraction_target: Optional[Literal["PER_DOC", "PER_PAGE", "PER_TABLE_ROW"]]

The extraction target specified.

Accepts one of the following:

"PER_DOC"

"PER_PAGE"

"PER_TABLE_ROW"

high_resolution_mode: Optional[bool]

Whether to use high resolution mode for the extraction.

invalidate_cache: Optional[bool]

Whether to invalidate the cache for the extraction.

multimodal_fast_mode: Optional[bool]

DEPRECATED: Whether to use fast mode for multimodal extraction.

num_pages_context: Optional[int]

Number of pages to pass as context on long document extraction.

minimum1

page_range: Optional[str]

Comma-separated list of page numbers or ranges to extract from (1-based, e.g., '1,3,5-7,9' or '1-3,8-10').

parse_model: Optional[Literal["openai-gpt-4o", "openai-gpt-4o-mini", "openai-gpt-4-1", 23 more]]

Public model names.

Accepts one of the following:

"openai-gpt-4o"

"openai-gpt-4o-mini"

"openai-gpt-4-1"

"openai-gpt-4-1-mini"

"openai-gpt-4-1-nano"

"openai-gpt-5"

"openai-gpt-5-mini"

"openai-gpt-5-nano"

"openai-text-embedding-3-large"

"openai-text-embedding-3-small"

"openai-whisper-1"

"anthropic-sonnet-3.5"

"anthropic-sonnet-3.5-v2"

"anthropic-sonnet-3.7"

"anthropic-sonnet-4.0"

"anthropic-sonnet-4.5"

"anthropic-haiku-3.5"

"anthropic-haiku-4.5"

"gemini-2.5-flash"

"gemini-3.0-pro"

"gemini-2.5-pro"

"gemini-2.0-flash"

"gemini-2.0-flash-lite"

"gemini-2.5-flash-lite"

"gemini-1.5-flash"

"gemini-1.5-pro"

priority: Optional[Literal["low", "medium", "high", "critical"]]

The priority for the request. This field may be ignored or overwritten depending on the organization tier.

Accepts one of the following:

"low"

"medium"

"high"

"critical"

system_prompt: Optional[str]

The system prompt to use for the extraction.

use_reasoning: Optional[bool]

Whether to use reasoning for the extraction.

data_schema: Dict[str, Union[Dict[str, object], List[object], str, 3 more]]

The schema of the data.

Accepts one of the following:

Dict[str, object]

List[object]

str

float

bool

The name of the extraction agent.

project_id: str

The ID of the project that the extraction agent belongs to.

formatuuid

created_at: Optional[datetime]

The creation time of the extraction agent.

formatdate-time

custom_configuration: Optional[Literal["default"]]

Custom configuration type for the extraction agent. Currently supports 'default'.

updated_at: Optional[datetime]

The last update time of the extraction agent.

formatdate-time

ExtractionExtraction AgentsSchema

Validate Extraction Schema

extraction.extraction_agents.schema.validate_schema() -> SchemaValidateSchemaResponse

POST/api/v1/extraction/extraction-agents/schema/validation

Generate Extraction Schema

extraction.extraction_agents.schema.generate_schema() -> SchemaGenerateSchemaResponse

POST/api/v1/extraction/extraction-agents/schema/generate