API
Parse Document API
Upload one PDF or supported document and receive Markdown, plain text, JSON, and page-level metrics.
Endpoint
POST /api/parse?mode=standard
Request
curl -F file=@sample.pdf "http://127.0.0.1:8765/api/parse?mode=standard"
curl -F file=@sample.docx "http://127.0.0.1:8765/api/parse?mode=standard"
Modes
- standard: preserves paragraph breaks and page structure.
- plain: normalizes repeated spaces for simpler text output.
Response fields
- markdown: Markdown output with page headings.
- text: full extracted text.
- json: formatted JSON string for downloads.
- pages: page-level text and counts.
- extraction_method: user-friendly extraction method such as PDF text extraction or image text recognition.
- warnings: helpful notes when a file has missing text, scanned pages, or image quality issues.
Supported formats
PDF is the primary product format. Common Office documents, spreadsheets, presentations, and image scans are also accepted.
GET /api/formats
Limits
Files can be up to 30 MB. For the best results, upload searchable PDFs or clear, high-resolution scans.