#python #library #pdf #pdf_conversion #pdf_converter #pdf_generation #pdf_library #python3 #sdk #typesetting
https://github.com/jorisschellekens/borb
https://github.com/jorisschellekens/borb
GitHub
GitHub - jorisschellekens/borb: borb is a library for reading, creating and manipulating PDF files in python.
borb is a library for reading, creating and manipulating PDF files in python. - jorisschellekens/borb
#java #docker #java #pdf #pdf_converter #pdf_editor #pdf_manipulation #pdf_merger #pdf_ocr #pdf_tools #pdf_web_apps #pdfmerger
Stirling-PDF is a powerful tool for managing PDF files locally on your computer or server. It allows you to perform various operations like splitting, merging, converting, and editing PDFs without sending your files to external servers, ensuring your data stays private. You can add images, rotate pages, compress files, and even convert PDFs to other formats like Word or images. The tool supports multiple languages and has features like dark mode, custom download options, and API integration for advanced users. It's easy to set up using Docker and offers customizable settings and security features like login authentication. This makes it a versatile and secure solution for all your PDF needs.
https://github.com/Stirling-Tools/Stirling-PDF
Stirling-PDF is a powerful tool for managing PDF files locally on your computer or server. It allows you to perform various operations like splitting, merging, converting, and editing PDFs without sending your files to external servers, ensuring your data stays private. You can add images, rotate pages, compress files, and even convert PDFs to other formats like Word or images. The tool supports multiple languages and has features like dark mode, custom download options, and API integration for advanced users. It's easy to set up using Docker and offers customizable settings and security features like login authentication. This makes it a versatile and secure solution for all your PDF needs.
https://github.com/Stirling-Tools/Stirling-PDF
GitHub
GitHub - Stirling-Tools/Stirling-PDF: #1 PDF Application on GitHub that lets you edit PDFs on any device anywhere
#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere - Stirling-Tools/Stirling-PDF
👍1
#python #ai4science #document_analysis #extract_data #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_extractor_llm #pdf_extractor_pretrain #pdf_extractor_rag #pdf_parser #python
MinerU is a tool that converts PDFs into machine-readable formats like markdown or JSON. Here are the key benefits and features MinerU removes headers, footers, and other unnecessary elements to ensure the text is semantically coherent and in human-readable order, even for complex layouts.
- **Structure Preservation** It extracts images, image descriptions, tables, and table titles.
- **Formula Conversion** Recognizes tables and converts them to LaTeX or HTML format.
- **OCR Support** Supports multiple output formats and various visualization results.
- **GPU and CPU Compatibility**: Works on both CPU and GPU environments, compatible with Windows, Linux, and Mac.
You can try MinerU through an online demo, a quick CPU demo, or by using a GPU for faster processing. For detailed usage, refer to the command line options, API integration, and deployment guides provided.
https://github.com/opendatalab/MinerU
MinerU is a tool that converts PDFs into machine-readable formats like markdown or JSON. Here are the key benefits and features MinerU removes headers, footers, and other unnecessary elements to ensure the text is semantically coherent and in human-readable order, even for complex layouts.
- **Structure Preservation** It extracts images, image descriptions, tables, and table titles.
- **Formula Conversion** Recognizes tables and converts them to LaTeX or HTML format.
- **OCR Support** Supports multiple output formats and various visualization results.
- **GPU and CPU Compatibility**: Works on both CPU and GPU environments, compatible with Windows, Linux, and Mac.
You can try MinerU through an online demo, a quick CPU demo, or by using a GPU for faster processing. For detailed usage, refer to the command line options, API integration, and deployment guides provided.
https://github.com/opendatalab/MinerU
GitHub
GitHub - opendatalab/MinerU: Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows. - opendatalab/MinerU
#typescript #bun #conversion #convert #converter #document_conversion #elysia #file_conversion #file_converter #hacktoberfest #pdf_converter #self_hosted #tailwindcss #typescript
ConvertX is a self-hosted online file converter that supports over a thousand file formats, including images, videos, documents, e-books, and 3D assets. It lets you convert multiple files at once, offers password protection, and supports multiple user accounts for privacy. You can run it easily using Docker, making it simple to set up on your own server. This means your files stay private since conversions happen locally without sending data to external servers. It uses powerful open-source tools like FFmpeg and ImageMagick, giving you a versatile and secure way to handle all your file conversion needs in one place[1][2].
https://github.com/C4illin/ConvertX
ConvertX is a self-hosted online file converter that supports over a thousand file formats, including images, videos, documents, e-books, and 3D assets. It lets you convert multiple files at once, offers password protection, and supports multiple user accounts for privacy. You can run it easily using Docker, making it simple to set up on your own server. This means your files stay private since conversions happen locally without sending data to external servers. It uses powerful open-source tools like FFmpeg and ImageMagick, giving you a versatile and secure way to handle all your file conversion needs in one place[1][2].
https://github.com/C4illin/ConvertX
GitHub
GitHub - C4illin/ConvertX: 💾 Self-hosted online file converter. Supports 1000+ formats ⚙️
💾 Self-hosted online file converter. Supports 1000+ formats ⚙️ - C4illin/ConvertX
#python #document_analysis #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_parser #python #vlm_ocr
Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.
https://github.com/bytedance/Dolphin
Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.
https://github.com/bytedance/Dolphin
GitHub
GitHub - bytedance/Dolphin: The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025. - bytedance/Dolphin
#csharp #pdf #pdf_converter #pdf_document_processor #pdf_generation
PDFPatcher is a free and open-source tool that helps you manage PDF files. It allows you to edit PDF metadata, bookmarks, and page layouts. You can also merge, split, and rotate PDF pages. Additionally, it supports converting PDF pages to images and extracting specific pages. The software is free to use and does not have ads or privacy concerns. It encourages users to do a good deed after using it, which is part of its unique "良心授权" (conscience license) agreement. This tool is beneficial for users who need to manipulate PDFs without spending money on expensive software.
https://github.com/wmjordan/PDFPatcher
PDFPatcher is a free and open-source tool that helps you manage PDF files. It allows you to edit PDF metadata, bookmarks, and page layouts. You can also merge, split, and rotate PDF pages. Additionally, it supports converting PDF pages to images and extracting specific pages. The software is free to use and does not have ads or privacy concerns. It encourages users to do a good deed after using it, which is part of its unique "良心授权" (conscience license) agreement. This tool is beneficial for users who need to manipulate PDFs without spending money on expensive software.
https://github.com/wmjordan/PDFPatcher
GitHub
GitHub - wmjordan/PDFPatcher: PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等
PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等 - wmjordan/PDFPatcher