GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#java #docker #java #pdf #pdf_converter #pdf_editor #pdf_manipulation #pdf_merger #pdf_ocr #pdf_tools #pdf_web_apps #pdfmerger

Stirling-PDF is a powerful tool for managing PDF files locally on your computer or server. It allows you to perform various operations like splitting, merging, converting, and editing PDFs without sending your files to external servers, ensuring your data stays private. You can add images, rotate pages, compress files, and even convert PDFs to other formats like Word or images. The tool supports multiple languages and has features like dark mode, custom download options, and API integration for advanced users. It's easy to set up using Docker and offers customizable settings and security features like login authentication. This makes it a versatile and secure solution for all your PDF needs.

https://github.com/Stirling-Tools/Stirling-PDF
👍1
#python #ai4science #document_analysis #extract_data #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_extractor_llm #pdf_extractor_pretrain #pdf_extractor_rag #pdf_parser #python

MinerU is a tool that converts PDFs into machine-readable formats like markdown or JSON. Here are the key benefits and features MinerU removes headers, footers, and other unnecessary elements to ensure the text is semantically coherent and in human-readable order, even for complex layouts.
- **Structure Preservation** It extracts images, image descriptions, tables, and table titles.
- **Formula Conversion** Recognizes tables and converts them to LaTeX or HTML format.
- **OCR Support** Supports multiple output formats and various visualization results.
- **GPU and CPU Compatibility**: Works on both CPU and GPU environments, compatible with Windows, Linux, and Mac.

You can try MinerU through an online demo, a quick CPU demo, or by using a GPU for faster processing. For detailed usage, refer to the command line options, API integration, and deployment guides provided.

https://github.com/opendatalab/MinerU
#typescript #bun #conversion #convert #converter #document_conversion #elysia #file_conversion #file_converter #hacktoberfest #pdf_converter #self_hosted #tailwindcss #typescript

ConvertX is a self-hosted online file converter that supports over a thousand file formats, including images, videos, documents, e-books, and 3D assets. It lets you convert multiple files at once, offers password protection, and supports multiple user accounts for privacy. You can run it easily using Docker, making it simple to set up on your own server. This means your files stay private since conversions happen locally without sending data to external servers. It uses powerful open-source tools like FFmpeg and ImageMagick, giving you a versatile and secure way to handle all your file conversion needs in one place[1][2].

https://github.com/C4illin/ConvertX
#python #document_analysis #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_parser #python #vlm_ocr

Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.

https://github.com/bytedance/Dolphin
#csharp #pdf #pdf_converter #pdf_document_processor #pdf_generation

PDFPatcher is a free and open-source tool that helps you manage PDF files. It allows you to edit PDF metadata, bookmarks, and page layouts. You can also merge, split, and rotate PDF pages. Additionally, it supports converting PDF pages to images and extracting specific pages. The software is free to use and does not have ads or privacy concerns. It encourages users to do a good deed after using it, which is part of its unique "良心授权" (conscience license) agreement. This tool is beneficial for users who need to manipulate PDFs without spending money on expensive software.

https://github.com/wmjordan/PDFPatcher