#python #ocr #ocr_python #paddleocr #qml #qt #screenshot #umi_ocr
Umi-OCR is a free, open-source, and offline OCR (Optical Character Recognition) software that offers several benefits. Here are the key points The software is completely free to use, with all code available openly.
- **Convenient** It comes with efficient OCR engines and supports multiple languages.
- **Flexible** It includes screenshot OCR, batch OCR, PDF recognition, QR code scanning and generation, and formula recognition.
This software is easy to use, supports various file formats, and has features like ignoring regions in images to exclude unwanted text. It also supports multiple languages and themes, making it highly customizable. Overall, Umi-OCR is a powerful tool for anyone needing to extract text from images or documents efficiently.
https://github.com/hiroi-sora/Umi-OCR
Umi-OCR is a free, open-source, and offline OCR (Optical Character Recognition) software that offers several benefits. Here are the key points The software is completely free to use, with all code available openly.
- **Convenient** It comes with efficient OCR engines and supports multiple languages.
- **Flexible** It includes screenshot OCR, batch OCR, PDF recognition, QR code scanning and generation, and formula recognition.
This software is easy to use, supports various file formats, and has features like ignoring regions in images to exclude unwanted text. It also supports multiple languages and themes, making it highly customizable. Overall, Umi-OCR is a powerful tool for anyone needing to extract text from images or documents efficiently.
https://github.com/hiroi-sora/Umi-OCR
GitHub
GitHub - hiroi-sora/Umi-OCR: OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。 - hiroi-sora/Umi-OCR
#python #chineseocr #crnn #db #ocr #ocrlite
PaddleOCR is a powerful tool for Optical Character Recognition (OCR) that helps developers create and use advanced models. It supports various cutting-edge algorithms and models, such as text recognition, table recognition, and formula recognition. The tool offers low-code development capabilities, making it easy to use with simple Python APIs and graphical interfaces. This allows developers to quickly integrate and customize models for different tasks, including automated office work, financial risk control, healthcare, education, and more. It also supports deployment on various hardware like NVIDIA GPUs, Kunlun chips, and others, making it highly efficient and versatile.
https://github.com/PaddlePaddle/PaddleOCR
PaddleOCR is a powerful tool for Optical Character Recognition (OCR) that helps developers create and use advanced models. It supports various cutting-edge algorithms and models, such as text recognition, table recognition, and formula recognition. The tool offers low-code development capabilities, making it easy to use with simple Python APIs and graphical interfaces. This allows developers to quickly integrate and customize models for different tasks, including automated office work, financial risk control, healthcare, education, and more. It also supports deployment on various hardware like NVIDIA GPUs, Kunlun chips, and others, making it highly efficient and versatile.
https://github.com/PaddlePaddle/PaddleOCR
GitHub
GitHub - PaddlePaddle/PaddleOCR: Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit…
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. - PaddlePaddle/Paddl...
👍2
#typescript #clipboard #color_picker #cross_platform #electron #image_editing #image_editor #live_text #ocr #paddleocr #screen_capture #screen_recorder #screenshot #search #search_photos
eSearch is a powerful tool that helps you capture, edit, and search content on your screen. It works on Windows, Linux, and macOS. With eSearch, you can take screenshots, recognize text using OCR (even offline), translate text, and search images. You can also record your screen, add annotations, and use various editing tools like cropping, blurring, and more.
The benefit to you is that eSearch makes it easy to manage and interact with the content on your screen in multiple ways, saving you time and effort. It's especially useful for tasks like capturing and translating text from images or videos, which can be very handy for work or study.
https://github.com/xushengfeng/eSearch
eSearch is a powerful tool that helps you capture, edit, and search content on your screen. It works on Windows, Linux, and macOS. With eSearch, you can take screenshots, recognize text using OCR (even offline), translate text, and search images. You can also record your screen, add annotations, and use various editing tools like cropping, blurring, and more.
The benefit to you is that eSearch makes it easy to manage and interact with the content on your screen in multiple ways, saving you time and effort. It's especially useful for tasks like capturing and translating text from images or videos, which can be very handy for work or study.
https://github.com/xushengfeng/eSearch
GitHub
GitHub - xushengfeng/eSearch: 截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 万向滚动截屏 屏幕翻译 Screenshot Offline OCR Search Translate Search for…
截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 万向滚动截屏 屏幕翻译 Screenshot Offline OCR Search Translate Search for picture Paste the picture on the screen Screen recorder Omnidirectional scrolling screenshot ...
#python #ocr #pdf
Zerox OCR is a simple tool to convert documents into Markdown format using AI. Here’s how it helps you you pass in your file, and Zerox OCR returns the content in Markdown format, which you can easily read and use.
This tool saves time and effort by automating the process of extracting text from complex documents, making it easier to work with the content digitally.
https://github.com/getomni-ai/zerox
Zerox OCR is a simple tool to convert documents into Markdown format using AI. Here’s how it helps you you pass in your file, and Zerox OCR returns the content in Markdown format, which you can easily read and use.
This tool saves time and effort by automating the process of extracting text from complex documents, making it easier to work with the content digitally.
https://github.com/getomni-ai/zerox
GitHub
GitHub - getomni-ai/zerox: OCR & Document Extraction using vision models
OCR & Document Extraction using vision models. Contribute to getomni-ai/zerox development by creating an account on GitHub.
#python #chineseocr #crnn #dbnet #easyocr #ocr #onnxocr #onnxruntime #openvino #paddleocr #rapidocr
RapidOCR is a free, open-source tool that quickly recognizes text from images. It is very fast, supports multiple languages like Chinese and English, and works on various platforms including Linux, Windows, and Mac. You can use it offline, which is convenient. The tool is easy to install and use, and it even allows you to customize it for specific needs. This makes it beneficial for users who need quick and accurate text recognition without relying on internet connectivity.
https://github.com/RapidAI/RapidOCR
RapidOCR is a free, open-source tool that quickly recognizes text from images. It is very fast, supports multiple languages like Chinese and English, and works on various platforms including Linux, Windows, and Mac. You can use it offline, which is convenient. The tool is easy to install and use, and it even allows you to customize it for specific needs. This makes it beneficial for users who need quick and accurate text recognition without relying on internet connectivity.
https://github.com/RapidAI/RapidOCR
GitHub
GitHub - RapidAI/RapidOCR: 📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and…
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch. - RapidAI/RapidOCR
#python #ai4science #document_analysis #extract_data #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_extractor_llm #pdf_extractor_pretrain #pdf_extractor_rag #pdf_parser #python
MinerU is a tool that converts PDFs into machine-readable formats like markdown or JSON. Here are the key benefits and features MinerU removes headers, footers, and other unnecessary elements to ensure the text is semantically coherent and in human-readable order, even for complex layouts.
- **Structure Preservation** It extracts images, image descriptions, tables, and table titles.
- **Formula Conversion** Recognizes tables and converts them to LaTeX or HTML format.
- **OCR Support** Supports multiple output formats and various visualization results.
- **GPU and CPU Compatibility**: Works on both CPU and GPU environments, compatible with Windows, Linux, and Mac.
You can try MinerU through an online demo, a quick CPU demo, or by using a GPU for faster processing. For detailed usage, refer to the command line options, API integration, and deployment guides provided.
https://github.com/opendatalab/MinerU
MinerU is a tool that converts PDFs into machine-readable formats like markdown or JSON. Here are the key benefits and features MinerU removes headers, footers, and other unnecessary elements to ensure the text is semantically coherent and in human-readable order, even for complex layouts.
- **Structure Preservation** It extracts images, image descriptions, tables, and table titles.
- **Formula Conversion** Recognizes tables and converts them to LaTeX or HTML format.
- **OCR Support** Supports multiple output formats and various visualization results.
- **GPU and CPU Compatibility**: Works on both CPU and GPU environments, compatible with Windows, Linux, and Mac.
You can try MinerU through an online demo, a quick CPU demo, or by using a GPU for faster processing. For detailed usage, refer to the command line options, API integration, and deployment guides provided.
https://github.com/opendatalab/MinerU
GitHub
GitHub - opendatalab/MinerU: Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows. - opendatalab/MinerU
#javascript #deep_learning #javascript #ocr #tesseract #webassembly
Tesseract.js is a JavaScript library that helps you extract text from images in almost any language. It works in both browsers and on servers using Node.js. You can easily install it using a script tag, webpack, or npm. Here’s how it benefits you: it allows you to convert images into text quickly and accurately, supporting multiple languages and formats. This can be very useful for tasks like scanning documents, recognizing text in videos, and more. The library is also efficient, with smaller file sizes and lower memory usage, making it faster to use.
https://github.com/naptha/tesseract.js
Tesseract.js is a JavaScript library that helps you extract text from images in almost any language. It works in both browsers and on servers using Node.js. You can easily install it using a script tag, webpack, or npm. Here’s how it benefits you: it allows you to convert images into text quickly and accurately, supporting multiple languages and formats. This can be very useful for tasks like scanning documents, recognizing text in videos, and more. The library is also efficient, with smaller file sizes and lower memory usage, making it faster to use.
https://github.com/naptha/tesseract.js
GitHub
GitHub - naptha/tesseract.js: Pure Javascript OCR for more than 100 Languages 📖🎉🖥
Pure Javascript OCR for more than 100 Languages 📖🎉🖥 - naptha/tesseract.js
#python #image_processing #ocr #pdf #python #tesseract
OCRmyPDF is a tool that makes scanned PDF files searchable and editable. It adds a text layer to the PDF, so you can search for words or copy and paste text from the document. It supports many languages, fixes misrotated or crooked pages, and optimizes the file size. The tool works on various operating systems like Linux, Windows, and macOS, and it uses multiple CPU cores to speed up the process. This makes it easier to work with scanned documents and keeps your files organized and searchable.
https://github.com/ocrmypdf/OCRmyPDF
OCRmyPDF is a tool that makes scanned PDF files searchable and editable. It adds a text layer to the PDF, so you can search for words or copy and paste text from the document. It supports many languages, fixes misrotated or crooked pages, and optimizes the file size. The tool works on various operating systems like Linux, Windows, and macOS, and it uses multiple CPU cores to speed up the process. This makes it easier to work with scanned documents and keeps your files organized and searchable.
https://github.com/ocrmypdf/OCRmyPDF
GitHub
GitHub - ocrmypdf/OCRmyPDF: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched - ocrmypdf/OCRmyPDF
#kotlin #aes_256 #android #background_removal #clean_architecture #crop #djvu #edit_photo #exif #f_droid #filter_image #image_manipulation #jetpack_compose #jxl #kotlin #material_you #ocr_recognition #pdf #psd #qrcode_scanner #watermark
Image Toolbox is a powerful and versatile image editing tool that lets you do many things with your photos. You can crop, apply over 230 different filters, edit EXIF data, remove backgrounds, and even convert images to PDFs. It also allows you to add stickers and text, extract text from images in over 120 languages, and encrypt files with AES-256 encryption. You can resize images using various scaling algorithms, convert between multiple image formats, and create collages. The app also supports GIF, WEBP, APNG, and JXL conversions, document scanning, QR code scanning and creation, and more. It has a simple interface but offers many advanced features, making it useful for both photographers and developers.
https://github.com/T8RIN/ImageToolbox
Image Toolbox is a powerful and versatile image editing tool that lets you do many things with your photos. You can crop, apply over 230 different filters, edit EXIF data, remove backgrounds, and even convert images to PDFs. It also allows you to add stickers and text, extract text from images in over 120 languages, and encrypt files with AES-256 encryption. You can resize images using various scaling algorithms, convert between multiple image formats, and create collages. The app also supports GIF, WEBP, APNG, and JXL conversions, document scanning, QR code scanning and creation, and more. It has a simple interface but offers many advanced features, making it useful for both photographers and developers.
https://github.com/T8RIN/ImageToolbox
GitHub
GitHub - T8RIN/ImageToolbox: 🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features,…
🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options -...
#typescript #anki #chatgpt #deepseek #electron #evernote #knowledge_base #local_first #markdown #note_taking #notes_app #notion #obsidian #ocr #ollama #openai #pdf #s3 #self_hosted #webdav
SiYuan is a privacy-first personal knowledge management tool. It allows you to organize your thoughts and notes in a secure way, even offline. You can use features like block-level references, Markdown editing, and mathematical formulas. It also supports AI tools and has apps for Android, iOS, and HarmonyOS. SiYuan is open source and free for most features, making it a great choice for managing your personal knowledge securely.
https://github.com/siyuan-note/siyuan
SiYuan is a privacy-first personal knowledge management tool. It allows you to organize your thoughts and notes in a secure way, even offline. You can use features like block-level references, Markdown editing, and mathematical formulas. It also supports AI tools and has apps for Android, iOS, and HarmonyOS. SiYuan is open source and free for most features, making it a great choice for managing your personal knowledge securely.
https://github.com/siyuan-note/siyuan
GitHub
GitHub - siyuan-note/siyuan: A privacy-first, self-hosted, fully open source personal knowledge management software, written in…
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang. - siyuan-note/siyuan
#javascript #linux #macos #ocr #pot #pot_app #recognize #tauri #translate #translation #tts #windows
Pot is a cross-platform translation tool that lets you quickly translate text by selecting it and using a shortcut, typing text to translate, or using OCR to translate text from screenshots. It supports many translation engines like OpenAI, Google, DeepL, and more, plus offline options. You can also add plugins to extend its features and use it on Windows, macOS, and Linux. Pot offers an API for integration with other software and works well even on Wayland systems. This makes translating easier, faster, and more flexible, helping you understand and work with multiple languages efficiently.
https://github.com/pot-app/pot-desktop
Pot is a cross-platform translation tool that lets you quickly translate text by selecting it and using a shortcut, typing text to translate, or using OCR to translate text from screenshots. It supports many translation engines like OpenAI, Google, DeepL, and more, plus offline options. You can also add plugins to extend its features and use it on Windows, macOS, and Linux. Pot offers an API for integration with other software and works well even on Wayland systems. This makes translating easier, faster, and more flexible, helping you understand and work with multiple languages efficiently.
https://github.com/pot-app/pot-desktop
GitHub
GitHub - pot-app/pot-desktop: 🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.
🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition. - pot-app/pot-desktop
#python #document_analysis #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_parser #python #vlm_ocr
Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.
https://github.com/bytedance/Dolphin
Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.
https://github.com/bytedance/Dolphin
GitHub
GitHub - bytedance/Dolphin: The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025. - bytedance/Dolphin