#python #document_understanding #language_generation #language_understanding #layoutlm #minilm #nlp #pre_trained_model #s2s_ft #small_pre_trained_model #unilm
https://github.com/microsoft/unilm
https://github.com/microsoft/unilm
GitHub
GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities - microsoft/unilm
#typescript #document_object_model #dom #dom_api #dom_manipulation #html #javascript
https://github.com/phuoc-ng/html-dom
https://github.com/phuoc-ng/html-dom
GitHub
GitHub - phuocng/html-dom: Common tasks of managing HTML DOM with vanilla JavaScript. Give me 1 ⭐if it’s useful.
Common tasks of managing HTML DOM with vanilla JavaScript. Give me 1 ⭐if it’s useful. - phuocng/html-dom
#python #computer_vision #deep_learning #detectron2 #document_image_analysis #document_image_processing #document_layout_analysis #layout_analysis #layout_parser #object_detection #ocr
https://github.com/Layout-Parser/layout-parser
https://github.com/Layout-Parser/layout-parser
GitHub
GitHub - Layout-Parser/layout-parser: A Unified Toolkit for Deep Learning Based Document Image Analysis
A Unified Toolkit for Deep Learning Based Document Image Analysis - Layout-Parser/layout-parser
#python #data_mining #data_science #document_similarity #fasttext #gensim #information_retrieval #machine_learning #natural_language_processing #neural_network #nlp #topic_modeling #word_embeddings #word_similarity #word2vec
https://github.com/RaRe-Technologies/gensim
https://github.com/RaRe-Technologies/gensim
GitHub
GitHub - piskvorky/gensim: Topic Modelling for Humans
Topic Modelling for Humans. Contribute to piskvorky/gensim development by creating an account on GitHub.
#python #bert #document_embedding #pre_trained_language_models #semantic_search #sentence_encoder #sentence_transformers #text_search #text_semantic_similarity #top2vec #topic_modeling #topic_modelling #topic_search #topic_vector #word_embeddings
https://github.com/ddangelov/Top2Vec
https://github.com/ddangelov/Top2Vec
GitHub
GitHub - ddangelov/Top2Vec: Top2Vec learns jointly embedded topic, document and word vectors.
Top2Vec learns jointly embedded topic, document and word vectors. - ddangelov/Top2Vec
#rust #backend_as_a_service #cloud_database #collaborative #database #database_as_a_service #developer_tools #devtools #distributed #distributed_database #document_database #graph_database #hacktoberfest #iot_database #nosql #realtime_database #serverless #sql #surreal #surrealdb #web
https://github.com/surrealdb/surrealdb
https://github.com/surrealdb/surrealdb
GitHub
GitHub - surrealdb/surrealdb: A scalable, distributed, collaborative, document-graph database, for the realtime web
A scalable, distributed, collaborative, document-graph database, for the realtime web - surrealdb/surrealdb
#html #data_pipelines #deep_learning #document_ai #document_image_analysis #document_image_processing #document_parser #document_parsing #docx #donut #information_retrieval #langchain #machine_learning #ml #natural_language_processing #nlp #ocr #pdf #pdf_to_json #pdf_to_text #preprocessing
https://github.com/Unstructured-IO/unstructured
https://github.com/Unstructured-IO/unstructured
GitHub
GitHub - Unstructured-IO/unstructured: Convert documents to structured data effortlessly. Unstructured is open-source ETL solution…
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website...
#go #database #document #ferretdb #golang #mongo #mongo_db #mongodb #mongodb_database #postgres #postgresql
https://github.com/FerretDB/FerretDB
https://github.com/FerretDB/FerretDB
GitHub
GitHub - FerretDB/FerretDB: A truly Open Source MongoDB alternative
A truly Open Source MongoDB alternative. Contribute to FerretDB/FerretDB development by creating an account on GitHub.
#python #document_ai #document_image_analysis #document_layout_analysis #document_parser #document_understanding #layoutlm #nlp #ocr #publaynet #pubtabnet #pytorch #table_detection #table_recognition #tensorflow
https://github.com/deepdoctection/deepdoctection
https://github.com/deepdoctection/deepdoctection
GitHub
GitHub - deepdoctection/deepdoctection: A Repo For Document AI
A Repo For Document AI. Contribute to deepdoctection/deepdoctection development by creating an account on GitHub.
#cplusplus #artificial_intelligence #computer_vision #document #document_analysis #document_intelligence #document_recognition #document_understanding #documentai #end_to_end_ocr #multimodal #multimodal_deep_learning #ocr #scene_text_detection #scene_text_detection_recognition #scene_text_recognition #text_detection #text_recognition #vision_language #vision_language_model #vision_language_transformer
https://github.com/AlibabaResearch/AdvancedLiterateMachinery
https://github.com/AlibabaResearch/AdvancedLiterateMachinery
GitHub
GitHub - AlibabaResearch/AdvancedLiterateMachinery: A collection of original, innovative ideas and algorithms towards Advanced…
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group. ...
👍1
#python #beit #beit_3 #bitnet #deepnet #document_ai #foundation_models #kosmos #kosmos_1 #layoutlm #layoutxlm #llm #minilm #mllm #multimodal #nlp #pre_trained_model #textdiffuser #trocr #unilm #xlm_e
Microsoft is developing advanced AI models through large-scale self-supervised pre-training across various tasks, languages, and modalities. These models, such as Foundation Transformers (Magneto) and Kosmos-2.5, are designed to be highly generalizable and capable of handling multiple tasks like language understanding, vision, speech, and multimodal interactions. The benefit to users includes state-of-the-art performance in document AI, speech recognition, machine translation, and more, making these models highly versatile and efficient for a wide range of applications. Additionally, tools like TorchScale and Aggressive Decoding enhance stability, efficiency, and speed in model training and deployment.
https://github.com/microsoft/unilm
Microsoft is developing advanced AI models through large-scale self-supervised pre-training across various tasks, languages, and modalities. These models, such as Foundation Transformers (Magneto) and Kosmos-2.5, are designed to be highly generalizable and capable of handling multiple tasks like language understanding, vision, speech, and multimodal interactions. The benefit to users includes state-of-the-art performance in document AI, speech recognition, machine translation, and more, making these models highly versatile and efficient for a wide range of applications. Additionally, tools like TorchScale and Aggressive Decoding enhance stability, efficiency, and speed in model training and deployment.
https://github.com/microsoft/unilm
GitHub
GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities - microsoft/unilm
#python #agent #agents #ai_search #chatbot #chatgpt #data_pipelines #deep_learning #document_parser #document_understanding #genai #graph #graphrag #llm #nlp #pdf_to_text #preprocessing #rag #retrieval_augmented_generation #table_structure_recognition #text2sql
RAGFlow is an open-source tool that helps businesses answer questions accurately using large language models and deep document understanding. It extracts information from various complex data formats, such as Word documents, Excel files, and web pages, and provides grounded citations to support its answers. You can try a demo online or set it up on your own server using Docker. The setup is relatively straightforward, requiring a few steps like cloning the repository, building the Docker image, and configuring the system settings. RAGFlow offers key features like template-based chunking, reduced hallucinations, and compatibility with multiple data sources, making it a powerful tool for truthful question-answering capabilities. This benefits users by providing reliable and explainable answers, streamlining their workflow, and supporting integration with their business systems.
https://github.com/infiniflow/ragflow
RAGFlow is an open-source tool that helps businesses answer questions accurately using large language models and deep document understanding. It extracts information from various complex data formats, such as Word documents, Excel files, and web pages, and provides grounded citations to support its answers. You can try a demo online or set it up on your own server using Docker. The setup is relatively straightforward, requiring a few steps like cloning the repository, building the Docker image, and configuring the system settings. RAGFlow offers key features like template-based chunking, reduced hallucinations, and compatibility with multiple data sources, making it a powerful tool for truthful question-answering capabilities. This benefits users by providing reliable and explainable answers, streamlining their workflow, and supporting integration with their business systems.
https://github.com/infiniflow/ragflow
GitHub
GitHub - infiniflow/ragflow: RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge…
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs - infiniflow/ragflow