#python #document_understanding #language_generation #language_understanding #layoutlm #minilm #nlp #pre_trained_model #s2s_ft #small_pre_trained_model #unilm
https://github.com/microsoft/unilm
https://github.com/microsoft/unilm
GitHub
GitHub - microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities - microsoft/unilm
#typescript #document_object_model #dom #dom_api #dom_manipulation #html #javascript
https://github.com/phuoc-ng/html-dom
https://github.com/phuoc-ng/html-dom
GitHub
GitHub - phuocng/html-dom: Common tasks of managing HTML DOM with vanilla JavaScript. Give me 1 ⭐if it’s useful.
Common tasks of managing HTML DOM with vanilla JavaScript. Give me 1 ⭐if it’s useful. - phuocng/html-dom
#python #computer_vision #deep_learning #detectron2 #document_image_analysis #document_image_processing #document_layout_analysis #layout_analysis #layout_parser #object_detection #ocr
https://github.com/Layout-Parser/layout-parser
https://github.com/Layout-Parser/layout-parser
GitHub
GitHub - Layout-Parser/layout-parser: A Unified Toolkit for Deep Learning Based Document Image Analysis
A Unified Toolkit for Deep Learning Based Document Image Analysis - Layout-Parser/layout-parser
#python #data_mining #data_science #document_similarity #fasttext #gensim #information_retrieval #machine_learning #natural_language_processing #neural_network #nlp #topic_modeling #word_embeddings #word_similarity #word2vec
https://github.com/RaRe-Technologies/gensim
https://github.com/RaRe-Technologies/gensim
GitHub
GitHub - piskvorky/gensim: Topic Modelling for Humans
Topic Modelling for Humans. Contribute to piskvorky/gensim development by creating an account on GitHub.
#python #bert #document_embedding #pre_trained_language_models #semantic_search #sentence_encoder #sentence_transformers #text_search #text_semantic_similarity #top2vec #topic_modeling #topic_modelling #topic_search #topic_vector #word_embeddings
https://github.com/ddangelov/Top2Vec
https://github.com/ddangelov/Top2Vec
GitHub
GitHub - ddangelov/Top2Vec: Top2Vec learns jointly embedded topic, document and word vectors.
Top2Vec learns jointly embedded topic, document and word vectors. - ddangelov/Top2Vec
#rust #backend_as_a_service #cloud_database #collaborative #database #database_as_a_service #developer_tools #devtools #distributed #distributed_database #document_database #graph_database #hacktoberfest #iot_database #nosql #realtime_database #serverless #sql #surreal #surrealdb #web
https://github.com/surrealdb/surrealdb
https://github.com/surrealdb/surrealdb
GitHub
GitHub - surrealdb/surrealdb: A scalable, distributed, collaborative, document-graph database, for the realtime web
A scalable, distributed, collaborative, document-graph database, for the realtime web - surrealdb/surrealdb
#html #data_pipelines #deep_learning #document_ai #document_image_analysis #document_image_processing #document_parser #document_parsing #docx #donut #information_retrieval #langchain #machine_learning #ml #natural_language_processing #nlp #ocr #pdf #pdf_to_json #pdf_to_text #preprocessing
https://github.com/Unstructured-IO/unstructured
https://github.com/Unstructured-IO/unstructured
GitHub
GitHub - Unstructured-IO/unstructured: Convert documents to structured data effortlessly. Unstructured is open-source ETL solution…
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website...
#go #database #document #ferretdb #golang #mongo #mongo_db #mongodb #mongodb_database #postgres #postgresql
https://github.com/FerretDB/FerretDB
https://github.com/FerretDB/FerretDB
GitHub
GitHub - FerretDB/FerretDB: A truly Open Source MongoDB alternative
A truly Open Source MongoDB alternative. Contribute to FerretDB/FerretDB development by creating an account on GitHub.