#python #cross_modal_retrieval #image_captioning #pretraining #tden #video_captioning #vision_and_language #visual_question_answering
https://github.com/YehLi/xmodaler
https://github.com/YehLi/xmodaler
GitHub
GitHub - YehLi/xmodaler: X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning…
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsens...
#python #deep_learning #deep_learning_library #image_captioning #multimodal_datasets #multimodal_deep_learning #salesforce #vision_and_language #vision_framework #vision_language_pretraining #vision_language_transformer #visual_question_anwsering
https://github.com/salesforce/LAVIS
https://github.com/salesforce/LAVIS
GitHub
GitHub - salesforce/LAVIS: LAVIS - A One-stop Library for Language-Vision Intelligence
LAVIS - A One-stop Library for Language-Vision Intelligence - salesforce/LAVIS