Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)
-
Updated
Jun 6, 2024 - Jupyter Notebook
Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)
A lightweight and extensible toolbox for image classification
A simple and private project to implement the ideas behind the paper "VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localisation" by Mishra, Vera et al.
Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
Omni Geoguessr AI: A Vision Transformer AI integrated with Geoguessr for automated geographic location prediction and gameplay using streetview panoramas.
A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP
This is a series of computer vision foundational projects for anyone diving into the field must tackle.
Video Foundation Models & Data for Multimodal Understanding
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Code for Video Deepfake Detector from "MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection", paper available on IEEE Transactions on Information Forensics and Security.
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Scripts and trained models from our paper: M. Ntrougkas, N. Gkalelis, V. Mezaris, "T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers", IEEE Access, 2024. DOI:10.1109/ACCESS.2024.3405788.
[ICLR 2024 Oral] Less is More: Fewer Interpretable Region via Submodular Subset Selection
pix2tex: Using a ViT to convert images of equations into LaTeX code.
My reimplementations of some of the transformer-based models. LLM and LVMs.
EfficientViT is a new family of vision models for efficient high-resolution vision.
EVA Series: Visual Representation Fantasies from BAAI
OpenMMLab Detection Toolbox and Benchmark
A curated list of foundation models for vision and language tasks
Add a description, image, and links to the vision-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-transformer topic, visit your repo's landing page and select "manage topics."