Research & Education
Browsing page 135 of AI tools for Academic Research in Research & Education. Sorted by confidence score — our independent quality rating.
Awesome-Deblurring
Awesome-Deblurring is a comprehensive, curated list of resources dedicated to image and video deblurring. Hosted on GitHub, this open-source repository serves as a central hub for researchers and developers seeking to explore or implement deblurring techniques. It meticulously categorizes resources into various sections, including single-image blind motion deblurring (both non-DL and DL approaches), non-blind deblurring, depth-aware motion deblurring, defocus deblurring, and benchmark datasets. Each entry typically includes the publication year, paper title, and links to associated code or project pages, making it an invaluable tool for navigating the vast landscape of deblurring research and practical applications.
awesome-deep-rl
awesome-deep-rl is a comprehensive, curated list of resources for Deep Reinforcement Learning. This open-source repository serves as a central hub for researchers and practitioners to discover libraries, benchmark results, environments, competitions, and educational materials like books and tutorials. It covers a wide array of topics, from foundational algorithms and historical timelines to advanced frameworks and simulation platforms, making it an invaluable reference for anyone involved in the field of Deep Reinforcement Learning. The resource is continuously updated, reflecting the dynamic nature of AI research.
Awesome-BEV-Perception-Multi-Cameras
Awesome-BEV-Perception-Multi-Cameras is a valuable resource for researchers and engineers focused on multi-camera 3D object detection and segmentation within the Bird's-Eye-View (BEV) paradigm. This curated list compiles significant academic papers, including influential works like DETR3D, BEVDet, BEVFormer, BEVDepth, and UniAD. It categorizes papers by key themes such as Longterm BEV, BEV + Stereo, End to End BEV Perception, BEV + Distillation, Robust BEV, Fast BEV, HD Map Construction, Multi-sensor fusion, Survey, Occupancy Network, and Pre-training. Each entry typically includes a link to the paper and its corresponding GitHub repository, making it easy for users to access the research and associated codebases. This tool is essential for staying updated with the latest advancements in vision-centric autonomous driving perception.
CVPR-2019-Paper-Statistics
CVPR-2019-Paper-Statistics is an open-source project offering detailed statistics and visualizations for papers accepted at the CVPR 2019 conference. Inspired by ICLR2019-OpenReviewData, this tool analyzes the acceptance rate trends from 2015 to 2019, highlighting the significant increase in paper submissions and the corresponding decrease in acceptance rates. It also provides insights into the most frequent keywords in accepted papers, such as 'Image', 'detection', '3d', 'object', 'video', 'segmentation', 'adversarial', 'recognition', and 'visual'. The project includes Jupyter Notebook code for analysis and visualization, supporting both CSV and website data formats, and requires Python 3.5 with libraries like selenium, wordcloud, and matplotlib.
deep-reinforcement-learning-papers
deep-reinforcement-learning-papers is a comprehensive, open-source GitHub repository dedicated to cataloging papers and resources related to deep reinforcement learning. The collection is organized into categories such as Deep Value Function, Deep Policy, Deep Actor-Critic, Deep Model, and Application to Non-RL Tasks, making it easier for users to navigate specific areas of interest. It also includes sections for talks, slides, and other miscellaneous resources. The project is actively maintained with a stated goal to continuously add more papers and improve classification methods, welcoming contributions from the community. This resource is ideal for anyone looking to explore the foundational and cutting-edge research in deep reinforcement learning.
Deep-Reinforcement-Stock-Trading
Deep-Reinforcement-Stock-Trading is a light-weight, open-source framework designed for applying deep reinforcement learning algorithms to stock trading and portfolio management. This project offers a highly modular and scalable environment for researchers and developers to explore advanced AI strategies in finance. It includes features for training and evaluating DDPG and DQN agents, with built-in metrics and visualizations. The framework supports single stock types and basic actions like buy, hold, and sell, with plans to integrate more sophisticated algorithms, complex state representations, and high-quality data sources for backtesting. It's ideal for those looking to experiment with AI in financial markets.
Video-XL
Video-XL is an open-source project offering a family of efficient vision-language models (VLMs) specifically designed for understanding extremely long videos, capable of processing content at an hour scale. The project includes models like Video-XL2 and Video-XL-Pro, which have achieved state-of-the-art results on various long video understanding benchmarks. Video-XL-Pro, for instance, can process up to 10,000 frames on an 80G GPU with only 3 billion parameters. The project provides models, training, and evaluation code, making it a valuable resource for researchers and developers working with extensive video data. It builds upon existing codebases like LongVA and LMMs-Eval for its development and evaluation processes.
Embodied_AI_Paper_List
Embodied_AI_Paper_List is an open-source repository maintained by HCPLab at SYSU and Pengcheng Laboratory, offering a comprehensive collection of papers and resources focused on Embodied AI. This resource is designed to serve as a foundational reference for researchers and practitioners, bridging the gap between cyberspace and the physical world through intelligent systems. The repository covers key areas such as embodied perception, interaction, agent development, and sim-to-real adaptation, including state-of-the-art methods, essential paradigms, and comprehensive datasets. It also explores the role of Multi-modal Large Models (MLMs) and World Models (WMs) in facilitating interactions for embodied agents, highlighting their significance in both digital and physical environments. The list is regularly updated with the latest advancements and includes a survey paper accepted by IEEE/ASME Transactions on Mechatronics.
fpn.pytorch
fpn.pytorch offers a pure PyTorch implementation of the Feature Pyramid Network (FPN) for object detection, building upon the properties of a faster R-CNN implementation. This project stands out for its complete conversion of all NumPy implementations to PyTorch, ensuring a consistent and efficient environment. A key feature is its support for training with batch sizes greater than one, achieved by revising all relevant layers including dataloader, RPN, and ROI-pooling. It also leverages a multiple GPU wrapper (nn.DataParallel) for flexible scaling across one or more GPUs. The implementation integrates three pooling methods—ROI pooling, ROI align, and ROI crop—all adapted for multi-image batch training. Benchmarking has been conducted on datasets like PASCAL VOC and COCO, demonstrating its performance.
hate-speech-and-offensive-language
The hate-speech-and-offensive-language repository is an Open Source project associated with the paper "Automated Hate Speech Detection and the Problem of Offensive Language" from ICWSM 2017. It offers a valuable dataset, lexicons, and Python 2.7 code for researchers and developers interested in analyzing and detecting hate speech and offensive language in online content, particularly from Twitter. The repository also includes a classifier script and instructions for running it on new data. While the project is no longer actively maintained, it serves as a foundational resource for understanding and addressing the complexities of offensive language detection, with a focus on the nuances of racial bias in such datasets.
Image-Adaptive-YOLO
Image-Adaptive-YOLO is an open-source implementation of an object detection model specifically engineered to perform robustly in adverse weather conditions. Based on the research paper "Image-Adaptive YOLO for Object Detection in Adverse Weather Conditions (AAAI 2022)", this tool incorporates image-adaptive filtering techniques to enhance detection accuracy in scenarios like fog, darkness, or other challenging visual environments. The project provides code for installation, dataset preparation (including VOC PASCAL, RTTS, ExDark, and custom foggy/dark datasets), and both training and evaluation scripts. It is built on Python and TensorFlow, making it accessible for researchers and developers working on computer vision tasks in difficult conditions.
openarm
OpenArm is a fully open-source 7DOF humanoid arm specifically engineered for physical AI research and deployment, particularly in contact-rich environments. Its design emphasizes high backdrivability and compliance, making it suitable for safe human-robot interaction while still providing practical payload capabilities for real-world applications. The arm features human-scale proportions and is available as a complete bimanual system for $6,500 USD, offering a flexible platform for teleoperation, imitation learning, simulation, and real-world data collection. OpenArm is under continuous development, actively seeking contributors, research partners, and company collaborators to advance practical humanoid systems.
robomimic
robomimic is a comprehensive, modular framework designed for robot learning from demonstration. It offers a wide array of demonstration datasets specifically collected for robot manipulation domains, alongside robust offline learning algorithms to effectively learn from these datasets. The primary goal of robomimic is to enhance the accessibility and reproducibility of robot learning research, enabling researchers and practitioners to benchmark tasks and algorithms consistently. This framework facilitates the development of the next generation of robot learning algorithms, supporting features like Diffusion Policy, multi-dataset training, language-conditioned policies, and integration with robosuite and DeepMind MuJoCo bindings. It also supports various observation modalities, pre-trained image representations, and logging with wandb.
SimpleVLA-RL
SimpleVLA-RL is an open-source reinforcement learning (RL) framework designed to efficiently scale the training of Vision-Language-Action (VLA) models. It provides an end-to-end RL pipeline built on veRL, incorporating VLA-specific optimizations such as multi-environment parallel rendering for accelerated trajectory sampling. The framework leverages state-of-the-art infrastructure for efficient distributed training, hybrid communication patterns, and optimized memory management. SimpleVLA-RL supports various VLA models like OpenVLA and OpenVLA-OFT, and benchmarks including LIBERO and RoboTwin 1.0/2.0. It emphasizes minimal reward engineering with binary outcome rewards and includes exploration strategies like dynamic sampling and adaptive clipping. The modular architecture allows for easy integration of new VLA models, benchmarks, and RL algorithms, making it a powerful tool for researchers and developers in the field.
semantic-segmentation-editor
Semantic Segmentation Editor is an open-source, web-based labeling tool designed for creating AI training datasets from both 2D bitmap images and 3D point clouds. Developed by Hitachi Automotive And Industry Lab, it is particularly useful for autonomous driving research. The tool supports various image formats like JPG and PNG, and point cloud formats including ASCII, Binary, and Binary compressed. It offers a comprehensive set of tools for polygon drawing, magic tool for contrast detection, manipulation, cutting/expanding, and contiguous polygon creation for bitmap images. For point clouds, it provides functionalities for rotation, zooming, and point selection. The editor is built using Meteor, React, Paper.js, and three.js, and can be run via Docker Compose or from source.
SelfExSR
SelfExSR is a research code implementation for single image super-resolution, based on the paper "Single Image Super-Resolution from Transformed Self-Exemplars" (CVPR 2015). This algorithm stands out by achieving state-of-the-art performance in image super-resolution without requiring any external training dataset, complex feature extraction, or complicated learning algorithms. It operates by learning from transformed self-exemplars within the image itself. The repository provides the MATLAB source code, testing images for various datasets (Set5, Set14, Urban 100, BSD 100, Sun-Hays 80), and precomputed results for comparison with other state-of-the-art methods. While designed as educational code and not optimized for speed, users can adjust iteration numbers for a trade-off between speed and visual quality.
sphereface
SphereFace offers a comprehensive open-source implementation of the SphereFace algorithm, a deep hypersphere embedding method for face recognition. This tool provides a full pipeline covering face detection, alignment, and recognition, making it valuable for researchers and developers in computer vision. It includes detailed instructions for installation and usage, demonstrating how to train models on datasets like CASIA-WebFace and evaluate performance on LFW. The repository also features various network architectures, including SphereFace-20, and highlights its state-of-the-art verification performance in challenges like MegaFace. Additionally, it provides insights into the underlying mathematical concepts and practical considerations for training, such as gradient normalization and convergence difficulties, along with links to third-party re-implementations and related angular margin learning resources.
SSL4MIS
SSL4MIS (Semi Supervised Learning for Medical Image Segmentation) is a comprehensive resource for researchers and developers focusing on medical image analysis. It offers a curated collection of literature reviews and practical code implementations for semi-supervised learning techniques. The repository includes re-implementations of various semi-supervised methods such as Mean Teacher, Entropy Minimization, and FixMatch, adapted for medical image segmentation. Additionally, it supports a range of 2D and 3D backbone networks like UNet, nnUNet, and Swin-UNet. This project aims to establish a benchmark for semi-supervised medical image segmentation, fostering easier evaluation and fair comparison within the medical image computing community. It also covers active learning and source-free domain adaptation for medical image analysis.
synthetic-computer-vision
synthetic-computer-vision is a GitHub repository dedicated to tracking and organizing resources related to the use of synthetic images in computer vision research. It serves as a valuable hub for researchers, offering a curated list of synthetic datasets such as SunCG, Minos, and Synthia, alongside various tools like AirSim, CARLA, and UnrealCV. The repository also includes a collection of relevant academic publications, categorized by year, with links to papers, code, and project pages. Users are encouraged to contribute by adding missing works or updating existing information through pull requests, making it a collaborative and up-to-date resource for the computer vision community.
Video Depth Anything
Video Depth Anything is an AI tool designed to process input videos and generate corresponding depth videos. This application visualizes the depth information present in each frame of a video, making it suitable for various applications such as 3D video reconstruction, visual effects creation, and AI research. Users can easily upload their video files and customize settings like resolution and frames per second (FPS) to achieve desired outputs. The tool is hosted on Hugging Face Spaces, indicating its accessibility and potential for community-driven development and use. Its primary function is to provide a clear, frame-by-frame depth map, offering a foundational component for advanced video analysis and manipulation.
TheWell
TheWell is a data visualization tool hosted on Hugging Face Spaces, designed for exploring and visualizing physics simulation datasets. Users can select a dataset, a specific field within that dataset, and a file to view the corresponding data. A key feature is the ability to adjust time steps, which is particularly useful for analyzing dynamic fields within the simulations. This tool is ideal for researchers, students, and data scientists working with physics simulation data, offering an intuitive interface for data exploration and analysis directly within the Hugging Face ecosystem. It simplifies the process of interacting with complex scientific datasets.
Zero Shot Text Classification
Zero Shot Text Classification is an AI tool hosted on Hugging Face Spaces by datasciencedojo, designed for classifying text into predefined categories without requiring specific training data for those categories. Users can easily input a piece of text and provide a list of candidate labels or categories. The tool then processes the input and returns a score for each category, indicating how well the text fits into that particular classification. This makes it a highly flexible and efficient solution for quick text categorization tasks, eliminating the need for extensive dataset preparation and model training.
Budgerigar Gender Determination
Budgerigar Gender Determination is an AI tool hosted on Hugging Face designed to automatically identify the gender of budgerigars. Users can upload photos or videos of their birds, and the application will analyze the cere color to determine gender. The tool then draws labeled boxes around each detected bird, indicating its gender. It offers adjustable confidence and detection settings, allowing users to fine-tune the analysis. This free tool provides a quick and easy method for budgerigar owners, bird enthusiasts, and researchers to determine the gender of their birds without manual inspection.
Awesome-Self-Supervised-Papers
Awesome-Self-Supervised-Papers is a comprehensive, open-source repository on GitHub dedicated to collecting and organizing research papers in the fields of self-supervised learning and representation learning. It serves as a valuable resource for researchers and practitioners, offering a curated list of academic publications. The repository is regularly updated with new papers, including those focusing on self-supervised learning with distillation and dense prediction. It categorizes papers by areas such as Computer Vision (CV) pretraining, contrastive learning, image transformation, self-supervised learning with knowledge distillation, and various other methods, providing details like conference/journal, ImageNet accuracy, and other performance metrics where applicable. Contributions to the paper bank are welcomed.