ShypdShypd.ai
📚

Research & Education

Browsing page 153 of AI tools for Academic Research in Research & Education. Sorted by confidence score — our independent quality rating.

DPIR

DPIR

54%

DPIR (Deep Plug-and-Play Image Restoration) is an open-source project implemented in PyTorch, focusing on advanced image restoration techniques. It leverages a deep denoiser prior within a model-based framework to address various inverse problems in image processing. The tool excels in tasks such as deblurring, super-resolution, denoising, and demosaicing, offering performance that often surpasses state-of-the-art model-based methods and competes with learning-based approaches. DPIR is particularly notable for its DRUNet denoiser, which demonstrates robust performance even on extremely high, unseen noise levels, making it a powerful solution for challenging image restoration scenarios.

mvpose

mvpose

54%

mvpose is an open-source project providing code for fast and robust multi-person 3D pose estimation from multiple views. Developed by zju3dv, it is based on research published in CVPR 2019 and T-PAMI 2021. The tool includes functionalities for setting up a Python environment, compiling necessary backend libraries, and preparing models and datasets for use. It supports datasets like Shelf and CampusSeq1, with detailed instructions for generating camera parameters. Users can run demos and evaluate performance on these datasets, with options to accelerate evaluation by saving predicted 2D poses and heatmaps. The project leverages components from Light head rcnn, Cascaded Pyramid Network, and CamStyle, making it a valuable resource for advanced computer vision research.

NTIRE2017

NTIRE2017

54%

NTIRE2017 is an open-source project offering a Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution." Developed by Team SNU_CVLab, it was recognized with the Best Paper Award at the CVPR 2017 Workshop (2nd NTIRE). The repository includes detailed model architectures (EDSR, MDSR), NTIRE2017 Super-resolution Challenge results, and demo and training code. Users can access trained models, information on datasets like DIV2K and Flickr2K, and super-resolution examples. The code is based on Facebook's Torch implementation of ResNet and also provides a PyTorch version for some models. It's designed for researchers and developers working on image restoration and enhancement, particularly in the field of single image super-resolution.

f2-nerf

f2-nerf

54%

f2-nerf is an open-source project designed for fast neural radiance field (NeRF) training, specifically optimized for scenarios involving free camera trajectories. Built primarily on LibTorch, this tool provides a robust framework for efficient 3D scene reconstruction and novel view synthesis. Users can train F2-NeRF on custom data, including images processed with COLMAP or hloc, and generate camera poses. It also includes scripts for rendering test images and creating render paths by interpolating input camera poses. The project leverages several powerful libraries such as tiny-cuda-nn for fast MLP training, happly for PLY I/O, and eigen for linear algebra, making it a comprehensive solution for advanced NeRF applications.

FSGS

FSGS

54%

FSGS, short for "Real-Time Few-Shot View Synthesis using Gaussian Splatting," is an advanced AI tool presented at ECCV 2024. It specializes in generating new views of a scene from a minimal number of input images, leveraging Gaussian Splatting technology for real-time performance. The tool provides comprehensive environmental setups, including Conda package management and CUDA 11.7 support, ensuring a robust development environment. Users can prepare data by reconstructing sparse view inputs using SfM and dense stereo matching with COLMAP, supporting datasets like LLFF and MipNeRF-360. FSGS offers clear instructions for training models with varying view counts, rendering images, and evaluating model performance, making it a valuable resource for researchers and developers in computer vision and graphics.

pointnerf

pointnerf

54%

pointnerf is an open-source implementation of Point-NeRF, a method for modeling radiance fields using neural 3D point clouds with associated neural features. This tool enables efficient rendering by aggregating neural point features near scene surfaces through a ray marching-based pipeline. A key differentiator is its ability to be initialized via direct inference of a pre-trained deep network to produce a neural point cloud, which can then be finetuned for visual quality surpassing NeRF with significantly faster training times. pointnerf also integrates with other 3D reconstruction methods and manages errors and outliers through a novel pruning and growing mechanism, making it suitable for various research applications in computer vision and graphics.

simple-HRNet

simple-HRNet

54%

simple-HRNet is an unofficial yet fully compatible implementation of the Deep High-Resolution Representation Learning for Human Pose Estimation paper, built with PyTorch. This tool simplifies the process of human pose estimation, offering compatibility with official pre-trained weights and delivering results consistent with the original implementation. It supports both Windows and Linux environments and includes features like multi-GPU inference, options for retrieving YOLO bounding boxes and HRNet heatmaps, and multi-person support with YOLOv3, YOLOv3-tiny, or YOLOv5. The repository also provides a live demo, scripts for training and testing on datasets like COCO, and support for TensorRT, making it a versatile solution for developers and researchers in computer vision.

instant-ngp

instant-ngp

54%

instant-ngp is an open-source implementation of four neural graphics primitives: neural radiance fields (NeRF), signed distance functions (SDFs), neural images, and neural volumes. It allows users to train and render MLPs with multiresolution hash input encoding using the tiny-cuda-nn framework. The tool features an interactive GUI with comprehensive controls, including a VR mode, snapshot saving/loading, a camera path editor for videos, NeRF->Mesh and SDF->Mesh conversion, and camera pose/lens optimization. It supports various NeRF-compatible datasets and provides options for both Windows and Linux users, with Python bindings available for automated experiments.

SplaTAM

SplaTAM

54%

SplaTAM is a cutting-edge system designed for Splatting, Tracking, and Mapping 3D Gaussians, enabling dense RGB-D SLAM. This tool, presented at CVPR 2024, is particularly useful for robotics and computer vision applications requiring real-time environmental understanding. Users can capture their own environments using an iPhone or LiDAR-equipped Apple device with the NeRFCapture app, and then process the data either online or offline. SplaTAM supports interactive rendering of reconstructions and allows for the export of splats to .ply files for visualization in external viewers like SuperSplat and PolyCam. It also facilitates 3D Gaussian Splatting on reconstructions and datasets with ground truth poses, making it a versatile tool for researchers and developers in the field.

PointRCNN

PointRCNN

54%

PointRCNN is an open-source 3D object detector that directly generates accurate 3D box proposals from raw point cloud data in a bottom-up manner. It then refines these proposals using a bin-based 3D box regression loss. This tool was the first two-stage 3D object detector to use only raw point cloud as input, achieving state-of-the-art performance on the KITTI dataset at the time of its submission. PointRCNN supports features like multiple GPUs for training, GPU version rotated NMS, and faster PointNet++ inference and training. It is implemented in Python with PyTorch 1.0 and TensorboardX, making it suitable for researchers and developers in autonomous systems and computer vision.

animatable_nerf

animatable_nerf

54%

Animatable_nerf is an open-source research tool that provides the implementation for "Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos," a paper accepted to TPAMI 2024 and ICCV 2021. This tool allows researchers to generate realistic avatars from video footage by leveraging animatable neural fields. It supports various configurations, including vanilla Animatable NeRF, versions with neural blend weight fields replaced by displacement fields, and versions where the canonical NeRF model is replaced with a neural surface field (SDF output). The repository includes evaluation frameworks for reconstruction quality comparison and provides access to datasets like Mobile-Stage and SyntheticHuman++ for further research and development in neural rendering and 3D human body modeling.

SoundMind

SoundMind

54%

SoundMind is an innovative project that provides a rule-based reinforcement learning (RL) algorithm specifically designed to endow audio language models (ALMs) with deep bimodal reasoning abilities. It is built upon the Audio Logical Reasoning (ALR) dataset, which comprises 6,446 text-audio annotated samples tailored for complex reasoning tasks. This resource enables the training of ALMs to perform sophisticated logical reasoning across both audio and textual modalities. The repository offers the official implementation, dataset download links, environment setup instructions, and details for RL-training and evaluation, making it a valuable tool for researchers and developers in the field of audio-language processing.

visual_anagrams

visual_anagrams

54%

visual_anagrams is an open-source tool specifically designed for generating multi-view optical illusions. It leverages advanced diffusion models to create these unique visual effects. The tool offers readily available code, making it accessible for hands-on experimentation. It also includes Colab notebooks, catering to both free and Pro tier users, to facilitate the creation of visual anagrams and exploration of factorized diffusion techniques. This makes it a valuable resource for those interested in the intersection of AI and visual art.

Conference-Accepted-Paper-List

Conference-Accepted-Paper-List

54%

Conference-Accepted-Paper-List is a GitHub repository designed to centralize information on accepted papers from a variety of conferences within the fields of artificial intelligence, machine learning, and robotics. The repository provides convenient, quick links directly to conference submission notifications and the lists of accepted papers. While serving as a useful starting point for researchers and academics, it also recommends leveraging more comprehensive academic search engines like dblp and Aminer for in-depth research and broader searches.

MVSGaussian

MVSGaussian

54%

MVSGaussian is an open-source project designed for efficient 3D reconstruction using Gaussian Splatting from multi-view stereo (MVS) data. This tool can reconstruct unseen scenes from sparse views in a single forward pass, providing high-quality initialization for rapid training and real-time rendering. It leverages MVS to encode geometry-aware Gaussian representations and decodes them into Gaussian parameters. MVSGaussian also features a hybrid Gaussian rendering approach for novel view synthesis and a multi-view geometric consistent aggregation strategy to effectively initialize per-scene optimization. Compared to NeRF-based methods, MVSGaussian achieves superior view synthesis quality with reduced training computational costs and real-time rendering speeds, making it valuable for computer vision research and 3D modeling applications.

mmf

mmf

54%

mmf is a modular framework developed by Facebook AI Research (FAIR) for conducting vision and language multimodal research. It offers reference implementations of state-of-the-art vision and language models, making it a valuable resource for researchers. The framework is built on PyTorch, supports distributed training, and is designed to be un-opinionated, scalable, and fast. mmf can be used to bootstrap new vision and language multimodal research projects and serves as a starter codebase for challenges involving vision and language datasets, such as The Hateful Memes, TextVQA, TextCaps, and VQA challenges. It was formerly known as Pythia.

BioMedIA

BioMedIA

54%

BioMedIA is an AI tool hosted on Hugging Face Spaces, designed to facilitate the exploration of AI applications within the biomedical field. While the live website indicates a build error, its intended purpose is to serve as a platform for understanding how AI can be applied in biomedical research and educational contexts. The tool is available for free, making it accessible for a wide range of users interested in the intersection of AI and biomedicine. It is suitable for researchers, students, and healthcare professionals who wish to delve into the capabilities and potential of AI in this specialized domain.

SC-GS

SC-GS

54%

SC-GS provides code for Sparse-Controlled Gaussian Splatting, designed for editable dynamic scenes. This open-source tool allows users to effortlessly edit and customize their digital assets through interactive features. It represents motion using sparse control points, which drive 3D Gaussians for high-fidelity rendering. The approach supports both dynamic view synthesis and motion editing, making it versatile for various applications. Recent updates include support for editing static Gaussians from .ply files, improved handling of real-world static objects, and video rendering with interpolation of editing results. It offers two ARAP deformation strategies for motion editing: iterative deformation and deformation from Laplacian initialization, giving users flexibility in achieving desired effects.

USRNet

USRNet

54%

USRNet is a deep unfolding network for image super-resolution, implementing a model described in a CVPR 2020 paper. This PyTorch-based tool provides code and models for training and testing image super-resolution algorithms. It leverages both learning-based and model-based methods, offering the flexibility of model-based approaches to super-resolve blurry and noisy images across different scale factors, blur kernels, and noise levels using a single unified model. Key features include a data module for clearer HR estimation, a prior module for cleaner HR estimation, and a hyper-parameter module to control outputs. It supports various degradation models, including bicubic degradation and deblurring, and demonstrates strong generalizability to different kernel sizes.

whatlanguage

whatlanguage

54%

whatlanguage is a Ruby library designed for efficient text language detection. It leverages bloom filters to achieve high speed and memory efficiency, making it suitable for processing larger text blocks like blog posts or comments. The library supports a wide array of languages including Dutch, English, Farsi, French, German, Italian, Pinyin, Swedish, Portuguese, Russian, Arabic, Finnish, Greek, Hebrew, Hungarian, Korean, Norwegian, Polish, and Spanish. While effective for longer texts, it is noted to perform poorly on very short or Twitter-esque content. The project, initially built in 2007, has received minor updates to ensure compatibility with modern Ruby implementations, though the core algorithms remain largely unchanged.

WildGS-SLAM

WildGS-SLAM

54%

WildGS-SLAM is an open-source research tool designed for monocular Gaussian Splatting SLAM in dynamic environments. Developed for Computer Vision and Pattern Recognition (CVPR) 2025, it excels at accurately tracking camera trajectories and reconstructing 3D Gaussian maps for static elements from monocular video sequences, even when captured in the wild with dynamic distractors. The tool effectively removes all dynamic components to provide a clear static reconstruction. It supports various datasets including Wild-SLAM Mocap, Wild-SLAM iPhone, Bonn Dynamic, and TUM RGB-D, and also allows users to integrate their own custom datasets. WildGS-SLAM provides functionalities for camera pose evaluation and novel view synthesis, making it a valuable resource for researchers in the field.

openbr

openbr

54%

OpenBR (Open Source Biometrics) is a comprehensive toolkit designed for developers and researchers working in the field of biometrics, particularly face recognition. Hosted on GitHub, it offers an open-source solution for building and experimenting with biometric systems. The platform provides the necessary tools and functionalities to implement various biometric algorithms, making it a valuable resource for academic research, prototyping, and custom application development. Users can clone the repository, check out specific release tags, and build the software following detailed instructions for their operating system. This open-source nature fosters community contributions and allows for transparent development in biometric identification.

nerfmm

nerfmm

54%

nerfmm is an open-source implementation of Neural Radiance Fields (NeRF) designed to reconstruct 3D scenes and render novel views even when camera parameters are unknown. This tool jointly estimates camera poses, focal lengths, and the NeRF model, offering a robust solution for 3D reconstruction. It supports various datasets, including the LLFF dataset and a custom Blender Forward Facing (BLEFF) dataset, which is specifically designed for evaluating camera parameter estimation accuracy and image rendering quality under varying pose perturbations. nerfmm provides scripts for training from scratch, refining pre-trained models, and evaluating image rendering quality, novel view synthesis, and 3D pose visualization. It is particularly useful for researchers and developers in computer vision working on advanced 3D reconstruction and neural rendering techniques.

voc-dpm

voc-dpm

54%

voc-dpm is an open-source object detection system, specifically voc-release5, developed by Ross Girshick. It implements object detection based on mixtures of deformable part models (DPMs) and supports both binary latent SVM and weak-label structural SVM (WL-SSVM) for learning. The system includes pretrained models for PASCAL and INRIA Person datasets, along with features like context rescoring and the star-cascade detection algorithm. Implemented primarily in MATLAB with MEX C++ helper functions for efficiency, it requires MATLAB, GCC, and at least 4GB of memory. The GitHub repository serves as a code release, with the author recommending checking their website for the latest, more thoroughly tested tarball.