AI Agents & Automation
Browsing page 178 of AI Frameworks & Infra in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
composio
Composio empowers AI agents and Large Language Models (LLMs) by offering access to over 100 integrations, facilitated through function calling. It provides Software Development Kits (SDKs) for both Python and Javascript, enabling developers to significantly extend and enhance the functionalities of their AI agents. The core focus of Composio is to ensure seamless integration and continuous skill evolution for AI agents, allowing them to interact with a wide array of external services and applications.
nerfacc
nerfacc is a PyTorch-based acceleration toolbox specifically designed for Neural Radiance Fields (NeRFs), optimizing both training and inference processes. It emphasizes efficient volumetric sampling using computationally cheap estimators to discover surfaces, making it universal and plug-and-play for most NeRF models. Users can integrate nerfacc with minimal code modifications by defining `sigma_fn` for density computation and `rgb_sigma_fn` for color and density, enabling significant speedups. The toolbox supports various NeRF papers and offers a pure Python interface with flexible APIs. Installation is straightforward via PyPI or source, with pre-built wheels available for major PyTorch and CUDA combinations.
FCOS
FCOS (Fully Convolutional One-Stage Object Detection) is an open-source project that provides an implementation of the FCOS algorithm for object detection. This tool is designed to completely avoid the complex computations and hyper-parameters associated with anchor boxes, offering a simpler and more efficient approach. It achieves better performance than Faster R-CNN, with significantly faster training and inference times. FCOS supports various backbones including ResNet, ResNeXt, and MobileNet, and offers models with state-of-the-art performance, reaching up to 49.0% AP on COCO test-dev. The project includes detailed instructions for installation, testing, and training, making it suitable for researchers and developers working on computer vision applications.
FAST-LIVO2
FAST-LIVO2 is an efficient and accurate open-source LiDAR-inertial-visual fusion localization and mapping system. It is designed for real-time 3D reconstruction and onboard robotic localization, particularly in severely degraded environments. The system integrates data from LiDAR, inertial measurement units, and visual sensors to provide robust odometry. Key features include its direct fusion approach, support for resource-constrained platforms, and an associated dataset for evaluation. The project also provides resources for building a hard-synchronized handheld device, including CAD files and source code, making it a comprehensive solution for developers working on autonomous navigation and robotics.
pointnerf
pointnerf is an open-source implementation of Point-NeRF, a method for modeling radiance fields using neural 3D point clouds with associated neural features. This tool enables efficient rendering by aggregating neural point features near scene surfaces through a ray marching-based pipeline. A key differentiator is its ability to be initialized via direct inference of a pre-trained deep network to produce a neural point cloud, which can then be finetuned for visual quality surpassing NeRF with significantly faster training times. pointnerf also integrates with other 3D reconstruction methods and manages errors and outliers through a novel pruning and growing mechanism, making it suitable for various research applications in computer vision and graphics.
simple-HRNet
simple-HRNet is an unofficial yet fully compatible implementation of the Deep High-Resolution Representation Learning for Human Pose Estimation paper, built with PyTorch. This tool simplifies the process of human pose estimation, offering compatibility with official pre-trained weights and delivering results consistent with the original implementation. It supports both Windows and Linux environments and includes features like multi-GPU inference, options for retrieving YOLO bounding boxes and HRNet heatmaps, and multi-person support with YOLOv3, YOLOv3-tiny, or YOLOv5. The repository also provides a live demo, scripts for training and testing on datasets like COCO, and support for TensorRT, making it a versatile solution for developers and researchers in computer vision.
SplaTAM
SplaTAM is a cutting-edge system designed for Splatting, Tracking, and Mapping 3D Gaussians, enabling dense RGB-D SLAM. This tool, presented at CVPR 2024, is particularly useful for robotics and computer vision applications requiring real-time environmental understanding. Users can capture their own environments using an iPhone or LiDAR-equipped Apple device with the NeRFCapture app, and then process the data either online or offline. SplaTAM supports interactive rendering of reconstructions and allows for the export of splats to .ply files for visualization in external viewers like SuperSplat and PolyCam. It also facilitates 3D Gaussian Splatting on reconstructions and datasets with ground truth poses, making it a versatile tool for researchers and developers in the field.
YOLOv3
YOLOv3 is an open-source Keras implementation of the YOLOv3 object detection algorithm, designed for identifying objects within images and videos. This tool requires specific dependencies including OpenCV 3.4, Python 3.6, TensorFlow-gpu 1.5.0, and Keras 2.1.3. Users can quickly get started by downloading official YOLOv3 weights and converting them to a Keras H5 file using the provided `yad2k.py` script. The tool demonstrates improved classification capabilities over its predecessor, YOLOv2. While it currently supports object detection, future development plans include training the model for broader applications. It is a valuable resource for developers and data scientists working on computer vision tasks.
SoundMind
SoundMind is an innovative project that provides a rule-based reinforcement learning (RL) algorithm specifically designed to endow audio language models (ALMs) with deep bimodal reasoning abilities. It is built upon the Audio Logical Reasoning (ALR) dataset, which comprises 6,446 text-audio annotated samples tailored for complex reasoning tasks. This resource enables the training of ALMs to perform sophisticated logical reasoning across both audio and textual modalities. The repository offers the official implementation, dataset download links, environment setup instructions, and details for RL-training and evaluation, making it a valuable tool for researchers and developers in the field of audio-language processing.
voicetree
Voicetree is an open-source spatial Integrated Development Environment (IDE) specifically built for orchestrating multiple AI agents. It features an interactive graph-view interface, enabling users to work directly within a visual representation of their AI agent ecosystem. Within this environment, nodes can serve various purposes, including representing markdown notes or acting as terminal-based AI agents such as Claude Code and Gemini. A key capability of Voicetree is that agents can spawn sub-agents and access nearby nodes to gather context, facilitating complex AI workflows and interactions.
DiffusionHub
DiffusionHub is a cloud-based platform designed for generating AI-powered images and videos through stable diffusion. It boasts a fast server launch time of just 10 seconds and provides users with 300GB of storage. The platform supports well-known web user interfaces such as Automatic1111, ComfyUI, and Kohya, making it accessible for a wide range of users, regardless of their technical expertise. It aims to offer a reliable and efficient environment for AI content creation.
MassGen
MassGen is an open-source, terminal-based multi-agent scaling system. It is designed to autonomously orchestrate advanced AI models and agents, enabling them to work together effectively. The system facilitates collaboration and reasoning among these AI entities to tackle complex problems and generate high-quality outcomes. By coordinating AI workflows, MassGen aims to enhance problem-solving capabilities through a scalable and integrated approach.
mmf
mmf is a modular framework developed by Facebook AI Research (FAIR) for conducting vision and language multimodal research. It offers reference implementations of state-of-the-art vision and language models, making it a valuable resource for researchers. The framework is built on PyTorch, supports distributed training, and is designed to be un-opinionated, scalable, and fast. mmf can be used to bootstrap new vision and language multimodal research projects and serves as a starter codebase for challenges involving vision and language datasets, such as The Hateful Memes, TextVQA, TextCaps, and VQA challenges. It was formerly known as Pythia.
motpy
motpy is a Python library designed for multi-object tracking using the tracking-by-detection paradigm. It offers a straightforward yet robust baseline for developers to implement object tracking without needing to build the entire algorithmic stack from scratch. Key features include IOU and optional feature similarity matching, Kalman filters for modeling object trackers, and configurable system orders for object position and size. The library is optimized for performance, achieving real-time tracking even on resource-constrained devices like the Raspberry Pi. It supports various use cases, from synthetic 2D tracking to detecting and tracking objects in videos and webcam face tracking, making it a versatile tool for computer vision applications.
stack-chan
stack-chan is an open-source project featuring a JavaScript-driven robot embedded in M5Stack. This super-kawaii robot can display a range of cute faces and expressions, including happy, angry, and sad. Users have the flexibility to customize the robot's face and expressions, as well as add various M5Units for enhanced functionality. The project provides all necessary components, including firmware source codes, stereolithography (STL) files for the case, and schematics with board layout data. It supports driving serial (TTL) and PWM servos and encourages users to develop their own applications. The project is distributed under the Apache version 2.0 license, making it accessible for developers and hobbyists.
safe-control-gym
safe-control-gym offers physics-based CartPole and Quadrotor Gym environments built using PyBullet, featuring symbolic a priori dynamics powered by CasADi. This framework is designed for learning-based control, as well as model-free and model-based reinforcement learning (RL). It includes symbolic safety constraints and implements input, parameter, and dynamics disturbances to rigorously test the robustness and generalizability of various control approaches. The tool provides a unified benchmark suite for safe learning-based control and RL in robotics, supporting a range of implemented controllers like PID, LQR, iLQR, MPC, SAC, and PPO, alongside safety filters such as MPSC and CBF. It also offers performance comparisons against other popular Gym environments.
openbr
OpenBR (Open Source Biometrics) is a comprehensive toolkit designed for developers and researchers working in the field of biometrics, particularly face recognition. Hosted on GitHub, it offers an open-source solution for building and experimenting with biometric systems. The platform provides the necessary tools and functionalities to implement various biometric algorithms, making it a valuable resource for academic research, prototyping, and custom application development. Users can clone the repository, check out specific release tags, and build the software following detailed instructions for their operating system. This open-source nature fosters community contributions and allows for transparent development in biometric identification.
rektor-db
Rektor-db is presented as a conceptual vector database project, currently in its earliest stages of development. The project is explicitly described as "pre-revenue, pre-code, and pre-vision," indicating that it lacks a functional product, a defined business model, and a clear strategic direction. The primary objective stated is to attract investors to fund its future development. As of now, there are no features, pricing, or use cases available, as the project is purely a concept seeking financial backing to move forward. It is hosted on GitHub, suggesting an intention for open-source development once funding is secured.
voc-dpm
voc-dpm is an open-source object detection system, specifically voc-release5, developed by Ross Girshick. It implements object detection based on mixtures of deformable part models (DPMs) and supports both binary latent SVM and weak-label structural SVM (WL-SSVM) for learning. The system includes pretrained models for PASCAL and INRIA Person datasets, along with features like context rescoring and the star-cascade detection algorithm. Implemented primarily in MATLAB with MEX C++ helper functions for efficiency, it requires MATLAB, GCC, and at least 4GB of memory. The GitHub repository serves as a code release, with the author recommending checking their website for the latest, more thoroughly tested tarball.
Yolov7-tracker
Yolov7-tracker is a comprehensive toolbox designed for multi-object tracking, implementing the tracking-by-detection paradigm. It supports a wide range of Yolo detectors, from YOLOX to YOLO v12 by ultralytics, and integrates numerous advanced trackers including SORT, DeepSORT, ByteTrack, BoT-SORT, OCSORT, Strong SORT, and more. The tool is built with a unified code style and modular design, decoupling the detector, tracker, ReID model, and Kalman filter, which simplifies experimentation and integration into custom projects. It also supports TensorRT for optimized inference and offers ReID models for both pedestrian and vehicle re-identification. The toolbox is compatible with datasets like MOT17 and VisDrone2019, providing detailed instructions for data preparation and training.
psmoveapi
Psmoveapi is a versatile, cross-platform library designed for 6DoF (six degrees of freedom) tracking of the PlayStation Move Motion Controller. It integrates advanced sensor fusion and computer vision techniques to provide precise positional and rotational tracking. The library also extends its functionality to include ambient display control through the PS Move's LED orb, enhancing user feedback and immersion. Developers can utilize psmoveapi to gain direct PC access to the PS Move controller, facilitating communication via both Bluetooth and USB connections. This makes it an ideal tool for creating custom applications, games, or research projects that leverage the unique input capabilities of the PS Move controller on various computing platforms.
sdfstudio
sdfstudio is a unified, open-source framework designed for neural implicit surface reconstruction, leveraging the foundation of the Nerfstudio project. It provides a modular architecture that allows for the implementation and exploration of different surface reconstruction methods, such as UniSurf, VolSDF, and NeuS. The framework supports various scene representations and datasets, making it a versatile tool for advanced 3D modeling, research, and development in the field of neural implicit surfaces. Its open-source nature encourages community contributions and provides a flexible platform for experimenting with cutting-edge 3D reconstruction techniques.
VMamba
VMamba is an open-source visual state space model that transplants the Mamba state-space language model into a vision backbone, offering linear time complexity for computer vision tasks. At its core, VMamba utilizes Visual State-Space (VSS) blocks with a 2D Selective Scan (SS2D) module, which efficiently gathers contextual information from 2D vision data by traversing along four scanning routes. This design helps bridge the gap between 1D selective scan and non-sequential 2D data. The tool provides a family of VMamba architectures, accelerated through architectural and implementation enhancements. It demonstrates promising performance across diverse visual perception tasks such as ImageNet-1K classification, COCO object detection, and ADE20K semantic segmentation, showcasing its efficiency in input scaling compared to existing benchmark models. VMamba is designed for researchers and developers in the AI and computer vision fields.
PETR
PETR (Position Embedding Transformation for Multi-View 3D Object Detection) and its successor PETRv2 offer a unified framework for 3D perception from multi-camera images. PETR encodes 3D coordinate position information into image features, creating 3D position-aware features that enable end-to-end object detection. PETRv2 extends this by incorporating temporal modeling to utilize previous frames' information for improved 3D object detection and introduces a feature-guided position encoder for better data adaptability. It also supports high-quality BEV (Bird's Eye View) segmentation through dedicated segmentation queries. This framework achieves state-of-the-art performance in both 3D object detection and BEV segmentation, making it a robust baseline for future research in autonomous driving and robotics.