AI Agents & Automation
Browsing page 171 of AI Frameworks & Infra in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
YoloSharp
YoloSharp offers a high-performance, real-time object detection solution built on YOLO11 and powered by ONNX-Runtime. It supports a comprehensive range of YOLO vision tasks, including detection, oriented bounding box (OBB), pose estimation, segmentation, and classification. The tool leverages various .NET features to maximize performance and optimize memory usage by reusing memory blocks and reducing garbage collection pressure. YoloSharp provides NuGet packages for both CPU-based and GPU-based inference, along with a core library for lightweight production. It also includes plotting options to visualize model results directly on target images, making it a robust solution for developers working with real-time object detection.
MCP Showcase
MCP Showcase provides a platform for auto-generating live, interactive MCP playgrounds for your MCP server, enabling developers and decision-makers to explore, chat with, and integrate APIs quickly. It aims to accelerate developer onboarding by offering real-time feedback and interactive documentation, making it easier to understand MCP APIs than with static documents. The tool also helps bridge the buyer-developer gap by allowing non-technical stakeholders to "see it work," thereby shrinking the sales funnel. Product teams can gain real-time insights into how prospects use the playground, facilitating faster feature refinement and quality improvements. Key features include a launch-ready MCP sandbox with mocked data, SSE and streamable HTTP support, and automatic MCP introspection. It also offers interactive documentation and an MCP chat connected to the tools, along with sample chat history for better understanding.
AstaBench Leaderboard
AstaBench Leaderboard offers a comprehensive platform for viewing and comparing benchmark leaderboards across diverse AI categories. Users can explore performance metrics for models in areas such as literature understanding, code execution, data analysis, and discovery. The tool is hosted on Hugging Face Spaces by AllenAI, providing a centralized location to track and evaluate the advancements in AI model capabilities. It serves as a valuable resource for researchers and developers to assess the effectiveness of different AI systems without requiring any input, simply by browsing the available leaderboards.
comfyui-deploy-gradio
comfyui-deploy-gradio offers a user-friendly Gradio interface designed to streamline interactions with ComfyDeploy. This application empowers users to dynamically generate UI components based on predefined deployment input definitions, simplifying the process of creating and managing interfaces. Through this intuitive platform, users can efficiently submit various jobs to ComfyDeploy, making it an accessible tool for those looking to leverage ComfyDeploy's capabilities without deep technical expertise in UI development. It acts as a bridge, translating complex deployment inputs into interactive and functional user interfaces.
pytorch-maml-rl
pytorch-maml-rl is an open-source implementation of Model-Agnostic Meta-Learning (MAML) specifically tailored for reinforcement learning problems, built using the PyTorch framework. This repository offers a comprehensive toolkit for researchers and developers to explore and apply meta-learning techniques to various RL scenarios. It includes support for diverse environments such as multi-armed bandits, tabular Markov Decision Processes (MDPs), continuous control tasks using MuJoCo, and 2D navigation. The project provides scripts for both training and testing meta-learned policies, making it a valuable resource for experimenting with fast adaptation of deep networks in RL.
DeepResearch Bench
DeepResearch Bench is a comprehensive platform designed for evaluating deep research agents, offering a dynamic leaderboard to track and compare their performance. Users can easily search for specific AI models or filter them by various categories to analyze their scores and effectiveness. A key feature is the ability to conduct side-by-side comparisons of two chosen models, allowing for detailed analysis of their results. This tool is particularly valuable for AI researchers and data scientists who need to assess and understand the capabilities of different deep research agents in a structured and comparative manner, aiding in model selection and performance optimization.
Gemini Live API - p5js
Gemini Live API - p5js is a web-based tool hosted on Hugging Face that enables users to engage in creative coding for visual art. Users can input JavaScript code to define the appearance and behavior of their art, and the application dynamically generates the visual output. This platform serves as a console for utilizing the Multimodal Live API over a websocket, offering modules for streaming audio playback and recording user media. It provides a hands-on environment for developers and artists to experiment with real-time visual programming and interactive media creation.
Gemini Live API Console
The Gemini Live API Console is a web-based tool designed for interacting with the Multimodal Live API. It enables users to generate detailed responses by combining both text and image inputs. This console is particularly useful for developers and researchers who need to test and experiment with multimodal AI capabilities, providing a direct interface to the Gemini API. The application is hosted on Hugging Face Spaces and is available for free under the Apache-2.0 license, making it an accessible resource for exploring advanced AI functionalities. It's a practical solution for those looking to integrate or understand multimodal AI interactions without extensive setup.
Playbook
Playbook offers a secure, production-ready layer built on top of ComfyUI, specifically designed for AI-native studios. It enables these studios to standardize, scale, and protect their generative media pipelines, ensuring consistency and efficiency. The platform allows users to access ComfyUI from any browser, facilitating work from anywhere on any device. Key features include LoRA training and data management, multimodal controls, and tools tailored for media pipelines, helping studios ship mission-critical media projects in days rather than months. Playbook aims to extend creative agency by providing robust control and creativity within generative media workflows.
StreamPETR
StreamPETR is an official implementation of a research paper accepted by ICCV 2023, focusing on exploring object-centric temporal modeling for efficient multi-view 3D object detection. This open-source tool provides a robust framework for researchers and developers working in the field of computer vision and autonomous driving. Key features include support for StreamPETR, PETR, and Focal-PETR codebases, flash attention, deformable attention (RepDETR3D), and checkpoints. It also offers functionalities like sliding window training, efficient training in streaming video, TensorRT inference, and 3D object tracking. The repository provides detailed documentation for environment setup, data preparation, and training/inference procedures, along with model zoo results on NuScenes validation and test sets.
Intrascope
Intrascope offers a secure and collaborative AI workspace designed for teams, centralizing the management of AI models, API keys, and project manifests. It allows multiple users to interact with advanced AI models like OpenAI, DeepSeek, Gemini, Anthropic, and xAI within a shared environment. Each team member has their own login and chat history, while working within a unified team context. The platform features structured projects, contextual prompts called manifests, user-level control, project-based histories, and token usage monitoring. Administrators can invite and manage users, create manifests, monitor token usage, and control API providers, ensuring full visibility and cost control over team AI usage.
UniAD
UniAD is a unified autonomous driving algorithm framework developed by OpenDriveLab, distinguished by its planning-oriented philosophy. Unlike traditional modular designs, UniAD hierarchically integrates perception, prediction, and planning tasks into a single framework. This approach has enabled UniAD to achieve state-of-the-art performance across all these tasks, particularly in motion prediction, occupancy prediction, and planning, with impressive metrics like 0.71m minADE for motion and 0.31% avg.Col for planning. The framework is open-source, available on GitHub, and has received the CVPR 2023 Best Paper Award. It supports integration with datasets like nuPlan and NAVSIM, and offers tools for CARLA and closed-loop evaluation. UniAD is designed for researchers and developers in the autonomous driving domain, providing a robust platform for advancing self-driving technology.
UDTL
UDTL is an open-source repository providing the implementation details for the paper "Applications of Unsupervised Deep Transfer Learning to Intelligent Fault Diagnosis: A Survey and Comparative Study." It serves as a comprehensive library for researchers and academics interested in applying unsupervised deep transfer learning (UDTL) to intelligent fault diagnosis. The project offers baseline accuracies and a unified framework, allowing users to load their own datasets and models for new studies. It includes various loss functions for mapping-based DTL, data augmentation methods, PyTorch datasets for time and frequency domains, and models used in the project. The repository also provides utilities for the training procedure, making it a valuable resource for replicating and extending research in this field.
TradingGym
TradingGym is an open-source toolkit designed for training and backtesting reinforcement learning algorithms and simple rule-based trading strategies. Inspired by OpenAI Gym, it offers a flexible framework for creating trading environments. It supports both tick data and OHLC data formats, allowing for diverse data input for strategy development. The toolkit includes functionalities for setting up training environments, performing backtesting, and visualizing transaction details. Future plans include implementing real-time trading environments with Interactive Broker API integration. Users can define custom agents and test their performance against historical data, making it a valuable resource for quantitative finance research and development.
semantic-segmentation
semantic-segmentation is an open-source PyTorch library designed for state-of-the-art semantic segmentation models. It provides a flexible and customizable framework for computer vision researchers and developers. The library supports a wide array of datasets, making it suitable for various applications requiring precise pixel-level classification. Its focus on ease of use and customizability allows users to adapt models to specific needs, ensuring high accuracy for diverse computer vision projects. This tool is ideal for those looking to implement or experiment with advanced semantic segmentation techniques.
caffe-yolo
caffe-yolo offers a Caffe implementation of the YOLO (You Only Look Once) real-time object detection system. This tool specifically supports YOLO v1 and includes batch normalization layers. The Caffe models used are not trained within Caffe but are converted from Darknet's original .weight files, ensuring compatibility and leveraging existing pre-trained models. The conversion process involves creating .prototxt files from Darknet's .cfg files, initializing the Caffe network, reading weights from Darknet, and then replacing initialized weights with the pre-trained ones. It provides scripts for creating .prototxt and .caffemodel files, and a main script for performing object detection on images. This makes it a valuable resource for developers and researchers working with object detection in a Caffe environment.
maml
Maml is an open-source code repository for Model-Agnostic Meta-Learning (MAML), a technique designed for the fast adaptation of deep networks. Developed by cbfinn, this repository provides the foundational code accompanying the paper "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks" (Finn et al., ICML 2017). It specifically includes implementations for few-shot supervised learning domain experiments, covering tasks such as sinusoid regression, Omniglot classification, and MiniImagenet classification. The project is built using Python 2.* or 3.* and TensorFlow v1.0+, making it accessible for researchers and developers working in meta-learning and few-shot learning. Users can access data preparation instructions for Omniglot and MiniImagenet, and detailed usage instructions are available within the `main.py` file.
DeepRL-Agents
DeepRL-Agents is an open-source repository offering a comprehensive collection of Deep Reinforcement Learning algorithms, all implemented using Tensorflow. This resource is ideal for individuals looking to understand and apply various RL techniques, from foundational Q-learning and policy gradient methods to more advanced concepts like Double-Dueling-DQN, Deep Recurrent Q-Networks, and Asynchronous Advantage Actor-Critic (A3C). The repository includes iPython notebooks for each algorithm, often accompanied by tutorial series published on Medium, making it a valuable educational and practical tool for learning about reinforcement learning.
DeepRL-TensorFlow2
DeepRL-TensorFlow2 is a GitHub repository offering straightforward implementations of a wide array of Deep Reinforcement Learning (DRL) algorithms, all built with TensorFlow2. The project prioritizes code clarity, making it an excellent resource for students and researchers delving into DRL. Each algorithm is contained within a single Python script, simplifying the learning process by eliminating the need to navigate multiple files. The repository is actively maintained and continuously updated with new DRL algorithms. It currently includes implementations for DQN, DRQN, DoubleDQN, DuelingDQN, A2C, A3C, PPO, and DDPG, with TRPO, TD3, and SAC noted as planned additions. The project also provides code snippets illustrating the core ideas behind each algorithm, such as using target networks and replay buffers in DQN, or advantage functions in A2C.
Mouse Hackathon
Mouse Hackathon is a dynamic platform designed for creative innovation using AI, specifically structured around 1-minute challenges. It serves as a Hugging Face Space by VIDraft, offering a collaborative environment for AI enthusiasts and innovators. The platform allows users to participate in the MOUSE-I Hackathon, providing clear information on dates, prize amounts, and participation steps. It also features language switching between English and Korean, alongside a news view, to keep participants informed and engaged. This tool is ideal for those looking to quickly experiment with AI concepts and engage in rapid prototyping within a competitive yet supportive hackathon setting.
Deep-reinforcement-learning-with-pytorch
Deep-reinforcement-learning-with-pytorch is an open-source GitHub repository that offers PyTorch implementations of classic and state-of-the-art deep reinforcement learning algorithms. The project includes implementations of popular methods such as DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, and TD3. Its primary goal is to provide clear and accessible code, making it easier for individuals to learn and experiment with deep reinforcement learning algorithms. The repository is actively maintained, with plans to add more advanced algorithms and update existing code. It also provides installation instructions and examples for testing the implementations.
mmskeleton
MMSkeleton is an open-source toolbox developed by OpenMMLAB, specifically designed for skeleton-based human understanding. It offers a highly extensible framework that systematically organizes code and projects, allowing for adaptation to various tasks and scaling to complex deep models. Key functionalities include 2D and 3D pose estimation, skeleton-based action recognition (like ST-GCN), and action synthesis. The toolbox also supports building custom skeleton-based datasets and creating personalized applications. It is part of the OpenMMLAB project, developed on the ST-GCN research project, and is released under the Apache 2.0 license.
evolution-strategies-starter
evolution-strategies-starter offers a distributed implementation of the Evolution Strategies algorithm, as detailed in the paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning." This open-source project utilizes a master-worker architecture where the master broadcasts parameters to workers, and workers return results. The code is specifically designed to run on AWS EC2, making it resilient to worker termination and suitable for spot instances. It requires a Mujoco license for humanoid experiments and uses Packer for AMI building. The project is provided as-is, with no further updates expected, serving as a foundational codebase for researchers and developers in reinforcement learning.
Skywork-R1V
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning. The series includes both open-source versions with model weights and inference code, as well as closed-source offerings like Skywork-R1V4-Lite. These models deliver exceptional performance across vision understanding, code execution, and deep research tasks, featuring agentic capabilities. Key features include code execution for complex tasks, deep research integration with web search, multi-turn reasoning with tool usage, and streaming support for real-time responses. The models have demonstrated state-of-the-art performance on various multimodal benchmarks, particularly excelling in perception and deep research capabilities.