AI Agents & Automation
Browsing page 177 of AI Frameworks & Infra in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
apollo
Apollo is an open-source autonomous driving platform designed to accelerate the development, testing, and deployment of autonomous vehicles. It provides a high-performance and flexible architecture, supporting a wide range of autonomous driving applications. The platform has evolved through numerous versions, each introducing new modules and features, from basic GPS waypoint following to complex urban road navigation with advanced perception and planning algorithms. Apollo emphasizes collaboration and innovation in the autonomous vehicle technology field, offering extensive documentation and quick-start guides for developers. It supports various hardware configurations and software environments, including different Ubuntu versions, NVIDIA GPUs, and Docker-CE, making it a comprehensive solution for autonomous driving development.
Zero GPU Spaces Leaderboard
Zero GPU Spaces Leaderboard is an application hosted on Hugging Face that provides a user-friendly interface for exploring Zero-GPU spaces. Users can search and browse through various AI spaces that operate without dedicated GPUs, making it easier to discover efficient and accessible AI models. The platform also allows users to view detailed information about each space and identify trending creators within the Zero-GPU ecosystem. This tool is particularly useful for those interested in understanding the landscape of GPU-free AI applications and finding innovative projects.
YOLO-World + EfficientSAM
YOLO-World + EfficientSAM is an AI tool available on Hugging Face that facilitates advanced object detection and image segmentation. Users can upload photos or videos and specify objects they wish to identify using comma-separated names. The tool then processes the media to highlight these objects with precise bounding boxes and masks, offering an optional confidence score display. This combination of YOLO-World for detection and EfficientSAM for segmentation provides a robust solution for visual analysis tasks. It is particularly suitable for AI research and prototyping, allowing developers and researchers to experiment with and build upon state-of-the-art computer vision models.
Zyphra-ZR1 WebGPU
Zyphra-ZR1 WebGPU is a compact AI reasoning model engineered to operate entirely within a web browser, leveraging WebGPU technology. This innovative approach enables users to perform complex reasoning tasks and interact with 3D models without the need for external servers or cloud infrastructure. Users can upload their own 3D models or utilize preloaded ones, exploring them in a detailed and immersive environment directly from their browser. This local execution capability makes it particularly useful for applications requiring offline functionality, enhanced privacy, or experimental AI development where server-side processing is not desired or feasible. The tool is hosted on Hugging Face Spaces, indicating its community-driven and accessible nature.
Kotaemon
Kotaemon is an AI chatbot designed to facilitate general conversation and AI interaction. Users can leverage this tool to test the capabilities of chatbots and explore various AI language models. It serves as a valuable resource for individuals and developers looking to prototype chatbot applications, offering a platform to experiment with and understand AI-driven conversational agents. The tool is available for free, making it accessible for a wide range of users interested in AI.
Manticore 13B Chat
Manticore 13B Chat is an AI chatbot tool specifically created for the development and testing of AI chatbots and language models. It provides a platform for users to explore and experiment with conversational AI technologies. The tool is particularly useful for individuals and organizations engaged in researching the nuances of AI-driven conversations. Manticore 13B Chat is offered at no cost, making it accessible for a wide range of users interested in AI chatbot development.
grayskull
Grayskull is a minimalist, dependency-free computer vision library written in C, specifically engineered for microcontrollers and other resource-constrained devices like drones and robotics. It focuses on grayscale image processing, providing a suite of modern and practical algorithms that fit within a few kilobytes of code. Key features include image operations such as copy, crop, resize (bilinear), and downsample, along with filtering capabilities like blur, Sobel edges, and various thresholding methods (global, Otsu, adaptive). The library also supports morphology operations (erosion, dilation), geometry functions like connected components and perspective warp, and advanced features like FAST/ORB keypoints for object tracking and LBP cascades for face and vehicle detection. Its single-header design, integer-based operations, and pure C99 implementation ensure no dynamic memory allocation or C++ dependencies, making it ideal for embedded vision projects.
Gemini vs GPT vs Claude
Gemini vs GPT vs Claude is a dedicated AI comparison tool designed for evaluating the performance of leading large language models. Users can input custom prompts and observe the responses generated by Gemini Pro, GPT-4, and Claude 3. This side-by-side comparison facilitates a detailed analysis of each model's strengths, weaknesses, and unique characteristics, helping users understand their respective capabilities and limitations for various tasks.
ML-GCN
ML-GCN is a PyTorch implementation of Multi-Label Image Recognition with Graph Convolutional Networks, as presented in a CVPR 2019 paper. This open-source project provides researchers and developers with the code and pre-trained models necessary to apply GCNs to multi-label image recognition tasks. The implementation highlights improvements achieved by replacing Global Average Pooling (GAP) with Global Max Pooling (GMP) for feature aggregation, demonstrating enhanced performance on datasets like COCO, NUS-WIDE, and VOC2007. It includes detailed instructions for setting up requirements, downloading models, and running demos for VOC 2007 and COCO 2014 datasets, making it a valuable resource for academic research and practical application in computer vision.
temporal-shift-module
The Temporal Shift Module (TSM) is an open-source PyTorch implementation designed for efficient video understanding. It allows for temporal modeling in video analysis tasks, such as action recognition, by shifting part of the channels along the temporal dimension. TSM is a plug-and-play module that adds zero parameters and zero FLOPs, making it highly efficient. The project provides pre-trained models on datasets like Kinetics-400 and Something-Something, along with code for data preparation, testing, and training. It also features a live demo for online hand gesture recognition on NVIDIA Jetson Nano, showcasing its real-time capabilities.
AimRT
AimRT is a high-performance runtime framework specifically designed for modern robotics applications. Built with modern C++, it emphasizes being lightweight and easy to deploy, making it suitable for various robotic systems. The framework focuses on critical aspects such as efficient resource management, enabling developers to optimize the use of computational resources in their robotic projects. It also supports asynchronous programming, which is crucial for handling multiple tasks concurrently and ensuring responsive robotic behaviors. Furthermore, AimRT provides robust deployment configuration capabilities, simplifying the process of getting robotic applications up and running in diverse environments. This makes AimRT an essential tool for developers looking to build and deploy sophisticated, resource-efficient, and reliable robotic solutions.
AS-One
AS-One is a comprehensive, open-source Python wrapper designed for computer vision tasks, providing an easy and modular interface for object detection, segmentation, tracking, and pose estimation. It supports a wide range of YOLO models, including YOLOv9, v8, v7, v6, v5, R, and X, enabling users to implement these advanced models in under 10 lines of code. The library integrates various tracking algorithms like ByteTrack, DeepSORT, and NorFair, and supports models in ONNX, PyTorch, and CoreML formats. AS-One also includes capabilities for text detection and recognition using models like CRAFT and EasyOCR, and pose estimation with YOLOv8 and YOLOv7-w6. It is ideal for developers and researchers looking for a unified and efficient solution for their computer vision projects.
llm-twin-course
llm-twin-course is a free educational resource designed to guide users through the process of building a production-ready Large Language Model (LLM) and Retrieval Augmented Generation (RAG) system. The course emphasizes LLMOps best practices, offering practical, hands-on lessons and accompanying source code. It covers the entire development lifecycle, from initial data gathering to the final stages of productionizing LLMs, with a specific focus on creating an AI replica.
nerves
Nerves offers a comprehensive set of tools and libraries for developing and deploying embedded software using Elixir. It leverages the robust Erlang virtual machine and the Linux kernel to create small, self-contained software images for microprocessor-based systems. While not a full Linux distribution, Nerves integrates the Erlang runtime early in the boot process, allowing Elixir to manage the system. It supports a wide range of hardware, including various Raspberry Pi models and BeagleBone boards, and provides access to the Elixir ecosystem, including Phoenix, LiveView, Elixir Nx, and Livebook. Nerves also includes a C/C++ cross-toolchain for consistent builds across host platforms and offers modules for hardware access, networking, and SSH capabilities.
ObjectDetection-OneStageDet
ObjectDetection-OneStageDet is an open-source object detection framework developed by Tencent, designed to provide a unified platform for single-stage generic object detectors. Currently, it supports YOLOv2 and YOLOv3 implementations, with future plans to integrate YOLO and SSD into a single framework. The tool emphasizes performance and speed, offering good mAP scores and fast inference times, especially with various efficient backbones like TinyYOLO, MobileNet, and ShuffleNet. It provides comprehensive instructions for installation, data preparation, training, evaluation, and benchmarking, making it suitable for developers and researchers working on object detection tasks.
onyx
Onyx is an open-source AI platform designed for easy deployment and self-hosting. It provides a comprehensive chat user interface that can be used with any Large Language Model (LLM). A key advantage is its ability to operate effectively in air-gapped environments, ensuring data security and compliance. The platform is equipped with advanced functionalities including AI Agents, integrated Web Search capabilities, and Retrieval Augmented Generation (RAG). Furthermore, Onyx offers connectors to more than 40 different knowledge sources, enhancing its ability to access and utilize diverse information.
Rofunc
Rofunc is an open-source Python package designed for robot learning from demonstration and robot manipulation. It provides a comprehensive framework for developing and deploying advanced robot learning algorithms. The tool is hosted on GitHub, making it accessible for researchers and developers in the robotics field. Rofunc facilitates the entire workflow, from initial algorithm development to practical deployment, supporting various aspects of robot control and interaction. Its open-source nature encourages community contributions and collaborative development, making it a valuable resource for advancing robotics research and applications.
Stereo-RCNN
Stereo-RCNN is an open-source implementation for accurate 3D object detection and estimation, primarily developed for autonomous driving applications. This tool leverages stereo images to perform simultaneous object detection and association, enhancing the precision of 3D box estimations. It also incorporates a dense alignment module for refining 3D box predictions. The project supports Pytorch 1.0.0 and Python 3.6, with a light-weight version available for scenarios with limited GPU memory. Researchers and developers can utilize Stereo-RCNN for tasks requiring robust 3D perception from image-only data, offering a valuable resource for advancing autonomous systems.
vedadet
vedadet is a single-stage object detection toolbox built on PyTorch, offering a modular design that re-engineers MMDetection for enhanced flexibility and deployment. It decomposes the detector into four key parts: data pipeline, model, postprocessing, and criterion, making it straightforward to convert PyTorch models into TensorRT engines. This design facilitates efficient deployment on NVIDIA devices such as Tesla V100, Jetson Nano, and Jetson AGX Xavier. The toolbox supports several popular single-stage detectors, including RetinaNet and FCOS, right out of the box. Its friendly integration with TensorRT allows for easy model conversion and deployment through both Python and C++ front-ends, making it a powerful tool for developers working on object detection tasks.
TempestV0.1 GPU Demo
TempestV0.1 GPU Demo is a demonstration of AI capabilities, specifically designed to showcase the TempestV0.1 model. Hosted on Hugging Face Spaces, this tool leverages GPU processing to provide a platform for users to explore and test the model's functionalities. While currently paused, it aims to offer insights into advanced AI applications. Users interested in utilizing this Space are encouraged to contact the author through the community tab to request its restart, indicating its potential for academic research and educational purposes.
claude-to-chatgpt
claude-to-chatgpt is an open-source utility designed to bridge the gap between Anthropic's Claude API and OpenAI's Chat API. It allows developers and applications built for the OpenAI ecosystem to seamlessly integrate and utilize Claude models without significant code changes. The tool handles the conversion of API requests and responses, supporting streaming for real-time interactions. It is versatile in deployment, offering options via Cloudflare Workers for serverless execution or Docker for containerized environments, making it accessible for various technical setups.
Grendel-GS
Grendel-GS is a distributed training system designed to significantly scale up 3D Gaussian Splatting training. It allows users to leverage multiple GPUs for substantially faster training times and supports a greater number of Gaussians in GPU memory, facilitating the reconstruction of larger-area, higher-resolution 3D scenes with improved PSNR. The system retains the original 3DGS algorithm, making it a direct and safe replacement for existing implementations. Grendel-GS is particularly beneficial for training large-scale 4K scenes with millions of Gaussians, offering performance improvements such as training Mip360 datasets over 3.5 times faster and completing Tanks&Temple scenes in under a minute. While it focuses on training functionality, it integrates with existing 3DGS workflows.
context-portal
context-portal is an open-source server designed to manage project context using a Model Context Protocol (MCP). It constructs a project-specific knowledge graph, which serves to enhance the capabilities of AI assistants. The tool facilitates Retrieval Augmented Generation (RAG), allowing for more context-aware development directly within Integrated Development Environments (IDEs). Essentially, context-portal functions as a memory bank specifically tailored for AI development tools, providing relevant information to improve their performance and understanding.
continuous-eval
continuous-eval is an open-source package designed for the data-driven evaluation of applications powered by Large Language Models (LLMs). It provides a modular approach to evaluation, allowing users to apply tailored metrics to each specific module within their LLM pipeline. The tool includes a comprehensive library of metrics to facilitate thorough assessment. It supports the evaluation of diverse LLM use cases, including Retrieval-Augmented Generation (RAG), code generation, and the utilization of agent tools.