AI Agents & Automation
Browsing page 172 of AI Frameworks & Infra in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
Online-3D-BPP-DRL
Online-3D-BPP-DRL is an open-source project that provides the implementation of the paper "Online 3D Bin Packing with Constrained Deep Reinforcement Learning." This tool is designed for researchers and developers interested in optimizing 3D bin packing problems using AI. It allows users to train new models on randomly generated sequences or test existing models with various data sets. The repository includes code for user-study applications, multi-bin algorithms, and MCTS for comparison, offering a comprehensive environment for experimentation and development in this domain. Users can adjust network architectures and parameters to suit their specific needs, making it a flexible platform for advanced AI research in logistics and optimization.
Online-3D-BPP-PCT
Online-3D-BPP-PCT is an open-source tool that implements a method for efficient online 3D bin packing. It leverages deep reinforcement learning (DRL) on a hierarchical packing configuration tree to enhance the practical applicability of the online 3D Bin Packing Problem (BPP). This approach makes the DRL model adept at dealing with practical constraints and performing well even in continuous solution spaces. Key features include arbitrary container and item sizes, support for continuous online 3D-BPP, algorithms for approximating stability, and improved performance with complex constraints. It also offers more adequate heuristic baselines for domain development and stable training.
pytorch-pose
pytorch-pose is an open-source PyTorch toolkit designed for 2D single human pose estimation. It offers a comprehensive pipeline for training, inference, and evaluation, making it a valuable resource for researchers and developers in computer vision. The toolkit includes a robust dataloader with various data augmentation options, compatible with popular human pose databases such as MPII, LSP, and FLIC. Key features include multi-thread data loading, multi-GPU training support, a logger for tracking progress, and visualization of training and testing results. It is compatible with PyTorch 0.4.1/1.0 and provides detailed instructions for installation, data preparation, and usage, including testing with pre-trained models and evaluating PCKh@0.5 scores.
PyGCL
PyGCL is a PyTorch-based open-source library specifically designed for Graph Contrastive Learning (GCL). It provides a comprehensive framework for researchers and developers to implement and experiment with various GCL algorithms. The library features modularized GCL components, including graph augmentation techniques like Edge Adding, Feature Masking, and Node Dropping, as well as different contrasting architectures and modes (single-branch, dual-branch, bootstrapped, within-embedding). PyGCL also implements a variety of contrastive objectives such as InfoNCE, JSD, and Barlow Twins, alongside negative sampling strategies. It supports standardized evaluation with evaluators like Logistic Regression and SVM, and offers utilities for managing experiments, making it a valuable tool for advancing graph representation learning.
nitrain
Nitrain (formerly torchsample) is a framework-agnostic Python library designed for medical image analysis, enabling efficient training of AI models. It provides robust functionalities for sampling and augmenting medical images, supporting various frameworks like PyTorch, TensorFlow, and Keras. The library simplifies model training by offering reasonable defaults and a high level of abstraction. Users can visualize results within a medical imaging context, making it a comprehensive tool for medical imaging AI development. Full examples for segmentation, classification, and registration tasks are available, and it integrates with the ANTsPy package for advanced medical image processing.
SEAM
SEAM (Self-supervised Equivariant Attention Mechanism) is an open-source implementation designed for weakly supervised semantic segmentation. This tool addresses the challenge of generating accurate object masks from image-level supervision, a common limitation in advanced class activation map (CAM) solutions. SEAM introduces a self-supervised approach by enforcing consistency regularization on predicted CAMs across various transformed images, effectively narrowing the gap between full and weak supervisions. Additionally, it incorporates a pixel correlation module (PCM) to refine predictions by leveraging context appearance information and similar neighbors. Extensive experiments on the PASCAL VOC 2012 dataset demonstrate SEAM's superior performance compared to state-of-the-art methods using the same level of supervision, making it a valuable resource for AI researchers and computer vision engineers.
Trading-Gym
Trading-Gym is an open-source project designed for the development and testing of reinforcement learning algorithms within the context of financial trading. It offers a flexible environment, currently featuring a SpreadTrading environment, which allows users to trade spreads based on bid and ask price time series for multiple products. A key feature is its generic data feeding mechanism, enabling users to create custom DataGenerators to input diverse price data. The environment's state includes prices, entry price, and position (long, short, or flat). Trading-Gym's API is inspired by OpenAI Gym, aiming for full compatibility to integrate as an additional OpenAI environment, making it accessible for researchers and developers familiar with the OpenAI Gym framework.
YOLOv11-RGBT
YOLOv11-RGBT offers a comprehensive single-stage multispectral object detection framework, extending the capabilities of YOLO models (from YOLOv3 to YOLOv13) and RTDETR to handle RGBT (Red, Green, Blue, Thermal) data. This project simplifies the configuration of visible and infrared datasets for multimodal object detection tasks, providing three distinct configuration methods. It supports multi-spectral object detection, keypoint detection, and instance segmentation. The framework is adaptable to various pixel-aligned images, including depth maps and SAR images, not just multispectral. Key features include support for TIFF images, 16-bit multi-spectral datasets with arbitrary channels, and various image formats like Gray, BGR, RGBT, and Multispectral with flexible channel configurations.
SEED-Bench Leaderboard
SEED-Bench Leaderboard is a platform designed for evaluating and comparing the performance of various AI models. Users can submit their model evaluation results in JSON format, providing details such as the model name, type, size, and the evaluation method used. The platform then analyzes and displays the model's performance on a public leaderboard. This tool serves as a centralized hub for researchers and developers to track advancements and benchmark their models against others in the AI field. While the current live website indicates a build error, the intended functionality is to facilitate transparent and comparable evaluation of AI models.
Awesome-Vision-Mamba-Models
Awesome-Vision-Mamba-Models is an open-source GitHub repository dedicated to the rapidly evolving field of visual Mamba models. It functions as a comprehensive resource, offering a survey of existing models and exploring new outlooks and advancements in the domain. The repository is actively maintained and updated with the latest research papers and developments, making it an invaluable hub for researchers, academics, and practitioners working with or interested in visual Mamba. Its structure allows for easy navigation through various models and related information, fostering knowledge sharing and collaboration within the AI community.
Awesome-VLA4AD
Awesome-VLA4AD is a comprehensive and continuously updated repository dedicated to Vision–Language–Action models for Autonomous Driving (VLA4AD). It serves as the companion resource to a survey paper, offering a curated collection of research papers, datasets, and tools in the field. The repository categorizes VLA4AD advancements into stages, from explanatory perception modules to end-to-end reasoning and control architectures. It details various models, their key features, and links to their respective papers and codebases. Additionally, it lists relevant datasets and benchmarks, making it an invaluable resource for researchers, academics, and engineers working on autonomous driving systems.
openai-cookbook
OpenAI-cookbook is an open-source repository offering a collection of examples and guides designed to help developers effectively use the OpenAI API. It provides practical code samples, primarily in Python, along with clear instructions for accomplishing common tasks and integrating OpenAI's powerful AI models into various applications. The cookbook serves as a valuable resource for understanding API functionalities, exploring different use cases, and accelerating development with OpenAI's technologies. Users need an OpenAI account and API key to run the examples, which can be set via an environment variable or an .env file.
TheBloke Quantized Models
TheBloke Quantized Models is a Hugging Face Space designed to help users find and explore quantized AI models. Quantization is a technique that reduces the size and computational cost of AI models, making them more efficient for deployment and use on various hardware. This tool provides a search interface where users can look for models based on the author or the model's specific name. The platform presents a table of available models, detailing their types and other relevant information. While the current status indicates a build error, the intent of the space is to serve as a repository and discovery tool for these optimized AI models, primarily hosted on Hugging Face.
OpenCV-Face-Recognition
OpenCV-Face-Recognition is an open-source project designed for real-time face recognition using OpenCV and Python. It serves as a foundational resource for developers and data scientists looking to implement face detection and recognition systems. The project includes comprehensive tutorials, making it accessible for those who want to build end-to-end face recognition applications. It leverages the power of OpenCV for image processing and Python for scripting, providing a robust framework for various computer vision tasks related to facial analysis. This tool is particularly useful for learning and developing custom solutions in areas such as security, attendance systems, or interactive applications requiring real-time facial identification.
PaddleDetection
PaddleDetection is an end-to-end object detection development toolkit built on PaddlePaddle, offering a rich set of model components and benchmarks. It focuses on industrial applications by providing specialized models and tools, along with practical application examples. This toolkit helps developers streamline the entire process from data preparation and model selection to training and deployment. It supports various tasks including 2D/3D object detection, instance segmentation, face detection, keypoint detection, multi-object tracking, and semi-supervised learning. PaddleDetection also features low-code full-process development capabilities and a modular design for easy model construction.
pgmpy
pgmpy is an open-source Python library designed for causal and probabilistic reasoning through graphical models. It offers comprehensive implementations of data structures for various models including DAGs, PDAGs, MAGs, PAGs, Bayesian Networks, Dynamic Bayesian Networks, and Structural Equation Models. The toolkit includes algorithms for key tasks such as causal discovery, causal identification, causal and probabilistic inference, model validation, parameter estimation, and simulations. Its modular and extensible API ensures compatibility with scikit-learn, allowing direct use, integration into sklearn pipelines, or building higher-level tools. pgmpy supports both discrete and linear Gaussian data, as well as mixture data with arbitrary relationships.
Vista
Vista is an open-source project from OpenDriveLab, presented at NeurIPS 2024, offering a generalizable world model specifically designed for autonomous driving. This tool allows for the prediction of high-fidelity futures across a wide range of driving scenarios, extending these predictions to continuous and long horizons. A key feature is its ability to execute multi-modal actions, including steering angles, speeds, commands, trajectories, and goal points. Furthermore, Vista can provide rewards for different actions without requiring access to ground truth actions, making it a valuable resource for researchers and developers in the autonomous driving field. The implementation is based on generative-models from Stability AI, and the project includes installation, training, and sampling scripts, along with model weights available on Hugging Face and Google Drive.
Grounding Dino Inference
Grounding Dino Inference is an AI tool hosted on Hugging Face Spaces, designed for advanced object detection and image analysis. Users can upload an image and then provide text descriptions of the objects they wish to identify. The application leverages the Grounding Dino model to accurately locate and highlight these specified objects within the uploaded image. This tool is particularly useful for researchers and developers working in computer vision, offering a straightforward interface to perform complex inference tasks. It provides a practical demonstration of the Grounding Dino model's capabilities in identifying diverse objects based on natural language input.
ZeroEval Leaderboard
ZeroEval Leaderboard is an AI tool developed by AllenAI, available as a Hugging Face Space, designed for evaluating and comparing the performance of various AI models. This application embeds ZeroEval, allowing users to integrate and utilize its evaluation tools directly on their websites without requiring any input. It serves as a centralized platform for researchers and developers to assess and benchmark AI model capabilities, fostering transparency and progress in the AI community. The tool is freely accessible and operates as a web application.
openai-api-proxy
openai-api-proxy offers a straightforward solution for developers needing to proxy OpenAI API requests. It can be easily deployed using a single Docker command or integrated with Tencent Cloud Functions, making it versatile for various hosting environments. A key feature is its support for Server-Sent Events (SSE) streaming output, which allows for real-time data transfer. Additionally, the proxy includes built-in text moderation capabilities, configurable for different levels of strictness, ensuring content compliance. It supports both GET and POST methods and provides environment variables for customization, such as port, proxy access key, and request timeout. This tool is ideal for developers looking to manage and secure their OpenAI API access with added functionalities like moderation and streaming.
Find3D
Find3D is an open-world 3D part segmentation model designed to identify and segment specific components within 3D objects. Users can upload their own .pcd files or select from provided samples to analyze point cloud data. The tool allows for precise part queries, enabling the segmentation of complex 3D objects into their constituent parts. This capability is particularly useful for applications requiring detailed structural analysis, object recognition, and component isolation within 3D environments. Developed as a Hugging Face Space, Find3D offers an accessible platform for researchers, developers, and enthusiasts working with 3D data and AI applications.
geckoview
GeckoView is an open-source project by Mozilla, offering a robust set of components for embedding the Gecko browser engine into Android applications. This allows developers to seamlessly integrate web content rendering capabilities directly within their native Android apps, providing a consistent and powerful browsing experience. The project emphasizes customizability, enabling developers to tailor the web view to their specific application needs. It is a foundational technology for applications like Firefox for Android, providing a secure and performant way to display web content. The GitHub repository serves as the documentation hub, guiding contributors and users on how to get started and utilize its features.
OccNet-Course
OccNet-Course offers the first comprehensive course in China on Occupancy Network algorithms, covering everything from BEV (Bird's Eye View) to Occupancy Network principles and engineering practices, including edge-side deployment. This open-source course is designed for autonomous driving enthusiasts and professionals, providing in-depth knowledge on surrounding semantic occupancy perception. It includes detailed documentation, PowerPoint presentations, and source code, making it a valuable resource for both theoretical understanding and practical application. The curriculum covers various aspects such as BEV perception, different Occupancy Network approaches (pure vision, point cloud, multi-modal fusion), important datasets, benchmarks, and deployment strategies for NVIDIA and Horizon J5 chips. The course also features practical coding exercises and a final project to solidify learning.
LongVU
LongVU is an AI tool hosted on Hugging Face Spaces that enables users to interact with visual content by uploading videos or images and posing questions or comments. The application then processes the visual input and generates detailed text responses, providing insights and information derived from the content. This functionality makes LongVU a valuable resource for researchers and developers focused on video analysis, image understanding, and general visual content interpretation. It leverages advanced AI models to bridge the gap between visual data and textual explanations, facilitating deeper engagement with multimedia.