AI Agents & Automation
Browsing page 136 of AI Frameworks & Infra in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
k8m
k8m is a lightweight, cross-platform Mini Kubernetes AI Dashboard designed to streamline cluster management. Built on AMIS and using kom as a Kubernetes API client, it integrates AI capabilities like Qwen2.5-Coder-7B and DeepSeek-R1-Distill-Qwen-7B for intelligent analysis, YAML translation, and log AI diagnostics. It supports multi-cluster management with heart-beat detection, automated reconnection, and granular permission control for users and groups. Key features include a plugin-based architecture, MCP integration for large model tool calls, and advanced security with MCP permission integration. It also offers Pod file and running management, API access, cluster inspection, k8s Event forwarding, CRD management, and a Helm market. The tool is fully open-source, supports multiple architectures and databases, and can be deployed as a single executable, making it highly efficient and easy to use for Kubernetes operations.
LLM-Pruner
LLM-Pruner is a cutting-edge tool designed for the structural pruning of large language models (LLMs), as presented at NeurIPS 2023. It enables users to compress LLMs to any desired size while retaining their original multi-task solving abilities. The tool emphasizes task-agnostic compression, requiring minimal training corpus (e.g., 50k Alpaca samples for post-training) and offering efficient compression times, with pruning taking approximately 3 minutes and post-training around 3 hours. LLM-Pruner supports a wide range of popular LLMs, including Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, and TinyLlama. It features an automatic structural pruning process, aiming for minimal human effort, and provides detailed instructions for discovery, estimation, and recovery stages of pruning, along with evaluation using lm-evaluation-harness.
OMG
OMG is an advanced open-source framework designed for occlusion-friendly personalized multi-concept generation within diffusion models, as presented at ECCV 2024. It allows users to generate complex images featuring multiple characters and styles, integrating seamlessly with LoRAs from Civitai.com and InstantID for single-image ID personalization. The tool also supports ControlNet for layout control and various style LoRAs. OMG is built on Python 3.10.6 with PyTorch 2.0.1 and torchvision 0.15.2, requiring specific model downloads for its functionality, including Stable Diffusion XL and various ControlNet and LoRA checkpoints. It offers flexible usage through command-line inference scripts for both LoRA and InstantID workflows.
Stellon Labs
Stellon Labs is an AI research lab dedicated to developing powerful, tiny AI models specifically optimized for edge applications. Their focus is on creating 'frontier AI' solutions that can operate efficiently on minimal hardware, making advanced artificial intelligence accessible for devices with limited computational resources. The lab aims to push the boundaries of AI performance in constrained environments, enabling new possibilities for on-device intelligence without requiring extensive infrastructure. Their work is geared towards practical applications where low-power and small-footprint AI is crucial.
pytorch-maddpg
Pytorch-maddpg offers a PyTorch implementation of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, a key approach in multi-agent reinforcement learning. This open-source project is hosted on GitHub and is designed for researchers and developers working on complex multi-agent systems. The implementation includes a modified Waterworld environment, where agents (evaders, pursuers, poisons) interact under specific physical rules, allowing for experimentation with cooperative behaviors. It supports features like agents bouncing off walls and requiring exact cooperation for rewards, making it a valuable tool for studying multi-agent coordination and policy learning.
VoAPI
VoAPI is a next-generation, high-performance, and highly scalable intelligent AI large-model API aggregation and distribution system. It offers a comprehensive suite of features for managing AI model APIs, including user and multi-currency management, API data forwarding, and a flexible rules engine supporting ES5/ES6 JavaScript syntax for custom rules. The system supports multiple balance mechanisms, daily check-ins, and multi-user levels. Advanced features include real-time RPM/TPM support for users, channels, and individual keys, remote model and vendor data synchronization, and channel grouping with fixed or timed multipliers. VoAPI also provides robust error handling with key disabling and automatic recovery, circuit breaker timeouts, IP/UA rule restrictions, and global/independent proxy configurations. It includes a redemption code system, custom menus, third-party logins, security filtering, API line display and speed testing, and node status monitoring. The Pro version adds online payment support, custom multi-currency, automated exchange rate conversion, online self-service invoicing, real-name authentication, marketing notifications, a powerful ticketing system, and dynamic routing with instant hot reloading.
XVERSE-13B
XVERSE-13B is a multilingual large language model developed by XVERSE Technology Inc. It features a Decoder-only Transformer network structure with an 8K context length, which is extended to 256K in the XVERSE-13B-256K version for handling extensive input content like literature summaries and report analysis. The model was trained on 3.2 trillion tokens across over 40 languages, with a focus on Chinese and English performance. It utilizes a 100,534-token BPE-based tokenizer that supports multiple languages without requiring additional vocabulary expansion. The project also highlights an efficient training framework with high peak computing power utilization. Quantized models (GGUF, GPTQ) are available for inference on MacOS, Linux, and Windows systems.
VidClaw
VidClaw is an open-source, self-hosted command center designed specifically for managing OpenClaw AI agents. It offers a comprehensive dashboard that allows users to visually queue tasks using a Kanban board, track real-time token usage and cost estimates, and switch between different AI models. A key feature is the 'Soul Editor,' which enables users to tweak their agent's persona, identity, and operating instructions with version history. Additionally, VidClaw includes a Skills Manager to browse, enable, or create custom skills, and a Content Browser for workspace files. Built for developers and users who prefer to own their infrastructure, VidClaw ensures privacy and security by binding only to localhost, with no cloud dependencies or external tracking.
mader
mader is an open-source trajectory planner specifically designed for use in multi-agent and dynamic environments. It has been accepted for publication in the IEEE Transactions on Robotics (T-RO), highlighting its academic rigor and practical applicability. The tool facilitates trajectory planning for robotic systems, including single-agent and multi-agent simulations, with features like obstacle avoidance and dynamic environment handling. Users can set up and run simulations using ROS, with options for both Docker and non-Docker installations. It supports backend optimizers like Gurobi and NLOPT, providing flexibility for different computational needs. The project is hosted on GitHub by MIT-ACL, making it accessible for researchers and developers in the robotics community.
SwanLab
SwanLab is an open-source, modern-design AI training tracking and visualization tool built for AI model training teams. It provides comprehensive features for experiment analysis, metric observation, and collaboration. Researchers can track key metrics, record hyperparameters, and visualize training processes through an intuitive UI, helping to identify issues and accelerate model iteration. SwanLab supports a wide range of data types including scalar metrics, images, audio, text, video, 3D point clouds, and biochemical molecules, along with various chart types like line, media, bar, and custom ECharts. It offers both cloud and self-hosted deployment options and integrates with over 50 mainstream frameworks, including PyTorch, Transformers, and Keras. Key functionalities include experiment comparison, multi-person collaboration, hardware monitoring, and an open API for extended capabilities.
eli5
eli5 is a Python package designed to help debug and inspect machine learning classifiers, providing explanations for their predictions. It supports a wide range of machine learning frameworks, including scikit-learn, Keras (for Grad-CAM visualizations), xgboost, LightGBM, CatBoost, and lightning. The library can explain weights and predictions of linear classifiers, print decision trees, show feature importances, and debug scikit-learn pipelines. Additionally, eli5 implements algorithms for inspecting black-box models, such as TextExplainer for LIME-based explanations and permutation importance for feature importances. Explanations can be formatted for console display, HTML embedding, pandas DataFrames, or JSON for custom rendering.
micronet
Micronet is an open-source library designed for AI model compression and efficient deployment on various hardware platforms. It provides a comprehensive suite of techniques including quantization-aware training (QAT) and post-training quantization (PTQ) for both high-bit and low-bit scenarios, as well as pruning methods like normal, regular, and group convolutional channel pruning. The library also supports batch-normalization fusion for quantization, enhancing model efficiency. For deployment, Micronet integrates with TensorRT, enabling optimized inference in fp32, fp16, and int8 formats with features like op-adapt and dynamic shape support. This makes it an invaluable tool for developers looking to reduce model size and accelerate inference speed.
phishguard-scaffold
PhishGuard Scaffold offers a unified framework for detecting and controlling phishing attacks on social media platforms. Leveraging LLaMA-based modeling, it provides deep semantic understanding and adversarial robustness to identify phishing attempts. The tool also incorporates advanced graph-based intervention strategies to control the propagation of detected phishing content. Key features include LLaMA-2-7B integration, multi-layer semantic projection, LoRA/PEFT support for efficient fine-tuning, and robust preprocessing. It also focuses on adversarial robustness through semantic perturbations and KL divergence training. For propagation control, it builds automated social networks, simulates diffusion with an Independent Cascade Model, and offers targeted intervention using greedy optimization and influence-aware selection. The framework is designed for production readiness with mixed precision training, gradient checkpointing, and comprehensive MLOps integration via MLflow and Ray Tune.
serverless-ml-course
The serverless-ml-course is an open-source educational resource designed to simplify the development and operation of AI-enabled prediction services. It teaches how to build batch and real-time prediction services using Python, focusing on serverless infrastructure. The course covers essential MLOps fundamentals such as versioning, testing, data validation, and operations, enabling users to deploy features and models, train models, and run inference pipelines. A key differentiator is its emphasis on building a prediction service around a model without needing extensive operations experience, making it accessible for those who can program in Python but are not cloud computing experts. It also guides users on building serverless UIs for their prediction services.
pyannote-audio
pyannote-audio is an open-source Python toolkit designed for speaker diarization, a process that identifies 'who spoke when' in an audio recording. Built on the PyTorch machine learning framework, it offers robust capabilities for speech activity detection, speaker change detection, and speaker embedding. The toolkit includes pretrained models and pipelines, allowing users to quickly implement and experiment with audio analysis tasks. Furthermore, it supports fine-tuning of these models, enabling users to optimize performance on their specific custom datasets. This makes pyannote-audio a versatile tool for researchers and developers working with audio data.
6thlabs
6thlabs is an AI & Tech Solutions Partner specializing in custom AI solutions and tech development for SaaS, startups, and software companies. They offer services like MVP development, rapid prototyping, scalable solutions, and user feedback integration to launch ideas quickly and cost-effectively. Their automation services streamline operations and boost productivity with intelligent automation solutions, including automating repetitive tasks and optimizing decision-making with AI. Additionally, 6thlabs provides software integrations for seamless business operations, covering CRM, custom API development, ERP system integration, and real-time data sync. They focus on empowering businesses through tailored web and mobile solutions and innovative AI technologies, driving efficiency and growth.
DataCrunch
Verda, formerly DataCrunch, is a European ISO-certified cloud provider specializing in AI infrastructure. It offers instant access to powerful production-grade GPUs through self-service instances and multi-node clusters, including bare-metal options with NVIDIA B200, H200, and H100 GPUs. Verda also provides serverless inference for containerized models, allowing auto-scaling and pay-per-usage, and managed endpoints for popular AI models. The platform is designed to remove infrastructure barriers for AI teams, focusing on optimizing performance, reliability, and costs, with all infrastructure powered by 100% renewable energy and hosted in GDPR-regulated European countries.
robustmq
RobustMQ is a unified messaging engine built with Rust, designed as a communication infrastructure for the AI era. It operates as a single binary, one broker, and one storage layer, eliminating external dependencies and allowing deployment from edge devices to cloud clusters. It natively supports MQTT, Kafka, NATS, AMQP, and its own mq9 protocol on a shared storage layer, meaning a message written once can be consumed by any protocol. The mq9 protocol is specifically designed for AI Agent asynchronous communication, offering features like agent mailboxes with persistent store-first delivery, priority levels, and public mailbox discovery. RobustMQ emphasizes minimal operations, multi-tenancy, and ultra-low-latency dispatch, making it suitable for diverse messaging needs from IoT to streaming data pipelines.
spacy-models
spacy-models offers a collection of pre-trained models specifically designed for use with the spaCy Natural Language Processing (NLP) library. These models are essential for data scientists and machine learning engineers who are building applications that require advanced text processing capabilities. The models support a wide range of NLP tasks, including efficient text analysis, named entity recognition, and dependency parsing. By leveraging these pre-trained models, users can significantly accelerate their NLP development workflows, reducing the need for extensive custom training. The integration with spaCy ensures high performance and ease of use for various linguistic tasks.
sherpa
sherpa is an open-source speech-to-text inference framework built with PyTorch, designed for deploying pre-trained models to transcribe speech. It specializes in end-to-end models, particularly transducer- and CTC-based architectures, offering high-performance speech recognition capabilities. Developers can integrate sherpa into their projects using either C++ or Python APIs, making it versatile for various development environments. The framework is ideal for those looking to implement custom speech-to-text solutions, leverage advanced AI models for audio processing, or contribute to the open-source AI community. Its focus on inference means it's optimized for efficient deployment of trained models.
TensorFlow-VAE-GAN-DRAW
TensorFlow-VAE-GAN-DRAW is an open-source collection of generative methods implemented using TensorFlow. This repository offers implementations of Deep Convolutional Generative Adversarial Networks (DCGAN), Variational Autoencoders (VAE), and DRAW: A Recurrent Neural Network For Image Generation. It allows users to experiment with and run these different generative models, providing a foundation for research and development in image generation. The project highlights that DCGANs produce decent results after 10 epochs with default parameters and outlines future enhancements like more complex data integration and replacing the current attention mechanism with a Spatial Transformer Layer.
tensorflow_template_application
tensorflow_template_application offers a versatile and generic template for deep learning projects built with TensorFlow. It is designed to streamline the development process by providing a structured foundation. The tool supports multiple data formats, including CSV, LIBSVM, and TFRecords, ensuring flexibility in data handling. Key features extend to prediction servers, leveraging TensorFlow Serving and a Python HTTP server, as well as prediction clients available in various programming languages. This comprehensive setup makes it suitable for developers looking to quickly deploy and manage deep learning models.
SOFTEYE
TDK AIsight is a core technology platform developed by TDK that focuses on building the fundamental technologies for generative AI glasses. It enables context-aware vision, memory, and low-power on-device intelligence for next-generation smart glasses. The platform integrates core hardware components and a modular subsystem architecture for performance, flexibility, and scalability. It employs a multi-modal feedback architecture, distributing system output across visual, audio, haptic, and display subsystems. The intelligence behind AIsight, eyeGI™ and eyeGenI™, delivers low-power, real-time perception and contextual understanding, supporting a wide range of context-aware experiences for work, learning, travel, and shopping.
TTS
TTS is a comprehensive open-source library developed by Mozilla for advanced Text-to-Speech generation. It leverages the latest research to provide a balance of ease-of-training, speed, and quality, making it suitable for various applications. The library includes pretrained models and tools for measuring dataset quality, supporting over 20 languages. It features high-performance deep learning models for Text2Spec tasks like Tacotron and Glow-TTS, as well as various vocoder models such as MelGAN and WaveRNN. TTS supports multi-speaker TTS, efficient multi-GPU training, and the ability to convert PyTorch models to Tensorflow 2.0 and TFLite for inference. It also provides a demo server for model testing and notebooks for extensive benchmarking.