🤖

AI Agents & Automation

Browsing page 386 of AI Agents & Automation. Sorted by confidence score — our independent quality rating.

All AI Frameworks & Infra Browser & Web Agents Chatbots & Conversational AI General-Purpose Agents Multi-Agent Systems Personal Assistants RAG & Document AI RPA Scheduling & Task Agents Voice Agents Workflow Agents

DROO

60%

DROO is an open-source project providing a Deep Reinforcement Learning algorithm for online computation offloading in Wireless Powered Mobile-Edge Computing (WPMEC) networks. The tool takes time-varying wireless channel gains as input and generates binary offloading decisions. It includes DNN structures for WPMEC, along with training and testing functionalities. Implementations are available for Tensorflow 1.x, Tensorflow 2, and PyTorch, making it accessible to developers with different deep learning framework preferences. The repository also provides data sets and demo files to evaluate performance under various conditions, such as alternating weights and random device turn-on/off scenarios.

Function Inception: Update/Remove Functions During Conversations

60%

Function Inception is a key feature within the AutoGen framework, designed to empower AI agents with the ability to dynamically modify their functional capabilities during ongoing conversations. This advanced feature allows agents to update existing functions or remove them as needed, significantly enhancing their adaptability and interaction within complex conversational contexts. It facilitates the creation of sophisticated multi-agent AI applications that can operate autonomously or collaborate effectively with human users. By enabling agents to evolve their toolset in real-time, Function Inception supports more flexible and responsive AI systems, making it an essential component for developers building dynamic AI solutions.

Multilingual Accessible Mistral 7B

60%

Multilingual Accessible Mistral 7B is an AI chatbot designed to facilitate multilingual communication. This tool is particularly useful for individuals engaged in language learning, offering a platform to practice and interact in various languages. Beyond language acquisition, it also serves as a valuable resource for content generation, allowing users to create text in multiple languages. The tool is accessible for free, making it an ideal choice for educational purposes and for those interested in exploring the capabilities of AI models without financial commitment. Its focus on accessibility and multilingual support positions it as a versatile tool for a diverse user base.

giga-world-0

60%

GigaWorld-0 is an open-source, unified world model framework designed as a data engine to empower embodied AI, specifically for Vision-Language-Action (VLA) learning. It integrates two synergistic components: GigaWorld-0-Video, which generates diverse, texture-rich, and temporally coherent embodied sequences with fine-grained control over appearance, camera viewpoint, and action semantics; and GigaWorld-0-3D, which combines 3D generative modeling, 3D Gaussian Splatting reconstruction, physically differentiable system identification, and executable motion planning to ensure geometric consistency and physical realism. The framework is built upon GigaTrain, GigaDatasets, and GigaModels, offering a comprehensive solution for researchers and developers in embodied AI.

Minion AI

60%

Minion AI is a web agent designed to revolutionize human-computer interaction by providing a personal AI assistant capable of automating a wide range of tasks. This tool focuses on streamlining workflows and offering intelligent assistance, positioning itself at the forefront of AI advancements. While specific features are not detailed, its core purpose is to act as an autonomous agent, performing actions and managing information on behalf of the user. Minion AI is built to enhance productivity and simplify complex operations, making advanced AI capabilities accessible for everyday use.

GITM

60%

GITM (Ghost in the Minecraft) is an innovative AI agent framework designed to tackle complex, long-horizon tasks within open-world environments, specifically demonstrated in Minecraft. It integrates Large Language Models (LLMs) with text-based knowledge and memory to enable generally capable agents. Unlike previous RL-based agents that struggle with mapping complex goals to low-level operations, GITM employs a hierarchical approach, breaking down goals into sub-goals, structured actions, and finally keyboard/mouse operations. This framework features an LLM Decomposer, LLM Planner, and LLM Interface, which collectively manage goal decomposition, action planning, and environmental interaction. GITM boasts broad task coverage, achieving 100% completion of the Minecraft Overworld technology tree, significantly outperforming previous methods. It also demonstrates a high success rate on challenging tasks like "ObtainDiamond" and remarkable training efficiency, requiring only a single CPU node for two days, a stark contrast to the extensive GPU training days needed by other leading agents.

InternNav

60%

InternNav is an all-in-one open-source toolbox built on PyTorch, Habitat, and Isaac Sim, designed for embodied navigation. It provides modular support for the entire navigation system, including vision-language navigation with discrete action space (VLN-CE), visual navigation (VN) with various goal types, and full VLN systems with continuous trajectory outputs. The platform is compatible with mainstream simulation platforms, catering to diverse training and evaluation needs. It offers comprehensive datasets, models, and benchmarks, including the advanced InternData-N1 dataset and the dual-system navigation foundation model, InternVLA-N1, which demonstrates leading performance and zero-shot generalization capabilities in real-world scenarios. InternNav also supports distributed evaluation and provides resources for real-world deployment.

Intrusion-Detection-System-Using-Machine-Learning

60%

This repository offers open-source code for developing Intrusion Detection Systems (IDS) using a range of machine learning algorithms. It's designed for general IDS and anomaly detection applications, particularly in the context of the Internet of Vehicles (IoV). The project includes implementations of tree-based algorithms like Decision Tree, Random Forest, XGBoost, LightGBM, and CatBoost, as well as unsupervised learning with k-means, and ensemble methods such as stacking and the proposed LCCDE. It also incorporates hyperparameter optimization techniques like Bayesian optimization. The code is accompanied by published research papers detailing three specific IDS models: a tree-based IDS, MTH-IDS (a multi-tiered hybrid IDS), and LCCDE (a decision-based ensemble framework). Datasets like CICIDS2017 and CAN-intrusion are used for experimentation, making it a valuable resource for cybersecurity researchers and developers.

HALOs

60%

HALOs (Human-Centered Loss Functions) is a Python library designed to facilitate the alignment of Large Language Models (LLMs) with human preferences. It provides extensible implementations of popular alignment methods such as DPO, KTO, PPO, and ORPO. The library emphasizes modularity, separating dataloading, training, and sampling, and extensibility, allowing users to quickly implement custom dataloaders or new alignment losses. HALOs is built for simplicity, making it easy to hack on, and has been tested with LLMs ranging from 1B to 30B parameters. It supports LoRA training, reference logit caching to reduce memory, and integrates with tools like Hydra for configuration and Accelerate for job launching with FSDP. The repository also includes scripts for evaluation with AlpacaEval and LMEval.

hamiltonian-nn

60%

Hamiltonian-nn offers the code for the paper "Hamiltonian Neural Networks," which introduces a novel approach to modeling physical systems using neural networks. Unlike traditional neural networks, Hamiltonian Neural Networks (HNNs) are designed to learn and adhere to exact conservation laws, such as energy conservation, in an unsupervised fashion. The tool provides practical examples for various tasks, including modeling ideal mass-spring systems, pendulums (both ideal and real), two-body and three-body problems, and pixel observations of a pendulum. HNNs demonstrate faster training and better generalization compared to regular neural networks, with the added benefit of being perfectly reversible in time. This makes it particularly useful for researchers and developers working on physics-informed machine learning.

neuronika

60%

Neuronika is a machine learning framework built entirely in Rust, emphasizing ease of use, rapid prototyping, and performance. At its core, Neuronika utilizes reverse-mode automatic differentiation, enabling the creation of dynamically changing neural networks with minimal effort and overhead through a lean, imperative, and define-by-run API. The framework leverages the power of the Rust language to offer an intuitive and efficient interface without the need for Foreign Function Interfaces (FFI). It supports GPU-accelerated primitives via CUDA, serialization with Serde, and transparent BLAS support for optimized matrix multiplication. Neuronika is currently in active development, with breaking changes expected as it evolves.

natbot

60%

natbot is an open-source project designed to automate browser interactions using GPT-3. It allows users to control a web browser through AI commands, effectively turning natural language instructions into browser actions. The tool is hosted on GitHub, indicating a developer-centric approach and encouraging community contributions for its enhancement. While currently a foundational tool, the project roadmap includes improvements such as better prompt engineering, prompt chaining, enhanced DOM serialization, and the ability for the agent to manage multiple tabs. This makes natbot a valuable resource for developers looking to experiment with AI-driven browser automation and contribute to its evolution.

ithaca

60%

Ithaca is a pioneering deep neural network developed by Google DeepMind for the restoration, geographical, and chronological attribution of ancient Greek inscriptions. This open-source tool significantly enhances the historian's workflow by providing a collaborative, decision-support, and interpretable architecture. It achieves 62% accuracy in restoring damaged texts, and when used by historians, their performance leaps from 25% to 72%. Ithaca can also attribute inscriptions to their original location with 71% accuracy and date them with a distance of less than 30 years from ground-truth ranges, contributing to critical debates in Ancient History. The project includes an interactive online notebook and an offline library for advanced users.

Log10

60%

Log10's Everest platform is an agentic AI solution specifically designed for life sciences services, including Pharma/Biotech, MedTech, CROs, and consultancies. It enables teams to transform their expertise into scalable, compliant workflows for document generation. Everest can produce a wide range of documents, from regulatory submissions like 510(k)s and IND/CTA Briefing Packages to clinical reports such as Clinical Trial Protocols and Investigator Brochures, and strategic documents like Market Landscape Summaries. The platform emphasizes speed, accuracy, and compliance, aiming to accelerate documentation processes without increasing team size. It also offers white-labeling solutions for CROs and consultancies to deliver AI-powered documents under their own brand.

NATSpeech

60%

NATSpeech is a comprehensive open-source framework for Non-Autoregressive Text-to-Speech (NAR-TTS) research and development. It offers official PyTorch implementations of advanced models like PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022), facilitating high-quality and portable speech generation. The framework includes robust features such as data processing for NAR-TTS using Montreal Forced Aligner, a scalable training and inference system, and an efficient random-access dataset implementation. It's designed for technical users who want to explore and build upon state-of-the-art speech synthesis technologies, providing the necessary tools and code for experimentation and deployment.

Cruit

60%

Cruit is an AI-powered career agent designed to streamline the job search and career development process. It acts as a personal career agent, helping users build their professional brand, land their dream job, and grow their career through a simple chat interface. Key features include an AI-optimized resume builder, LinkedIn profile optimization, interview preparation with video feedback, and a job tracker. Cruit aims to replace multiple disparate career tools with one integrated, conversational platform that remembers a user's entire career history and provides ethical, context-aware guidance without fabricating experience.

Openai Whisper Small

60%

Openai Whisper Small is a speech-to-text transcription tool available as a Hugging Face Space. It allows users to upload an audio file and receive a written transcription of the spoken words. This tool is a compact version of the well-known OpenAI Whisper model, designed for efficient audio analysis and language translation tasks. While the live website currently shows a runtime error, its intended functionality is to provide a straightforward way to convert audio to text, making it useful for various applications requiring written records of spoken content.

Microsoft AutoGen

60%

Microsoft AutoGen is a programming framework designed for creating multi-agent AI applications. It allows developers to build AI systems that can operate independently or in conjunction with human users. The framework supports automated workflows and facilitates collaboration between multiple AI agents. AutoGen features a layered and extensible design, offering a Core API for message passing and event-driven agents, an AgentChat API for rapid prototyping of multi-agent patterns, and an Extensions API for integrating LLM clients and capabilities like code execution. While AutoGen is now in maintenance mode, existing users can continue to leverage its architecture. For new projects, Microsoft recommends its successor, Microsoft Agent Framework, which offers enterprise-grade support. AutoGen also provides developer tools like AutoGen Studio for no-code GUI development and AutoGen Bench for evaluating agent performance.

mcp-agent

60%

mcp-agent is an open-source framework designed for building effective AI agents using the Model Context Protocol (MCP). It provides a simple, composable approach, implementing patterns described in Anthropic's "Building Effective Agents" guide. The framework offers full MCP support, managing server connections and agent lifecycles automatically. It enables developers to connect LLMs to MCP servers using patterns like map-reduce, orchestrator, and router. A key differentiator is its durable execution capabilities, scaling to production workloads with Temporal as the backend, allowing agents to pause, resume, and recover without API changes. mcp-agent is Pythonic, using decorators and context managers for easy integration, and supports deployment to cloud environments.

mlx-lm

60%

mlx-lm is a Python package designed for generating text and fine-tuning large language models (LLMs) specifically on Apple silicon using the MLX framework. It offers seamless integration with the Hugging Face Hub, allowing users to easily access and utilize a vast array of LLMs with simple commands. Key features include support for quantizing models, uploading them to the Hugging Face Hub, and performing both low-rank and full model fine-tuning, even with quantized models. The package also provides distributed inference and fine-tuning capabilities with `mx.distributed`, and tools for efficient handling of long prompts and generations through a rotating fixed-size key-value cache and prompt caching.

MAAC

60%

MAAC (Multi-Actor-Attention-Critic) is an open-source implementation of the Actor-Attention-Critic model, specifically designed for multi-agent reinforcement learning. Released as code for an ICML 2019 paper, it offers a foundational framework for researchers and engineers to delve into attention mechanisms within multi-agent environments. The tool requires Python 3.6.1, OpenAI baselines, PyTorch, and OpenAI Gym, providing a robust setup for replicating and extending the original research. Users can run various experiments, including the "Cooperative Treasure Collection" and "Rover-Tower" environments, by configuring options via `main.py`. This makes MAAC an invaluable resource for academic and experimental work in advanced AI.

neurojs

60%

neurojs is an open-source JavaScript framework designed for deep learning and reinforcement learning applications within the browser environment. While it mainly focuses on reinforcement learning, it is versatile enough for various neural network-based tasks. The library includes practical examples and demos, such as a 2D self-driving car visualization, to showcase its capabilities. It supports advanced features like uniform and prioritized replay buffers, advantage-learning, and models such as deep-q-networks and actor-critic (via deep-deterministic-policy-gradients). neurojs also allows for binary import and export of network configurations, including weights, and is built for high performance. However, development on neurojs is no longer actively maintained, with the recommendation to use more general frameworks like TensorFlow-JS.

KnowledgeGPT

60%

KnowledgeGPT is an AI-powered platform designed for knowledge retrieval and interactive learning. Users can ask questions on any topic and receive beautifully crafted, interactive pages tailored to their curiosity, rather than just a list of links. The platform offers customizable experiences, including interactive courses for language learning, calculators for financial planning, data explorers for product comparisons, visual timelines for historical events, interactive quizzes for general knowledge, step-by-step guides for recipes, and travel guides for destination planning. It aims to transform how users discover and interact with information, making learning and data exploration more engaging and personalized.

opencv

60%

OpenCV (Open Source Computer Vision Library) is a powerful and widely adopted library designed for computer vision and machine learning tasks. It offers a comprehensive suite of tools for image and video analysis, including functionalities for object detection, facial recognition, image manipulation, and 3D reconstruction. The library supports various programming languages like C++, Python, and Java, making it accessible to a broad range of developers and researchers. Its open-source nature fosters a vibrant community, contributing to continuous development and a rich ecosystem of resources, tutorials, and applications. OpenCV is a fundamental tool for anyone working on projects involving visual data interpretation and processing.

EXPLORE OTHER CATEGORIES

🎨 Content & Design 📊 Productivity & Business 💻 Coding & Development 📚 Research & Education 🧘 Wellness & Lifestyle 💼 Career Development 📈 Marketing & Growth 📉 Data & Analytics 💬 Customer Support & CX 💰 Finance 🛒 E-commerce