AI Agents & Automation
Browsing page 185 of AI Agents & Automation. Sorted by confidence score — our independent quality rating.
lagent
Lagent is a lightweight framework designed for building sophisticated LLM-based agents, inspired by the design philosophy of PyTorch. It simplifies the process of creating multi-agent applications by allowing users to focus on defining layers and message passing between them in a Pythonic way. Key features include agent-to-agent communication via AgentMessage, memory management for conversational context, and custom message aggregation. The framework also supports flexible response formatting and consistent tool calling through ActionExecutor. Lagent offers dual interfaces (synchronous and asynchronous) for debugging and large-scale inference, making it suitable for various development needs.
KG_RAG
KG_RAG is an open-source framework designed to enhance Large Language Models (LLMs) through Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG). It integrates the explicit knowledge from a Knowledge Graph (KG) with the implicit knowledge of an LLM, making it particularly effective for knowledge-intensive tasks. The framework extracts "prompt-aware context" from KGs, such as the massive biomedical SPOKE KG, to provide LLMs with minimal yet sufficient context for user prompts. This approach significantly improves the accuracy and relevance of LLM responses, as demonstrated in use cases comparing GPT with and without KG-RAG. The tool is currently optimized for disease-related prompts and includes a benchmark dataset, BiomixQA, for validation.
talk-to-chatgpt
talk-to-chatgpt was a Google Chrome and Microsoft Edge extension designed to enable voice interaction with ChatGPT. Users could speak to the AI using speech recognition and receive spoken responses through text-to-speech, making conversations more natural. It supported ElevenLabs API integration for custom voices and offered settings for voice, language, and speech rate. While initially a fun proof of concept, it also aimed to assist elderly and disabled individuals in interacting with ChatGPT. The project has since been discontinued due to OpenAI's changes and the release of official desktop applications, which render the extension obsolete. Users are encouraged to fork the project for further development.
julius
Julius is a high-performance, small-footprint open-source large vocabulary continuous speech recognition (LVCSR) decoder software. It is designed for speech-related researchers and developers, capable of real-time decoding on diverse platforms from micro-computers to cloud servers. The engine utilizes a 2-pass tree-trellis search algorithm, incorporating advanced decoding techniques like tree-organized lexicons, N-gram factoring, and enveloped beam search. Julius is modular, supporting various HMM structures and offering multi-instance recognition. It adopts standard formats for models, ensuring compatibility with other speech and language modeling toolkits like HTK and SRILM. Recent versions also support Deep Neural Network (DNN) based real-time decoding, making it a versatile tool for speech recognition research and application development.
transfer-learning-conv-ai
transfer-learning-conv-ai is an open-source repository from Hugging Face, offering a clean and commented codebase for building state-of-the-art conversational AI. It leverages transfer learning from OpenAI GPT and GPT-2 Transformer language models to create dialog agents. The repository includes comprehensive training and testing scripts, allowing users to reproduce results from the NeurIPS 2018 ConvAI2 competition, where Hugging Face's participation was state-of-the-art on automatic metrics. It supports single and multi-GPU training, with options for distributed and FP16 training, making it possible to train a model in about an hour on an 8 V100 cloud instance. A pre-trained and fine-tuned model is also available for immediate interaction, simplifying the setup process for developers and researchers.
TavernAI
TavernAI serves as an atmospheric frontend for chat and storywriting, compatible with a wide range of AI language models including KoboldAI, NovelAI, Pygmalion, OpenAI (ChatGPT, GPT-4), Claude, and Ollama. Users can create characters, engage in group chats with multiple characters simultaneously, and utilize a story mode with world info. The tool offers configurable generation settings, interface themes (including one resembling CharacterAI), and customizable backgrounds. It also features message editing, deletion, and movement, along with GPT and Claude picture recognition capabilities. TavernAI is available for Windows, Linux, MacOS, and can be run online via Google Colab, making it accessible on phones and tablets.
kani
kani (カニ) is a lightweight and highly hackable microframework designed for chat-based language models with tool usage and function calling capabilities. Unlike more opinionated frameworks, kani offers extensive customizability over the control flow, making it an ideal choice for NLP researchers, hobbyists, and developers who require precise control. It supports a wide range of models out-of-the-box, including OpenAI, Anthropic, Google AI, Hugging Face transformers, llama.cpp, and vLLM, with a model-agnostic framework for easy integration of others. Key features include automatic chat memory management, robust function calling with model feedback and retry mechanisms, and support for multimodal inputs. The framework is built with an asynchronous design, allowing for scalable parallel chat sessions.
CustomAI Studio
CustomAI Studio specializes in designing, building, and deploying custom AI systems tailored for real businesses. Their proprietary AgenticOS framework is used to create Agentic AI systems that deliver tangible P&L impact. The process involves a data-first methodology to identify where AI can create leverage, mapping information flow to pinpoint workflows suitable for LLM or agent execution. An AI Solutions Architect embeds with the client's team to develop a Custom AI Blueprint, including a workflow map, ROI model, and implementation roadmap. Development follows a spec-driven approach, ensuring production-grade AI systems are built rapidly with automated testing. CustomAI Studio employs progressive deployment, rolling out high-leverage modules one at a time to ensure measurable ROI at each stage, integrating systems into existing tools without disruption.
llama-cpp-agent
The llama-cpp-agent framework is a powerful tool designed for easy interaction with Large Language Models (LLMs). It provides a comprehensive interface for chatting with LLMs, executing single and parallel function calls, and generating structured output. A key differentiator is its ability to work with models not specifically fine-tuned for JSON output and function calls, thanks to guided sampling. The framework also supports Retrieval Augmented Generation (RAG) with Colbert reranking and offers various agent chains, including Conversational, Sequential, and Mapping Chains, for processing text with tools. It is compatible with multiple providers like llama.cpp server, llama-cpp-python, TGI, and vllm servers, and supports Python functions, Pydantic tools, llama-index tools, and OpenAI tool schemas.
llamafarm
LlamaFarm is an open-source AI platform designed for deploying AI models, agents, databases, RAG, and pipelines locally or remotely. It emphasizes complete privacy, ensuring data never leaves the user's device, and eliminates API costs by utilizing open-source models. The platform is offline-capable once models are downloaded and is hardware-optimized for GPU/NPU acceleration on Apple Silicon, NVIDIA, and AMD. Users can build RAG applications, train custom classifiers, detect anomalies, and perform document processing. It offers a desktop app for instant setup, a CLI for development, and a Designer web interface for project management, RAG configuration, and prompt engineering.
BetterBrain
BetterBrain specializes in providing mid-market companies with production-ready AI solutions quickly, leveraging proprietary accelerators and a stack-agnostic approach. The platform offers full-stack delivery, covering everything from initial discovery and strategy mapping to development, deployment, adoption, and continuous optimization. Key offerings include BetterSearch for enterprise knowledge retrieval, BetterDocs for document intelligence, BetterAgent for custom AI agents, BetterVoice for voice agent automation, BetterChat for conversational AI, and BetterInsight for predictive analytics. BetterBrain aims to help companies transition from being AI-ready to AI-first, addressing common challenges like slow implementation times and pilot purgatory.
llm-applications
llm-applications offers a comprehensive guide and resources for building Retrieval Augmented Generation (RAG) based Large Language Model (LLM) applications ready for production. This open-source project, hosted on GitHub, details how to develop such applications from scratch, scale their core components like loading, chunking, embedding, and serving, and evaluate different configurations for optimal performance. It also covers implementing LLM hybrid routing to integrate both open-source and closed LLMs, and serving applications in a scalable and highly available manner. The guide includes practical setup instructions for API keys (OpenAI, Anyscale Endpoints), environment configuration, and data management, making it a valuable resource for developers looking to productionize their AI solutions.
LLM4Rec-Awesome-Papers
LLM4Rec-Awesome-Papers is a meticulously curated collection of academic papers and resources focused on the intersection of large language models (LLMs) and recommender systems. This GitHub repository serves as an invaluable resource for researchers, data scientists, and developers working in the field of AI and recommendation technology. It categorizes papers based on whether they involve 'No Tuning' or 'Supervised Fine-Tuning' of LLMs, offering insights into various approaches to integrating LLMs into recommendation systems. The list is continuously updated, ensuring users have access to the latest advancements and research findings. It also includes related surveys and common datasets, making it a comprehensive hub for staying current with the rapidly evolving landscape of LLM-enhanced recommendation systems.
VieNeu-TTS
VieNeu-TTS is an advanced Vietnamese Text-to-Speech (TTS) model featuring instant voice cloning and bilingual English-Vietnamese support. It's optimized for on-device, real-time CPU inference, delivering high-quality 24kHz audio. The tool includes a Turbo mode for extremely fast inference on CPUs and low-end devices, alongside a Standard mode for maximum audio quality and high-fidelity voice cloning. VieNeu-TTS also incorporates AI identification through audio watermarking for responsible content creation and is production-ready for offline use. It provides a Python SDK and can be deployed as a high-performance API server.
llm_distillation_playbook
The llm_distillation_playbook offers a comprehensive guide to best practices for distilling large language models (LLMs) into smaller, more efficient counterparts suitable for production applications. It targets engineers and ML practitioners with deep learning fundamentals and LLM familiarity. The playbook covers key concepts like teacher and student models, and provides practical advice across 12 best practices, including understanding smaller model limitations, building robust logging infrastructure, defining clear evaluation criteria, and maximizing teacher model quality. It also emphasizes the importance of diverse datasets, starting simple, and continuous monitoring in production environments. The document draws from experiences at Google and Predibase, aiming to systemize recommendations for effective LLM refinement.
TTS-Voice-Wizard
TTS-Voice-Wizard is a comprehensive tool designed to enhance the VRChat experience and beyond, offering robust Speech-to-Text and Text-to-Speech capabilities. Users can convert spoken words into text and back to speech using various methods, with over 100 different voices and customization options. A key feature is the ability to send transcribed speech as OSC messages to VRChat, displaying it on avatars or in the chatbox. The tool also supports real-time translation into over 50 languages, displays current Spotify or Windows Media song information, and shows tracker/controller battery life. Advanced features include voice commands for VRChat avatar parameters and customizable interactive counters.
magpie
Magpie is an open-source project designed for alignment data synthesis, enabling the generation of high-quality synthetic data for training large language models (LLMs). Unlike traditional methods that depend on prompt engineering or seed questions, Magpie leverages the auto-regressive nature of aligned LLMs to generate both user queries and corresponding LLM responses from scratch. This innovative approach allows for scalable data creation by prompting LLMs with their pre-query templates. The tool supports various LLM families, including Llama, Qwen, Phi, and Gemma, and offers scripts for batched SFT data generation, multi-turn conversation extension, and comprehensive dataset filtering and tagging. Magpie aims to democratize AI by making high-quality alignment data accessible and transparent.
Gooru
Gooru Learning offers MyGooru AI for Personalized Pathways (MAP), an AI-driven personalization infrastructure designed to deliver assured outcomes across various industries including learning, finance, health, and enterprise. MAP goes beyond generative AI by using formal reasoning to build beliefs about each user and generate mathematically certain pathways, ensuring engagement and completion. It actively senses user mindsets, motivation, confidence, and intent, continuously updating probabilistic beliefs across knowledge, mindsets, interests, abilities, and community. This approach helps lower customer acquisition costs through personalized discovery and smarter conversion funnels, while increasing lifetime value via adaptive engagement and outcome completion. Gooru also provides tools for instructors, institution leaders, and curriculum developers.
unity-mcp
Unity MCP acts as a bridge, allowing AI assistants like Claude and Cursor to interact directly with your Unity Editor via a local Model Context Protocol (MCP) Client. This tool empowers Large Language Models (LLMs) with the capabilities to manage assets, control scenes, edit scripts, and automate various tasks within Unity. It offers a wide range of tools for managing profiler, physics, editor functions, builds, cameras, graphics, and more. Unity MCP also supports multiple Unity Editor instances and provides robust API verification through reflection and official documentation access. It is proudly sponsored and maintained by Coplay, offering a powerful way to integrate AI into game development workflows.
local-rag
local-rag is an open-source tool designed for Retrieval Augmented Generation (RAG) using open-source Large Language Models (LLMs). It allows users to ingest various file types, including local files, GitHub repositories, and websites, all without relying on third-party services or sending sensitive data outside their network. A key differentiator is its support for offline embeddings and LLMs, eliminating the need for external APIs like OpenAI. The tool features streaming responses, conversational memory, and chat export capabilities, making it suitable for secure, local RAG implementations.
transformers.js
transformers.js is an open-source JavaScript library designed to bring state-of-the-art machine learning capabilities directly to the web browser. It allows developers to run Hugging Face's Transformer models without requiring a server, offering functional equivalence to the Python library. The tool supports a wide range of tasks including text classification, image segmentation, automatic speech recognition, and zero-shot object detection. It leverages ONNX Runtime for browser execution and allows for easy conversion of PyTorch, TensorFlow, or JAX models to ONNX using 🤗 Optimum. Developers can install it via NPM or use it directly in vanilla JS via CDN, with options to run models on CPU (WASM) or GPU (WebGPU) and adjust quantization for performance.
LongtermChatExternalSources
LongtermChatExternalSources is an open-source project that enables the creation of GPT-3 chatbots featuring long-term memory and the ability to integrate external data sources. This tool is designed for developers and AI enthusiasts who want to build more sophisticated conversational AI agents. It requires Python 3 and an OpenAI API key for setup, making it accessible for those with basic programming knowledge. By incorporating long-term memory, the chatbot can maintain context across extended conversations, leading to more coherent and natural interactions. The integration of external sources further enhances its knowledge base, allowing it to draw information from beyond its initial training data.
wandb
wandb (Weights & Biases) is a comprehensive AI developer platform designed to streamline the machine learning lifecycle. It allows users to train and fine-tune models, and manage them effectively from initial experimentation through to production deployment. The platform provides robust tools for tracking and visualizing all components of a machine learning pipeline, including datasets and models. For those building LLM applications, wandb offers Weave, a dedicated suite for tracking, debugging, evaluating, and monitoring GenAI projects. It integrates seamlessly with popular ML frameworks and libraries, simplifying experiment tracking and data versioning. Users can deploy wandb in a multi-tenant cloud, dedicated cloud, or self-managed on-premises infrastructure.
VMove
VMove is a private driver shared mobility platform dedicated to promoting green transportation. Leveraging AI and machine learning, the platform optimizes routes to enhance efficiency and reduce environmental impact. While the current website content is minimal, the tool's core offering appears to be B2B services, specifically focusing on employee transportation. This suggests a solution for businesses looking to streamline their internal logistics and provide sustainable commuting options for their staff. The platform aims to improve operational efficiency through intelligent route planning.