AI Agents & Automation
Browsing page 462 of AI Agents & Automation. Sorted by confidence score — our independent quality rating.
kokoro-tts
kokoro-tts is an open-source command-line interface (CLI) text-to-speech tool built on the Kokoro model, designed to convert text into natural-sounding speech. It offers extensive language and voice support, including the ability to blend multiple voices with customizable weights for unique audio outputs. The tool can process various input formats such as TXT, EPUB books, and PDF documents, automatically extracting chapters for organized output. Users can stream audio directly, adjust speech speed, and save output in WAV or MP3 formats. It also supports GPU acceleration for faster processing and provides detailed debug output for troubleshooting, making it a versatile solution for generating audio content from diverse text sources.
kornia
Kornia is a differentiable computer vision library built on PyTorch, designed for spatial AI applications. It offers a comprehensive suite of differentiable image processing and geometric vision algorithms, allowing users to leverage powerful batch transformations, auto-differentiation, and GPU acceleration. Key features include a wide range of image processing operators like filters, transformations, and enhancements, as well as advanced augmentation pipelines for training AI models. Kornia also provides access to pre-trained AI models for tasks such as face detection, feature matching, segmentation, and classification. The library is expanding its focus towards end-to-end vision models, with a particular emphasis on integrating state-of-the-art Vision Language Models (VLM) and Vision Language Agents (VLA). It supports multi-framework usage, including TensorFlow, JAX, and NumPy, making it a versatile tool for developers and researchers in the AI and computer vision fields.
Interactive-LLM-Powered-NPCs
Interactive LLM Powered NPCs is an open-source project designed to revolutionize how players interact with non-player characters in video games. It enables engaging conversations with NPCs using microphone input, converting speech to text for processing by a Large Language Model (LLM). The system utilizes facial recognition to identify characters, vector stores for limitless NPC memory, and pre-conversation files to shape dialogue styles. NPCs can even perceive player facial expressions via webcam, adjusting responses accordingly. This project targets popular open-world titles like Cyberpunk 2077 and Assassin's Creed, integrating seamlessly without modifying game source code by replacing facial pixels with generated animations. It aims to bring immersive dialogue adventures to existing games, filling a long-standing void in player interaction.
Klu
Klu is a meeting automation platform designed to enhance productivity for modern teams. It focuses on automating workflows and integrating with existing tools to streamline meeting management. The platform aims to help users take meeting notes with no effort, suggesting a focus on efficiency and ease of use. By connecting to various tools, Klu seeks to centralize meeting-related tasks and information, ultimately leading to more productive team interactions. Its core offering appears to be around simplifying the often-tedious aspects of meetings, allowing teams to concentrate on core discussions and decisions.
lmnr
Laminar is an open-source observability platform specifically designed for AI agents, offering comprehensive tools for tracing, evaluations, and AI monitoring. It features an OpenTelemetry-native tracing SDK that requires only a single line of code to automatically trace popular AI frameworks like Vercel AI SDK, LangChain, OpenAI, Anthropic, and Gemini. The platform also includes an unopinionated, extensible SDK and CLI for running evaluations locally or in CI/CD pipelines, with a UI for visualizing and comparing results. Users can define events with natural language descriptions for AI monitoring, track issues, logical errors, and custom agent behavior. All data is accessible via SQL, allowing for querying traces, metrics, and events, bulk dataset creation, and custom dashboards. Laminar boasts extremely high performance, built with Rust, featuring a custom real-time engine for trace viewing and ultra-fast full-text search over span data.
logfire
Logfire is an AI observability platform designed for production LLM and agent systems, built by the team behind Pydantic Validation. It offers a simple and powerful dashboard that provides Python-centric insights, including rich display of Python objects, event-loop telemetry, and profiling of Python code and database queries. Users can query their data using standard SQL, leveraging existing BI tools. Logfire is an opinionated wrapper around OpenTelemetry, supporting all OpenTelemetry signals (traces, metrics, and logs) and enabling integration with existing tooling and infrastructure. It also features deep Pydantic integration to understand data flow through models and provides built-in validation analytics. The platform's SDKs are open source, while the server application and UI are closed source, with an enterprise license available for self-hosting.
Ai Angels
AI Angels offers a platform for users to chat with over 70 AI angel girlfriends, providing romantic, supportive, and 24/7 NSFW AI companion experiences. Key features include persistent memory across conversations, uncensored chat, unlimited messaging, and real-time voice chat. Users can customize their AI girlfriend's personality, interests, appearance, and style. The platform also supports AI girlfriend image generation on demand and roleplay scenarios, aiming for realistic companions with emotional support capabilities. AI Angels differentiates itself with free unlimited messages and no content filters, unlike some alternatives.
model_analyzer
Triton Model Analyzer is a command-line interface (CLI) tool designed to help users better understand the compute and memory requirements of models running on the Triton Inference Server. It assists in finding optimal configurations for various model types, including single, multiple, ensemble, and BLS models, on a given piece of hardware. The tool offers several search modes, such as Optuna Search for hyperparameter optimization, Quick Search for sparse exploration of batch size and instance group parameters, and Automatic/Manual Brute Search for exhaustive parameter sweeps. Model Analyzer also supports profiling Large Language Models (LLMs) and generates detailed and summary reports to highlight trade-offs between different model configurations. Users can apply QoS constraints to filter results based on specific latency or other performance requirements.
natasha
Natasha is a powerful open-source Python library designed to solve basic NLP tasks specifically for the Russian language. It offers a comprehensive suite of functionalities including tokenization, sentence segmentation, word embedding, morphology tagging, lemmatization, phrase normalization, syntax parsing, NER tagging, and fact extraction. The library emphasizes production readiness, focusing on optimized model size, RAM usage, and performance, with models running efficiently on CPU using Numpy for inference. Natasha integrates several specialized libraries like Razdel for segmentation, Navec for compact Russian embeddings, Slovnet for deep-learning morphology, syntax, and NER, and Yargy for rule-based fact extraction. While its API may evolve, it provides a convenient unified interface for various Russian NLP tasks, with models primarily optimized for news articles.
Baby Name Maker
MTAD (Mass Technology And Development) provides a comprehensive suite of services designed to help businesses thrive in the digital age. Their offerings include cutting-edge Future Technology Research, where experts delve into AI, machine learning, blockchain, and IoT to provide invaluable insights and help businesses make informed decisions. They also specialize in Website Development, creating stunning, user-friendly, and results-driven custom web solutions. For mobile presence, MTAD offers App Development for both iOS and Android platforms, crafting innovative and feature-rich applications. Additionally, their Digital Marketing services focus on maximizing online presence and driving growth through data-driven strategies like SEO and targeted social media campaigns, ensuring a high return on investment for their clients.
OpenFace
OpenFace is a state-of-the-art, open-source toolkit designed for comprehensive facial behavior analysis. It enables real-time facial landmark detection, accurate head pose estimation, robust facial action unit recognition, and precise eye-gaze estimation. Developed by Tadas Baltrušaitis in collaboration with CMU MultiComp Lab, OpenFace is intended for computer vision and machine learning researchers, as well as the affective computing community. The tool stands out for its ability to run efficiently from a simple webcam without requiring specialized hardware, making advanced facial analysis accessible. It provides source code for both running and training models, ensuring flexibility and extensibility for research and application development.
PaddleViT
PaddleViT, or PPViT, is an open-source collection of state-of-the-art Visual Transformer and MLP Models specifically designed for PaddlePaddle 2.0+. It goes beyond traditional convolutional neural networks by offering a wide array of vision models based on Visual Transformers, Visual Attentions, and MLPs. The tool integrates popular layers, utilities, optimizers, schedulers, data augmentations, and training/validation scripts to facilitate the reproduction of cutting-edge ViT and MLP models. PaddleViT supports multiple vision tasks including image classification, object detection, semantic segmentation, and GANs, with each model architecture defined in a standalone Python module for easy modification and research. It also provides pretrained weights for fine-tuning on custom datasets and includes tools for customized datasets, data preprocessing, performance metrics, and DDP for high-performance training.
parameter_efficient_instruction_tuning
parameter_efficient_instruction_tuning is an open-source repository dedicated to the systematic comparison of various parameter-efficient fine-tuning (PEFT) methods for instruction tuning tasks. The project utilizes the SuperNI dataset as its primary benchmark for training and evaluation. Implementations of PEFT methods are adapted from well-known libraries such as adapter-transformers and peft. The repository includes bash scripts for running experiments, optimized for the hfai HPC platform, supporting features like experiment configuration, checkpoint management, and training state validation. It also addresses platform-specific considerations like PyTorch and CUDA compatibility, making it a valuable resource for researchers and developers working on efficient large language model fine-tuning.
Point-BERT
Point-BERT is a PyTorch implementation of a novel pre-training paradigm for 3D point cloud Transformers, introduced in CVPR 2022. Inspired by BERT, it utilizes a Masked Point Modeling (MPM) task where point clouds are divided into local patches, and a discrete Variational AutoEncoder (dVAE) tokenizes these patches. The pre-training objective involves recovering original point tokens at masked locations, supervised by the dVAE's output. This method significantly advances the capabilities of Transformers for 3D data, facilitating tasks like classification on ModelNet40 and ScanObjectNN, few-shot learning, and part segmentation on ShapeNetPart. It is an essential tool for researchers and engineers working with 3D point cloud analysis.
pipelines
Kubeflow Pipelines is a core component of the Kubeflow platform, designed to simplify and scale machine learning (ML) workflows on Kubernetes. It provides end-to-end orchestration capabilities, making it easier to build, deploy, and manage complex ML pipelines. The service focuses on enabling easy experimentation, allowing users to quickly iterate on ideas and manage various trials. Furthermore, it promotes re-use of components and pipelines, accelerating the development of ML solutions without constant rebuilding. Kubeflow Pipelines leverages Argo Workflows for orchestrating Kubernetes resources and offers a Python SDK for defining pipelines, along with comprehensive API documentation.
rome
ROME (Rank-One Model Editing) is an open-source tool designed for researchers and developers to precisely locate and modify factual associations within large language models, specifically GPT-2 XL and GPT-J. This GPU-only implementation allows for targeted editing of model knowledge without extensive retraining. It provides functionalities for causal tracing to understand model behavior and a straightforward API for specifying rewrite requests. The repository includes evaluation suites for benchmarking editing methods against CounterFact, making it a valuable resource for advancing research in model interpretability and editability. Users can also integrate new editing methods for comparative analysis.
SEAL
SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction) is a novel framework designed for link prediction. It systematically transforms the link prediction task into a subgraph classification problem. For each target link, SEAL extracts its h-hop enclosing subgraph and constructs a node information matrix, which can include structural node labels, latent embeddings, and explicit attributes. This data is then fed into a graph neural network (GNN) to classify the existence of the link, allowing the model to learn from both graph structure features and latent/explicit node features simultaneously. The framework is implemented in both MATLAB and Python, with a PyTorch Geometric version available for testing on OGB, Planetoid, and custom datasets. Notably, SEAL can achieve strong performance even without node embeddings or attributes, leveraging purely graph structures, and can function as an inductive link prediction model.
Causal Foundry
Causal Foundry offers Kenkai, an adaptive AI platform designed for real-time personalization, optimization, and scalable decision-making. Built on ClickHouse, Kenkai streams and queries high-resolution data instantly, enabling enterprise-scale interventions. It leverages reinforcement learning and contextual bandits to continuously optimize engagement strategies through experimentation and adaptation. The platform also includes embedded metrics and analytics, allowing users to define governed metrics once and explore them everywhere, integrating live dashboards directly into existing systems without black boxes. Causal Foundry aims to democratize reinforcement learning for organizations worldwide, adapting to individual preferences, environments, and behaviors.
SalesGPT
SalesGPT is an open-source AI Sales Agent designed to automate sales outreach with context-aware capabilities. It can understand various stages of a sales conversation, from introduction to closing, and act accordingly. The tool integrates with pre-defined product knowledge bases to significantly reduce AI hallucinations and can connect to any data system via Mindware. Key features include automated email communication, Calendly meeting scheduling, and the ability to generate Stripe payment links for closing sales. SalesGPT supports various LLMs through LiteLLM and is optimized for low-latency voice conversations, boasting sub-1-second response times. It also offers enterprise-grade security and human-in-the-loop supervision.
seldon-server
Seldon-server is an open-source machine learning platform designed to help data science teams deploy models into production within a Kubernetes cluster. While this specific project is archived and superseded by Seldon Core, it laid the groundwork for serving a wide range of ML models, including those built with TensorFlow, Keras, Vowpal Wabbit, XGBoost, and Gensim. It features an API with Predict and Recommend endpoints for supervised machine learning models and high-performance recommendation engines, respectively. Other capabilities include dynamic algorithm configuration for A/B and Multivariate tests, a Command Line Interface (CLI), secure OAuth 2.0 REST and gRPC APIs, and a Grafana dashboard for real-time analytics. Seldon-server supports deployment on-premise or in the cloud (e.g., GCP, AWS, Azure).
Self-Driving Delivery Agent
Self-Driving Delivery Agent, also known as DriVLMe, is an open-source project providing the official implementation of the IROS 2024 paper: "Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experience." This tool is designed for researchers and developers working on autonomous driving systems, particularly those interested in integrating large language models (LLMs) with real-world driving experiences. It offers a framework for setting up a conda environment, preparing LLaVA weights, and training/finetuning models on datasets like bddx and SDN. The project includes scripts for pretraining, finetuning, and evaluating autonomous driving agents, making it a valuable resource for advancing the field of AI-driven autonomous vehicles.
self-host-n8n-on-gcr
self-host-n8n-on-gcr provides a comprehensive guide for deploying n8n, a powerful workflow automation platform, on Google Cloud Run. This setup allows users to leverage n8n's capabilities without incurring monthly subscription fees, while also ensuring complete control over their data. The guide details a serverless deployment approach with per-use pricing, effectively eliminating the complexities and costs associated with traditional server maintenance. It covers essential steps including Google Cloud project setup, n8n preparation for Cloud Run, container repository configuration, Cloud SQL PostgreSQL instance creation for database persistence, and secure handling of sensitive data using Secret Manager. The guide also outlines the deployment process to Cloud Run, offering both official image and custom Docker image options, making it suitable for users seeking cost-effective and scalable automation solutions.
Focoos AI
Focoos AI reshapes computer vision by offering ultra-efficient models designed to reduce costs, automate hardware integration, and ensure peak performance across various devices. The platform allows ML Engineers to train, deploy, and iterate models faster than ever, supporting both cloud and edge environments. Its models are engineered for speed, delivering up to 10x faster inference and being 4x lighter in compute and memory compared to mainstream alternatives. Focoos AI provides pre-trained, production-ready models that can be instantly deployed and easily fine-tuned. It features an all-in-one platform for managing, comparing, monitoring, and deploying models, alongside an open-source library for community collaboration and local use. The tool emphasizes security, control, and sustainability, making it suitable for applications in manufacturing, smart cities, and autonomous systems.
strix
Strix is an open-source AI security tool designed to identify and remediate application vulnerabilities. It employs autonomous AI agents that mimic real hackers, dynamically running code to find and validate vulnerabilities with proof-of-concepts. Built for developers and security teams, Strix offers fast, accurate security testing without the overhead of manual penetration testing or the false positives common with static analysis tools. Key capabilities include a full hacker toolkit, collaborative agent teams, real validation with PoCs, a developer-first CLI with actionable reports, and auto-fix and reporting features to accelerate remediation. It integrates seamlessly with GitHub Actions and CI/CD pipelines, allowing for automatic vulnerability scanning on every pull request.