AI Agents & Automation
Browsing page 452 of AI Agents & Automation. Sorted by confidence score — our independent quality rating.
Holo1.5 Localization
Holo1.5 Localization is an AI tool developed by Hcompany, available as a Hugging Face Space, designed for high-resolution UI grounding. Users can upload an image of a user interface and then specify a particular UI element they wish to locate. The application will then process the image and predict the coordinates of the specified element, marking its position on the image. This functionality is part of the broader Holo1.5 initiative, which focuses on open foundation models for computer use agents. The tool is free to use and is licensed under Apache 2.0, making it accessible for various applications in UI analysis and automation.
GamingAgent
GamingAgent is a comprehensive platform designed for the development and evaluation of LLM/VLM-based agents within interactive gaming environments. It facilitates the testing of state-of-the-art models across a diverse suite of video games, both in vanilla single-model VLM settings and with a customized GamingAgent workflow that enhances model gaming performance. The tool supports a wide range of models including those from OpenAI, Anthropic, Gemini, xAI, Deepseek, and Qwen. It also offers an easy solution for deploying computer use agents (CUAs) for gaming directly on PCs and laptops. Researchers and developers can utilize GamingAgent to benchmark model performance, analyze game performance, and generate replay videos, making it an invaluable resource for AI research in gaming.
CX Company (part of CM.com)
CM.com provides a comprehensive platform designed to streamline customer interactions across various touchpoints. It integrates messaging, marketing, customer service, payments, and event management into a single system, allowing all teams to access and work from a unified customer profile. This approach eliminates the complexities of managing multiple integrations. Key offerings include HALO AI Agents for automated customer interactions, a Customer Data Platform for unified profiles, and various communication channels like SMS, WhatsApp, and Email. The platform also supports online and in-person payments, as well as ticketing and cashless solutions for events, ensuring a cohesive customer journey from start to finish.
golearn
golearn is a comprehensive machine learning library designed for the Go programming language, emphasizing both simplicity and customizability. It offers a 'batteries included' approach, providing a wide range of functionalities for machine learning tasks. Users can load data as Instances, perform matrix-like operations, and pass them to various estimators. The library implements the scikit-learn interface of Fit/Predict, allowing for easy swapping of estimators during trial and error. Additionally, golearn includes helper functions for data management, such as cross-validation and train-test splitting. It supports various algorithms including KNN, linear models, neural networks, and decision trees, making it suitable for diverse machine learning applications.
prismatic-vlms
prismatic-vlms offers a flexible and efficient codebase for training visually-conditioned language models (VLMs). It natively supports diverse visual backbones like CLIP, SigLIP, and DINOv2, with an easy mechanism for adding new ones via TIMM. The tool also integrates with arbitrary instances of AutoModelForCausalLM from Transformers, including both base and instruct-tuned language models. Designed for easy scaling, prismatic-vlms leverages PyTorch FSDP and Flash-Attention to efficiently train models ranging from 1B to 34B parameters on configurable dataset mixtures. It also includes an evaluation codebase for rigorously testing VLMs across 12 vision-and-language benchmarks and provides full instructions and configurations for reproducing results.
pytorch-attention
pytorch-attention offers a robust PyTorch implementation of various cutting-edge deep learning models, including a wide array of attention mechanisms, vision transformers, MLP-like models, and convolutional neural networks. This open-source codebase is designed for researchers and engineers to easily experiment with and integrate advanced architectures into their projects. It features implementations of models like Squeeze-and-Excitation Attention, ViT, ResNet, and MLP-Mixer, complete with code examples for quick setup and testing. The repository is modular and extensible, making it a valuable resource for anyone working on computer vision and deep learning tasks, providing a foundation for both academic research and practical application development.
KOGO
KOGO Workspace is an AI Agents & Automation tool designed to run entire departments using agentic AI apps. It emphasizes 100% privacy, giving users full control over their data. The platform allows businesses to build, manage, and deploy AI agents at scale, supported by enterprise-grade security. KOGO aims to streamline operations by enabling AI to understand human intent and execute tasks, offering pre-built actions and agent templates suitable for various industries. This makes it a comprehensive solution for organizations looking to integrate advanced AI capabilities into their business processes while maintaining data sovereignty.
optimate
OptiMate is an open-source collection of libraries developed by Nebuly AI, aimed at optimizing AI model performance. While it is now in a legacy phase and no longer actively maintained, the source code remains available for reference. Key components include Speedster, which helps reduce inference costs by leveraging state-of-the-art optimization techniques for AI models on various hardware, and Nos, designed to lower infrastructure costs through real-time dynamic partitioning and elastic quotas for Kubernetes GPU clusters. Additionally, ChatLLaMA is included for fine-tuning optimization and RLHF alignment to reduce hardware and data costs. The project is ideal for developers and data scientists looking to explore or implement AI model optimization techniques.
pinns-torch
PINNs-Torch is a PyTorch-based implementation of Physics-Informed Neural Networks (PINNs), designed to accelerate scientific computing tasks. A key differentiator is its integration of CUDA Graphs and JIT Compilers (TorchScript), which can boost performance by up to nine times compared to earlier TensorFlow v1 implementations. The package is open-source and provides a robust framework for researchers and developers to build and experiment with PINNs. It includes examples for various problems, such as the Navier-Stokes PDE, and offers flexible installation options for both users and contributors. The tool is ideal for those looking to leverage the power of PyTorch for physics-informed machine learning, with a focus on speed and usability.
LongNet
LongNet is an open-source implementation of the plug-in and play attention mechanism described in the paper "LongNet: Scaling Transformers to 1,000,000,000 Tokens." This Transformer variant is designed to significantly extend the sequence length that models can handle, reaching up to 1 billion tokens, while maintaining strong performance on shorter sequences. Its core innovation is dilated attention, which expands the attentive field exponentially as the distance between tokens grows. LongNet offers linear computational complexity and a logarithmic dependency between tokens, making it suitable for distributed training of extremely long sequences. Its dilated attention can be seamlessly integrated into existing Transformer-based optimization methods, providing a drop-in replacement for standard attention.
polyaxon
Polyaxon is an open-source MLOps platform designed to manage and orchestrate the entire machine learning lifecycle. It focuses on solving reproducibility, automation, and scalability challenges for deep learning applications. The platform supports major deep learning frameworks like TensorFlow, MXNet, Caffe, and PyTorch, and can be deployed in any data center, cloud provider, or hosted by Polyaxon. Key features include experiment tracking, distributed job management, hyperparameter tuning with algorithms like Grid Search and Bayesian Optimization, parallel executions, and DAGs for managing complex machine learning pipelines. Polyaxon provides a dashboard for monitoring projects and experiments, making it faster and more efficient to develop and deploy ML models.
MatchSum
MatchSum offers an implementation of the ACL 2020 paper "Extractive Summarization as Text Matching." This tool is designed for researchers and developers working on natural language processing tasks, specifically extractive summarization. It supports both BERT and RoBERTa encoders and provides pre-trained models for the CNN/DailyMail dataset, as well as other datasets like WikiHow, PubMed, XSum, MultiNews, and Reddit. Users can process their own data by converting it to a specific JSONL format and generating candidate summaries. The code requires Python 3.7, PyTorch 1.4.0, fastNLP 0.5.0, pyrouge 0.1.3, rouge 1.0.0, and transformers 2.5.1, and is optimized for Linux environments with GPU support.
MOSS-TTSD
MOSS-TTSD is an advanced open-source spoken dialogue generation model designed for expressive multi-speaker synthesis, moving beyond traditional text-to-speech to "script-to-conversation." It supports 1 to 5 speakers with flexible control over turn-taking, overlapping speech, and distinct persona maintenance. A key differentiator is its extreme long-context modeling, supporting up to 60 minutes of coherent audio in a single session with consistent identity. The tool offers state-of-the-art zero-shot voice cloning from short audio references and robust cross-lingual performance across 20 major languages, including Chinese, English, Japanese, and European languages. It is fine-tuned for diverse scenarios like AI podcasts, dynamic commentary, audiobooks, dubbing, and crosstalk.
Mini-Agent
Mini-Agent is a minimal yet professional open-source demo project designed to illustrate best practices for building AI agents using the MiniMax M2.5 model. It leverages an Anthropic-compatible API, enabling interleaved thinking to enhance M2's powerful reasoning capabilities for long and complex tasks. Key features include a full agent execution loop with basic file system and shell operations, persistent memory via an active Session Note Tool, and intelligent context management that automatically summarizes conversation history for infinitely long tasks. The project also integrates 15 professional Claude Skills for documents, design, testing, and development, and natively supports MCP for tools like knowledge graph access and web search. With comprehensive logging and a clean CLI, Mini-Agent serves as an excellent starting point for advanced agent development.
SnakeAI
SnakeAI is an open-source project designed to train a neural network to play the classic game Snake through the application of a genetic algorithm. Each snake within the simulation is equipped with a neural network, initially featuring an input layer of 24 neurons, two hidden layers of 16 neurons, and an output layer of 4 neurons. A key feature is the ability to customize the number of hidden layers and neurons, offering flexibility for experimentation. The snake's 'vision' system provides 24 inputs by detecting the distance to food, its own body, and walls in 8 directions. The evolutionary process involves natural selection across generations of 2000 snakes, with fitness scores determining reproduction. Snakes are rewarded more for higher scores than simply staying alive, with a move limit to prevent endless looping. Crossover and mutation mechanisms are used to evolve the neural networks, and models can be saved and loaded for further testing and analysis.
Powered By Intel Leaderboard
Powered By Intel Leaderboard is a platform designed for the evaluation and comparison of open-source language models (LLMs) specifically on Intel hardware. Users can submit their LLMs to be ranked on the leaderboard, providing details such as the model name, the Intel hardware used for evaluation, and the precision settings. This tool offers a valuable resource for developers and researchers to gain visibility for their models and receive feedback on performance. It serves as a centralized hub for understanding how different LLMs perform within the Intel ecosystem, fostering innovation and development in the AI community.
Zulu OpenClaw App
Zulu OpenClaw App is an autonomous AI agent platform built on open-source foundations, designed to make powerful AI accessible and production-ready without complex setup or security risks. It allows users to connect with services like Gmail, Google Sheets, Calendar, and Analytics to automate tasks such as clearing inboxes, managing schedules, tracking leads, and surfacing website insights. The platform supports full agent capabilities including coding, browsing, file management, and persistent memory, going beyond simple chat. It's available through WhatsApp, Telegram, a native app, or API, and offers transparent billing with token-based usage, ensuring no surprise costs. OpenZulu bridges the gap between powerful open-source AI and a managed, secure, and easy-to-use platform for businesses and individuals.
PepHop AI
PepHop AI provides an AI-powered platform for creating and interacting with customizable AI characters. Users can choose from a variety of existing AI personalities or design their own, tailoring their appearance, backstory, and conversational style. The platform supports both SFW (Safe for Work) and NSFW (Not Safe for Work) content, offering flexibility for different user preferences. It leverages advanced language processing to facilitate natural and engaging conversations, and importantly, it remembers past dialogues to maintain context and continuity across interactions. This allows for more personalized and evolving relationships with the AI characters.
Olimi AI
Olimi AI specializes in providing voice AI agents for businesses, particularly focusing on the MENA region with native accuracy in Arabic. The platform supports over 20 languages and dialects, including English, French, Spanish, and Italian, allowing for broad international deployment. These voice agents are designed to handle various tasks, such as qualifying leads, following up on payments, and managing routine conversations, making them suitable for businesses with high call volumes. Olimi AI aims to deliver natural, human-sounding interactions in real-time, enhancing customer experience and operational efficiency across multiple industries.
rf-detr
RF-DETR is a real-time transformer architecture for object detection and instance segmentation, developed by Roboflow. Built on a DINOv2 vision transformer backbone, it achieves state-of-the-art accuracy and latency trade-offs on Microsoft COCO and RF100-VL datasets. The tool supports both detection and instance segmentation through a consistent API and is designed for fine-tuning. It offers various model sizes, from Nano to 2XLarge, with some larger models requiring the `rfdetr_plus` extension. RF-DETR can be installed via pip or from source, and models can be run using the `rfdetr` package or the Inference library. Training capabilities are available in Google Colab or directly on the Roboflow platform.
HalloSophia
HalloSophia is a leading platform designed to empower digital services with scalable eService and eConsulting, significantly enhanced by AI-driven support automation. It enables businesses, particularly tech companies and consulting agencies, to offer world-class, productized expert video consultations to their clients. The platform focuses on elevating service offerings through innovative technology, providing a robust solution for managing and automating customer interactions and expert advice. HalloSophia helps streamline operations, improve customer satisfaction, and drive efficiency in service delivery by leveraging artificial intelligence to support and automate various aspects of customer engagement and consultation.
Real-Time Threat Detection Agent
Real-Time Threat Detection Agent is an open-source collection of AI agents designed to automate various cybersecurity tasks using Large Language Models (LLMs). Built on the AutoGen framework, this tool offers a modular design, allowing for customization and combination of individual agents and tasks to fit specific security needs. It aims to automate repetitive and complex tasks, freeing up security teams for strategic analysis. The project provides a comprehensive set of pre-defined workflows, agents, and tasks, enabling users to quickly implement cyber security automation. It includes features like detecting EDR running on Windows systems based on live data and supports scenarios for demonstrating exfiltration or payload downloading. Users are cautioned to run LLM-generated code in virtual or test environments due to security risks.
Flash Insights
Flash Insights, despite its name, currently hosts content for 'Dominoqq > Qiu Qiu | Situs Pkv Games QQ | Domino99 | AhliQQ', an online gambling platform. The website details various Pkv Games card games such as poker online, sakong, capsa susun, and dominoqq (qiu qiu). It emphasizes ease of transactions through local banks and e-wallets, and claims to be a trusted site ensuring fair play. The content also mentions support for multiple devices and 24-hour availability. This suggests a significant discrepancy between the tool's intended purpose and its current web presence.
Altered State Machine
Altered State Machine (ASM) is a protocol designed for Non-Fungible Intelligence, enabling the ownership of AI within NFTs. It provides an open protocol for AI agents that users can own, evolve, and personalize. The platform, branded as THINK, emphasizes 'Compiled Intelligence' where AI becomes code, reducing its reliance on models over time. This approach aims for on-device intelligence by default. Users can explore ThinkOS, a system for building and interacting with these AI agents, and engage with the community. The project also features a dashboard, AppCoin, and points system, suggesting a broader ecosystem for AI agent development and interaction.