AI Agents & Automation
Browsing page 347 of AI Agents & Automation. Sorted by confidence score — our independent quality rating.
MiniGPT4-video
MiniGPT4-video offers official code for the Goldfish model, designed for understanding arbitrarily long videos, and MiniGPT4-video itself, tailored for short video understanding. This tool advances multimodal Large Language Models (LLMs) by integrating visual and textual tokens for comprehensive video analysis. Goldfish addresses challenges in long video processing through an efficient retrieval mechanism that identifies relevant video clips, making it suitable for applications like movies or TV series. MiniGPT4-video generates detailed descriptions for video clips, facilitating the retrieval process for Goldfish. The project also introduces the TVQA-long benchmark for evaluating long video comprehension and demonstrates significant performance improvements over existing state-of-the-art methods in both long and short video understanding.
ml-cvnets
ml-cvnets is a comprehensive computer vision toolkit developed by Apple, designed for researchers and engineers to efficiently train a wide array of computer vision models. It supports both standard and novel mobile- and non-mobile architectures for tasks such as object classification, object detection, semantic segmentation, and foundation models like CLIP. The library is built on Python 3.10+ and PyTorch, offering features like automatic data augmentation (RangeAugment, AutoAugment, RandAugment) and enhanced distillation support. It includes a model zoo with various CNNs (MobileNet, EfficientNet, ResNet) and Transformers (Vision Transformer, MobileViT, SwinTransformer), making it a versatile platform for advanced computer vision research and development.
nerve
Nerve is a powerful Agent Development Kit (ADK) designed for technical users to build, run, evaluate, and orchestrate LLM-based agents. It simplifies agent creation through a declarative YAML format, allowing definition of system prompts, tasks, tools, and variables in a single file. The kit supports various tools, including shell commands, Python functions, and remote tools, all fully typed and annotated for extensibility. A key differentiator is its native Model Context Protocol (MCP) support, enabling the definition of MCP servers in YAML and acting as both client and server for agent teams and deep orchestration. Nerve also includes an evaluation mode for benchmarking agents with reproducible tests and an LLM-agnostic architecture built on LiteLLM, supporting numerous models like OpenAI, Anthropic, and Ollama.
ncnn
ncnn is a high-performance neural network inference computing framework specifically optimized for mobile platforms. Designed from the ground up with mobile deployment in mind, it boasts no third-party dependencies, ensuring cross-platform compatibility and superior speed on mobile CPU compared to other known open-source frameworks. Developers can leverage ncnn to easily port deep learning algorithms to mobile devices, facilitating the creation of intelligent applications and bringing AI capabilities to users' fingertips. It supports a wide array of convolutional neural networks, including classical, practical, and light-weight architectures, as well as models for detection, segmentation, and pose estimation. ncnn also features ARM NEON assembly-level optimization, sophisticated memory management, multi-core parallel computing, and GPU acceleration via Vulkan API, making it a robust solution for mobile AI.
Office-Word-MCP-Server
Office-Word-MCP-Server implements the Model Context Protocol (MCP) to allow AI assistants to interact with Microsoft Word documents. This server acts as a bridge, offering functionalities for document creation, content addition, formatting, and analysis. Key features include creating new documents, extracting text, adding headings, paragraphs, tables, and images, and applying rich text formatting. It also supports advanced manipulations like deleting paragraphs, inserting content relative to existing text, and managing document protection. The server is designed with a modular architecture for extensibility and can be integrated with AI assistants like Claude for Desktop.
AI Singapore
AI Singapore is a national program launched in May 2017, dedicated to fostering advanced AI capabilities within Singapore. It serves as a nexus for Singapore-based research institutions, AI startups, and established companies, facilitating collaborative efforts in use-inspired research, knowledge creation, tool development, and talent cultivation. The initiative focuses on key areas such as AI Research, Governance, Technology, Innovation, and Products, aiming to generate significant social and economic impact. It also offers various talent development programs, including the AI Apprenticeship Programme (AIAP) and LearnAI, to equip professionals and students with essential AI skills.
openai-assistant-swarm
OpenAI Assistant Swarm is a Node.js library designed to enhance the OpenAI Node SDK by enabling automatic delegation of tasks to a swarm of specialized AI assistants. This tool simplifies the management of multiple custom agents created within OpenAI, allowing developers to orchestrate complex workflows through a single, unified interface. It handles the mental overhead of assigning tasks, enabling parallel processing of requests by different assistants based on their defined specializations. Key features include delegating prompts to sub-assistants, retrieving all available assistants with pagination handled, and fetching specific assistants by ID. The library also provides event listeners for monitoring the completion of parent and child assistant responses, offering flexibility in how developers integrate and manage AI agent interactions.
CLEDAR
CLEDAR offers an ontology-driven AI platform designed to transform fragmented enterprise data into actionable insights. Led by former CERN domain leaders, the platform unifies disparate data sources into a single, governed semantic context, laying the foundation for enterprise AI adoption. It features secure, modular infrastructure, a unified data foundation, and adaptive AI agents that automate workflows and execute end-to-end tasks autonomously. CLEDAR aims to boost productivity by cutting decision cycles from weeks to hours and optimize costs by reducing OPEX by up to 10%, helping companies scale AI from pilots to enterprise-wide impact.
CLICKMARK AI
CLICKMARK AI is an AI consultancy firm specializing in designing, building, and operating AI agents, automation, AI SEO, and custom software for businesses across Southeast Asia, the UAE, and the US. They offer a comprehensive AI OS service, acting as a dedicated AI team to build and maintain a business's AI foundation, including strategy, data structuring, custom software, and ongoing support. For those seeking targeted solutions, they provide workshops and training to empower teams with AI tools, and one-time project work for specific builds like AI agents or SEO sprints. Their approach focuses on delivering measurable ROI and practical AI implementation.
pezzo
Pezzo is an open-source, developer-first LLMOps platform that provides comprehensive tools for managing and optimizing AI operations. It streamlines prompt design, offering version management and instant delivery capabilities. The platform facilitates collaboration among developers and includes robust features for troubleshooting and observability, allowing users to monitor their AI operations effectively. Pezzo aims to significantly reduce costs and latency associated with AI deployments, making it an ideal solution for developers looking to enhance their LLM workflows. It supports various clients including Node.js, Python, and LangChain, and integrates with open-source technologies like PostgreSQL, ClickHouse, Redis, and Supertokens.
Backed App
Backed App is an AI-powered application designed to help users alleviate back pain and improve their posture. It delivers science-backed exercises and personalized routines tailored to individual pain points, fitness levels, and time availability. The app provides clear video demonstrations and expert tips for safe and confident movement, aiming to strengthen the core and correct posture in just 15 minutes a day. Beyond exercises, it fosters habit building with posture reminders, movement prompts, and motivational messages, allowing users to track their progress and celebrate improvements over time with detailed analytics. Developed by back health experts, Backed AI acts as a daily companion for consistent back care.
PandoraAI
PandoraAI is an open-source web chat client built using Nuxt 3, a Vue 3 framework, designed to provide a seamless and convenient conversational AI experience. It is powered by node-chatgpt-api, enabling users to chat with various AI systems including gpt-3.5-turbo, text-davinci-003, ChatGPT, and Bing. A key feature is the ability to create and manage multiple custom presets for each client, allowing for personalized interactions. All user data, including presets, is stored locally, eliminating the need for an account and supporting easy import/export to other devices. PandoraAI can also be used with other API server implementations as long as the endpoints are compatible, offering flexibility for developers and advanced users.
PromptVisor
PromptVisor is an advanced AI prompting tool designed to supercharge your experience with artificial intelligence. It offers access to leading AI models from Google, OpenAI, and Anthropic, enabling users to explore, experiment, and learn about AI and prompting techniques. The platform features dynamic prompting capabilities to enhance interaction and output quality. PromptVisor provides flexible pricing options, including pay-per-prompt or subscription models, and even offers free usage through referrals, making it accessible for various user needs.
Conversation Design Institute (CDI)
Conversation Design Institute (CDI) is the world's leading training and certification institute for Conversational AI, offering comprehensive programs for individuals and businesses. CDI provides courses and certifications in areas like AI Ethics, AI Trainer, CDI Method Foundation, and Conversation Designer, equipping professionals with the skills to build human-centric and goal-oriented AI Assistants. Beyond individual training, CDI offers business solutions including assessment, consulting, team training, and workshops to help organizations deploy AI assistants at scale. Their CDI Standards Framework provides a systematic approach to developing conversational AI capabilities, ensuring alignment across mindset, skillset, culture, and systems. CDI also offers resources like free courses, webinars, and case studies, demonstrating their expertise with clients like HP, Vodafone, and Vandebron.
playground
Playground is an open-source platform dedicated to AI research in multi-agent learning, primarily through the game Pommerman, a clone of Bomberman. Researchers and AI enthusiasts can submit agents they have trained to compete in regular competitions across three variants: Free For All (FFA), Team (2v2 with partial observability), and Team Radio (2v2 with limited communication). The platform aims to provide approachable benchmarks for multi-agent learning, foster contributions to multi-agent and communication research, and offer a competitive environment for AI development. It supports training agents with popular libraries like TensorForce and provides an example training script. Submissions are handled via Docker containers, ensuring agent safety and fair play.
pytorch-pruning
pytorch-pruning is an open-source PyTorch implementation of the paper "Pruning Convolutional Neural Networks for Resource Efficient Inference." This tool is designed to optimize deep learning models by reducing their size and improving inference speed. It achieves this by systematically removing filters from convolutional layers. The project demonstrates its effectiveness by pruning a VGG16-based classifier on a small dog/cat dataset, resulting in a significant 3x reduction in CPU runtime and a 4x reduction in model size. While currently pruning filters sequentially, the project notes that future improvements could include a single-pass pruning mechanism for greater efficiency. It also aims to support additional architectures beyond VGG, such as VGG with batch normalization.
PyABSA
PyABSA is a modular and reproducible open-source framework designed for Aspect-based Sentiment Analysis (ABSA), bridging the gap from research to production. It offers a unified API for training, evaluation, and inference across multiple ABSA subtasks, including Aspect Polarity Classification (APC), Aspect Term Extraction & Polarity Classification (ATEPC), Aspect Sentiment Triplet Extraction (ASTE), and Aspect Category Opinion Sentiment Triplet Extraction (ASQP/ACOS). The framework comes with a Model Zoo of available checkpoints that auto-download, visualization tools for evaluation metrics, and helpers for dataset annotation. Additionally, PyABSA supports text augmentation for classification and adversarial defense, along with automatic device selection for CPU/GPU. It is ideal for researchers and developers working with sentiment analysis and natural language processing tasks.
ppl.nn
PPLNN, short for "Primitive Library for Neural Network," is a high-performance deep-learning inference engine designed for efficient AI inferencing. It supports running various ONNX models and offers enhanced compatibility with OpenMMLab. Key features include a new LLM Engine with Flash Attention, Group-query Attention, and Dynamic Batching, alongside Tensor Parallelism and Graph Optimization. It also supports INT8 groupwise KV Cache and INT8 per token per channel Quantization for improved performance and accuracy. The library provides comprehensive documentation for building from source, integrating APIs, and developing new engines and operations across X86, CUDA, RISCV, and ARM platforms. It is an open-source project, welcoming contributions and providing resources for developers.
Archipelago
Archipelago offers an AI agent designed to streamline broker workflows by providing accurate and validated property and casualty data. It addresses the complexities of traditional spreadsheet property schedules, offering solutions for data ingestion, remediation, and recommendations. The platform features an AI agent that runs in the background to resolve issues proactively, and a Hub with power tools to remediate issues, explain impact, and track progress. Archipelago also provides an enterprise-grade platform for value collection, collaboration, and marketing, ensuring scalability, security, and white-glove support for servicing teams, brokers, producers, and analytics teams. It is trusted by leading risk professionals, managing over 1.6 million properties and 2,500 accounts.
resin
Resin is a reboot of an older search engine project, now featuring a more sane architecture. It functions as a vector space search engine, a vector database, and a key/value store, designed for efficient string processing, vector operations, and custom storage primitives. The tool can produce large language models from strings and large 'anything' models from byte arrays. Key features include fast key/value storage with page/column readers and writers, practical text analysis utilities for various data types, and command-line tools for building and validating lexicons. Its design is clean, dependency-light, and easy to extend, making it suitable for developers working with search and machine learning applications.
reasoning-from-scratch
reasoning-from-scratch is the official code repository for the book *Build a Reasoning Model (From Scratch)*, offering a hands-on approach to understanding and implementing reasoning large language models (LLMs) in PyTorch. Users start with a pre-trained base LLM and progressively add reasoning capabilities, mirroring approaches used in large-scale models like DeepSeek R1 and GPT-5 Thinking. The repository includes code for generating text, evaluating reasoning models, improving reasoning with inference-time scaling and self-refinement, and training models with reinforcement learning. It also covers distilling reasoning models for efficiency and provides bonus materials on topics like GPU optimization, advanced evaluation methods, and building chat interfaces. The code is designed to run on consumer hardware, with GPU utilization if available, making it accessible for a wide audience.
Callidus Legal AI
StrongSuit, previously known as Callidus Legal AI, is a comprehensive legal AI platform designed to enhance and speed up essential legal tasks for lawyers. It provides advanced AI legal research capabilities, including immediate answers to legal questions, analysis of complex fact patterns, and the ability to draft extensive memos and briefs. The platform also excels in contract redlining, allowing users to redline contracts significantly faster, summarize differences, compare against market standards, and generate AI-powered redline suggestions. Furthermore, StrongSuit assists with discovery and timelines, enabling the creation of timelines and statements of facts from relevant files, conducting document reviews, and improving writing. It aims to reduce hallucinations in legal research and offers a unified solution for various legal software needs.
rag-time
RAG Time is a comprehensive 5-week learning journey designed to help users master Retrieval-Augmented Generation (RAG). Developed by Microsoft experts, this resource provides step-by-step guides, live coding samples, and expert insights to enable the creation of smarter AI applications. The program covers fundamental RAG concepts, building ultimate retrieval systems, optimizing vector indexes for scale, handling multimodal data, and exploring hero use cases, including Agentic RAG. It features exclusive video content, practical demonstrations, and sample code to facilitate hands-on learning, making complex concepts accessible through engaging visuals.
eNOugh
eNOugh is developing eNO, the world's first mini AI bodyguard, designed to autonomously detect and respond to real-world threats using real-time AI intelligence. This wearable device, referred to as the eNO badge, aims to provide personal safety without relying on human reaction during dangerous situations. It leverages multimodal AI to identify potential threats and trigger protective actions independently. The tool is intended for individuals seeking enhanced personal security and aims to offer a proactive, AI-driven solution to real-world dangers, ensuring immediate response when human intervention might be too slow or impossible.