AI Agents & Automation
Browsing page 40 of AI tools for Voice Agents in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
SoundHound
SoundHound AI is a leading conversational AI platform that enables businesses to create and deploy voice AI agents across diverse industries. Its proprietary end-to-end conversational AI stack powers solutions for restaurants, automotive, retail, financial services, healthcare, and smart devices. Key offerings include Dynamic Drive-Thru for increased throughput, Smart Answering for 100% phone call handling, and Custom Voice AI Solutions for bespoke experiences. The platform also features Amelia for enterprise AI agents, Autonomics for ITSM automation, and SoundHound Chat AI for brand-specific intelligence. With over 400 patents, SoundHound AI focuses on delivering real impact by automating billions of conversations annually, aiming to cut operating costs, boost revenue, and enhance customer loyalty.
UNEE
UNEE is an interactive AI companion plush toy designed to provide 24/7 conversational engagement. More than just a toy, it acts as an AI friend that engages across senses with adaptive feedback and emotional understanding. Powered by Mission AI’s proprietary emotional model, UNEE uses a multi-layer memory model to gradually shape a long-term personality, anticipating user feelings and offering appropriate companionship. Its emotional resonance engine crafts replies that feel like true reflections, while proactive dialogue uses location and real-time updates for relevant conversations. UNEE features voice interaction, adaptive eye feedback, touch response, and a sleep mode, bringing companionship that feels truly alive.
Book Bot
Book Bot is an innovative AI tool that converts books into interactive learning experiences. Users can upload various file types, including EPUB, PDF, DOC, DOCX, and TXT, to create a "BookBot." This AI-powered bot then allows readers to ask questions and receive customized responses, fostering a dynamic and adaptive learning environment. Authors and educators can utilize Book Bot to publish their works in an interactive format, with the option to embed these BookBots directly into their own websites. The platform emphasizes data privacy, ensuring that uploaded book source material is never exposed or used to train AI models, offering a secure way to engage with content.
inGen Dynamics Inc.
InGen Dynamics Inc. is at the forefront of robotics and AI, developing a comprehensive suite of AI-powered robotics platforms and automation systems. Their mission is to enhance human lives by providing intelligent solutions across diverse sectors such as healthcare, education, security, and eldercare. The company's core innovation is the Origami AI Platform, a hardware-agnostic, edge-native, multimodal intelligence layer that powers all its products. This platform enables knowledge transfer across different robots, making each deployment smarter. InGen Dynamics offers products like Sentinel Prime for enterprise security, Aido for service robotics, and Fari for eldercare, demonstrating a phased approach to product development from tabletop devices to advanced humanoid robotics.
See what your AI agent did and verify it locally
Agent Auditor is an open-source tool designed to decode, display, and verify signed interaction records and evidence bundles generated by AI agents, middleware, and automated systems. It operates entirely locally, either in a web browser or as a command-line interface (CLI) tool, requiring no outbound verification or artifact fetches. This ensures privacy and allows for offline operation. Key functionalities include Ed25519 signature verification, decoding signed agent receipts, inspecting dispute bundles, checking policy binding status, and reconstructing timelines of agent actions. It's ideal for debugging trust boundaries, supporting audits, resolving disputes, and technical reviews, providing a clear, verifiable account of an agent's activities without sending any data externally.
Parloa
Parloa offers an AI Agent Management Platform designed to transform customer experiences in contact centers. By deploying personal AI agents, the platform automates customer service using generative AI, enabling businesses to handle millions of conversations at unparalleled speed and precision. It orchestrates the full AI agent lifecycle, from design and testing to scaling and optimization, ensuring consistent performance as businesses grow. Parloa aims to close the gap between companies and customers by creating personalized, preemptive, and seamless interactions, fostering lasting loyalty. The platform is engineered for reliability and built for scale, supporting high-volume, high-stakes environments across various industries like financial services, utilities, e-commerce, healthcare, media, and IT.
Askruit
Askruit is an AI-powered video interview platform specifically designed for HR professionals and recruiters. Its primary function is to streamline the candidate screening process by automating video interviews. This tool helps organizations efficiently assess candidates, reducing the time and effort traditionally associated with initial screening stages. It caters to both startups and larger organizations aiming to enhance and optimize their recruitment workflows.
Utter, a local-first dictation app for Mac and iPhone
Utter is a local-first dictation application designed for Mac and iPhone users, enabling them to transform spoken words into clear, polished text across any app. It boasts a speed up to four times faster than traditional typing, facilitating a seamless flow from thought to text. The app prioritizes privacy with on-device processing, ensuring audio is not stored or used for training, and no account is required. Key features include dictation in any app via a global shortcut, a searchable voice history, AI modes for tone and format adjustment, speaker-separated transcripts for meetings, and support for over 50 languages. It also offers offline functionality and file transcription for audio and video.
Voice Changer - Fun AI Effects
Voice Changer - Fun AI Effects is a mobile application designed to transform voices with a wide array of AI-powered effects. Users can record their voice directly within the app or import existing audio files to apply over 50 different voice effects, including robot, alien, chipmunk, monster, and many more. The app also features a voice editor with controls for pitch, tone, speed, echo, reverb, and background noise reduction. Additionally, it offers background sound effects like rain, forest, and ocean waves to enhance recordings, along with a built-in soundboard for instant prank sounds. This tool is ideal for creating funny audio, prank calls, gaming voice effects, and unique content for social sharing.
Roark
Roark is a comprehensive QA and observability platform specifically designed for Voice AI agents, ensuring reliability and performance. It enables teams to proactively catch issues before customers encounter them by offering robust monitoring and evaluation capabilities for live voice interactions. Users can track over 40 built-in metrics, analyze multi-speaker conversations, and run best-in-class evaluators on demand or automatically. The platform also facilitates pre-deployment testing through end-to-end simulations, allowing users to stress-test agents across real-world scenarios and automatically generate test cases from failed live calls. With one-click native integrations for popular voice platforms like VAPI, Retell, LiveKit Cloud, and Pipecat, Roark offers quick setup and real-time insights.
Text to Speech - AI Voices
Text to Speech - AI Voices transforms various text formats, including books, documents, images, and PDFs, into lifelike human speech using advanced AI technology. This app makes communication smoother by providing accurate text-to-audio conversions, acting as a voice narrator for dictation, and enabling the creation of voiceovers. Key features include a selection of natural, human-like voices, support for multiple content types, and voice customization options for speed and volume. It also offers multilingual support, allowing users to filter voices by country, gender, and age. Users can save voice files for offline playback and even create or clone their own voice with the "My Voice" feature. The tool syncs saved content across devices, making it ideal for busy individuals, students, and those with reading disabilities.
AiryChat
AiryChat makes AI accessible and easy to use, offering a suite of AI assistants designed to augment employees and streamline business operations. The platform provides specialized AI assistants like Bob (General Assistant), Dwight (Art Assistant), Jess (Marketing Assistant), Linus (Software Developer), and Nissa (User Interface Designer). Built on cutting-edge OpenAI, Meta, and Google services, AiryChat supports features such as PDF, CSV, and DOCX processing, long-term memory for conversations, unlimited context length, web search indexing, and built-in image generation. It also includes a voice mode for hands-free interaction and cost-saving prompt prefetch.
QuoVerified
Quo, formerly known as OpenPhone, provides an AI-powered business phone system designed for startups and small businesses. It consolidates calls, texts, and contacts into a single shared workspace, enhancing team alignment and customer response times. A key feature is Sona, Quo’s AI voice agent, which offers 24/7 call answering, captures details, transfers calls, and ensures no customer interaction is missed. The system supports various devices including mobile (iOS and Android) and desktop, using an internet connection for calls and messages. Quo also offers features like voicemail transcription, phone menus (IVR), local and vanity numbers, virtual SMS numbers, and analytics for tracking performance. It aims to provide a reliable and efficient communication solution that scales with business needs.
uis-rnn
uis-rnn is a Python library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, primarily used for fully supervised speaker diarization. This algorithm excels at segmenting and clustering sequential data by learning from examples. The library provides core APIs for model construction, training, and prediction, allowing users to fit models with observation sequences and ground truth cluster IDs. It supports both list-based and concatenated sequence inputs, with careful handling of cluster ID uniqueness. The tool is particularly useful for tasks like identifying who spoke when in audio recordings, leveraging d-vector embeddings as observations. It also offers guidelines for training on large datasets by calling the fit() function multiple times with appropriately sized inputs.
Callin.io
Callin.io offers Neuron 1.0, an ultra-low latency AI voice platform designed for enterprise performance, ensuring natural conversations with sub-176 millisecond response times. It provides cohuman agents capable of autonomous workflow execution, handling tasks like web navigation, CRM updates, and email workflows without custom development. The platform is white-label ready, allowing businesses to deploy industry-specific AI voice solutions under their own brand, with custom domains and pricing models. Callin.io focuses on vertical solutions for property management, IT support, smart cities, and real estate, offering features like intelligent issue triage, knowledge base integration, and automated lead qualification. It boasts enterprise trust with 99.9% uptime, GDPR & CCPA compliance, and end-to-end encrypted voice data, supporting multi-carrier options.
PowPow.ai
PowPow.ai is an AI translation assistant designed to provide real-time voice translation. The platform connects individuals and AI agents, enabling seamless multilingual communication across various languages. It aims to facilitate language interpretation, allowing users to interact effectively regardless of language barriers. While the website content is minimal, the core offering appears to be instant, AI-powered translation, making it suitable for scenarios requiring immediate cross-language understanding. The tool focuses on bridging communication gaps through its AI capabilities, suggesting a user-friendly approach to complex translation tasks.
VoiceType
VoiceType is an AI-powered speech-to-text application designed to significantly boost writing speed and efficiency. It allows users to dictate content at up to 360 words per minute, which is 9x faster than average typing speeds. The tool boasts a 99.7% accuracy score and works seamlessly across all applications, from email clients like Gmail to productivity tools like Notion and Linear. Beyond simple transcription, VoiceType intelligently auto-formats and improves writing, adapting its tone based on the application being used. It supports over 35 languages, includes a 'Whisper Mode' for soft speaking, and is context-aware, adjusting transcriptions to the user's environment. All data is encrypted, ensuring privacy and security.
Smart Calendars AI — Plan fast & easy v2.5
Smart Calendars AI is an intelligent calendar application designed to streamline event and reminder creation. Users can effortlessly convert voice commands, photos, screenshots, PDFs, and emails into actionable calendar entries. The tool offers seamless integration with popular calendar services like Apple Calendar, Google Calendar, and Outlook, ensuring all events are synchronized. It provides smart scheduling features, allowing users to find available slots, manage conflicts, and set recurring events with natural language queries. Additionally, it supports custom calendar feeds, availability sharing, and public feed subscriptions, making it a versatile solution for personal and professional time management.
NLPearl
NLPearl provides an AI-driven platform for automating phone calls and voice interactions, enabling businesses to create human-like AI call centers. Users can build these AI agents using natural language prompts, eliminating the need for coding or complex setups. The platform focuses on enhancing customer engagement and optimizing operational efficiency through realistic voice interactions. It supports both inbound and outbound communications, aiming to boost sales, reduce operational costs, and explore new market opportunities. NLPearl emphasizes ease of use, allowing anyone to describe their needs and deploy an AI call center quickly.
nerd-dictation
nerd-dictation is a simple, hackable, and offline speech-to-text utility designed for Desktop Linux. It leverages the VOSK-API for accurate transcription without requiring an internet connection. The tool is a single-file Python script with minimal dependencies, making it easy to set up and use. Key features include optional conversion of numbers to digits, a timeout function for automatic speech ending, and configurable output types (simulating keystrokes or printing to standard output). Users can customize text manipulation through Python scripts and bind begin/end/cancel commands to shortcut keys for efficient workflow. It also supports suspend/resume functionality to manage resource usage, especially with larger language models.
aspeak
aspeak is a versatile command-line interface (CLI) and Python library for text-to-speech conversion, leveraging the Azure TTS API. It allows users to generate speech from text or SSML input, offering extensive control over voice, locale, pitch, rate, and style. The tool supports both RESTful and WebSocket API modes for Azure TTS and provides options for authentication via subscription keys, environment variables, or configuration profiles. Users can save synthesized speech to various audio formats like WAV, MP3, OGG, and WebM, with adjustable quality levels. aspeak is ideal for developers and content creators who need a robust and customizable solution for integrating high-quality text-to-speech capabilities into their applications or workflows.
Ultimate RVC
Ultimate RVC is a free AI voice cloning tool hosted on Hugging Face, designed for transforming voices in audio recordings. Users can upload an existing recording of the voice they wish to modify and a short sample of the desired target speaker's voice. The web application then processes these files to generate a new audio output where the original speech adopts the characteristics of the target voice. This tool is particularly useful for individuals looking to experiment with AI-generated vocals, offering a straightforward way to achieve voice conversion without complex setups. Its accessibility on Hugging Face Spaces makes it easy for content creators, musicians, and voice actors to utilize its capabilities.
Vui
Vui is a conversational speech model hosted on Hugging Face Spaces, allowing users to convert any typed text into spoken audio. The application provides a straightforward interface where users can input text or select a sample, fine-tune optional settings, and then generate an audio file of the spoken words. This tool is ideal for quickly creating audio versions of written content, making it suitable for various applications from content creation to accessibility. While the core functionality is free, Hugging Face offers various paid plans for enhanced features, storage, and compute resources for those needing more advanced capabilities.
Mitra – Call People with AI
Mitra is a personal AI phone assistant designed to handle both outgoing and incoming calls on your behalf. It allows users to make calls to multiple contacts simultaneously, manage incoming calls effortlessly, and bypass automated business systems. The tool integrates seamlessly across devices, enabling call management from anywhere. Powered by advanced conversational AI, Mitra supports multilingual communication and state-of-the-art voice cloning, adapting to user preferences and learning from interactions over time. Key features include real-time call management, instant internet search during calls for relevant information, and the ability to customize conversation strategies and tones. Mitra prioritizes security and compliance with robust encryption and industry-leading standards, making phone communication stress-free and efficient.