AI Agents & Automation
Browsing page 132 of AI Frameworks & Infra in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
Mocha.jl
Mocha.jl is a deep learning framework for the Julia programming language, drawing inspiration from the C++ framework Caffe. Although now deprecated, it was designed for efficient training of deep and shallow convolutional neural networks, supporting optional unsupervised pre-training via stacked auto-encoders. The framework boasts a modular architecture with isolated components for layers, activation functions, solvers, and more, allowing for easy extension. Written in Julia, it offers a high-level interface for intuitive deep neural network experimentation. Mocha.jl provides multiple backends, including a portable pure Julia backend, a faster native extension backend, and a highly efficient GPU backend utilizing NVidia® cuDNN and CUDA kernels. It also supports HDF5 for data and model storage, ensuring compatibility with other computational tools, and can import Caffe model snapshots.
mlops-stacks
mlops-stacks offers a customizable, open-source solution for initiating new machine learning projects on Databricks, adhering to production best practices. It streamlines the development process by providing a pre-configured environment that includes ML project structure, ML resources as code, and CI/CD workflows (GitHub Actions or Azure DevOps). Data scientists can quickly iterate on ML code, while MLOps engineers can efficiently set up continuous integration and continuous deployment pipelines and manage ML resources. The tool supports automated model training and batch inference jobs across dev, staging, and production Databricks workspaces, facilitating an easy transition to production-grade ML solutions. It also integrates with Databricks asset bundles and offers options for Unity Catalog and Feature Store.
Ollamac
Ollamac is a free and open-source native Mac application designed to seamlessly integrate with Ollama, enabling users to run and interact with various Ollama models directly on their macOS 14.0 Sonoma or later devices. The application is exclusively available from its official GitHub repository, ensuring authenticity and direct access to updates. Key features include compatibility with all Ollama models, customizable host settings, and syntax highlighting for an enhanced user experience. Ollamac prioritizes simplicity and ease of use, providing a native interface for local AI model interaction without requiring internet access once models are pulled. This makes it an ideal tool for developers, data scientists, and students looking to experiment with large language models offline.
PyTorch-BYOL
PyTorch-BYOL offers a robust PyTorch implementation of the Bootstrap Your Own Latent (BYOL) self-supervised learning approach. This tool is designed for researchers and developers to experiment with and apply BYOL algorithms for representation learning. It includes configurable parameters for network architecture (ResNet-18 or ResNet-50), projection and prediction heads, data transformations, and trainer settings such as batch size, momentum update, and epochs. The repository provides clear installation instructions and configuration options, making it accessible for those looking to delve into self-supervised learning without starting from scratch. It also details feature evaluation methods, including linear separability using logistic regression and KNN on datasets like STL10.
rnn
rnn is a specialized library designed for building Recurrent Neural Networks within the Torch7's nn framework. It offers functionalities to construct different types of RNN architectures, including LSTMs (Long Short-Term Memory), GRUs (Gated Recurrent Units), and BRNNs (Bidirectional Recurrent Neural Networks). This tool is particularly useful for developers and researchers working on deep learning projects that require sequential data processing and advanced neural network models. While the original repository is deprecated, its principles and functionalities laid a foundation for subsequent RNN implementations in Torch.
Resemblyzer
Resemblyzer is a Python package designed for advanced voice analysis and comparison, leveraging deep learning techniques. It functions by deriving a high-level representation of a voice through a sophisticated voice encoder model. The tool generates a summary vector consisting of 256 values, which effectively encapsulates the unique characteristics of a spoken voice. This capability makes it suitable for applications requiring detailed voice identification, verification, or similarity analysis, providing a robust framework for understanding vocal nuances in various contexts.
sematic
Sematic is an open-source platform designed for ML engineers and data scientists to develop and manage machine learning pipelines. It enables users to write complex end-to-end pipelines using simple Python code, which can then be executed locally on a laptop, in a cloud VM, or on a Kubernetes cluster to leverage cloud resources. The platform emphasizes easy onboarding with no deployment or infrastructure needed to get started, offering local-to-cloud parity. Key features include end-to-end traceability of pipeline artifacts, reproducibility of results, dynamic graphs, lineage tracking, and runtime type-checking. Sematic also provides a modern web dashboard for monitoring, tracking, and visualizing pipelines and artifacts, along with integrations for Apache Spark, Ray, Snowflake, Plotly, Matplotlib, and Pandas.
SuperGluePretrainedNetwork
SuperGluePretrainedNetwork is a research project from Magic Leap, presented at CVPR 2020, focusing on learning feature matching using Graph Neural Networks. The core of the project is the SuperGlue network, which integrates a Graph Neural Network with an Optimal Matching layer. This architecture is specifically designed to perform matching tasks on two distinct sets of sparse image features. The repository offers both the PyTorch code implementation and pretrained weights, making it accessible for researchers and developers interested in computer vision and feature matching applications. It serves as a valuable resource for those looking to implement or build upon advanced feature matching techniques.
stellargraph
StellarGraph is a comprehensive Python library designed for machine learning on various types of graphs and networks. It provides a rich collection of state-of-the-art algorithms, including GraphSAGE, GCN, GAT, Node2Vec, and Metapath2Vec, enabling users to perform tasks such as representation learning for nodes and edges, classification of nodes or entire graphs, and link prediction. The library supports diverse graph structures, from homogeneous to heterogeneous and knowledge graphs, and integrates seamlessly with TensorFlow 2, Keras, Pandas, and NumPy. This makes it user-friendly, modular, and extensible, allowing for smooth interoperability with existing machine learning workflows and easy augmentation of its core algorithms.
sumo-rl
sumo-rl is an open-source tool designed to simplify the creation and management of Reinforcement Learning (RL) environments for Traffic Signal Control using SUMO. It offers a straightforward interface, ensuring compatibility with widely used RL libraries and frameworks such as Gymnasium, PettingZoo, stable-baselines3, and RLlib. The tool supports both single-agent and multi-agent RL scenarios, allowing for flexible experimentation. Users can easily customize observation spaces and reward functions to suit their specific research or application needs. sumo-rl is particularly useful for developers and researchers focused on advancing AI agents for traffic management and optimization, providing a robust platform for simulating and evaluating different control strategies.
T-MAC
T-MAC is an open-source AI Frameworks & Infra tool specifically designed for efficient low-bit Large Language Model (LLM) inference on CPU/NPU architectures. It utilizes a lookup table approach to accelerate the execution of LLMs, making it suitable for deployment on resource-constrained devices. The tool supports models like BitNet and offers a significant advantage over traditional dequantization-based methods by providing faster inference speeds. T-MAC aims to optimize the performance of AI models in environments where computational resources are limited, making advanced AI capabilities more accessible and practical for a wider range of applications.
susi_shell
susi_shell provides a collection of command-line tools designed for seamless interaction with various AI services directly from the terminal. This allows developers and technical users to integrate AI capabilities into their workflows without leaving the command line. While the specific AI services are not detailed, the tool aims to streamline AI-related tasks, offering a programmatic approach to leveraging artificial intelligence. Some functionalities within susi_shell require a connection to the OpenAI API, indicating its potential for tasks like natural language processing, code generation, or other generative AI applications. It caters to those who prefer a text-based interface for efficiency and automation.
SpeedTorch
SpeedTorch is a Python library designed to optimize data transfer between CPU and GPU in PyTorch, particularly for deep learning applications. It achieves faster transfer speeds for pinned CPU to GPU tensors and GPU to CPU tensors, in some cases up to 410x faster for GPU to CPU transfers. The library is especially beneficial for training large numbers of embeddings by allowing them to be hosted on CPU RAM when idle, thereby sparing GPU RAM. It also enables the use of non-sparse optimizers like Adamax for sparse training, which is typically not supported. SpeedTorch leverages Cupy tensors and custom memory allocators to achieve its performance gains, making it a valuable tool for developers working with memory-intensive PyTorch models.
text_renderer
text_renderer is an open-source tool designed to generate synthetic text line images, primarily for training deep learning Optical Character Recognition (OCR) models like CRNN. It features a modular design, allowing users to easily add different components such as Corpus, Effect, and Layout. A key capability is its integration with Albumentations, providing a wide range of image augmentation effects to enhance dataset diversity. The tool supports rendering multiple corpora on a single image with varying effects, generating vertical text, and creating LMDB datasets compatible with PaddleOCR. It also includes a web-based font viewer and corpus sampler for character balance.
torchscale
torchscale is a PyTorch library specifically engineered to facilitate the scaling of Transformer models, which are fundamental to modern large language models. It emphasizes key aspects such as modeling generality and capability, ensuring that the models can be applied across a wide range of tasks and perform robustly. The library also prioritizes training stability and efficiency, crucial for developing and managing large-scale foundation models. By providing tools and frameworks within the PyTorch ecosystem, torchscale aims to empower researchers and developers to build, train, and deploy increasingly complex and powerful AI models more effectively.
Uni-ControlNet
Uni-ControlNet is an advanced AI tool designed to offer comprehensive control over text-to-image diffusion models. It provides an all-in-one method for controllable image synthesis, allowing users to precisely guide the generation process. The tool unifies various control aspects, simplifying the creation of specific image outputs. Based on research presented at NeurIPS 2023, Uni-ControlNet aims to enhance the flexibility and accuracy of AI-driven image generation, making it a valuable resource for researchers and developers working with diffusion models.
UER-py
UER-py (Universal Encoder Representations) is an open-source framework designed for pre-training on general-domain corpora and fine-tuning on downstream NLP tasks using PyTorch. It emphasizes model modularity, allowing users to combine various embedding, encoder, decoder, and target modules to construct custom pre-training models. The toolkit supports CPU, single GPU, and distributed training modes, making it versatile for different computational environments. UER-py also provides a comprehensive model zoo with pre-trained models of diverse properties, facilitating their direct use in various applications. It has been tested for reproducibility against original implementations of models like BERT, GPT-2, ELMo, and T5, and offers solutions for numerous NLP competitions.
voicebox
voicebox is an open-source voice synthesis studio that leverages Qwen3-TTS to provide a private and customizable environment for voice generation. This tool enables users to clone existing voices, generate new speech, and develop various voice-powered applications directly on their local machines. By running locally, voicebox ensures privacy and offers extensive customization options, making it suitable for developers and content creators who require fine-grained control over their audio output. Its open-source nature fosters community contributions and allows for continuous improvement and adaptation to specific user needs, providing a flexible solution for advanced voice synthesis tasks.
ModelOp
ModelOp is a leading AI lifecycle management and governance platform designed for enterprises. It provides a centralized AI system of record, enabling visibility into all internal and third-party AI solutions. The platform automates AI deployment with enforceable policies, accelerating time-to-production for ML, GenAI, Agentic AI, and vendor AI. ModelOp helps organizations control costs, ensure audit-readiness, and deliver executive insights by integrating with existing systems to orchestrate governance. It supports various industries and roles, offering solutions for AI governance, risk management, and compliance with standards like NIST AI RMF and EU AI Act.
ClipBERT
ClipBERT is an official PyTorch code implementation for an efficient framework designed for end-to-end learning across image-text and video-text tasks. Recognized with a CVPR 2021 Best Student Paper Honorable Mention, ClipBERT processes raw videos/images and text inputs to generate task predictions. It leverages 2D CNNs and transformers, incorporating a sparse sampling strategy to enable efficient multimodal learning. The framework supports end-to-end pretraining and finetuning for tasks such as image-text pretraining on COCO and VG captions, text-to-video retrieval on MSRVTT, DiDeMo, and ActivityNet Captions, video-QA on TGIF-QA and MSRVTT-QA, and image-QA on VQA 2.0. Its modular design allows for easy integration of additional image-text or video-text tasks.
Tapway
Tapway is a no-code computer vision AI platform designed to automate visual inspection and enable real-time actions across various industries. It harnesses Vision AI to convert video feeds and images into actionable intelligence, driving business growth. The platform allows users to capture real-time visual data using existing cameras, automatically detect patterns, anomalies, and compliance issues, and instantly trigger automated alerts or actions. Tapway offers products like SamurAI for end-to-end Vision AI, VehicleTrack for automatic car plate recognition and vehicle profiling, and PeopleTrack for analyzing customer traffic and behavior. Its applications span plate number recognition, optical character recognition, fruit counting and classification, footfall tracking, PPE compliance detection, and quality inspection.
tflite_gles_app
tflite_gles_app offers GPU-accelerated deep learning inference applications, leveraging TensorFlow Lite GPU Delegate and TensorRT for enhanced performance. This open-source project is designed for platforms such as Raspberry Pi, NVIDIA Jetson, and Linux PCs. It includes a variety of applications covering tasks like lightweight and high-accuracy face detection (Blazeface, DBFace), age and gender estimation, image classification, object detection, 3D facial surface geometry estimation (Facemesh), hair segmentation, 3D handpose estimation, iris detection, 3D object detection, various pose estimations (Blazepose, Posenet), 3D human pose estimation, depth estimation, semantic segmentation, face segmentation, selfie-to-anime transformation, artistic style transfer, and text detection. The repository provides detailed instructions for building and running applications on different target environments, supporting both live camera and recorded video file inputs.
speech
Speech is an open-source Python package designed to facilitate research and development in end-to-end models for automatic speech recognition (ASR). It provides implementations of various ASR architectures, including sequence-to-sequence models with attention mechanisms, Connectionist Temporal Classification (CTC), and the RNN Sequence Transducer. Built on PyTorch, this tool allows researchers and developers to experiment with and build advanced speech-to-text systems. The software is specifically tested for Python 3.6 and does not provide backward compatibility for Python 2.7, ensuring a modern development environment. It includes examples for model configurations and datasets, making it easier to get started with training and evaluating ASR models.
NAX Group
NAX Group offers an enterprise AI software platform designed to streamline the development and deployment of custom AI applications. The platform focuses on leveraging automation to build, deploy, and run these applications efficiently. This approach aims to significantly reduce operational costs, accelerate the time it takes for businesses to realize value from their AI investments, and ultimately create a competitive advantage. By providing a comprehensive solution for managing the AI lifecycle, NAX Group enables organizations to integrate advanced AI capabilities into their operations without extensive manual intervention, fostering innovation and efficiency across various business functions.