Extruct
Visit Toolextruct is a Data & Analytics tool that extracts embedded metadata from HTML markup. It supports various formats like Microdata, JSON-LD, Open Graph, RDFa, and Dublin Core.
At a glance
Trending
Also listed in
extruct is a Data & Analytics tool that extracts embedded metadata from HTML markup. It supports various formats like Microdata, JSON-LD, Open Graph, RDFa, and Dublin Core.
Trending
Also listed in
About
extruct is an open-source Python library designed for extracting embedded metadata from HTML markup. It supports a wide range of popular metadata formats including W3C's HTML Microdata, embedded JSON-LD, Microformat via mf2py, Facebook's Open Graph (experimental), RDFa via rdflib, and Dublin Core Metadata (DC-HTML-2003). The tool allows users to perform all-in-one extraction from an HTML string or a parsed HTML tree, with the option to select specific syntaxes for extraction. It also offers a uniform output format for easier processing and can return references to HTML nodes for microdata items, providing granular control over the extracted data. This makes it a powerful tool for developers and data professionals working with web scraping and structured data retrieval.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending