html-content-analysis

HTML Content Analysis Skill

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "html-content-analysis" with this command: npx skills add transilienceai/communitytools/transilienceai-communitytools-html-content-analysis

HTML Content Analysis Skill

Purpose

Parse HTML documents to extract technology signals from meta tags, generator comments, script URLs, CSS frameworks, and structural patterns.

Operations

  1. extract_meta_generator

Find <meta name="generator"> tags.

Command (conceptual):

curl -s {url} | grep -oP '<meta[^>]*name=["']generator["'][^>]*content=["'][^"']+["']'

Generator Patterns:

{ "WordPress": { "pattern": "WordPress[\s]?([\d.]+)?", "tech": "WordPress", "extract_version": true, "confidence": 95 }, "Drupal": { "pattern": "Drupal[\s]?([\d.]+)?", "tech": "Drupal", "extract_version": true, "confidence": 95 }, "Joomla": { "pattern": "Joomla!?[\s]?([\d.]+)?", "tech": "Joomla", "extract_version": true, "confidence": 95 }, "Shopify": { "pattern": "Shopify", "tech": "Shopify", "confidence": 95 }, "Wix.com": { "pattern": "Wix\.com", "tech": "Wix", "confidence": 95 }, "Squarespace": { "pattern": "Squarespace", "tech": "Squarespace", "confidence": 95 }, "Ghost": { "pattern": "Ghost[\s]?([\d.]+)?", "tech": "Ghost", "extract_version": true, "confidence": 95 }, "Hugo": { "pattern": "Hugo[\s]?([\d.]+)?", "tech": "Hugo", "extract_version": true, "confidence": 95 }, "Jekyll": { "pattern": "Jekyll[\s]?([\d.]+)?", "tech": "Jekyll", "extract_version": true, "confidence": 95 }, "Gatsby": { "pattern": "Gatsby[\s]?([\d.]+)?", "tech": "Gatsby", "extract_version": true, "implies": ["React"], "confidence": 95 } }

  1. scan_html_comments

Look for technology hints in HTML comments.

Comment Patterns:

{ "Powered by": { "pattern": "<!--[^>][Pp]owered by ([^>-]+)-->", "extract_group": 1, "confidence": 80 }, "Generated by": { "pattern": "<!--[^>][Gg]enerated by ([^>-]+)-->", "extract_group": 1, "confidence": 80 }, "Built with": { "pattern": "<!--[^>]*[Bb]uilt with ([^>-]+)-->", "extract_group": 1, "confidence": 75 }, "WordPress": { "pattern": "<!--[^>]wp-content[^>]-->", "tech": "WordPress", "confidence": 85 }, "Drupal": { "pattern": "<!--[^>]drupal[^>]-->", "tech": "Drupal", "confidence": 85 }, "Magento": { "pattern": "<!--[^>]Magento[^>]-->", "tech": "Magento", "confidence": 90 } }

  1. analyze_script_urls

Identify framework-specific paths in script tags.

Script URL Patterns:

{ "/wp-content/": {"tech": "WordPress", "confidence": 95}, "/wp-includes/": {"tech": "WordPress", "confidence": 95}, "/sites/default/files/": {"tech": "Drupal", "confidence": 90}, "/misc/drupal.js": {"tech": "Drupal", "confidence": 95}, "/_next/": {"tech": "Next.js", "confidence": 95, "implies": ["React"]}, "/_nuxt/": {"tech": "Nuxt.js", "confidence": 95, "implies": ["Vue.js"]}, "/static/js/main.": {"tech": "Create React App", "confidence": 85}, "/assets/application-": {"tech": "Ruby on Rails", "confidence": 80}, "/bundles/": {"tech": "ASP.NET", "confidence": 75}, "/Scripts/": {"tech": "ASP.NET", "confidence": 75}, "jquery": {"tech": "jQuery", "confidence": 90}, "bootstrap": {"tech": "Bootstrap", "confidence": 90}, "angular": {"tech": "Angular", "confidence": 85}, "react": {"tech": "React", "confidence": 85}, "vue": {"tech": "Vue.js", "confidence": 85} }

  1. detect_css_frameworks

Find CSS framework classes in HTML.

CSS Framework Patterns:

{ "Bootstrap": { "classes": ["btn btn-", "container", "navbar", "row", "col-", "card", "modal"], "min_matches": 3, "confidence": 85 }, "Tailwind CSS": { "classes": ["bg-", "text-", "flex", "p-", "m-", "w-", "h-", "rounded-"], "min_matches": 5, "confidence": 85 }, "Foundation": { "classes": ["button", "callout", "top-bar", "grid-x", "cell"], "min_matches": 3, "confidence": 85 }, "Bulma": { "classes": ["button is-", "columns", "column", "hero", "box"], "min_matches": 3, "confidence": 85 }, "Material UI": { "classes": ["MuiButton", "MuiGrid", "MuiPaper", "MuiTypography"], "min_matches": 2, "confidence": 90 }, "Chakra UI": { "classes": ["chakra-", "css-"], "min_matches": 2, "confidence": 80 }, "Ant Design": { "classes": ["ant-btn", "ant-card", "ant-table", "ant-form"], "min_matches": 2, "confidence": 90 } }

  1. extract_structured_data

Find JSON-LD and structured data.

Process:

  • Find <script type="application/ld+json"> tags

  • Parse JSON content

  • Extract @type and properties

  • Note e-commerce, organization, or product data

Structured Data Signals:

{ "Product": {"indicates": "E-commerce site"}, "Organization": {"indicates": "Business website"}, "LocalBusiness": {"indicates": "Local business"}, "Article": {"indicates": "Blog/News site"}, "WebApplication": {"indicates": "Web app"} }

Output

{ "skill": "html_content_analysis", "domain": "string", "results": { "pages_analyzed": "number", "meta_generators": [ { "url": "string", "generator": "WordPress 6.4.2", "tech": "WordPress", "version": "6.4.2", "confidence": 95 } ], "html_comments": [ { "url": "string", "comment": "Powered by Django", "tech": "Django", "confidence": 80 } ], "script_analysis": { "urls_analyzed": "number", "frameworks_detected": [ { "tech": "Next.js", "source": "/_next/ path pattern", "confidence": 95 } ], "cdns_used": ["cdnjs.cloudflare.com", "unpkg.com"] }, "css_frameworks": [ { "name": "Bootstrap", "classes_matched": ["btn", "container", "navbar", "row"], "confidence": 85 }, { "name": "Tailwind CSS", "classes_matched": ["bg-white", "text-gray-500", "flex"], "confidence": 85 } ], "structured_data": [ { "type": "Organization", "url": "string", "signals": "Business website" } ], "technologies_summary": [ { "name": "string", "category": "CMS|Framework|CSS|Library", "confidence": "number", "sources": ["array"] } ] }, "evidence": [ { "type": "meta_generator", "url": "string", "value": "string", "timestamp": "ISO-8601" }, { "type": "script_url", "url": "string", "path": "string" } ] }

Rate Limiting

  • Page fetches: 10/minute per domain

  • HTML parsing: No limit (local processing)

Error Handling

  • Malformed HTML: Use lenient parser

  • Large pages: Limit to first 5MB

  • Encoding issues: Detect and handle charset

  • Continue on parse errors

Security Considerations

  • Only fetch public pages

  • Do not execute scripts

  • Sanitize extracted content

  • Log all fetches for audit

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

javascript-dom-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

tls-certificate-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

job-posting-analysis

No summary provided by upstream source.

Repository SourceNeeds Review