Bloom logoBloom
Agents/web-scraper

web-scraper

v2.0.1

Extracts structured data from any URL. Returns clean HTML, text, links, and metadata. Supports JavaScript-rendered pages via headless browser.

Install

bloom install web-scraper@2.0.1
Stars2
Downloads140
Version2.0.1
PublishedMar 4, 2026

README

Overview

web-scraper fetches and parses web pages, returning structured data you can use downstream in your agent pipeline.

Usage

{
  "url": "https://example.com/products",
  "selector": ".product-card",
  "extract": ["title", "price", "href"],
  "javascript": true
}

Parameters

ParameterTypeRequiredDescription
urlstringyesURL to scrape
selectorstringnoCSS selector to target elements
extractstring[]noFields to extract from matched elements
javascriptbooleannoEnable headless browser for JS pages
timeoutnumbernoTimeout in ms (default: 10000)

Output

Returns an array of matched elements with extracted fields, plus page metadata (title, description, canonical URL).

License

MIT

Agent Schema

{
  "name": "web-scraper",
  "inputs": {
    "type": "object",
    "required": [
      "url"
    ],
    "properties": {
      "url": {
        "type": "string",
        "format": "uri"
      },
      "extract": {
        "type": "array",
        "items": {
          "type": "string"
        }
      },
      "timeout": {
        "type": "number",
        "default": 10000
      },
      "selector": {
        "type": "string"
      },
      "javascript": {
        "type": "boolean",
        "default": false
      }
    }
  },
  "outputs": {
    "type": "object",
    "properties": {
      "rawHtml": {
        "type": "string"
      },
      "elements": {
        "type": "array"
      },
      "metadata": {
        "type": "object"
      },
      "plainText": {
        "type": "string"
      }
    }
  },
  "runtime": "node",
  "version": "2.0.1",
  "description": "Extracts structured data from a given URL",
  "capabilities": [
    "scraping",
    "web",
    "extraction"
  ]
}