web-scraper — Bloom

README

Overview

web-scraper fetches and parses web pages, returning structured data you can use downstream in your agent pipeline.

Usage

{
  "url": "https://example.com/products",
  "selector": ".product-card",
  "extract": ["title", "price", "href"],
  "javascript": true
}

Parameters

Parameter	Type	Required	Description
url	string	yes	URL to scrape
selector	string	no	CSS selector to target elements
extract	string[]	no	Fields to extract from matched elements
javascript	boolean	no	Enable headless browser for JS pages
timeout	number	no	Timeout in ms (default: 10000)

Output

Returns an array of matched elements with extracted fields, plus page metadata (title, description, canonical URL).

License

MIT

Agent Schema

{ "name": "web-scraper", "inputs": { "type": "object", "required": [ "url" ], "properties": { "url": { "type": "string", "format": "uri" }, "extract": { "type": "array", "items": { "type": "string" } }, "timeout": { "type": "number", "default": 10000 }, "selector": { "type": "string" }, "javascript": { "type": "boolean", "default": false } } }, "outputs": { "type": "object", "properties": { "rawHtml": { "type": "string" }, "elements": { "type": "array" }, "metadata": { "type": "object" }, "plainText": { "type": "string" } } }, "runtime": "node", "version": "2.0.1", "description": "Extracts structured data from a given URL", "capabilities": [ "scraping", "web", "extraction" ] }