web-scraper
v2.0.1Extracts structured data from any URL. Returns clean HTML, text, links, and metadata. Supports JavaScript-rendered pages via headless browser.
Install
bloom install web-scraper@2.0.1
Stars2
Downloads140
Version2.0.1
PublishedMar 4, 2026
Author
Guillermo Rauch
@rauchg
README
Overview
web-scraper fetches and parses web pages, returning structured data you can use downstream in your agent pipeline.
Usage
{
"url": "https://example.com/products",
"selector": ".product-card",
"extract": ["title", "price", "href"],
"javascript": true
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | yes | URL to scrape |
| selector | string | no | CSS selector to target elements |
| extract | string[] | no | Fields to extract from matched elements |
| javascript | boolean | no | Enable headless browser for JS pages |
| timeout | number | no | Timeout in ms (default: 10000) |
Output
Returns an array of matched elements with extracted fields, plus page metadata (title, description, canonical URL).
License
MIT
Agent Schema
{
"name": "web-scraper",
"inputs": {
"type": "object",
"required": [
"url"
],
"properties": {
"url": {
"type": "string",
"format": "uri"
},
"extract": {
"type": "array",
"items": {
"type": "string"
}
},
"timeout": {
"type": "number",
"default": 10000
},
"selector": {
"type": "string"
},
"javascript": {
"type": "boolean",
"default": false
}
}
},
"outputs": {
"type": "object",
"properties": {
"rawHtml": {
"type": "string"
},
"elements": {
"type": "array"
},
"metadata": {
"type": "object"
},
"plainText": {
"type": "string"
}
}
},
"runtime": "node",
"version": "2.0.1",
"description": "Extracts structured data from a given URL",
"capabilities": [
"scraping",
"web",
"extraction"
]
}