Model Context Protocol

Using the MCP server for Hyperbrowser integration.

Overview

The MCP server provides a standardized interface for AI models to access Hyperbrowser's web automation capabilities. This server implementation supports key functions like web scraping, structured data extraction, and web crawling.

You can see the MCP server code at https://github.com/hyperbrowserai/mcp

Installation

Prerequisites

  • Node.js (v14 or later)

  • npm or yarn package manager

Setup

  1. Clone the repository:

git clone git@github.com:hyperbrowserai/mcp.git hyperbrowser-mcp
cd hyperbrowser-mcp
  1. Install dependencies:

npm install
  1. Build the server:

npm run build

Configuration

Client Setup

Configure your MCP client to connect to the Hyperbrowser MCP server:

{
  "mcpServers": {
    "hyperbrowser": {
      "command": "node",
      "args": ["/path/to/hyperbrowser-mcp/build/server.js"],
      "env": {
        "HB_API_KEY": "your-api-key"
      }
    }
  }
}

Alternative Setup Using Shell Script

For clients that don't support the env field (like Cursor):

{
  "mcpServers": {
    "hyperbrowser": {
      "command": "bash",
      "args": ["/path/to/hyperbrowser-mcp/run_server.sh"]
    }
  }
}

Edit run_server.sh to include your API key:

#!/bin/bash
export HB_API_KEY="your-api-key"
node /path/to/hyperbrowser-mcp/build/server.js

Tools

Scrape Webpage

Retrieves content from a specified URL in various formats.

Method: scrape_webpage

Parameters:

  • url: string - The URL to scrape

  • outputFormat: string[] - Desired output formats (markdown, html, links, screenshot)

  • apiKey: string (optional) - API key for authentication

  • sessionOptions: object (optional) - Browser session configuration

Example:

{
  "url": "https://example.com",
  "outputFormat": ["markdown", "screenshot"],
  "sessionOptions": {
    "useStealth": true,
    "acceptCookies": true
  }
}

Extract Structured Data

Extracts data from webpages according to a specified schema.

Method: extract_structured_data

Parameters:

  • urls: string[] - List of URLs to extract data from (supports wildcards)

  • prompt: string - Instructions for extraction

  • schema: object (optional) - JSON schema for the extracted data

  • apiKey: string (optional) - API key for authentication

  • sessionOptions: object (optional) - Browser session configuration

Example:

{
  "urls": ["https://example.com/products/*"],
  "prompt": "Extract product name, price, and description",
  "schema": {
    "type": "object",
    "properties": {
      "name": { "type": "string" },
      "price": { "type": "number" },
      "description": { "type": "string" }
    }
  },
  "sessionOptions": {
    "useStealth": true
  }
}

Crawl Webpages

Navigates through multiple pages on a website, optionally following links.

Method: crawl_webpages

Parameters:

  • url: string - Starting URL for crawling

  • outputFormat: string[] - Desired output formats

  • followLinks: boolean - Whether to follow page links

  • maxPages: number (default: 10) - Maximum pages to crawl

  • ignoreSitemap: boolean (optional) - Skip using site's sitemap

  • apiKey: string (optional) - API key for authentication

  • sessionOptions: object (optional) - Browser session configuration

Example:

{
  "url": "https://example.com",
  "outputFormat": ["markdown", "links"],
  "followLinks": true,
  "maxPages": 5,
  "sessionOptions": {
    "acceptCookies": true
  }
}

Session Options

All tools support these common session configuration options:

  • useStealth: boolean - Makes browser detection more difficult

  • useProxy: boolean - Routes traffic through proxy servers

  • solveCaptchas: boolean - Automatically solves CAPTCHA challenges

  • acceptCookies: boolean - Automatically handles cookie consent popups

Last updated