This guide will walk you through using Hyperbrowser’s scraping API to extract data from websites in markdown format. This can be useful for extracting data for LLM applications.

Overview

Hyperbrowser provides a simple API endpoint for scraping websites. The scraping process happens in two steps:

  1. Submit a URL to start a scrape job
  2. Poll the job status to get the results

Starting a Scrape Job

To start scraping a website, make a POST request to the /api/scrape endpoint with the target URL:

curl -X POST "https://app.hyperbrowser.ai/api/scrape" \
  -H "x-api-key: $HYPERBROWSER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Replace $HYPERBROWSER_API_KEY with your actual API key.

Polling the Job Status

After starting a scrape job, you can poll the job status by making a GET request to the /api/scrape/{jobId} endpoint. Replace {jobId} with the ID of the job you started.

curl -X GET "https://app.hyperbrowser.ai/api/scrape/{jobId}" \
  -H "x-api-key: $HYPERBROWSER_API_KEY"

Replace $HYPERBROWSER_API_KEY with your actual API key.

Example

import Hyperbrowser from "@hyperbrowser/sdk";

const client = new Hyperbrowser({
  apiKey: process.env.HYPERBROWSER_API_KEY,
});

(async () => {
  const startJob = await client.startScrapeJob({ url: "https://example.com" });
  console.log(startJob);

  let completed = false;
  while (!completed) {
    const job = await client.getScrapeJob(startJob.jobId);

    if (job.status === "completed") {
      completed = true;
      console.log("Scrape job completed: ", job);
    } else if (job.status === "failed") {
      console.error("Scrape job failed: ", job.error);
      completed = true;
    } else {
      console.log("Scrape job is still running: ", job.status);
      await sleep(5_000);
    }
  }
})();