Hyperbrowser exposes endpoints for starting an extract request and for getting it's status and results. By default, extracting is handled in an asynchronous manner of first starting the job and then checking it's status until it is completed. However, with our SDKs, we provide a simple function that handles the whole flow and returns the data once the job is completed.
Installation
npm install @hyperbrowser/sdk
or
yarn add @hyperbrowser/sdk
pip install hyperbrowser
Usage
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
import { z } from "zod";
config();
const client = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const schema = z.object({
productName: z.string(),
productOverview: z.string(),
keyFeatures: z.array(z.string()),
pricing: z.array(
z.object({
plan: z.string(),
price: z.string(),
features: z.array(z.string()),
})
),
});
// Handles both starting and waiting for extract job response
const result = await client.extract.startAndWait({
urls: ["https://hyperbrowser.ai"],
prompt:
"Extract the product name, an overview of the product, its key features, and a list of its pricing plans from the page.",
schema: schema,
});
console.log("result", JSON.stringify(result, null, 2));
};
main();
import os
import json
from typing import List
from dotenv import load_dotenv
from hyperbrowser import Hyperbrowser
from hyperbrowser.models.extract import StartExtractJobParams
from pydantic import BaseModel
# Load environment variables from .env file
load_dotenv()
# Initialize Hyperbrowser client
client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
class PricingSchema(BaseModel):
plan: str
price: str
features: List[str]
class ExtractSchema(BaseModel):
product_name: str
product_overview: str
key_features: List[str]
pricing: List[PricingSchema]
def main():
result = client.extract.start_and_wait(
params=StartExtractJobParams(
urls=["https://hyperbrowser.ai"],
prompt="Extract the product name, an overview of the product, its key features, and a list of its pricing plans from the page.",
schema=ExtractSchema,
)
)
print("result:", json.dumps(result.data, indent=2))
main()
Start Extract Job
curl -X POST https://app.hyperbrowser.ai/api/extract \
-H 'Content-Type: application/json' \
-H 'x-api-key: <YOUR_API_KEY>' \
-d '{
"urls": ["https://hyperbrowser.ai"],
"prompt": "Extract the product name, an overview of the product, its key features, and a list of its pricing plans from the page.",
"schema": {
"type": "object",
"properties": {
"productName": {
"type": "string"
},
"productOverview": {
"type": "string"
},
"keyFeatures": {
"type": "array",
"items": {
"type": "string"
}
},
"pricing": {
"type": "array",
"items": {
"type": "object",
"properties": {
"plan": {
"type": "string"
},
"price": {
"type": "string"
},
"features": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"plan",
"price",
"features"
]
}
}
},
"required": [
"productName",
"productOverview",
"keyFeatures",
"pricing"
]
}
}'
You can configure the extract request with the following parameters:
urls - A required list of urls you want to use to extract data from. To allow crawling for any of the urls provided in the list, simply add /* to the end of the url (https://hyperbrowser.ai/*). This will crawl other pages on the site with the same origin and find relevant pages to use for the extraction context.
schema - A strict json schema you want the returned data to be structured as. Gives the best results.
prompt - A prompt describing how you want the data structured. Useful if you don't have a specific schema in mind.
maxLinks - The maximum number of links to look for if performing a crawl for any given url.
waitFor - A delay in milliseconds to wait after the page loads before initiating the scrape to get data for extraction from page. This can be useful for allowing dynamic content to fully render. This is also useful for waiting to detect CAPTCHAs on the page if you have solveCaptchas set to true in the sessionOptions.
You must provide either a schema or a prompt in your request, and if both are provided the schema takes precedence.
For the Node SDK, you can simply pass in a zod schema for ease of use or an actual json schema. For the Python SDK, you can pass in a pydantic model or an actual json schema.
Ensure that the root level of the schema is type: "object" .
Response
The Start Extract Job POST /extract endpoint will return a jobId in the response which can be used to get information about the job in subsequent requests.
The Get Extract Job GET /extract/{jobId} will return the following data:
{
"jobId": "962372c4-a140-400b-8c26-4ffe21d9fb9c",
"status": "completed",
"data": {
"pricing": [
{
"plan": "Free",
"price": "$0",
"features": [
"3,000 Credits Included",
"5 Concurrent Browsers",
"7 Days Data Retention",
"Basic Stealth Mode"
]
},
{
"plan": "Startup",
"price": "$30 / Month",
"features": [
"18,000 Credits Included",
"25 Concurrent Browsers",
"30 Day Data Retention",
"Auto Captcha Solving",
"Basic Stealth Mode"
]
},
{
"plan": "Scale",
"price": "$100 / Month",
"features": [
"60,000 Credits Included",
"100 Concurrent Browsers",
"30 Day Data Retention",
"Auto Captcha Solving",
"Advanced Stealth Mode"
]
},
{
"plan": "Enterprise",
"price": "Custom",
"features": [
"Volume discounts available",
"Premium Support",
"HIPAA/SOC 2",
"250+ Concurrent Browsers",
"180+ Day Data Retention",
"Auto Captcha Solving",
"Advanced Stealth Mode"
]
}
],
"keyFeatures": [
"Run headless browsers to automate tasks like web scraping, testing, and form filling.",
"Use browsers to scrape and structure web data at scale for analysis and insights.",
"Integrate with AI agents to enable browsing, data collection, and interaction with web apps.",
"Automatically solve captchas to streamline automation workflows.",
"Operate browsers in stealth mode to bypass bot detection and stay undetected.",
"Manage browser sessions with logging, debugging, and secure resource isolation."
],
"productName": "Hyperbrowser",
"productOverview": "Hyperbrowser is a platform for running and scaling headless browsers in secure, isolated containers. Built for web automation and AI-driven use cases."
}
}
The status of an extract job can be one of pending, running, completed, failed . There can also be an optional error field with an error message if an error was encountered.
You can also provide configurations for the session that will be used to execute the extract job just as you would when creating a new session itself. These could include using a proxy or solving CAPTCHAs. To see the full list of session configurations, checkout the Session API Reference.
import { config } from "dotenv";
import { z } from "zod";
config();
const client = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const schema = z.object({
productName: z.string(),
productOverview: z.string(),
keyFeatures: z.array(z.string()),
pricing: z.array(
z.object({
plan: z.string(),
price: z.string(),
features: z.array(z.string()),
})
),
});
const result = await client.extract.startAndWait({
urls: ["https://hyperbrowser.ai"],
prompt:
"Extract the product name, an overview of the product, its key features, and its pricing plans from the page.",
schema: schema,
// include sessionOptions
sessionOptions: {
useProxy: true,
solveCaptchas: true,
},
});
console.log("result", JSON.stringify(result, null, 2));
};
main();
import os
from dotenv import load_dotenv
from hyperbrowser import Hyperbrowser
from hyperbrowser.models.extract import StartExtractJobParams
from pydantic import BaseModel
load_dotenv()
client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
class PricingSchema(BaseModel):
plan: str
price: str
features: List[str]
class ExtractSchema(BaseModel):
product_name: str
product_overview: str
key_features: List[str]
pricing: List[PricingSchema]
def main():
result = client.extract.start_and_wait(
params=StartExtractJobParams(
urls=["https://hyperbrowser.ai"],
prompt="Extract the product name, an overview of the product, its key features, and a list of its pricing plans from the page.",
schema=ExtractSchema,
# include session_options
session_options=CreateSessionParams(use_proxy=True, solve_captchas=True),
)
)
print("result:", json.dumps(result.data, indent=2))
main()
Hyperbrowser's CAPTCHA solving and proxy usage features require being on a PAID plan.
Using proxy and solving CAPTCHAs will slow down the page scraping in the extract job so use it only if necessary.
For a full reference on the extract endpoint, checkout the API Reference.
Billing
Credit usage for extract jobs are charged based on the total number of output tokens used for successful extract jobs. Each output token costs 0.015 credits which comes out to $30 per million output tokens.