Browser Use
Browser Use is an open-source tool designed to make websites accessible for AI agents by enabling them to interact with web pages as a human user would. It provides a framework that allows AI systems to navigate, interpret, and manipulate web content, facilitating tasks such as data extraction, web automation, and testing.
Hyperbrowser's browser-use agent allows you to easily execute agent tasks on the web utilizing browser-use with just a simple call. Hyperbrowser exposes endpoints for starting/stopping a browser-use task and for getting it's status and results.
By default, browser-use tasks are handled in an asynchronous manner of first starting the task and then checking it's status until it is completed. However, if you don't want to handle the monitoring yourself, our SDKs provide a simple function that handles the whole flow and returns the data once the task is completed.
Installation
npm install @hyperbrowser/sdk
or
yarn add @hyperbrowser/sdk
pip install hyperbrowser
or
uv add hyperbrowser
Usage
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
config();
const hbClient = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const result = await hbClient.agents.browserUse.startAndWait({
task: "What is the title of the first post on Hacker News today?",
});
console.log(`Output:\n\n${result.data?.finalResult}`);
};
main().catch((err) => {
console.error(`Error: ${err.message}`);
});
import os
from hyperbrowser import Hyperbrowser
from hyperbrowser.models import StartBrowserUseTaskParams
from dotenv import load_dotenv
load_dotenv()
hb_client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
def main():
resp = hb_client.agents.browser_use.start_and_wait(
StartBrowserUseTaskParams(
task="go to Hacker News and summarize the top 5 posts of the day"
)
)
print(f"Output:\n\n{resp.data.final_result}")
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"Error: {e}")
Start browser-use task
curl -X POST https://app.hyperbrowser.ai/api/task/browser-use \
-H 'Content-Type: application/json' \
-H 'x-api-key: <YOUR_API_KEY>' \
-d '{
"task": "go to Hacker News and summarize the top 5 posts of the day"
}'
Get browser-use task status
curl https://app.hyperbrowser.ai/api/task/browser-use/{jobId}/status \
-H 'x-api-key: <YOUR_API_KEY>'
Get browser-use task
curl https://app.hyperbrowser.ai/api/task/browser-use/{jobId} \
-H 'x-api-key: <YOUR_API_KEY>'
Stop browser-use task
curl -X PUT https://app.hyperbrowser.ai/api/task/browser-use/{jobId}/stop \
-H 'x-api-key: <YOUR_API_KEY>'
Browser-Use Task parameters
sessionId
- An optional existing browser session ID to connect to instead of creating a new one.
validateOutput
- When enabled, validates the agent's output format to ensure proper structure.
useVision
- When enabled, allows the agent to analyze screenshots of the webpage for better context understanding.
useVisionForPlanner
- When enabled, provides screenshots to the planning component of the agent.
maxActionsPerStep
- The maximum number of actions the agent can perform in a single step before reassessing.
maxInputTokens
- Maximum token limit for inputs sent to the language model, preventing oversized contexts.
plannerLlm
- The language model to use specifically for planning future actions, can differ from the main LLM. By default, Hyperbrowser will use Gemini-2 Flash
pageExtractionLlm
- The language model to use for extracting structured data from webpages. By default, Hyperbrowser will use Gemini-2 Flash
plannerInterval
- How often the planner runs (measured in agent steps) to reassess the overall strategy.
maxSteps
- The maximum number of steps the agent can take before concluding the task.maxFailures
- The maximum number of failures allowed before the task is aborted.initialActions
- List of initial actions to run before the main task.
keepBrowserOpen
- When enabled, keeps the browser session open after task completion.
The agent may not complete the task within the specified maxSteps
. If that happens, try increasing the maxSteps
parameter.
Reuse Browser Session
You can pass in an existing sessionId
to the Browser Use task so that it can execute the task on an existing session. Also, if you want to keep the session open after executing the task, you can supply the keepBrowserOpen
param.
In the examples below, the keepBrowserOpen
field is not set to true in the second call to the AI Agent so it will close the browser session after execution, and the session is being closed at the end with the stop
function to make sure it gets closed.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
config();
const hbClient = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const session = await hbClient.sessions.create();
try {
const result = await hbClient.agents.browserUse.startAndWait({
task: "What is the title of the first post on Hacker News today?",
sessionId: session.id,
keepBrowserOpen: true,
});
console.log(`Output:\n${result.data?.finalResult}`);
const result2 = await hbClient.agents.browserUse.startAndWait({
task: "Tell me how many upvotes the first post has.",
sessionId: session.id,
});
console.log(`\nOutput:\n${result2.data?.finalResult}`);
} catch (err) {
console.error(`Error: ${err}`);
} finally {
await hbClient.sessions.stop(session.id);
}
};
main().catch((err) => {
console.error(`Error: ${err.message}`);
});
import os
from hyperbrowser import Hyperbrowser
from hyperbrowser.models import StartBrowserUseTaskParams
from dotenv import load_dotenv
load_dotenv()
hb_client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
def main():
session = hb_client.sessions.create()
try:
resp = hb_client.agents.browser_use.start_and_wait(
StartBrowserUseTaskParams(
task="What is the title of the first post on Hacker News today?",
session_id=session.id,
keep_browser_open=True,
)
)
print(f"Output:\n{resp.data.final_result}")
resp2 = hb_client.agents.browser_use.start_and_wait(
StartBrowserUseTaskParams(
task="Tell me how many upvotes the first post has.",
session_id=session.id,
)
)
print(f"\nOutput:\n{resp2.data.final_result}")
except Exception as e:
print(f"Error: {e}")
finally:
hb_client.sessions.stop(session.id)
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"Error: {e}")
Session Configurations
The sessionOptions
will only apply if creating a new session when no sessionId
is provided.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
config();
const hbClient = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const result = await hbClient.agents.browserUse.startAndWait({
task: "go to Hacker News and summarize the top 5 posts of the day",
sessionOptions: {
acceptCookies: true,
}
});
console.log(`Output:\n\n${result.data?.finalResult}`);
};
main().catch((err) => {
console.error(`Error: ${err.message}`);
});
import os
from hyperbrowser import Hyperbrowser
from hyperbrowser.models import StartBrowserUseTaskParams, CreateSessionParams
from dotenv import load_dotenv
load_dotenv()
hb_client = Hyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
def main():
resp = hb_client.agents.browser_use.start_and_wait(
StartBrowserUseTaskParams(
task="go to Hacker News and summarize the top 5 posts of the day",
session_options=CreateSessionParams(
accept_cookies=True,
),
)
)
print(f"Output:\n\n{resp.data.final_result}")
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"Error: {e}")
curl -X POST https://app.hyperbrowser.ai/api/task/browser-use \
-H 'Content-Type: application/json' \
-H 'x-api-key: <YOUR_API_KEY>' \
-d '{
"task": "go to Hacker News and summarize the top 5 posts of the day",
"sessionOptions": {
"acceptCookies": true
}
}'
Hyperbrowser's CAPTCHA solving and proxy usage features require being on a PAID
plan.
Using proxy and solving CAPTCHAs will slow down the web navigation in the browser-use task so use it only if necessary.
Last updated