Gemini Computer Use
Gemini Computer Use allows gemini to directly interact with your computer to perform tasks much like a human. This capability allows gemini to move the cursor, click buttons, type text, and navigate the web, thereby automating complex, multi-step workflows.
Hyperbrowser's Gemini Computer Use agent allows you to easily execute agent tasks on the web with just a simple call. Hyperbrowser exposes endpoints for starting/stopping a Gemini Computer Use task and for getting it's status and results.
By default, these tasks are handled in an asynchronous manner of first starting the task and then checking it's status until it is completed. However, if you don't want to handle the monitoring yourself, our SDKs provide a simple function that handles the whole flow and returns the data once the task is completed.
Installation
npm install @hyperbrowser/sdk
or
yarn add @hyperbrowser/sdk
Usage
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
config();
const hbClient = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const result = await hbClient.agents.geminiComputerUse.startAndWait({
task: "what are the top 5 posts on Hacker News",
});
console.log(`Output:\n\n${result.data?.finalResult}`);
};
main().catch((err) => {
console.error(`Error: ${err.message}`);
});
Gemini Computer Use tasks can be configured with a number of parameters. Some of them are described briefly here, but a list can be found in our Gemini Computer Use API Reference.
Reuse Browser Session
You can pass in an existing sessionId
to the Gemini Computer Use task so that it can execute the task on an existing session. Also, if you want to keep the session open after executing the task, you can supply the keepBrowserOpen
param.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
config();
const hbClient = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const session = await hbClient.sessions.create();
try {
const result = await hbClient.agents.geminiComputerUse.startAndWait({
task: "What is the title of the first post on Hacker News today?",
sessionId: session.id,
keepBrowserOpen: true,
});
console.log(`Output:\n${result.data?.finalResult}`);
const result2 = await hbClient.agents.geminiComputerUse.startAndWait({
task: "Tell me how many upvotes the first post has.",
sessionId: session.id,
});
console.log(`\nOutput:\n${result2.data?.finalResult}`);
} catch (err) {
console.error(`Error: ${err}`);
} finally {
await hbClient.sessions.stop(session.id);
}
};
main().catch((err) => {
console.error(`Error: ${err.message}`);
});
Use Your Own API Keys
You can provide your own API Keys to the Gemini Computer Use task so that it doesn't charge credits to your Hyperbrowser account for the tokens it consumes during execution. Only the credits for the usage of the browser itself will be charged. You will need to provide your Google API Key when useCustomApiKeys
is enabled.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
config();
const hbClient = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const result = await hbClient.agents.geminiComputerUse.startAndWait({
task: "What is the title of the first post on Hacker News today?",
useCustomApiKeys: true,
apiKeys: {
anthropic: "<ANTHROPIC_API_KEY>"
},
});
console.log(`Output:\n\n${result.data?.finalResult}`);
};
main().catch((err) => {
console.error(`Error: ${err.message}`);
});
Session Configurations
You can also provide configurations for the session that will be used to execute the task just as you would when creating a new session itself. These could include using a proxy or solving CAPTCHAs. To see the full list of session configurations, checkout the Session API Reference.
import { Hyperbrowser } from "@hyperbrowser/sdk";
import { config } from "dotenv";
config();
const hbClient = new Hyperbrowser({
apiKey: process.env.HYPERBROWSER_API_KEY,
});
const main = async () => {
const result = await hbClient.agents.geminiComputerUse.startAndWait({
task: "what are the top 5 posts on Hacker News",
sessionOptions: {
acceptCookies: true,
}
});
console.log(`Output:\n\n${result.data?.finalResult}`);
};
main().catch((err) => {
console.error(`Error: ${err.message}`);
});
Last updated