HyperAgent SDK
HyperAgent Class
The HyperAgent
class provides an interface for running autonomous web agents. It manages browser instances, task execution, and integration with Model Context Protocol (MCP) servers.
Creating a HyperAgent Instance
import { HyperAgent } from "@/agent";
import { ChatOpenAI } from "@langchain/openai";
// Initialize with default settings (requires OPENAI_API_KEY env var)
const agent = new HyperAgent();
// Or, initialize with custom configuration
const agentWithConfig = new HyperAgent({
llm: new ChatOpenAI({
modelName: "gpt-4o",
apiKey: process.env.OPENAI_API_KEY,
}), // Specify the LLM
browserProvider: "Hyperbrowser", // Can be Local or Hyperbrowser depending on the browser provider you want. Defaults to Local
hyperbrowserConfig: { /* Hyperbrowser specific config if using Hyperbrowser browser provider */ },
localConfig: { /* Playwright launch option if using the Local Browser Provider*/ }
debug: true, // Enable debug logging
customActions: [ /* Array of custom actions */ ],
});
Initializing
Description: Initializes a new instance of the
HyperAgent
. Configures the language model, browser provider (local or Hyperbrowser), debug mode, and custom actions.Parameters:
params
(HyperAgentConfig
, optional): Configuration object.llm
(BaseChatModel
, optional): The language model instance to use. Defaults to using GPT-4o ifOPENAI_API_KEY
environment variable is set.browserProvider
("Hyperbrowser" | "Local"
, optional): Specifies whether to use Hyperbrowser's cloud browsers or a local Playwright instance. Defaults toLocal
.hyperbrowserConfig
(object, optional): Configuration for the Hyperbrowser provider. See more details in the types description.localConfig
(object, optional): Configuration for the local browser provider. See more details in the types descriptioncustomActions
(AgentActionDefinition[]
, optional): An array of custom actions to register with the agent. You can read more about custom actions here.debug
(boolean
, optional): Enables detailed logging if set totrue
. Defaults tofalse
. Logs are dumped to./debug
directory.
Throws:
HyperagentError
if no LLM is provided andOPENAI_API_KEY
env var is not set.
Browser and Page Management
List all current pages
async getPages(): Promise<HyperPage[]>
Description: Retrieves all currently open pages within the agent's browser context. Each page is enhanced with
ai
andaiAsync
methods for task execution.Returns:
Promise<HyperPage[]>
- An array ofHyperPage
objects.Usage:
const pages = await agent.getPages(); if (pages.length > 0) { await pages[0].ai("Summarize the content of this page."); }
Creating a newPage
async newPage(): Promise<HyperPage>
Description: Creates and returns a new page (tab) in the agent's browser context. The returned page is enhanced with
ai
andaiAsync
methods.Returns:
Promise<HyperPage>
- A newHyperPage
object.Usage:
const newPage = await agent.newPage(); await newPage.goto("https://example.com"); const summary = await newPage.ai("What is this website about?"); console.log(summary);
Get current page
async getCurrentPage(): Promise<Page>
Description: Gets the agent's currently active page. If no page exists or the current page is closed, it creates a new one. Note: This returns a standard Playwright
Page
object, not aHyperPage
.Returns:
Promise<Page>
- The current or a new PlaywrightPage
.Usage:
const currentPage = await agent.getCurrentPage(); await currentPage.goto("https://google.com");
Close agent
async closeAgent(): Promise<void>
async closeAgent(): Promise<void>
Description: Closes the agent, including the browser instance, browser context, and any active MCP connections. Cancels any tasks that are still running or paused.
Returns:
Promise<void>
Usage:
await agent.closeAgent(); console.log("Agent closed.");
Task Execution
Execute a task
async executeTask(task: string, params?: TaskParams, initPage?: Page): Promise<TaskOutput>
Description: Executes a given task instruction synchronously. The agent uses its LLM and configured actions to perform the task on a browser page. It waits for the task to complete (or fail) and returns the final output.
Parameters:
task
(string
): The natural language instruction for the task (e.g., "Find the contact email on this page").params
(TaskParams
, optional): Additional parameters for the task, likeoutputSchema
to specify the desired output format using a Zod schema, ormaxSteps
to control the number of steps. A full description can be found on the types pageinitPage
(Page
, optional): A specific PlaywrightPage
to start the task on. Defaults to the agent'scurrentPage
.
Returns:
Promise<TaskOutput>
- The result of the task, typically a string summary or structured data ifoutputSchema
was provided.Throws: Rethrows any error encountered during task execution.
Usage:
const page = await agent.newPage(); await page.goto("https://example.com"); const result = await agent.executeTask("Extract the main heading from www.example.com", { outputSchema: z.object({ heading: z.string() }) }, page); console.log(result); // { heading: "Example Domain" }
Execute a task asynchronously
async executeTaskAsync(task: string, params?: TaskParams, initPage?: Page): Promise<Task>
Description: Executes a given task instruction asynchronously. It immediately returns a
Task
control object, allowing management (pause, resume, cancel) of the background task.Parameters
task
(string
): The natural language instruction for the task (e.g., "Find the contact email on this page").params
(TaskParams
, optional): Additional parameters for the task, likeoutputSchema
to specify the desired output format using a Zod schema, ormaxSteps
to control the number of steps. A full description can be found on the types pageinitPage
(Page
, optional): A specific PlaywrightPage
to start the task on. Defaults to the agent'scurrentPage
.
Returns:
Promise<Task>
- ATask
control object with methods:getStatus(): TaskStatus
pause(): TaskStatus
resume(): TaskStatus
cancel(): TaskStatus
Usage:
const taskControl = await agent.executeTaskAsync("Go to news.google.com, and search for international news."); console.log("Task started with status:", taskControl.getStatus()); // ... later taskControl.pause(); // ... even later taskControl.cancel();
Model Context Protocol (MCP) Integration
MCP allows extending the agent's capabilities by connecting to external servers that provide additional tools/actions.
Initialize a MCP client
async initializeMCPClient(config: MCPConfig): Promise<void>
Description: Initializes the MCP client and attempts to connect to all servers specified in the configuration. Actions provided by successfully connected servers are registered with the agent.
Parameters:
config
(MCPConfig
): Configuration object containing an array ofservers
(each with connection details like URL and ID).
Returns:
Promise<void>
Usage:
await agent.initializeMCPClient({ servers: [ { id: "server1", url: "ws://localhost:8080" }, // ... other servers ] });
Connect to a single MCP server
async connectToMCPServer(serverConfig: MCPServerConfig): Promise<string | null>
Description: Connects to a single MCP server at runtime. Registers actions provided by the server if the connection is successful.
Parameters:
serverConfig
(MCPServerConfig
): Configuration for the specific server to connect to.
Returns:
Promise<string | null>
- The server ID if connection was successful, otherwisenull
.Usage:
const serverId = await agent.connectToMCPServer({ id: "runtimeServer", url: "ws://localhost:8081" }); if (serverId) { console.log(`Connected to ${serverId}`); }
Disconnect from a MCP server
disconnectFromMCPServer(serverId: string): boolean
Description: Disconnects from a specific MCP server identified by its ID. Note: This does not automatically unregister the actions provided by that server.
Parameters:
serverId
(string
): The ID of the server to disconnect from.
Returns:
boolean
-true
if disconnection was successful or the server wasn't connected,false
if an error occurred.Usage:
const success = agent.disconnectFromMCPServer("server1"); console.log("Disconnected:", success);
Check if a MCP server is connected
isMCPServerConnected(serverId: string): boolean
Description: Checks if the agent is currently connected to a specific MCP server.
Parameters:
serverId
(string
): The ID of the server to check.
Returns:
boolean
-true
if connected,false
otherwise.Usage:
if (agent.isMCPServerConnected("server1")) { console.log("Server1 is connected."); }
List MCP server ids
getMCPServerIds(): string[]
Description: Retrieves the IDs of all currently connected MCP servers.
Returns:
string[]
- An array of connected server IDs.Usage:
const connectedServers = agent.getMCPServerIds(); console.log("Connected servers:", connectedServers);
Get MCP server info
getMCPServerInfo(): Array<{ id: string; toolCount: number; toolNames: string[]; }> | null
Description: Gets information about all connected MCP servers, including their IDs and the tools (actions) they provide.
Returns:
Array<{ id: string; toolCount: number; toolNames: string[]; }> | null
- An array of server info objects, ornull
if the MCP client isn't initialized.Usage:
const serverInfo = agent.getMCPServerInfo(); if (serverInfo) { serverInfo.forEach(info => { console.log(`Server: ${info.id}, Tools: ${info.toolNames.join(', ')}`); }); }
Utility Methods
Pretty print contents of an action output
pprintAction(action: ActionType): string
Description: Generates a human-readable string representation of an agent action, if a pretty-print function is defined for that action type. Useful for logging or debugging.
Parameters:
action
(ActionType
): The action object (containingtype
andparams
).
Returns:
string
- A formatted string representing the action, or an empty string if no specific pretty-print function exists for the action type.Usage:
// Assuming 'lastAction' is an action object from a task step console.log(agent.pprintAction(lastAction)); // Example output: "Clicked element with selector '#submit-button'"
Last updated