HyperAgent SDK

HyperAgent Class

The HyperAgent class provides an interface for running autonomous web agents. It manages browser instances, task execution, and integration with Model Context Protocol (MCP) servers.

Creating a HyperAgent Instance

import { HyperAgent } from "@/agent";
import { ChatOpenAI } from "@langchain/openai";

// Initialize with default settings (requires OPENAI_API_KEY env var)
const agent = new HyperAgent();

// Or, initialize with custom configuration
const agentWithConfig = new HyperAgent({
    llm: new ChatOpenAI({ 
        modelName: "gpt-4o",
        apiKey: process.env.OPENAI_API_KEY,
    }), // Specify the LLM
    browserProvider: "Hyperbrowser", // Can be Local or Hyperbrowser depending on the browser provider you want. Defaults to Local
    hyperbrowserConfig: { /* Hyperbrowser specific config if using Hyperbrowser browser provider */ },
    localConfig: { /* Playwright launch option if using the Local Browser Provider*/ }
    debug: true, // Enable debug logging
    customActions: [ /* Array of custom actions */ ],
});

Initializing

Description: Initializes a new instance of the HyperAgent. Configures the language model, browser provider (local or Hyperbrowser), debug mode, and custom actions.
Parameters:
- params (HyperAgentConfig, optional): Configuration object.
  - llm (BaseChatModel, optional): The language model instance to use. Defaults to using GPT-4o if OPENAI_API_KEY environment variable is set.
  - browserProvider ("Hyperbrowser" | "Local", optional): Specifies whether to use Hyperbrowser's cloud browsers or a local Playwright instance. Defaults to Local.
  - hyperbrowserConfig (object, optional): Configuration for the Hyperbrowser provider. See more details in the types description.
  - localConfig (object, optional): Configuration for the local browser provider. See more details in the types description
  - customActions (AgentActionDefinition[], optional): An array of custom actions to register with the agent. You can read more about custom actions here.
  - debug (boolean, optional): Enables detailed logging if set to true. Defaults to false. Logs are dumped to ./debug directory.
Throws: HyperagentError if no LLM is provided and OPENAI_API_KEY env var is not set.

Browser and Page Management

List all current pages

async getPages(): Promise<HyperPage[]>

Description: Retrieves all currently open pages within the agent's browser context. Each page is enhanced with ai and aiAsync methods for task execution.
Returns: Promise<HyperPage[]> - An array of HyperPage objects.

Usage:

const pages = await agent.getPages();
if (pages.length > 0) {
    await pages[0].ai("Summarize the content of this page.");
}

Creating a newPage

async newPage(): Promise<HyperPage>

Description: Creates and returns a new page (tab) in the agent's browser context. The returned page is enhanced with ai and aiAsync methods.
Returns: Promise<HyperPage> - A new HyperPage object.

Usage:

const newPage = await agent.newPage();
await newPage.goto("https://example.com");
const summary = await newPage.ai("What is this website about?");
console.log(summary);

Get current page

async getCurrentPage(): Promise<Page>

Description: Gets the agent's currently active page. If no page exists or the current page is closed, it creates a new one. Note: This returns a standard Playwright Page object, not a HyperPage.
Returns: Promise<Page> - The current or a new Playwright Page.

Usage:

const currentPage = await agent.getCurrentPage();
await currentPage.goto("https://google.com");

Close agent

`async closeAgent(): Promise<void>`

Description: Closes the agent, including the browser instance, browser context, and any active MCP connections. Cancels any tasks that are still running or paused.
Returns: Promise<void>

Usage:

await agent.closeAgent();
console.log("Agent closed.");

Task Execution

Execute a task

async executeTask(task: string, params?: TaskParams, initPage?: Page): Promise<TaskOutput>

Description: Executes a given task instruction synchronously. The agent uses its LLM and configured actions to perform the task on a browser page. It waits for the task to complete (or fail) and returns the final output.
Parameters:
- task (string): The natural language instruction for the task (e.g., "Find the contact email on this page").
- params (TaskParams, optional): Additional parameters for the task, like outputSchema to specify the desired output format using a Zod schema, or maxSteps to control the number of steps. A full description can be found on the types page
- initPage (Page, optional): A specific Playwright Page to start the task on. Defaults to the agent's currentPage.
Returns: Promise<TaskOutput> - The result of the task, typically a string summary or structured data if outputSchema was provided.
Throws: Rethrows any error encountered during task execution.

Usage:

const page = await agent.newPage();
await page.goto("https://example.com");
const result = await agent.executeTask("Extract the main heading from www.example.com", { outputSchema: z.object({ heading: z.string() }) }, page);
console.log(result); // { heading: "Example Domain" }

Execute a task asynchronously

async executeTaskAsync(task: string, params?: TaskParams, initPage?: Page): Promise<Task>

Description: Executes a given task instruction asynchronously. It immediately returns a Task control object, allowing management (pause, resume, cancel) of the background task.
Parameters
- task (string): The natural language instruction for the task (e.g., "Find the contact email on this page").
- params (TaskParams, optional): Additional parameters for the task, like outputSchema to specify the desired output format using a Zod schema, or maxSteps to control the number of steps. A full description can be found on the types page
- initPage (Page, optional): A specific Playwright Page to start the task on. Defaults to the agent's currentPage.
Returns: Promise<Task> - A Task control object with methods:
- getStatus(): TaskStatus
- pause(): TaskStatus
- resume(): TaskStatus
- cancel(): TaskStatus

Usage:

const taskControl = await agent.executeTaskAsync("Go to news.google.com, and search for international news.");
console.log("Task started with status:", taskControl.getStatus());
// ... later
taskControl.pause();
// ... even later
taskControl.cancel();

Model Context Protocol (MCP) Integration

MCP allows extending the agent's capabilities by connecting to external servers that provide additional tools/actions.

Initialize a MCP client

async initializeMCPClient(config: MCPConfig): Promise<void>

Description: Initializes the MCP client and attempts to connect to all servers specified in the configuration. Actions provided by successfully connected servers are registered with the agent.
Parameters:
- config (MCPConfig): Configuration object containing an array of servers (each with connection details like URL and ID).
Returns: Promise<void>

Usage:

await agent.initializeMCPClient({
    servers: [
        { id: "server1", url: "ws://localhost:8080" },
        // ... other servers
    ]
});

Connect to a single MCP server

async connectToMCPServer(serverConfig: MCPServerConfig): Promise<string | null>

Description: Connects to a single MCP server at runtime. Registers actions provided by the server if the connection is successful.
Parameters:
- serverConfig (MCPServerConfig): Configuration for the specific server to connect to.
Returns: Promise<string | null> - The server ID if connection was successful, otherwise null.

Usage:

const serverId = await agent.connectToMCPServer({ id: "runtimeServer", url: "ws://localhost:8081" });
if (serverId) {
    console.log(`Connected to ${serverId}`);
}

Disconnect from a MCP server

disconnectFromMCPServer(serverId: string): boolean

Description: Disconnects from a specific MCP server identified by its ID. Note: This does not automatically unregister the actions provided by that server.
Parameters:
- serverId (string): The ID of the server to disconnect from.
Returns: boolean - true if disconnection was successful or the server wasn't connected, false if an error occurred.

Usage:

const success = agent.disconnectFromMCPServer("server1");
console.log("Disconnected:", success);

Check if a MCP server is connected

isMCPServerConnected(serverId: string): boolean

Description: Checks if the agent is currently connected to a specific MCP server.
Parameters:
- serverId (string): The ID of the server to check.
Returns: boolean - true if connected, false otherwise.

Usage:

if (agent.isMCPServerConnected("server1")) {
    console.log("Server1 is connected.");
}

List MCP server ids

getMCPServerIds(): string[]

Description: Retrieves the IDs of all currently connected MCP servers.
Returns: string[] - An array of connected server IDs.

Usage:

const connectedServers = agent.getMCPServerIds();
console.log("Connected servers:", connectedServers);

Get MCP server info

getMCPServerInfo(): Array<{ id: string; toolCount: number; toolNames: string[]; }> | null

Description: Gets information about all connected MCP servers, including their IDs and the tools (actions) they provide.
Returns: Array<{ id: string; toolCount: number; toolNames: string[]; }> | null - An array of server info objects, or null if the MCP client isn't initialized.

Usage:

const serverInfo = agent.getMCPServerInfo();
if (serverInfo) {
    serverInfo.forEach(info => {
        console.log(`Server: ${info.id}, Tools: ${info.toolNames.join(', ')}`);
    });
}

Utility Methods

Pretty print contents of an action output

pprintAction(action: ActionType): string

Description: Generates a human-readable string representation of an agent action, if a pretty-print function is defined for that action type. Useful for logging or debugging.
Parameters:
- action (ActionType): The action object (containing type and params).
Returns: string - A formatted string representing the action, or an empty string if no specific pretty-print function exists for the action type.

Usage:

// Assuming 'lastAction' is an action object from a task step
console.log(agent.pprintAction(lastAction));
// Example output: "Clicked element with selector '#submit-button'"

PreviousAbout HyperAgent NextHyperAgent Types

Last updated 2 months ago