HyperAgent SDK

HyperAgent Class

The HyperAgent class provides an interface for running autonomous web agents. It manages browser instances, task execution, and integration with Model Context Protocol (MCP) servers.

Creating a HyperAgent Instance

import { HyperAgent } from "@/agent";
import { ChatOpenAI } from "@langchain/openai";

// Initialize with default settings (requires OPENAI_API_KEY env var)
const agent = new HyperAgent();

// Or, initialize with custom configuration
const agentWithConfig = new HyperAgent({
    llm: new ChatOpenAI({ 
        modelName: "gpt-4o",
        apiKey: process.env.OPENAI_API_KEY,
    }), // Specify the LLM
    browserProvider: "Hyperbrowser", // Can be Local or Hyperbrowser depending on the browser provider you want. Defaults to Local
    hyperbrowserConfig: { /* Hyperbrowser specific config if using Hyperbrowser browser provider */ },
    localConfig: { /* Playwright launch option if using the Local Browser Provider*/ }
    debug: true, // Enable debug logging
    customActions: [ /* Array of custom actions */ ],
});

Initializing

  • Description: Initializes a new instance of the HyperAgent. Configures the language model, browser provider (local or Hyperbrowser), debug mode, and custom actions.

  • Parameters:

    • params (HyperAgentConfig, optional): Configuration object.

      • llm (BaseChatModel, optional): The language model instance to use. Defaults to using GPT-4o if OPENAI_API_KEY environment variable is set.

      • browserProvider ("Hyperbrowser" | "Local", optional): Specifies whether to use Hyperbrowser's cloud browsers or a local Playwright instance. Defaults to Local.

      • hyperbrowserConfig (object, optional): Configuration for the Hyperbrowser provider. See more details in the types description.

      • localConfig (object, optional): Configuration for the local browser provider. See more details in the types description

      • customActions (AgentActionDefinition[], optional): An array of custom actions to register with the agent. You can read more about custom actions here.

      • debug (boolean, optional): Enables detailed logging if set to true. Defaults to false. Logs are dumped to ./debug directory.

  • Throws: HyperagentError if no LLM is provided and OPENAI_API_KEY env var is not set.

Browser and Page Management

List all current pages

async getPages(): Promise<HyperPage[]>

  • Description: Retrieves all currently open pages within the agent's browser context. Each page is enhanced with ai and aiAsync methods for task execution.

  • Returns: Promise<HyperPage[]> - An array of HyperPage objects.

  • Usage:

    const pages = await agent.getPages();
    if (pages.length > 0) {
        await pages[0].ai("Summarize the content of this page.");
    }

Creating a newPage

async newPage(): Promise<HyperPage>

  • Description: Creates and returns a new page (tab) in the agent's browser context. The returned page is enhanced with ai and aiAsync methods.

  • Returns: Promise<HyperPage> - A new HyperPage object.

  • Usage:

    const newPage = await agent.newPage();
    await newPage.goto("https://example.com");
    const summary = await newPage.ai("What is this website about?");
    console.log(summary);

Get current page

async getCurrentPage(): Promise<Page>

  • Description: Gets the agent's currently active page. If no page exists or the current page is closed, it creates a new one. Note: This returns a standard Playwright Page object, not a HyperPage.

  • Returns: Promise<Page> - The current or a new Playwright Page.

  • Usage:

    const currentPage = await agent.getCurrentPage();
    await currentPage.goto("https://google.com");

Close agent

async closeAgent(): Promise<void>

  • Description: Closes the agent, including the browser instance, browser context, and any active MCP connections. Cancels any tasks that are still running or paused.

  • Returns: Promise<void>

  • Usage:

    await agent.closeAgent();
    console.log("Agent closed.");

Task Execution

Execute a task

async executeTask(task: string, params?: TaskParams, initPage?: Page): Promise<TaskOutput>

  • Description: Executes a given task instruction synchronously. The agent uses its LLM and configured actions to perform the task on a browser page. It waits for the task to complete (or fail) and returns the final output.

  • Parameters:

    • task (string): The natural language instruction for the task (e.g., "Find the contact email on this page").

    • params (TaskParams, optional): Additional parameters for the task, like outputSchema to specify the desired output format using a Zod schema, or maxSteps to control the number of steps. A full description can be found on the types page

    • initPage (Page, optional): A specific Playwright Page to start the task on. Defaults to the agent's currentPage.

  • Returns: Promise<TaskOutput> - The result of the task, typically a string summary or structured data if outputSchema was provided.

  • Throws: Rethrows any error encountered during task execution.

  • Usage:

    const page = await agent.newPage();
    await page.goto("https://example.com");
    const result = await agent.executeTask("Extract the main heading from www.example.com", { outputSchema: z.object({ heading: z.string() }) }, page);
    console.log(result); // { heading: "Example Domain" }

Execute a task asynchronously

async executeTaskAsync(task: string, params?: TaskParams, initPage?: Page): Promise<Task>

  • Description: Executes a given task instruction asynchronously. It immediately returns a Task control object, allowing management (pause, resume, cancel) of the background task.

  • Parameters

    • task (string): The natural language instruction for the task (e.g., "Find the contact email on this page").

    • params (TaskParams, optional): Additional parameters for the task, like outputSchema to specify the desired output format using a Zod schema, or maxSteps to control the number of steps. A full description can be found on the types page

    • initPage (Page, optional): A specific Playwright Page to start the task on. Defaults to the agent's currentPage.

  • Returns: Promise<Task> - A Task control object with methods:

    • getStatus(): TaskStatus

    • pause(): TaskStatus

    • resume(): TaskStatus

    • cancel(): TaskStatus

  • Usage:

    const taskControl = await agent.executeTaskAsync("Go to news.google.com, and search for international news.");
    console.log("Task started with status:", taskControl.getStatus());
    // ... later
    taskControl.pause();
    // ... even later
    taskControl.cancel();

Model Context Protocol (MCP) Integration

MCP allows extending the agent's capabilities by connecting to external servers that provide additional tools/actions.

Initialize a MCP client

async initializeMCPClient(config: MCPConfig): Promise<void>

  • Description: Initializes the MCP client and attempts to connect to all servers specified in the configuration. Actions provided by successfully connected servers are registered with the agent.

  • Parameters:

    • config (MCPConfig): Configuration object containing an array of servers (each with connection details like URL and ID).

  • Returns: Promise<void>

  • Usage:

    await agent.initializeMCPClient({
        servers: [
            { id: "server1", url: "ws://localhost:8080" },
            // ... other servers
        ]
    });

Connect to a single MCP server

async connectToMCPServer(serverConfig: MCPServerConfig): Promise<string | null>

  • Description: Connects to a single MCP server at runtime. Registers actions provided by the server if the connection is successful.

  • Parameters:

    • serverConfig (MCPServerConfig): Configuration for the specific server to connect to.

  • Returns: Promise<string | null> - The server ID if connection was successful, otherwise null.

  • Usage:

    const serverId = await agent.connectToMCPServer({ id: "runtimeServer", url: "ws://localhost:8081" });
    if (serverId) {
        console.log(`Connected to ${serverId}`);
    }

Disconnect from a MCP server

disconnectFromMCPServer(serverId: string): boolean

  • Description: Disconnects from a specific MCP server identified by its ID. Note: This does not automatically unregister the actions provided by that server.

  • Parameters:

    • serverId (string): The ID of the server to disconnect from.

  • Returns: boolean - true if disconnection was successful or the server wasn't connected, false if an error occurred.

  • Usage:

    const success = agent.disconnectFromMCPServer("server1");
    console.log("Disconnected:", success);

Check if a MCP server is connected

isMCPServerConnected(serverId: string): boolean

  • Description: Checks if the agent is currently connected to a specific MCP server.

  • Parameters:

    • serverId (string): The ID of the server to check.

  • Returns: boolean - true if connected, false otherwise.

  • Usage:

    if (agent.isMCPServerConnected("server1")) {
        console.log("Server1 is connected.");
    }

List MCP server ids

getMCPServerIds(): string[]

  • Description: Retrieves the IDs of all currently connected MCP servers.

  • Returns: string[] - An array of connected server IDs.

  • Usage:

    const connectedServers = agent.getMCPServerIds();
    console.log("Connected servers:", connectedServers);

Get MCP server info

getMCPServerInfo(): Array<{ id: string; toolCount: number; toolNames: string[]; }> | null

  • Description: Gets information about all connected MCP servers, including their IDs and the tools (actions) they provide.

  • Returns: Array<{ id: string; toolCount: number; toolNames: string[]; }> | null - An array of server info objects, or null if the MCP client isn't initialized.

  • Usage:

    const serverInfo = agent.getMCPServerInfo();
    if (serverInfo) {
        serverInfo.forEach(info => {
            console.log(`Server: ${info.id}, Tools: ${info.toolNames.join(', ')}`);
        });
    }

Utility Methods

Pretty print contents of an action output

pprintAction(action: ActionType): string

  • Description: Generates a human-readable string representation of an agent action, if a pretty-print function is defined for that action type. Useful for logging or debugging.

  • Parameters:

    • action (ActionType): The action object (containing type and params).

  • Returns: string - A formatted string representing the action, or an empty string if no specific pretty-print function exists for the action type.

  • Usage:

    // Assuming 'lastAction' is an action object from a task step
    console.log(agent.pprintAction(lastAction));
    // Example output: "Clicked element with selector '#submit-button'"

Last updated