🦙LlamaIndex

Using Hyperbrowser's Web Reader Integration

Installation and Setup

To get started with LlamaIndex and Hyperbrowser, you can install the necessary packages using pip:

pip install llama-index-core llama-index-readers-web hyperbrowser

And you should configure credentials by setting the following environment variables:

HYPERBROWSER_API_KEY=<your-api-key>

You can get an API Key easily from the dashboard. Once you have your API Key, add it to your .env file as HYPERBROWSER_API_KEY or you can pass it via the api_key argument in the HyperbrowserWebReader constructor.

Usage

Once you have your API Key and have installed the packages you can load webpages into LlamaIndex using HyperbrowserWebReader.

from llama_index.readers.web import HyperbrowserWebReader

reader = HyperbrowserWebReader(api_key="your_api_key_here")

To load data, you can specify the operation to be performed by the loader. The default operation is scrape. For scrape, you can provide a single URL or a list of URLs to be scraped. For crawl, you can only provide a single URL. The crawl operation will crawl the provided page and subpages and return a document for each page. HyperbrowserWebReader supports loading and lazy loading data in both sync and async modes.

documents = reader.load_data(
    urls=["https://example.com"],
    operation="scrape",
)

Optional params for the loader can also be provided in the params argument. For more information on the supported params, you can see the params for scraping or crawling.

documents = reader.load_data(
    urls=["https://example.com"],
    operation="scrape",
    params={"scrape_options": {"include_tags": ["h1", "h2", "p"]}},
)

Last updated