⛓️LangChain

Using Hyperbrowser's Document Loader Integration

Hyperbrowser provides a Document Loader integration with LangChain via the langchain-hyperbrowser package. It can be used to load the metadata and contents(in formatted markdown or html) of any site as a LangChain Document.

Installation and Setup

To get started with langchain-hyperbrowser, you can install the package using pip:

pip install langchain-hyperbrowser

And you should configure credentials by setting the following environment variables:

HYPERBROWSER_API_KEY=<your-api-key>

You can get an API Key easily from the dashboard. Once you have your API Key, add it to your .env file as HYPERBROWSER_API_KEY or you can pass it via the api_key argument in the constructor.

Document Loader

The HyperbrowserLoader class in langchain-hyperbrowser can easily be used to load content from any single page or multiple pages as well as crawl an entire site. The content can be loaded as markdown or html.

from langchain_hyperbrowser import HyperbrowserLoader

loader = HyperbrowserLoader(urls="https://example.com")
docs = loader.load()

print(docs[0])

Advanced Usage

You can specify the operation to be performed by the loader. The default operation is scrape. For scrape, you can provide a single URL or a list of URLs to be scraped. For crawl, you can only provide a single URL. The crawl operation will crawl the provided page and subpages and return a document for each page.

loader = HyperbrowserLoader(
  urls="https://hyperbrowser.ai", api_key="YOUR_API_KEY", operation="crawl"
)

Optional params for the loader can also be provided in the params argument. For more information on the supported params, you can see the params for scraping or crawling.

loader = HyperbrowserLoader(
  urls="https://example.com",
  api_key="YOUR_API_KEY",
  operation="scrape",
  params={"scrape_options": {"include_tags": ["h1", "h2", "p"]}}
)

PreviousCAPTCHA Solving NextLlamaIndex

Last updated 5 months ago