Web Scraping
Web scraping with Hyperbrowser
Web scraping is the process of extracting data from websites. Hyperbrowser provides a simple API endpoint for scraping or crawling websites. The scraping process happens in two steps:
- Submit a URL to start a scrape job
- Poll the job status to get the results
When to Use What
-
Use the Scrape Endpoint for:
- Quick extraction of data from a single URL.
- Testing and prototyping your data extraction logic.
- Gleaning metadata from a handful of pages.
-
Use the Crawl Endpoint for:
- Larger-scale data gathering from multiple pages.
- Automating site-wide audits, content indexing, or SEO analysis.
- Building datasets by crawling entire sections of a website.
How to Use the Endpoints
Hyperbrowser provides SDKs in node and python so you can easily get started in minutes. Let’s setup the Node.js SDK in our project.
If you haven’t already, you can sign up for a free account at app.hyperbrowser.ai and get your API key.
1. Install the SDK
or
2. Starting a Scrape Job
3. Starting a Crawl Job
Error Handling and Status Checks
Both the scrape and crawl endpoints are asynchronous. Jobs may take time to complete, depending on network conditions, the size of the site, and other factors. Because of this, it’s crucial to:
- Check the Status: Always check the
status
field before assuming the job is done. - Handle Errors Gracefully: If
error
is present, log it or take necessary corrective actions. - Retry Strategies: Use a polling mechanism or use webhook callbacks (if available) to know when your jobs have finished without constantly pinging the service.
You can find the full documentation for all of our endpoints in our api reference.