Commit graph

1 commit

Author SHA1 Message Date
meirk-brd
274fb1c025
add Bright Data toolkit (#542)
# Add Bright Data Web Scraping and Data Extraction Toolkit

## Overview
This PR introduces a comprehensive Bright Data toolkit that provides web
scraping, search, and structured data extraction capabilities through
the Bright Data API.

## Features Added

### Core Tools
1. **`scrape_as_markdown`** - Scrapes any webpage and returns clean
Markdown content
2. **`get_screenshot`** - Captures screenshots of webpages and saves
them locally
3. **`search_engine`** - Advanced search functionality across Google,
Bing, and Yandex with customizable parameters
4. **`web_data_feed`** - Extracts structured data from major platforms
(LinkedIn, Amazon, Instagram, Facebook, X, YouTube, Zillow, Booking.com,
etc.)

### Supporting Infrastructure
- **`BrightDataClient`** 
- Error handling
- URL encoding utilities and request optimization

## Technical Details

### Search Engine Capabilities
- Multi-engine support (Google, Bing, Yandex)
- Advanced parameters: language, country, search type (images, shopping,
news)
- Device targeting (mobile, iOS, Android, iPad)
- Pagination and result count control
- Location-based searches

### Structured Data Sources
Supports 13+ data sources including:
- **E-commerce**: Amazon products and reviews
- **Professional**: LinkedIn profiles and companies, ZoomInfo
- **Social Media**: Instagram, Facebook, X (Twitter) content
- **Real Estate**: Zillow property listings
- **Travel**: Booking.com hotel listings
- **Video**: YouTube videos and metadata


## Testing & Validation
- [x] Deployed and tested on personal account
- [x] Tested via ngrok as well 
- [x] Verified all tool functions work as expected
- [x] Validated against multiple data sources and search engines
- [x] Confirmed error handling and edge cases


## Security & Best Practices
- Requires proper API key and zone configuration via secrets

## Dependencies
- `requests` - HTTP client
- `arcade_tdk` - Arcade toolkit framework
- Standard library modules: `json`, `time`, `typing`, `urllib.parse`

## Notes
- All tools require `BRIGHTDATA_API_KEY` secret
- Search and scraping tools also require `BRIGHTDATA_ZONE` secret
- Follows Arcade AI toolkit patterns and conventions
- Comprehensive docstrings with examples provided

This toolkit significantly expands Arcade AI's web data capabilities,
enabling users to scrape, search, and extract structured data from
across the web through a single, unified interface.

---------
Authored-by: meirk-brd
2025-10-15 16:47:45 -04:00