# Add Bright Data Web Scraping and Data Extraction Toolkit ## Overview This PR introduces a comprehensive Bright Data toolkit that provides web scraping, search, and structured data extraction capabilities through the Bright Data API. ## Features Added ### Core Tools 1. **`scrape_as_markdown`** - Scrapes any webpage and returns clean Markdown content 2. **`get_screenshot`** - Captures screenshots of webpages and saves them locally 3. **`search_engine`** - Advanced search functionality across Google, Bing, and Yandex with customizable parameters 4. **`web_data_feed`** - Extracts structured data from major platforms (LinkedIn, Amazon, Instagram, Facebook, X, YouTube, Zillow, Booking.com, etc.) ### Supporting Infrastructure - **`BrightDataClient`** - Error handling - URL encoding utilities and request optimization ## Technical Details ### Search Engine Capabilities - Multi-engine support (Google, Bing, Yandex) - Advanced parameters: language, country, search type (images, shopping, news) - Device targeting (mobile, iOS, Android, iPad) - Pagination and result count control - Location-based searches ### Structured Data Sources Supports 13+ data sources including: - **E-commerce**: Amazon products and reviews - **Professional**: LinkedIn profiles and companies, ZoomInfo - **Social Media**: Instagram, Facebook, X (Twitter) content - **Real Estate**: Zillow property listings - **Travel**: Booking.com hotel listings - **Video**: YouTube videos and metadata ## Testing & Validation - [x] Deployed and tested on personal account - [x] Tested via ngrok as well - [x] Verified all tool functions work as expected - [x] Validated against multiple data sources and search engines - [x] Confirmed error handling and edge cases ## Security & Best Practices - Requires proper API key and zone configuration via secrets ## Dependencies - `requests` - HTTP client - `arcade_tdk` - Arcade toolkit framework - Standard library modules: `json`, `time`, `typing`, `urllib.parse` ## Notes - All tools require `BRIGHTDATA_API_KEY` secret - Search and scraping tools also require `BRIGHTDATA_ZONE` secret - Follows Arcade AI toolkit patterns and conventions - Comprehensive docstrings with examples provided This toolkit significantly expands Arcade AI's web data capabilities, enabling users to scrape, search, and extract structured data from across the web through a single, unified interface. --------- Authored-by: meirk-brd
0 lines
Python
0 lines
Python