# Add Bright Data Web Scraping and Data Extraction Toolkit ## Overview This PR introduces a comprehensive Bright Data toolkit that provides web scraping, search, and structured data extraction capabilities through the Bright Data API. ## Features Added ### Core Tools 1. **`scrape_as_markdown`** - Scrapes any webpage and returns clean Markdown content 2. **`get_screenshot`** - Captures screenshots of webpages and saves them locally 3. **`search_engine`** - Advanced search functionality across Google, Bing, and Yandex with customizable parameters 4. **`web_data_feed`** - Extracts structured data from major platforms (LinkedIn, Amazon, Instagram, Facebook, X, YouTube, Zillow, Booking.com, etc.) ### Supporting Infrastructure - **`BrightDataClient`** - Error handling - URL encoding utilities and request optimization ## Technical Details ### Search Engine Capabilities - Multi-engine support (Google, Bing, Yandex) - Advanced parameters: language, country, search type (images, shopping, news) - Device targeting (mobile, iOS, Android, iPad) - Pagination and result count control - Location-based searches ### Structured Data Sources Supports 13+ data sources including: - **E-commerce**: Amazon products and reviews - **Professional**: LinkedIn profiles and companies, ZoomInfo - **Social Media**: Instagram, Facebook, X (Twitter) content - **Real Estate**: Zillow property listings - **Travel**: Booking.com hotel listings - **Video**: YouTube videos and metadata ## Testing & Validation - [x] Deployed and tested on personal account - [x] Tested via ngrok as well - [x] Verified all tool functions work as expected - [x] Validated against multiple data sources and search engines - [x] Confirmed error handling and edge cases ## Security & Best Practices - Requires proper API key and zone configuration via secrets ## Dependencies - `requests` - HTTP client - `arcade_tdk` - Arcade toolkit framework - Standard library modules: `json`, `time`, `typing`, `urllib.parse` ## Notes - All tools require `BRIGHTDATA_API_KEY` secret - Search and scraping tools also require `BRIGHTDATA_ZONE` secret - Follows Arcade AI toolkit patterns and conventions - Comprehensive docstrings with examples provided This toolkit significantly expands Arcade AI's web data capabilities, enabling users to scrape, search, and extract structured data from across the web through a single, unified interface. --------- Authored-by: meirk-brd
55 lines
2 KiB
Makefile
55 lines
2 KiB
Makefile
.PHONY: help
|
|
|
|
help:
|
|
@echo "🛠️ github Commands:\n"
|
|
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'
|
|
|
|
.PHONY: install
|
|
install: ## Install the uv environment and install all packages with dependencies
|
|
@echo "🚀 Creating virtual environment and installing all packages using uv"
|
|
@uv sync --active --all-extras --no-sources
|
|
@if [ -f .pre-commit-config.yaml ]; then uv run --no-sources pre-commit install; fi
|
|
@echo "✅ All packages and dependencies installed via uv"
|
|
|
|
.PHONY: install-local
|
|
install-local: ## Install the uv environment and install all packages with dependencies with local Arcade sources
|
|
@echo "🚀 Creating virtual environment and installing all packages using uv"
|
|
@uv sync --active --all-extras
|
|
@if [ -f .pre-commit-config.yaml ]; then uv run pre-commit install; fi
|
|
@echo "✅ All packages and dependencies installed via uv"
|
|
|
|
.PHONY: build
|
|
build: clean-build ## Build wheel file using poetry
|
|
@echo "🚀 Creating wheel file"
|
|
uv build
|
|
|
|
.PHONY: clean-build
|
|
clean-build: ## clean build artifacts
|
|
@echo "🗑️ Cleaning dist directory"
|
|
rm -rf dist
|
|
|
|
.PHONY: test
|
|
test: ## Test the code with pytest
|
|
@echo "🚀 Testing code: Running pytest"
|
|
@uv run --no-sources pytest -W ignore -v --cov --cov-config=pyproject.toml --cov-report=xml
|
|
|
|
.PHONY: coverage
|
|
coverage: ## Generate coverage report
|
|
@echo "coverage report"
|
|
@uv run --no-sources coverage report
|
|
@echo "Generating coverage report"
|
|
@uv run --no-sources coverage html
|
|
|
|
.PHONY: bump-version
|
|
bump-version: ## Bump the version in the pyproject.toml file by a patch version
|
|
@echo "🚀 Bumping version in pyproject.toml"
|
|
uv version --no-sources --bump patch
|
|
|
|
.PHONY: check
|
|
check: ## Run code quality tools.
|
|
@if [ -f .pre-commit-config.yaml ]; then\
|
|
echo "🚀 Linting code: Running pre-commit";\
|
|
uv run --no-sources pre-commit run -a;\
|
|
fi
|
|
@echo "🚀 Static type checking: Running mypy"
|
|
@uv run --no-sources mypy --config-file=pyproject.toml
|