# Add Bright Data Web Scraping and Data Extraction Toolkit ## Overview This PR introduces a comprehensive Bright Data toolkit that provides web scraping, search, and structured data extraction capabilities through the Bright Data API. ## Features Added ### Core Tools 1. **`scrape_as_markdown`** - Scrapes any webpage and returns clean Markdown content 2. **`get_screenshot`** - Captures screenshots of webpages and saves them locally 3. **`search_engine`** - Advanced search functionality across Google, Bing, and Yandex with customizable parameters 4. **`web_data_feed`** - Extracts structured data from major platforms (LinkedIn, Amazon, Instagram, Facebook, X, YouTube, Zillow, Booking.com, etc.) ### Supporting Infrastructure - **`BrightDataClient`** - Error handling - URL encoding utilities and request optimization ## Technical Details ### Search Engine Capabilities - Multi-engine support (Google, Bing, Yandex) - Advanced parameters: language, country, search type (images, shopping, news) - Device targeting (mobile, iOS, Android, iPad) - Pagination and result count control - Location-based searches ### Structured Data Sources Supports 13+ data sources including: - **E-commerce**: Amazon products and reviews - **Professional**: LinkedIn profiles and companies, ZoomInfo - **Social Media**: Instagram, Facebook, X (Twitter) content - **Real Estate**: Zillow property listings - **Travel**: Booking.com hotel listings - **Video**: YouTube videos and metadata ## Testing & Validation - [x] Deployed and tested on personal account - [x] Tested via ngrok as well - [x] Verified all tool functions work as expected - [x] Validated against multiple data sources and search engines - [x] Confirmed error handling and edge cases ## Security & Best Practices - Requires proper API key and zone configuration via secrets ## Dependencies - `requests` - HTTP client - `arcade_tdk` - Arcade toolkit framework - Standard library modules: `json`, `time`, `typing`, `urllib.parse` ## Notes - All tools require `BRIGHTDATA_API_KEY` secret - Search and scraping tools also require `BRIGHTDATA_ZONE` secret - Follows Arcade AI toolkit patterns and conventions - Comprehensive docstrings with examples provided This toolkit significantly expands Arcade AI's web data capabilities, enabling users to scrape, search, and extract structured data from across the web through a single, unified interface. --------- Authored-by: meirk-brd |
||
|---|---|---|
| .github | ||
| .vscode | ||
| contrib | ||
| docker | ||
| examples | ||
| libs | ||
| schemas/preview | ||
| toolkits | ||
| .editorconfig | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| .prettierignore | ||
| .prettierrc.toml | ||
| .ruff.toml | ||
| CONTRIBUTING.md | ||
| cspell.config.yaml | ||
| LICENSE | ||
| Makefile | ||
| pyproject.toml | ||
| README.md | ||
| uv_setup.sh | ||
| worker.toml | ||
Documentation • Tools • Quickstart • Contact Us
Arcade MCP Server Framework
To learn more about Arcade.dev, check out our documentation.
To learn more about the Arcade MCP Server Framework, check out our Arcade MCP documentation
Pst. hey, you, give us a star if you like it!
Quick Start: Create a New Server
The fastest way to get started is with the arcade new command, which creates a complete MCP server project:
# Install the CLI
uv pip install arcade-mcp
# Create a new server project
arcade new my_server
# Navigate to the project
cd my_server
This generates a complete project with:
-
server.py - Main server file with MCPApp and example tools
-
pyproject.toml - Dependencies and project configuration
-
.env.example - Example
.envfile containing a secret required by one of the generated tools inserver.py
The generated server.py includes proper command-line argument handling:
#!/usr/bin/env python3
import sys
from typing import Annotated
from arcade_mcp_server import MCPApp
app = MCPApp(name="my_server", version="1.0.0")
@app.tool
def greet(name: Annotated[str, "Name to greet"]) -> str:
"""Greet someone by name."""
return f"Hello, {name}!"
if __name__ == "__main__":
transport = sys.argv[1] if len(sys.argv) > 1 else "http"
app.run(transport=transport, host="127.0.0.1", port=8000)
This approach gives you:
-
Complete Project Setup - Everything you need in one command
-
Best Practices - Proper dependency management with pyproject.toml
-
Example Code - Learn from working examples of common patterns
-
Production Ready - Structured for growth and deployment
Running Your Server
Run your server directly with Python:
# Run with HTTP transport (default)
uv run server.py
# Run with stdio transport (for Claude Desktop)
uv run server.py stdio
# Or use python directly
python server.py http
python server.py stdio
Your server will start and listen for connections. With HTTP transport, you can access the API docs at http://127.0.0.1:8000/docs.
Configure MCP Clients
Once your server is running, connect it to your favorite AI assistant:
# Configure Claude Desktop (configures for stdio)
arcade configure claude --from-local
# Configure Cursor (configures for http streamable)
arcade configure cursor --from-local
# Configure VS Code (configures for http streamable)
arcade configure vscode --from-local
Client Libraries
-
ArcadeAI/arcade-py: The Python client for interacting with Arcade.
-
ArcadeAI/arcade-js: The JavaScript client for interacting with Arcade.
-
ArcadeAI/arcade-go: The Go client for interacting with Arcade.
Support and Community
- Discord: Join our Discord community for real-time support and discussions.
- GitHub: Contribute or report issues on the Arcade GitHub repository.
- Documentation: Find in-depth guides and API references at Arcade Documentation.