# PR Description This PR adds 6 new tools inside the new `arcade_web` toolkit. None of these tools require auth. They do, however, require the `FIRECRAWL_API_KEY` API Key to be set. The new tools implement the [Firecrawl](https://www.firecrawl.dev/) APIs `/scrape (POST)`, `/crawl (POST)`, `/crawl/{id} (GET)`, `/crawl/{id} (DELETE)`, and `/map (POST)`. The six tools are: * `Web.ScrapeUrl`: - In the future I would like this tool to support actions (clicking, scrolling, screenshotting, etc) and extract (specify what you want to scrape) parameters. Firecrawl supports both of these parameters. * `Web.CrawlWebsite`: - If `async_crawl` is true, then the tool just returns the id of the crawl job, which you can retrieve later with the `Web.GetCrawlData` tool. If `async_crawl` is false, then the entire contents of the crawl are returned. * `Web.GetCrawlStatus` - Works for in progress or recently finished crawl jobs (Firecrawl's limitation) * `Web.GetCrawlData` - Works for in progress or recently finished crawl jobs (Firecrawl's limitation) * `Web.CancelCrawl` - You can cancel an in progress async crawl job * `Web.MapWebsite` - This endpoint is in alpha, but it can give you all of the links of an entire website, or optionally, you can specify in natural language what type of links you want to map by using the `search` parameter. For example "only map webpages that are about AI"
11 lines
264 B
Python
11 lines
264 B
Python
from enum import Enum
|
|
|
|
|
|
# Models and enums for firecrawl web tools
|
|
class Formats(str, Enum):
|
|
MARKDOWN = "markdown"
|
|
HTML = "html"
|
|
RAW_HTML = "rawHtml"
|
|
LINKS = "links"
|
|
SCREENSHOT = "screenshot"
|
|
SCREENSHOT_AT_FULL_PAGE = "screenshot@fullPage"
|