Basic scraping with Firecrawl
To scrape a single page and get clean markdown content, you can use the/scrape endpoint.
Scraping PDFs
Firecrawl supports PDFs. Use theparsers option (e.g., parsers: ["pdf"]) when you want to ensure PDF parsing.
Scrape options
When using the/scrape endpoint, you can customize scraping with the options below.
Formats (formats)
- Type:
array - Strings:
["markdown", "links", "html", "rawHtml", "summary", "images"] - Object formats:
- JSON:
{ type: "json", prompt, schema } - Screenshot:
{ type: "screenshot", fullPage?, quality?, viewport? } - Change tracking:
{ type: "changeTracking", modes?, prompt?, schema?, tag? }(requiresmarkdown)
- JSON:
- Default:
["markdown"]
Full page content vs main content (onlyMainContent)
- Type:
boolean - Description: By default the scraper returns only the main content. Set to
falseto return full page content. - Default:
true
Include tags (includeTags)
- Type:
array - Description: HTML tags/classes/ids to include in the scrape.
Exclude tags (excludeTags)
- Type:
array - Description: HTML tags/classes/ids to exclude from the scrape.
Wait for page readiness (waitFor)
- Type:
integer - Description: Milliseconds of extra wait time before scraping (use sparingly). This waiting time is in addition to Firecrawl’s smart wait feature.
- Default:
0
Freshness and cache (maxAge)
- Type:
integer(milliseconds) - Description: If a cached version of the page is newer than
maxAge, Firecrawl returns it instantly; otherwise it scrapes fresh and updates the cache. Set0to always fetch fresh. - Default:
172800000(2 days)
Request timeout (timeout)
- Type:
integer - Description: Max duration in milliseconds before aborting.
- Default:
30000(30 seconds)
PDF parsing (parsers)
- Type:
array - Description: Control parsing behavior. To parse PDFs, set
parsers: ["pdf"]. - Cost: PDF parsing costs 1 credit per PDF page. To skip PDF parsing and receive the file as base64 (1 credit flat), set
parsers: []. - Limit pages: To limit PDF parsing to a specific number of pages, use
parsers: [{"type": "pdf", "maxPages": 10}].
Actions (actions)
When using the /scrape endpoint, Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.
- Type:
array - Description: Sequence of browser steps to run before scraping.
- Supported actions:
wait- Wait for page to load:{ type: "wait", milliseconds: number }or{ type: "wait", selector: string }click- Click an element:{ type: "click", selector: string, all?: boolean }write- Type text into a field:{ type: "write", text: string }(element must be focused first with click)press- Press a keyboard key:{ type: "press", key: string }scroll- Scroll the page:{ type: "scroll", direction: "up" | "down", selector?: string }screenshot- Capture screenshot:{ type: "screenshot", fullPage?: boolean, quality?: number, viewport?: { width: number, height: number } }scrape- Scrape sub-element:{ type: "scrape" }executeJavascript- Run JS code:{ type: "executeJavascript", script: string }pdf- Generate PDF:{ type: "pdf", format?: string, landscape?: boolean, scale?: number }
Action Execution Notes
- Write action: You must first focus the element using a
clickaction before usingwrite. The text is typed character by character to simulate keyboard input. - Scroll selector: If you want to scroll a specific element instead of the whole page, provide the
selectorparameter toscroll. - Wait with selector: You can wait for a specific element to be visible using
waitwith aselectorparameter, or wait for a fixed duration usingmilliseconds. - Actions are sequential: Actions are executed in order, and Firecrawl waits for page interactions to complete before moving to the next action.
Advanced Action Examples
Taking a screenshot:cURL
cURL
cURL
Example Usage
cURL
- Return the full page content as markdown.
- Include the markdown, raw HTML, HTML, links, and a screenshot in the response.
- Include only the HTML tags
<h1>,<p>,<a>, and elements with the class.main-content, while excluding any elements with the IDs#adand#footer. - Wait for 1000 milliseconds (1 second) before scraping to allow the page to load.
- Set the maximum duration of the scrape request to 15000 milliseconds (15 seconds).
- Parse PDFs explicitly via
parsers: ["pdf"].
JSON extraction via formats
Use the JSON format object informats to extract structured data in one pass:
Extract endpoint
Use the dedicated extract job API when you want asynchronous extraction with status polling.Crawling multiple pages
To crawl multiple pages, use the/v2/crawl endpoint.
cURL
Check Crawl Job
Used to check the status of a crawl job and get its result.cURL
Pagination/Next URL
If the content is larger than 10MB or if the crawl job is still running, the response may include anext parameter, a URL to the next page of results.
Crawl prompt and params preview
You can provide a natural-languageprompt to let Firecrawl derive crawl settings. Preview them first:
cURL
Crawler options
When using the/v2/crawl endpoint, you can customize the crawling behavior with:
includePaths
- Type:
array - Description: Regex patterns to include.
- Example:
["^/blog/.*$", "^/docs/.*$"]
excludePaths
- Type:
array - Description: Regex patterns to exclude.
- Example:
["^/admin/.*$", "^/private/.*$"]
maxDiscoveryDepth
- Type:
integer - Description: Max discovery depth for finding new URLs.
limit
- Type:
integer - Description: Max number of pages to crawl.
- Default:
10000
crawlEntireDomain
- Type:
boolean - Description: Explore across siblings/parents to cover the entire domain.
- Default:
false
allowExternalLinks
- Type:
boolean - Description: Follow links to external domains.
- Default:
false
allowSubdomains
- Type:
boolean - Description: Follow subdomains of the main domain.
- Default:
false
delay
- Type:
number - Description: Delay in seconds between scrapes.
- Default:
undefined
scrapeOptions
- Type:
object - Description: Options for the scraper (see Formats above).
- Example:
{ "formats": ["markdown", "links", {"type": "screenshot", "fullPage": true}], "includeTags": ["h1", "p", "a", ".main-content"], "excludeTags": ["#ad", "#footer"], "onlyMainContent": false, "waitFor": 1000, "timeout": 15000} - Defaults:
formats: ["markdown"], caching enabled by default (maxAge ~ 2 days)
Example Usage
cURL
Mapping website links
The/v2/map endpoint identifies URLs related to a given website.
Usage
cURL
Map Options
search
- Type:
string - Description: Filter links containing text.
limit
- Type:
integer - Description: Maximum number of links to return.
- Default:
100
sitemap
- Type:
"only" | "include" | "skip" - Description: Control sitemap usage during mapping.
- Default:
"include"
includeSubdomains
- Type:
boolean - Description: Include subdomains of the website.
- Default:
true
Whitelisting Firecrawl
Allowing Firecrawl to scrape your website
If you want Firecrawl to scrape your own website and need to whitelist the crawler:- User Agent: Firecrawl identifies itself with the user agent
FirecrawlAgent. Allow this user agent string in your firewall or security rules. - IP Addresses: Firecrawl does not use a fixed set of IP addresses for outbound scraping requests.
Allowing your application to call the Firecrawl API
If your firewall blocks outbound requests from your application to external services, whitelist35.245.250.27 to allow calls to Firecrawl’s API.
