> ## Documentation Index
> Fetch the complete documentation index at: https://docs.supadata.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Crawl

> Crawl a whole website and get content of all pages on it.

Crawling is a long running task. To get the content of a crawl, you first create a crawl job and then check the results of the job.

## Request

<CodeGroup>
  ```js Node theme={null}
  import {
    Supadata,
    JobId
  } from '@supadata/js';

  // Initialize the client
  const supadata = new Supadata({
    apiKey: 'YOUR_API_KEY',
  });

  // Crawl website
  const crawl: JobId = await supadata.web.crawl({
    url: 'https://supadata.ai',
    limit: 10,
  });

  console.log(`Started crawl job: ${crawl.jobId}`);
  ```

  ```python Python theme={null}
  from supadata import Supadata

  # Initialize the client
  supadata = Supadata(api_key="YOUR_API_KEY")

  # Start a crawl job
  crawl_job = supadata.web.crawl(
      url="https://supadata.ai",
      limit=100  # Optional: limit the number of pages to crawl
  )
  print(f"Started crawl job: {crawl_job.job_id}")
  ```

  ```bash cURL theme={null}
  curl -X POST 'https://api.supadata.ai/v1/web/crawl' \
  -H 'x-api-key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"url": "https://supadata.ai", "limit": 100}'
  ```
</CodeGroup>

<Info>
  The crawler will follow only the child links. For example, if you crawl
  `https://supadata.ai/blog`, the crawler will follow links like
  `https://supadata.ai/blog/article-1` , but not `https://supadata.ai/about`. To
  crawl the whole website, provide the top URL (ie `https://supadata.ai`) as the
  URL to crawl.
</Info>

## Parameters

| Parameter | Type   | Required | Description                                        |
| --------- | ------ | -------- | -------------------------------------------------- |
| url       | string | Yes      | URL of the webpage to scrape                       |
| limit     | number | No       | Maximum number of pages to crawl. Defaults to 100. |

## Response

```json theme={null}
{
  "jobId": "string" // The ID of the crawl job
}
```

## Results

After starting a crawl job, you can check the status of it. If the job is completed, you can get the results of the crawl.
The results can be paginated for large crawls. In such cases, the response will contain a `next` field which you can use to get the next page of results.

### Crawl Job

<CodeGroup>
  ```js Node theme={null}
  import { Supadata } from "@supadata/js";

  const supadata = new Supadata("YOUR_API_KEY");

  // Get crawl job results
  // This automatically handles pagination and returns all pages
  const crawlResult = await supadata.web.getCrawlResults(jobId);

  if (crawlResult.status === "completed") {
    console.log("Crawl job completed successfully!");
    console.log(`Total pages crawled: ${crawlResult.pages.length}`);
    
    // Process each page
    crawlResult.pages.forEach((page, index) => {
      console.log(`Page ${index + 1}: ${page.name}`);
      console.log(`URL: ${page.url}`);
      console.log(`Description: ${page.description}`);
      console.log(`Content preview: ${page.content.substring(0, 100)}...`);
      console.log("---");
    });
  } else if (crawlResult.status === "failed") {
    console.error("Crawl job failed:", crawlResult.error);
  } else {
    console.log("Job status:", crawlResult.status);
  }
  ```

  ```python Python theme={null}
  from supadata import Supadata

  supadata = Supadata("YOUR_API_KEY")

  # Get crawl results
  # This automatically handles pagination and returns all pages
  crawl_result = supadata.web.get_crawl_results(job_id=job_id)

  if crawl_result.status == "completed":
      print("Crawl job completed successfully!")
      print(f"Total pages crawled: {len(crawl_result.pages)}")
      
      # Process each page
      for i, page in enumerate(crawl_result.pages):
          print(f"Page {i + 1}: {page.name}")
          print(f"URL: {page.url}")
          print(f"Description: {page.description}")
          print(f"Content preview: {page.content[:100]}...")
          print("---")
  elif crawl_result.status == "failed":
      print(f"Crawl job failed: {crawl_result.error}")
  else:
      print(f"Job status: {crawl_result.status}")
  ```

  ```bash cURL theme={null}
  curl -X GET 'https://api.supadata.ai/v1/web/crawl/123e4567-e89b-12d3-a456-426614174000' \
    -H 'x-api-key: YOUR_API_KEY'
  ```
</CodeGroup>

### Crawl Results

```json theme={null}
{
  "status": "string", // The status of the crawl job: 'scraping', 'completed', 'failed' or 'cancelled'
  "pages": [
    // If job is completed, contains list of pages that were crawled
    {
      "url": "string", // The URL that was scraped
      "content": "string", // The markdown content extracted from the URL
      "name": "string", // The title of the webpage
      "description": "string" // A description of the webpage
    }
  ],
  "next": "string" // Large crawls will be paginated. Call this endpoint to get the next page of results
}
```

## Error Codes

The API returns HTTP status codes and error codes. See this [page](/errors) for more details.

<Info>
  Respect robots.txt and website terms of service when scraping web content.
</Info>

## Pricing

* 1 crawl request = 1 credit
* 1 crawled page = 1 credit