Crawling is a long running task. To get the content of a crawl, you first create a crawl job and then check the results of the job.Documentation Index
Fetch the complete documentation index at: https://docs.supadata.ai/llms.txt
Use this file to discover all available pages before exploring further.
Request
The crawler will follow only the child links. For example, if you crawl
https://supadata.ai/blog, the crawler will follow links like
https://supadata.ai/blog/article-1 , but not https://supadata.ai/about. To
crawl the whole website, provide the top URL (ie https://supadata.ai) as the
URL to crawl.Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | URL of the webpage to scrape |
| limit | number | No | Maximum number of pages to crawl. Defaults to 100. |
Response
Results
After starting a crawl job, you can check the status of it. If the job is completed, you can get the results of the crawl. The results can be paginated for large crawls. In such cases, the response will contain anext field which you can use to get the next page of results.
Crawl Job
Crawl Results
Error Codes
The API returns HTTP status codes and error codes. See this page for more details.Respect robots.txt and website terms of service when scraping web content.
Pricing
- 1 crawl request = 1 credit
- 1 crawled page = 1 credit