Photo by Ludovic Charlet on Unsplash
Rocket Validator is a distributed web crawler that is able to traverse your large sites, finding thousands of web pages and checking each one with the W3C HTML validator and the Axe Core accessibility validator, on your demand.
With great power comes great responsibility, so as all web crawlers, Rocket Validator always works under the limits you define:
- Maximum number of web pages to include in the report. You can decide how many web pages you want to validate: typically you’ll start with a small number, say 100 web pages, but this can be anything from 1 to 5,000.
- What to check on each web page. You decide if you want to check HTML using the W3C Validator, accessibility using Axe Core, or both on each web page found by the crawler.
- Validation speed. Sets a limit on the number of requests per second we can make on your behalf. This goes from a minimum of 1, to a maximum of 15, depending on your subscription level.
These three factors combined are going to determine how your web site is analyzed by Rocket Validator, and also how long a report is going to take to complete. But first, let’s understand what happens when you submit a site validation report.
Let’s say you’re running a site report, checking both HTML and accessibility on each web page found, and you set a rate limit of 1 request/second.
In this case, when each web page is validated, it’s going to receive up to 3 requests:
- One visit from the crawler to get more links from it (until max pages has been reached).
- One visit from the HTML validator.
- One visit from the accessibility validator.
Setting the report to run at 1 request per second means that each web page will need 3 seconds for its validations to start. That’s 20 web pages per minute, or 1,200 web pages per hour. So for 5,000 pages it will need more than 4 hours for all validations and crawling to start. See below for comparison tables with more scenarios.
Using XML or text sitemaps to skip crawling
Rocket Validator will automatically find the internal web pages on a site from a starting URL, which can be any URL on your site, typically the front page, thanks to our Deep Crawling feature. This makes it easier for you but it also means that we’re going to need to request more web pages, in order to discover the internal web pages.
A good way to speed up site validation significantly is using XML or text sitemaps. If you can provide a list of the exact web pages that you want to validate, Rocket Validator can then skip the whole crawling process, as we don’t need to discover the web pages by visiting all the rest.
Skipping crawling means only 2 requests per page are needed (HTML and accessibility validation), so that’s 2 seconds per page instead of 3, which is 30 pages per minute, 1,800 web pages per hour. So for 5,000 pages it would need less than 3 hours for all validations to start.
Even if you don’t have an exhaustive list of all your web pages, a sitemap always helps the crawler by giving it a list of URLs to include. The rest, if needed, can be found via deep crawling.
Let’s do some math
With these ideas in mind, let’s do some calculations to see how the different factors affect the report running time.
HTML + Accessibility + Deep Crawling, rate limited to 1 request/second
In this first scenario we let the deep crawler discover the internal web pages by visiting each one in search for links, and perform both HTML and accessibility checks on each web page found, at the slowest rate of 1 req/sec. This will need 3 requests per page, so at that validation speed this means 3 seconds per page.
Web pages | Total requests | Time to start all validations |
---|---|---|
100 | 300 | 5 minutes |
250 | 750 | 12 minutes |
1,000 | 3,000 | 50 minutes |
3,000 | 9,000 | 2 hours, 30 minutes |
5,000 | 15,000 | 4 hours, 10 minutes |
HTML + Accessibility + Deep Crawling, rate limited to 3 requests/second
In this second scenario we increase the rate limit to 3 requests/second. We keep using deep crawling and validating both HTML and accessibility on each web page, so it still takes 3 requests per page, but as we’re running at 3x speed, it takes 1/3 of the previous time, that’s 1 second per page.
Web pages | Total requests | Time to start all validations |
---|---|---|
100 | 300 | 1 minute 40 seconds |
250 | 750 | 4 minutes |
1,000 | 3,000 | 16 minutes |
3,000 | 9,000 | 50 minutes |
5,000 | 15,000 | 1 hour, 23 minutes |
HTML + Accessibility, rate limited to 3 requests/second, skipping crawling
In this third scenario we keep the rate limit of 3 reqs/sec, and we validate both HTML and accessibility on each web page, but we’ve disabled deep crawling and provided an XML sitemap with all the URLs, so it only takes 2 requests per page. That means we’re doing 2/3 of the requests from the previous scenario, which dramatically reduces the time needed to complete the validation report.
Web pages | Total requests | Time to start all validations |
---|---|---|
100 | 200 | 1 minute |
250 | 500 | 2 minutes |
1,000 | 2,000 | 11 minutes |
3,000 | 6,000 | 33 minutes |
5,000 | 10,000 | 55 minutes |
Effects of rate limit on similar reports
In this final scenario we play with the effect of setting a higher rate limit on the same kind of report (no deep crawling, HTML and A11Y checks, 2 reqs/page).
Web pages | Rate limit | Time to start all validations |
---|---|---|
1,000 | 1 req/sec | 33 minutes |
1,000 | 3 reqs/sec | 11 minutes |
1,000 | 5 reqs/sec | 6 minutes |
1,000 | 10 reqs/sec | 3 minutes |
1,000 | 15 reqs/sec | 2 minutes |
Some final thoughts
The calculations above give only a very rough estimation of how the settings you set on a validation report affect the time needed to run a site validation report. In practice, there are more factors to take into account.
These calculations take into account when the web page validations will start, but not how long they take to complete. This depends on other factors:
- How fast can your server respond to our requests.
- Complexity of the pages being validated. Simple web pages will validate faster, while web pages with complex DOMs, large size, or many issues, will take longer to process.
- Network errors or DoS protection may affect validation. In these cases our validation crawlers will retry later.
You’ll find the sweet spot depending on your use case. Our recommended workflow when working with a new site is:
- First, run a large report covering both HTML and Accessibility checks, at 3 requests/second. This will give you a bird’s eye view of the main issues on the site.
- After that, work in small increments on sections of the site. Maybe focus first on fixing the HTML issues, and after that move on to accessibility issues.
- For maintenance, set a monthly or weekly validation schedule to have your site validated periodically so you’ll be warned if new issues are found.
For most cases, a validation speed of 3 to 5 requests/second is good enough to get results fast while not overloading the servers.
We hope that helps in optimizing your validation workflow!
Feel free to contact us if you want to share your validation tips!