How does Rocket Validator work?

by Jaime Iniesta

At Rocket Validator, we try to give you the most comprehensive HTML validation reports for your sites, while keeping our tool as simple as possible.

To validate a site, you just need to enter its main URL and click the "Submit" button. In a few seconds you start receiving results, which complete in just a few minutes.

But, what happens in our servers after you click the "Submit" button? Let's review our internal processes:

First, we normalize the site URL, and resolve its redirections to get the final URL and status. For example, you might have typed http://example.com but the final URL after following redirections might be https://www.example.com/ - we keep this final address as the good one, which will be used in the rest of the process.

After your site's final URL is discovered, the scraping process starts. Your main URL is visited by our web crawler, which reads the links found in it, and for all the internal links discovered, adds them to the web page processing queue.

Again, for each internal link we normalize the URL and resolve their redirections to get the final URL. If they're still within the main site, they're added to your sitemap.

As the web pages are discovered and added to your sitemap, we launch background processes in parallel to validate HTML on each of them and store the HTML issues found.

Your web pages will be validated for HTML markup conformance with the W3C standards. To do this, we have our own servers with the official validation software released as open source by the W3C. We maintain our own servers for HTML validation, hosted on the great cloud service Digital Ocean. This allows us to scale as needed adding additional servers, as well as updating the software when a new version is available.

We store the validation results for each page: the number of HTML errors and warnings, as well as the specific errors found and the line where they appear on the source of your web pages.

Each web page found will also be visited by our web crawler, to search more internal links that are within the main URL of the sitemap. They'll get added to the web page processing queue, so they'll be normalized, resolved, validated and, recursively, visited to search more internal links, repeating this process until we can't find more web pages on your site or we reach the defined limit.

Another important part of our tool are exceptions and retries. There are several points of possible temporary problems: there can be timeouts, network connectivity problems, overload... To deal with this, we've got a retrying mechanism that will retry every validation several times in the case of temporary failures. If they keep failing after that, the exception is stored so we can further investigate its cause and improve our tool.

That's the complexity hidden behind a single click in a button!

Ready to check your sites?
Start your trial today.