Tool + How to do automated Google Lighthouse bulk tests


Doing SEO you want to provide a website which is fast, accessible, using best practise + modern web technology and of course fix SEO basics. A useful tool to test this is as you know Google Lighthouse.

The Chrome extension is nice, but at some point you want more and automated tests.

What would you like to check in thousands of lighthouse reports? Let me know!!!

So we built a little tool and process to check lighthouse scores for some other publishers (cause we work in publishing) and more urls of us more often.

The second thing we wanted to solve is comparability of reports. You probably want to know why you are slower now compared to an earlier test.

The Tool

This is a downgraded and modified version of the Propulsion Academy project of Valerie Glutz and me (Tobias Willmann)

It’s not a production-ready tool, but a test and in addition downgraded. So there may be bugs.

The interesting aspect of this project is not the tool, but the data — so it’s likely that this tool won’t be improved (at least until we haven’t collected more data).

The selected sites are random. We run tests for some more + section pages + article pages

Main functionalities of the frontend are the graphs

and the option to select two dots in the graph

to see a comparison of two HTML reports


The whole setup contains four Javascripts.

  1. Doing the Google Lighthouse test using and pushing the HMTL and JSON file to two AWS S3 buckets
  2. A AWS Lambda getting all new JSON files and processing. We had different versions of this extracting information from the JSON and pushing the extraction to DynamoDB
  3. A Lambda getting the information from DynamoDB + offering an API Gateway to make it accessible for a React Application
  4. A React Application to visualize the data and interact with the reports

First Insights

  1. The HTML reports are really nice as an entry point compared to the JSON if you are looking as a human for insights. If you learn in the HMTL files that you want to check specific stuff in bulk go with the JSON.
  2. It’s not so easy to tell instantly what you want to extract from the JSON. Some scores are super stable and thus not so interesting.
  3. If you check the five scores for publishes the SEO-, Best Practice-, PWA- and Accessibility-Scores are pretty stable and just change if real changes on the site have been implemented. Performance is for some sites shaking a lot. We used a variation score here and detected that for the homepages Performance variance is usually related to ads or tracking. Variance on article pages or sections is much smaller (Probably related to big ads which are just shown on home pages). If you check the data the learning here is that some ads really slow down your site to “unusable” (at least for new users on a slow connection).
  4. In general we haven’t found publishers with top scores (90+) in all five categories. Some are really bad with all five scores. We will collect some more data and later provide a full report here … but it looks like the news industry isn’t performing well here.
  5. To have multiple reports over time and with different URLs helps find the big problems, which afffect many users. Sometimes single big gifs for example cause bad scores, but with multiple reports you can detect outliers fast and easy.

Other Ideas?

Currently as mentioned we plan to collect more data and than look for insights.

Ideas welcome…

Other options to do this

There are other setup options to do similar things. E.g. like Simo Ahava’s here: offering Lighthouse Reports too now:

If your just interested in PageSpeed Insights:

More stuff

That’s a pretty good guide from Jamie Alberico about Lighthouse Performance Metrics in general:

© 2020 Tobias Willmann