tobiaswillmann.de

Google Search Console API tests

2017-06-27

Google Search Console is great, but without the API you get just 1000 entries from Search Analytics export. The API offers you 5000 rows per query.

In most cases you can get much more data using some “hacks”. This is about the learnings how to get more data from the Google Search Console API.

Please let me know if you recommend some improvements.

How do you use GSC API?

You can start here if you want to learn more about GSC API https://developers.google.com/webmaster-tools/search-console-api-original/


Limits of Search Analytics

Check: https://developers.google.com/webmaster-tools/search-console-api-original/v3/limits

The Search Analytics resource enforces the following limits:

Per-site limit (calls querying the same site):

5 QPS

200 QPM

Per-user limit (calls made by the same user):

5 QPS

200 QPM

Per-project limit (calls made using the same Developer Console key):

100,000,000 QPD

I added a 1 second sleep after each query… which should be fine in case of 200 QPM.

time.sleep(1)

Use filters for every letter in the alphabet

Looping over an array with the alphabet

alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
for i in range(len(alphabet)):

Later in the request query you get the values out of the array

requestQuery = {
'startDate': daysago.strftime('%Y-%m-%d'),
'endDate': daysago.strftime('%Y-%m-%d'),
'dimensions': ['page','query'],
"dimensionFilterGroups": [
{
"groupType": "and",
"filters": [
{
"dimension": "query",
"operator": "contains",
"expression": alphabet[i]
}
]
}
],
'rowLimit': 5000
}

If you want to go crazy with this do [“aa”, “ab”, “ac”,…] array. Thats useful if you run very large sites.

In my case the [“a”, “b”, “c”,…] array increased the number of rows from 5000 to 21564 unique rows (test with just the root folder’s GSC profile https://www.blick.ch).


Pagination

Pagination can offer you the second or third page of a query. You start to query like

'rowLimit': 5000,  
'startRow': 0

If the response is 5000 rows there is probably a next page.

'rowLimit': 5000,  
'startRow': 5000

I just call the main function again with a new startRow and the same letter

if len(jsonObj["rows"]) == 5000:
startRow = startRow+5000
main(sys.argv, startRow, letter)

Multiple GSC properties

We run multiple GSC properties for all our main folders and subdomains

Google Search Console

For this test i used four folders. These folders are probably the most important.

https://www.blick.ch/
https://www.blick.ch/news/
https://www.blick.ch/sport/
https://www.blick.ch/people-tv/

The API offered 932701 rows / 82499 unique rows

Keep in mind after setting up a new property it takes approx. 2 days until the property has Search Analytics data.


Duplicates

If you query the GSC API with e.g. alphabet query you will receive duplicates. In our case if you query all words with “b” you get our brand “blick”. If you query for “l” you will get “blick” again.

Duplicate cleanup is important. For example if i query “root folder + every letter in the alphabet” i receive 111787 rows, but just 21564 unique rows.

I use a hash of each row as id, and upsert.

json_string_to_hash is a json with all “columns” information.

data_md5 = hashlib.md5(json.dumps(json_string_to_hash, sort_keys=True)).hexdigest()
json_string = {"_id" : data_md5, "date" : daysago.strftime('%Y-%m-%d'), "impressions" : r["impressions"], "ctr" : r["ctr"], "clicks" : r["clicks"], "position" : r["position"], "url" : r["keys"][0], "keyword" : r["keys"][1]}
jsondata = json.dumps(json_string)
coll.update({"_id":data_md5}, json.loads(jsondata), True)

Test results:

Data of one day at https://www.blick.ch

  • Export: 1000 rows
  • API + root folder: 5000 rows (often less than 5000?)
  • API + root folder + alphabet filter: 21564 unique rows
  • API + root folder + alphabet filter + pagination: 24693 unique rows
  • API + multiple folders / GSC properties + alphabet filter + pagination: +82499 rows (this ist just four GSC properties)

Queries per day

Developer console https://console.developers.google.com can show you how many request were fired each day.


The code

I started with this https://github.com/google/google-api-python-client/tree/master/samples/searchconsole and extended it with:

  • basic mongoDB database handling with pymongo
  • date handling
  • alphabet filter
  • pagination
  • a simple hash generator for duplicates
  • custom queries of GSC API

To start the python script i use cronjobs every night.

40 1 * * * python /filer/wio/gsc/gsc-savetodb2.py https://www.blick.ch
50 1 * * * python /filer/wio/gsc/gsc-savetodb2.py https://www.blick.ch/news
0 2 * * * python /filer/wio/gsc/gsc-savetodb2.py https://www.blick.ch/sport

I’m still improving the code… but feel free to send me a message to get the latest version.

© 2020 Tobias Willmann