Google App Engine with server-side Google Analytics

why Google App Engine?

I was recently lucky enough to attend Google Next 17 and got to learn about Google Cloud Platform from some ridiculously smart Google engineers. This included some amazing people like Kelsey Hightower, Jessie Frazelle, Alex Mohr, Niels Provos, Terrence Ryan,… These engineers are phenomenal in their areas of expertise and it was awesome to hear their talks. This isn’t even all inclusive, but wait there's more - https://www.youtube.com/playlist?list=PLIivdWyY5sqI8RuUibiH8sMb1ExIw0lAR

In addition to the awesomeness of the conference and Google engineers, there were quite a few pieces of software and technology that Google introduced. Some of it was an expansion of existing services, one of which, was Google App Engine. I learned a few really cool things about Google App Engine (and the entire platform) that made me want to check it out…

So yeah, some good points to motivate me to try out GCP. I DEFINITELY wanted to try it out. And that very first thing to try? How about a full port of my blog to a Google App Engine?

a quick look at Google App Engine Standard vs Flexible

Google App Engine Standard is a “bring your own code service”. Bring the code, Google takes care of the rest.

There’s the app.yaml file, which describes the application, and the code, such as main.py or main.go for instance. Supported languages include:

  • Java 7
  • Python 2.7
  • PHP 5.5
  • Go 1.6

Neato. One of the super cool features is that GAE standard can “scale to 0”, meaning that when no one is using the application, you’re not paying for it.

And then there’s Google App Engine Flexible. It supports more languages, more versions of languages, and the best feature IMHO, the ability to “bring your own Dockerfile”. Slick.

This demo will be focused exclusively on making GAE standard work.

the new blog site requirements

Pretty straightforward:

  • Serve a static blog.
  • A simple way to push new content.
  • Implement some sort of server-side analytics.

So why collect server-side analytics? One thing that made me curious was comparing the analytics between server-side and client-side and seeing if I could capture some data on just how prevalent Google Analytics ad blocking was. I wanted to see if there was a significant difference. There are problems with this approach, such as cached hits, but it added an interesting challenge with room for improvement. Onward.

(Note, the following documentation assumes that Hugo has already rendered the markdown posts into the ./public folder, which in my case, is a symlink to the Hugo directory)

using GAE with Golang

First, when hosting static sites that do not have any dynamically generated data (such as directories) and the static files and paths are known, GAE makes it trivial to host static content. But this only works when the directory depth is known… Unless you have some healthy regex. Here’s a great example:

# https://github.com/yosukesuzuki/hugo-gae/blob/master/app.yaml
- url: /(.*)/$
  static_files: public/\1/index.html
  upload: .*\.html$

That’s pretty cool right there and useful for a tool like Hugo.

However, Golang has a super simple way to deal with this, without the need for regex. Dead simple code:

#app.yaml
runtime: go
api_version: go1

handlers:
 - url: /.*
   script: _go_app
#main.go
package main

import (
	"net/http"
)

func init() {
	http.Handle("/", http.FileServer(http.Dir("public")))
}

The final step was to create some server-side Google Analytics. If I hadn’t found a trivial Python example, I would have dug around more…

using GAE with Python and server-side Google Analytics

So it’s fairly straightforward to host static sites. But in this case, I wanted to also run some dynamic tasks for each page hit. Specifically, run a Google Analytics page view post. This is where the GAE Python attempt comes into play. Using Flask with Hugo, this GAE code serves the static files and fires off a server-side Google Analytics page view. This way, I can track analytics without the need for client-side Javascript.

Relevent code can be found here.

  • The first important setting is setting the cache:
app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 0

This is an attempt to cause a full page reload anytime a request is made. This also means pages can be updated and refreshes will be up-to-date. I still need to test and verify this is a legit way to go to solve this problem.

# [START track_page]
# https://developers.google.com/analytics/devguides/collection/protocol/v1/devguide#page
def track_page(page, title=''):
    data = {
        'v': '1',  # API Version.
        'tid': GA_TRACKING_ID,  # Tracking ID / Property ID.
        # Anonymous Client Identifier. Ideally, this should be a UUID that
        # is associated with particular user, device, or browser instance.
        'cid': '555',
        't': 'pageview',  # Pageview hit type.
        'dh': SITE_HOSTNAME,  # Document hostname
        'dp': page,  # Page.
        'dt': title,  # Title.
    }
    response = requests.post(
        'http://www.google-analytics.com/collect', data=data)
    # If the request fails, this will raise a RequestException. Depending
    # on your application's needs, this may be a non-error and can be caught
    # by the caller.
    response.raise_for_status()
# [END track_page]

The above fires off a Google Analytics pageview, thus providing a server-side Google Analytics entry. Could probably make this SSL…

  • Finally, when a static page is loaded and a page is hit, this code runs:
@app.route('/')
def static_index():
    track_page(page='/index.html')
    return app.send_static_file('index.html')
# routing for all other paths
@app.route('/<path:path>')
def static_proxy(path):
    track_page(page=path)
    return app.send_static_file(path + '/index.html')

The above renders the index.html page, which has been generated by Hugo.

The first route is the main landing page, the second route is for all other posts. I need to test if these can be condensed.

More information and examples can be found on Github.

summary

So there it is. The new blog site, hosted on Google App Engine, with a little twist on a static website deployment. The process for posting is simply to write up a new post using Hugo, test and evaluate it, then deploy it using the gcloud CLI:

gcloud app deploy --quiet

I hope to continue to experiment with Google App Engine. I really like Google’s cloud platform at first glance. A lot of potential and the free tier is really nice for just experimenting and playing around. Hope you enjoyed this.

-b

links