Gccgo in 2019: Faster, but still yielding (much) slower code than the standard compiler

The past

Back in 2013, when Go 1.2 was on the cusp of release, Dave Cheney benchmarked gccgo against the standard Go compiler, gc. The results were rather disappointing. Code produced by gccgo was much slower than code produced by gc for all but the most CPU bound of workloads.

It’s been five years, and much has changed. It’s high time for these benchmarks to be updated for the Go 1.11 era. Gccgo has learned some new tricks, like escape analysis, but gc has seen a continual stream of improvements, from both the Go team and the community. (Gccgo sees activity from just a handful of folks.)

The present, in summary

To enable comparison with Dave’s results, I ran the same “go1” benchmark suite¹ under Go 1.11. The results are below.

The graphed quantity is the percent change in each benchmark's time per operation between when the benchmark is compiled with gc 1.11.1 and when it is compiled with gccgo 9.0.0 20190112 (experimental). A positive percentage change (red) indicates that compiling with gccgo yielded slower code, while a negative percentage change (green) indicates that gccgo yielded a speedup.

There’s a lot more red on the board this time around. Gccgo used to eke out a win on a half dozen of the CPU-intensive benchmarks, but in the last five years gc has closed the gap. The only remaining benchmark where gccgo has the upper hand is Fannkuch11, and it’s a very small margin at that.

What’s happened is that, while both compilers are improving, gc is improving faster than gccgo, and so gccgo looks worse in comparison. For proof, we can compare gccgo against itself. Here’s today’s gccgo compared to gccgo 4.9:

Analogous to figure 1, except that the comparison is across benchmarks compiled with gccgo 4.9.0 and gccgo 9.0.0 20190114 (experimental).

This figure paints a much rosier picture. Gccgo is, indeed, improving. Today’s version of gccgo results in double-digit improvements over 2013’s gccgo for most go1 benchmarks. That’s actually quite the achievement, given how resource-constrained gccgo development appears to be.

The present, in detail

But what’s up with those six benchmarks that have gotten slower? It’s not immediately clear whether gccgo is to blame. Since we’re comparing benchmark results across Go versions, we’re not just measuring changes to the compiler; we’re also measuring changes to the runtime and standard library. The performance degradation in HTTPClientServer, for example, could just as easily be the result of a change to the net/http package as a change to the gccgo compiler internals.

In fact, it’s nearly impossible to isolate just the compiler improvements, as each Go compiler is tightly coupled to its contemporaneous runtime and standard library. But we can extract at least a fuller picture by comparing the evolution of gccgo performance to the evolution of gc performance. I want to take a look at two representative examples.

Each graph compares how performance on one benchmark has evolved in the last five years. The blue trend line graphs gccgo performance, while the orange trend line graphs gc performance. A positive slope indicate that the benchmark has gotten slower over time with that compiler, while a negative slope indicates that the benchmark has gotten faster over time with that compiler. An intersection of trend lines indicates that the relative performance of the compilers has reversed.

It turns out that the first benchmark, HTTPClientServer, has gotten slower with gc, too. As unfortunate as this is, it’s reassuring evidence that there is nothing particularly wrong with gccgo. I suspect that a bug, or a series of bugs, was discovered in the runtime or net/http whose solution(s) forced a performance regression.² Performance regressions in the standard Go toolchain do not go unnoticed for long, so it is likely that this regression was intentional.

The results for the BinaryTree17 benchmark, on the other hand, are downright strange. Gc managed a nearly 25% speedup on this workload, while gccgo yielded a 50% slowdown over the same time frame. There is clearly something in gccgo to investigate here, especially considering that the benchmark makes use of no standard library features (proof). Since the benchmark does depend heavily on the garbage collector and memory allocator, I suspect that something’s gone amiss in gccgo’s runtime.

I’ve begun gathering a list of the known performance bottlenecks as a starting point for investigation of these performance problems. If you know of additional bottlenecks, or know of code that behaves particularly pathologically with gccgo, please chime in!

The upshot

In 2019, performance is still a sore spot for gccgo. Gc yields faster code than gccgo on nearly every workload. Unless you’re compiling for an esoteric platform that gc doesn’t support, or you need faster interop between Go and C than cgo provides, there is little reason (yet!) to choose gccgo over the standard Go toolchain.

Raw benchmarking data for all the figures in this post, as well as reproduction instructions, are available as a GitHub Gist.

I’m not entirely clear on what the Go team uses the go1 benchmark suite for, but the commit that introduced the suite (6e88755) claims the “intent is to have mostly end-to-end benchmarks timing real world operations,” which is exactly what we’re after. ↩
For a good example of how bug fixes can force performance regressions, take a look at golang/go#18964. Note that this particular issue could only be responsible for about 1% of the total regression observed in the HTTPClientServer benchmark. ↩