GPU Benchmarking Methods Investigated: Fact vs. Fiction
Benchmarks. Every website worth their marbles uses them to varying degrees of accuracy. Meanwhile, every reader wants to recreate them in some way, shape or form in order to do exactly what their favorite publications are doing: to evaluate the performance of their hardware choices and quantify their purchase. Benchmarks can also help diagnose a problem but more often than not websites like Hardware Canucks use these tools to determine how well a given product performs against the competition. As with all things, the number of programs we can attain results with is nearly infinite but it is the job of publications to choose the right set of tools which will accurately convey results to the masses. Unfortunately, as we will show you in this article choosing the right programs and sequences is extremely hard and most of the current methods are inaccurate.
The reason why we have chosen to focus on GPU benchmarking is because this really is the wild-west of the online review industry. A fortune in terms of traffic can be had if GPU reviews are published regularly but with potential traffic increases comes the risk of cutting corners in order to complete the time-consuming benchmarking portion as quickly as possible. Naturally, some time-cutting methods will still produce accurate results while others won’t.
In a general canvassing of over two dozen English-speaking tech websites we found a wide swath of benchmarks being used; from timedemos to stand-alone programs to in-game benchmarks to walkthroughs. What we also saw at times was a general lack of information beyond a game’s title regarding the actual type of benchmark used. For the most part it seemed many websites were using in-game benchmarking tools (mostly “rolling” demos) instead of actual gameplay and coming up with some interesting results. This along with comments in several forums got us wondering: is there a “right” way to benchmark a particular game? In addition, do these in-game or stand-alone benchmarking programs –like the recently released AvP DX11 test- represent in-game performance? If not, do they even provide an accurate enough analysis for a writer to formulate a conclusion about a given product? Well, we’re about to find out.
In this article we are going to take nine of the most popular games used by most websites for GPU reviews and give you a rundown of their performance in-game and otherwise. In most cases we will be highlighting the usefulness of either stand-alone or on in-game benchmarks simply because they are easily accessible to reviewers and the general public alike. There will also be some discussion about how timedemos, sample lengths and patches can affect results.
We will be using a GTX 470 and a HD 5850 for these tests in order to determine if different benchmarking methods will affect the positioning of each product. Meanwhile, every game was played through from start to finish (yes, this article has been a long time in the making) and we have determined a worst-case sequence as well as a more “typical” scene from which we will be basing our real world numbers on. Meanwhile, for comparison purposes we will also be testing the additional benchmarking features each of these games comes with.
Before we go on, it is important to preface this article with one statement: we aren’t looking to point fingers in any way, shape or form. Our aim is to give readers enough information so they can determine which results are accurate and which are not.
Our thanks to Toms Hardware, Anandtech & PCGameshardware for helping out with validating results / methodologies for this article.

Last edited: