TechSense : Nintendo Switch GPU

8 Jan 2017

This is going to be a post about the design choices Nintendo have made and why, including why the rumors of extra hardware in the dock to improve the GPU don’t make sense. I wanted to write it because there seems to be a lot of confusion about how these things work so I’m going to try and explain it.

Starting off with the basics, if you want to make a digital device more powerful you’ve got two options:

  1. Clock it faster.
  2. Have more of them.

So let’s look at both options.

Clock it faster

Pretty much all forms of logic have a low current draw when they’re doing nothing, but this rockets up whenever they change state. So if you want a circuit to update twice as often, so it can do double the amount of work in the same amount of time, it’s going to use twice the current.

Now, while we think of logic in terms of well defined 0s and 1s it’s usually a range of values. Below V_Low is 0 and above V_High is 1, anything between is ‘undefined’ which means ‘we have no idea what its going to do, so stay outside this bit’.

The problem is, as you push the logic faster it generates more noise, which increases the chance you’ll start to stray into this undefined region. You can get around this by increasing the voltage, which make the ranges wider, giving you more wiggle room.

Power (measured in Watts) is calculated by multiplying the voltage a circuit draws by the current. So as we increase the clock rate, the current goes up, we also have to increase the voltage to keep everything happy.

This means that power tends to goes up with the square of the clock speed. So a chip running at twice the speed will probably need 4 times the power.

Now if you’re plugged in supplying this isn’t a problem, but you need to be able to get the heat generated out of your device or it’s going to get so hot it fails. This generally means bigger heat-sinks, which make it larger and heavier, and/or fans.

So what about having more of them?

More logic?

Another approach would be to make the chip twice the size, or use two of them in parallel. Now this is going to cost twice as much to produce, but will only use twice the power, so half of much as the clock doubling option.

The downside is the programmers now have to get their code running on both chips at the same time, sharing the work between both devices without them stepping on each others toes.

Now with CPUs, which are programmed per core, this is all up to the developer as there’s no reliable way of doing this automatically. There are vectorizing compilers and parallel libraries which try and make this easier, but they still require you to make your code more complicated and some problems just can’t be split up like this.

On the other hand GPUs are much simpler to deal with as they just need to the same thing for every pixel on the screen. If you’ve got 4 GPU cores you could give each one a quarter of the screen, or maybe you break it into smaller tiles and queue them up, giving a new tile to each core as soon as it’s finished (as some tiles are simpler to draw than others).

In other words, the work we give GPUs is easy to split up into as many pieces as you want, without the developers having to do anything weird, so making a game scale with different GPU performance is much easier then making them scale as the CPU power changes.

How GPUs are used

Just in case you’ve never done any 3D graphics, here’s roughly how you draw stuff.

Drawing a model on the screen involves a few steps (assuming we’re not doing anything fancy like deferred rendering):

  1. Upload the textures
  2. Upload the vertex and indices data (this describes the points that makes up the polygons and which ones are connected to each other)
  3. Set the textures we’re using
  4. Set a transformation matrix
  5. Ask the GPU to draw something using the vertex and indices buffers

The first two steps don’t need to be done every frame, as stuff tends to get re-used so it just stays on the GPU until we get rid of it.

The transformation matrix exists so we can reuse a model without having to change the vertex data every time. It’s a set of translation (movement), rotation and scaling that gets applied to every vertex in the next drawing command. While this sounds like a lot of extra work GPUs are designed to do it so it doesn’t slow them down. The benefit is you can draw the same model at any location, rotation and scale without changing the vertex/index buffers.

Now, notice that the last three steps, that are done for every model you want to draw, don’t care about the objects resolution, textures or screen size. Once it’s on the GPU it’s just ‘draw this, with this transform’. Your only limit is how many times you can tell the GPU to draw per second, as there’s a bit of setup that needs to be done each time. This is what all the fuss about reducing ‘draw call overhead’ is with DirectX 12, Vulcan and the bind-less tricks you can do in OpenGL. These don’t let you draw more complex models, but let you draw more of them, giving a game world more detail.

All this means that the game engine doesn’t care if the GPU is drawing 50 models with 1K polygons each at 720p (50K polys, 0.9M pixels), or 50 models with 10K polygons at 4K (500K polys, 8M pixels), it’s just 50 draw calls.

The Nintendo Switch

Now that you understand all of this you can probably see why Nintendo made the choices they did with the Switch.

Adjusting the CPU speed depending on the dock state would cause a lot of issues for the programmers, so they keep it fixed no matter what the power source. This limits the game worlds complexity, but it’s far better to do that then risking weird bugs occurring when running at different speeds. Just because the game runs without any bugs at 60FPS doesn’t mean it’ll be fine if it’s only running at 30FPS, so making this constant should keep QA simpler. Physics systems especially can start acting really weird if their update rate drops too low.

On the other hand, if they can scale the GPU’s performance in line with the screen resolution it’s driving, they can make things a lot simpler for developers. 720p is 0.9M pixels, while 1080p is 2.07M. So, if you can make the GPU at least 2.25 times as powerful when docked it should draw at the same frame-rate, even though it’s targeting a larger screen.

You would have to be careful about using too much memory bandwidth, but the simplest way of doing this would be to target 1080p first, which will have the higher requirement, and then check it looks good at the lower resolution.

As for power, if it’s a 3W GPU when mobile that’s just under 12W when docked, which wouldn’t be a problem in a device of this size with a fan small fan.

Wrapping up

Hopefully this should explain why the rumors of extra GPU power in the dock didn’t make any sense. Having it would just make everything more complicated and expensive without any need.

One of the major selling points of the Switch is going to be that you can play the same games on whichever screen you want, whether it’s out of the house or just in another room because the television is being used by someone else. Doing anything that makes this harder for developers to support risks making one of the headline features a gimmick that doesn’t work all the time.

Whether we’re going to see many AAA games ported over from the PS4/XO I’m not sure. Assuming they’ve got the bandwidth from the cartridges then the 4GB of RAM wouldn’t be a problem, but I think the CPUs performance isn’t going to compare very well which will be more of an issue then the GPU (for the reasons outlined earlier).

That said, I don’t think Nintendo really need Call of Duty/Battlefield on the Switch for it to be a success. With Unity and the Unreal engine both claiming support the indy titles that use them shouldn’t have too much of a problem getting stuff across and with those supplementing Nintendo releases they shouldn’t be a shortage of games to play.

Now we’ve just got to wait till the 13th and hope the rumored price is about right. If so, I can see myself pre-ordering on, especially if the Splatoon bundle is true.

This post has been taged as TechSense