Optimising Unity games for Mobile

Optimise for CPU and GPU


CPU is often limited by the number of batches that need to be rendered. “Batching” is where the engine attempts to combine the rendering of multiple objects into a chunk of memory in order to reduce CPU overhead due to resources switching.

To draw an object on the screen, the engine has to issue a draw call to the graphics API (e.g. OpenGL or Direct3D). Draw calls are often expensive, with the graphics API doing significant work for every draw call, causing performance overhead on the CPU side. This is mostly caused by the state changes done between the draw calls (e.g. switching to a different material), which causes expensive validation and translation steps in the graphics driver.

Basically draw calls are the commands that tells the GPU to render a certain set of vertices as triangles with a certain state (shaders, blend state and so on). It should be noted that draw calls aren’t necessarily expensive. In older versions of Direct3D, many calls required a context switch, which was expensive, but this isn’t true in newer versions. The main reason to make fewer draw calls is that graphics hardware can transform and render triangles much faster than you can submit them. If you submit few triangles with each call, you will be completely bound by the CPU and the GPU will be mostly idle. The CPU won’t be able to feed the GPU fast enough. Making a single draw call with two triangles is cheap, but if you submit too little data with each call, you won’t have enough CPU time to submit as much geometry to the GPU as you could have.

There are some real costs with making draw calls, it requires setting up a bunch of state (which set of vertices to use, what shader to use and so on), and state changes have a cost both on the hardware side (updating a bunch of registers) and on the driver side (validating and translating your calls that set state).

Unity uses static batching and dynamic batching to address this.

  • Static Batching: combine static (i.e. not moving) objects into big meshes, and render them in a faster way.

Internally, static batching works by transforming the static objects into world space and building a big vertex + index buffer for them. Then for visible objects in the same batch, a series of “cheap” draw calls are done, with almost no state changes in between. So technically it does not save “3D API draw calls”, but it saves on state changes done between them (which is the expensive part).

  • Dynamic Batching: for small enough meshes, transform their vertices on the CPU, group many similar ones together, and draw in one go.

Built-in batching has several benefits compared to manually merging objects together (most notably, the objects can still be culled individually). But it also has some downsides too (static batching incurs memory and storage overhead; and dynamic batching incurs some CPU overhead). Only objects sharing the same material can be batched together. Therefore, if you want to achieve good batching, you need to share as many materials among different objects as possible.

If you have two identical materials which differ only in textures, you can combine those textures into a single big texture – a process often called texture atlasing. Once textures are in the same atlas, you can use a single material instead.

Currently, only Mesh Renderers are batched. This means that skinned meshes, cloth, trail renderers and other types of rendering components are not batched.

Semitransparent shaders most often require objects to be rendered in back-to-front order for transparency to work. Unity first orders objects in this order, and then tries to batch them – but because the order must be strictly satisfied, this often means less batching can be achieved than with opaque objects.

Manually combining objects that are close to each other might be a very good alternative to draw call batching. For example, a static cupboard with lots of drawers often makes sense to just combine into a single mesh, either in a 3D modeling application or using Mesh.CombineMeshes.



GPU is often limited by fillrate or memory bandwidth. If running the game at a lower display resolution makes it faster then you’re most likely limited by fillrate on the GPU. Fill rate refers to the number of pixels that a video card can render or write to memory every second. It is measured in megapixels or gigapixels per second, which is obtained by multiplying the clock frequency of the graphics processing unit (GPU) by the number of raster operations (ROPs).


Textures – Texture Size, Compression, Atlases and MipMaps

Optimal Texture Type – PNG is the lesser of many evils. It does lossless image compression compared to lossy JPEG compression, it doesn’t do alpha as great as TGA does – but it does do compression and alpha mapping good enough to make it better than the other file types.

Texture Compression –  ETC texture compression, however doesn’t support alpha channels. If alpha then go with uncompressed.

You should always have mipmaps checked if you’re using 3D, because otherwise you get awful artifacts when the camera moves, plus it runs faster since it doesn’t have to calculate so many pixels for distant objects. Other than looking slightly blurry compared to not having mipmaps, there shouldn’t be any downsides, and the slight blurriness is more than compensated for by the lack of flickering texture artifacts. You can use trilinear filtering so that the transition between mipmap levels is smooth. If you have any serious degradation with mipmaps, that’s not normal

To create Texture Atlases use Texture Packer Tool with the standalone version




Models – Triangle Count and UV Map

  • Don’t use any more triangles than necessary
  • Try to keep the number of UV mapping seams and hard edges (doubled-up vertices) as low as possible

You should use only a single skinned mesh renderer for each character. Unity optimizes animation using visibility culling and bounding volume updates and these optimizations are only activated if you use one animation component and one skinned mesh renderer in conjunction. The rendering time for a model could roughly double as a result of using two skinned meshes in place of a single mesh and there is seldom any practical advantage in using multiple meshes.

When animating use as few bones as possible

A bone hierarchy in a typical desktop game uses somewhere between fifteen and sixty bones. The fewer bones you use, the better the performance will be. You can achieve very good quality on desktop platforms and fairly good quality on mobile platforms with about thirty bones. Ideally, keep the number below thirty for mobile devices and don’t go too far above thirty for desktop games.

Use as few materials as possible

You should also keep the number of materials on each mesh as low as possible. The only reason why you might want to have more than one material on a character is that you need to use different shaders for different parts (eg, a special shader for the eyes). However, two or three materials per character should be sufficient in almost all cases.


Culling and LOD (Level of Detail)

Occlusion Culling is a feature that disables rendering of objects when they are not currently seen by the camera because they are obscured (occluded) by other objects. This does not happen automatically in 3D computer graphics since most of the time objects farthest away from the camera are drawn first and closer objects are drawn over the top of them (this is called “overdraw”). Occlusion Culling is different from Frustum Culling. Frustum Culling only disables the renderers for objects that are outside the camera’s viewing area but does not disable anything hidden from view by overdraw. Note that when you use Occlusion Culling you will still benefit from Frustum Culling.

The occlusion culling process will go through the scene using a virtual camera to build a hierarchy of potentially visible sets of objects. This data is used at runtime by each camera to identify what is visible and what is not. Equipped with this information, Unity will ensure only visible objects get sent to be rendered. This reduces the number of draw calls and increases the performance of the game.


Fog and Lighting Effects

The solution we came up with is the use of simple mesh faces with a transparent texture (Fog planes) instead of global fog. Once a player comes too close to a fog plane, it fades out and moreover, vertices of the fog plane are pulled away (because even a fully transparent alpha surface still consumes lot of render time).


Debug Performance – Rendering Statistics, and Frame Debugger

Rendering Statistics

The Game View has a Stats button in the top right corner. When the button is pressed, an overlay window is displayed which shows realtime rendering statistics, which are useful for optimizing performance. The exact statistics displayed vary according to the build target.

Frame Debugger

The Frame Debugger lets you freeze playback for a running game on a particular frame and view the individual draw calls that are used to render that frame. As well as listing the drawcalls, the debugger also lets you step through them one-by-one so you can see in great detail how the scene is constructed from its graphical elements.


Extra Tips

  • Set Static property on a non-moving objects to allow internal optimizations like static batching.
  • Do not use dynamic lights when it is not necessary – choose to bake lighting instead.
  • Use compressed texture formats when possible, otherwise prefer 16bit textures over 32bit.
  • Use pixel shaders or texture combiners to mix several textures instead of a multi-pass approach.
  • CG: Use half precision variables when possible.
  • Do not use Pixel Lights when it is not necessary – choose to have only a single (preferably directional) pixel light affecting your geometry.
  • Alpha blending is ruthless on mobile.
  • Use occlusion culling.
  • Use texture atlases and pay attention to texture memory.
  • Limit particle emission count, use fast mobile shaders.
  • Use lightmapping, baked shadows, and blob shadows.