Memory Management

Detecting Memory Issues Using Unity’s Profiler

Unity’s profiler is primarily geared at analyzing the performance and the resource demands of the various types of assets in your game. Yet the profiler is equally useful for digging into the memory-related behavior of your C# code – even of external .NET/Mono assemblies that don’t reference UnityEngine.dll.

Unity memory profiler window

This is to allow you to see if you have any memory leaks stemming from your C# code. Even if you don’t use any scripts, the ‘Used’ size of the heap grows and contracts continuously. As soon as you do use scripts, you need a way to see where allocations occur, and the CPU profiler provides that info.

Unity CPU profiler window

 

 C# Memory Management

Automatic Memory Management. This feature had been built deeply into the C# language and was an integral part of its philosophy. Memory management is a problematic issue that cannot be simply entrusted to the common language runtime (CLR).

Your ability to manage memory, or more precisely how memory is allocated, in Unity / .NET is limited.  You get to choose whether your custom data structures are class (always allocated on the heap) or struct (allocated on the stack unless they are contained within a class), and that’s it. If you want more magical powers, you must use C#’s unsafekeyword. But unsafe code is just unverifiable code, meaning that it won’t run in the Unity Web Player and probably some other target platforms. For this and other reasons, don’t use unsafe. Because of the above-mentioned limits of the stack, and because C# arrays are just syntactic sugar for System.Array(which is a class), you cannot and should not avoid automatic heap allocation. What you should avoid are unnecessary heap allocations.

Your powers are equally limited when it comes to deallocation. Actually, the only process that can deallocate heap objects is the GC, and its workings are shielded from you. What you can influence is when the last reference to any of your objects on the heap goes out of scope, because the GC cannot touch them before that. This limited power turns out to have huge practical relevance, because periodic garbage collection (which you cannot suppress) tends to be very fast when there is nothing to deallocate.

Each use of foreach creates an enumerator object – an instance of the System.Collections.IEnumerator interface – behind the scenes. But does it create this object on the stack or on the heap? That turns out to be an excellent question, because both are actually possible. Most importantly, almost all of the collection types in the System.Collections.Generic namespace (List<T>, Dictionary<K, V>, LinkedList<T>, etc.) are smart enough to return a struct from from their implementation of GetEnumerator()). This includes the version of the collections that ships with Mono 2.6.5 (as used by Unity).

So should you avoid foreach loops?

  • Don’t use them in C# code that you allow Unity to compile for you.
  • Do use them to iterate over the standard generic collections (List<T> etc.) in C# code that you compile yourself with a recent compiler. Visual Studio as well as the free .NET Framework SDK are fine, and I assume (but haven’t verified) that the one that comes with the latest versions of Mono and MonoDevelop is fine as well.

Should you avoid closures and LINQ?

You probably know that C# offers anonymous methods and lambda expressions (which are almost but not quite identical to each other). You can create them with the delegate keyword and the => operator, respectively. They are often a handy tool, and they are hard to avoid if you want to use certain library functions (such as List<T>.Sort()) or LINQ.

Do anonymous methods and lambdas cause memory leaks? The answer is: it depends. The C# compiler actually has two very different ways of handling them. To understand the difference, consider the following small chunk of code:

int result = 0;
    
void Update()
{
    for (int i = 0; i &lt; 100; i++)
    {
        System.Func&lt;int, int&gt; myFunc = (p) =&gt; p * p;
        result += myFunc(i);
    }
}

Coroutines

If you launch a coroutine via StartCoroutine(),  you implicitly allocate both an instance of Unity’sCoroutine class (21 Bytes on my system) and an Enumerator (16 Bytes). Importantly, no allocation occurs when the coroutine yield's or resumes, so all you have to do to avoid a memory leak is to limit calls to StartCoroutine() while the game is running.

Strings

No overview of memory issues in C# and Unity would be complete without mentioning strings. From a memory standpoint, strings are strange because they are both heap-allocated and immutable. When you concatenate two strings (be they variables or string-constants) as in:

void Update()
{
    string string1 = "Two";
    string string2 = "One" + string1 + "Three";
}

the runtime has to allocate at least one new string object that contains the result. In String.Concat() this is done efficiently via an external method called FastAllocateString(), but there is no way of getting around the heap allocation (40 Bytes on my system in the example above). If you need to modify or concatenate strings at runtime, use System.Text.StringBuilder.

Boxing

Sometimes, data has to be moved between the stack and the heap. For example, when you format a string as in:

string result = string.Format("{0} = {1}", 5, 5.0f);

… you are calling a method with the following signature:

public static string Format(
  string format,
  params Object[] args
)

In other words, the integer “5” and the floating-point number “5.0f” have to be cast to System.Objectwhen Format() is called. But Object is a reference type whereas the other two are value types. C# therefore has to allocate memory on the heap, copy the values to the heap, and hand Format() a reference to the newly created int and float objects. This process is called boxing, and its counterpartunboxing.

This behavior may not be a problem with String.Format() because you expect it to allocate heap memory anway (for the new string). But boxing can also show up at less expected locations. A notorious example occurs when you want to implement the equality operator==” for your home-made value types (for example, a struct that represents a complex number). Read all about how to avoid hidden boxing in such cases here.

Now we also want to avoid unnecessary deallocations, so that while our game is running, the garbage collector (GC) doesn’t create those ugly drops in frames-per-second. Object pooling is ideal for this purpose. 

Object Pooling

The idea behind object pooling is extremely simple. Instead of creating new objects with the new operator and allowing them to become garbage later, we store used objects in a pool and reuse them as soon as they’re needed again. The single most important feature of the pool – really the essence of the object-pooling design pattern – is to allow us to acquire a ‘new’ object while concealing whether it’s really new or recycled. This pattern can be realized in a few lines of code:

public class ObjectPool&lt;<strong>T</strong>&gt; where <strong>T</strong> : <strong>class</strong>, new()
{
    private Stack&lt;<strong>T</strong>&gt; m_objectStack = new Stack&lt;<strong>T</strong>&gt;();

    public <strong>T</strong> New()
    {
        return (m_objectStack.Count == 0) ? new <strong>T</strong>() : m_objectStack.Pop();
    }

    public void Store(<strong>T</strong> t)
    {
        m_objectStack.Push(t);
    }
}

Simple, yes, but a perfectly good realization of the core pattern. (If you’re confused by the “where T...” part, it is explained below.) To use this class, you have to replace allocations that make use of the new operator, such as here…

void Update()
{
    MyClass m = new MyClass();
}

… with paired calls to New() and Store():

ObjectPool&lt;<strong>MyClass</strong>&gt; poolOfMyClass = new ObjectPool&lt;<strong>MyClass</strong>&gt;();

void Update()
{
    MyClass m = poolOfMyClass.New();

    // do stuff...

    poolOfMyClass.Store(m);
}

This is annoying because you’ll need to remember to call Store(), and do so at the right place. Unfortunately, there is no general way to simplify this usage pattern further because neither theObjectPool class nor the C# compiler can know when your object has gone out of scope. Well, actually, there is one way – it is called automatic memory managment via garbage collection, and it’s shortcomings are the reason you’re reading these lines in the first place! That said, in some fortunate situations, you can use a pattern explaind under “A pool with collective reset” at the end of this article. There, all your calls to Store() are replaced by a single call to a ResetAll() method.

Functionality Requirements

  • Many types of objects need to be ‘reset’ in some way before they can be reused. At a minimum, all member variables may be set to their default state. This can be handled transparently by the pool, rather than by the user. When and how to reset is a matter of design that relates to the following two distinctions.
    • Resetting can be eager (i.e., executed at the time of storage) or lazy (executed right before the object is reused).
    • Resetting can be managed by the pool (i.e., transparently to the class that is being pooled) or by the class (transparently to the person who is declaring the pool object).
  • In the example above, the object pool ‘poolOfMyClass‘ had to be declared explicitly with class-level scope. Obviously, a new such pool would have to be declared for each new type of resource (My2ndClass etc.). Alternatively, it is possible to have the ObjectPool class create and manage all these pools transparently to the user.
  • Several object-pooling libraries you find out there aspire to manage very heterogeneous kinds of scarce resources (memory, database connections, game objects, external assets etc.). This tends to boost the complexity of the object pooling code, as the logic behind handling such diverse resources varies a great deal.
  • Some types of resources (e.g., database connections) are so scarce that the pool needs to enforce an upper limit and offer a safe way of failing to allocate a new/recycled object.
  • If objects in the pool are used in large numbers at relatively ‘rare’ moments, we may want the pool to have the ability to shrink (either automatically or on-demand).
  • Finally, the pool can be shared by several threads, in which case it would have to be thread-safe.

Which of these are worth implementing? Your answer may differ from mine, but allow me to explain my own preferences.

  • Yes, the ability to ‘reset’ is a must-have. But, as you will see below, there is no point in choosing between having the reset logic handled by the pool or by the managed class. You are likely to need both, and the code below will show you one version for each case.
  • Unity imposes limitations on your multi-threading – basically, you can have worker threads in addition to the main game thread, but only the latter is allowed to make calls into the Unity API. In my experience, this means that we can get away with separate object pools for all our threads, and can thus delete ‘support for multi-threading’ from our list of requirements.
  • Personally, I don’t mind too much having to declare a new pool for each type of object I want to pool. The alternative means using the singleton pattern: you let your ObjectPool class create new pools as needed and store them in a dictionary of pools, which is itself stored in a static variable. To get this to work safely, you’d have to make your ObjectPool class multi-threaded. ( I would avoid multi threaded pooling solutions due to being likely unsafe).
  • In line with the scope of this three-part blog, I’m only interested in pools that deal with one type of scarce resource: memory. Pools for other kinds of resources are important, too, but they’re just not within the scope of this post. This really narrows down the remaining requirements.
    • The pools presented here do not impose a maximum size. If your game uses too much memory, you are in trouble anyway, and it’s not the object pool’s business to fix this problem.
    • By the same token, we can assume that no other process is currently waiting for you to release your memory as soon as possible. This means that resetting can be lazy, and that the pool doesn’t have to offer the ability to shrink.

A basic pool with initialization and reset

Our revised ObjectPool<T> class looks as follows:

public class ObjectPool&lt;<strong>T</strong>&gt; where <strong>T</strong> : <strong>class</strong>, new()
{
    private Stack&lt;<strong>T</strong>&gt; m_objectStack;

    private Action&lt;<strong>T</strong>&gt; m_resetAction;
    private Action&lt;<strong>T</strong>&gt; m_onetimeInitAction;

    public ObjectPool(int initialBufferSize, Action&lt;<strong>T</strong>&gt;
        ResetAction = null, Action&lt;<strong>T</strong>&gt; OnetimeInitAction = null)
    {
        m_objectStack = new Stack&lt;<strong>T</strong>&gt;(initialBufferSize);
        m_resetAction = ResetAction;
        m_onetimeInitAction = OnetimeInitAction;
    }

    public <strong>T</strong> New()
    {
        if (m_objectStack.Count &gt; 0)
        {
            <strong>T</strong> t = m_objectStack.Pop();

            if (m_resetAction != null)
                m_resetAction(t);

            return t;
        }
        else
        {
            <strong>T</strong> t = new <strong>T</strong>();

            if (m_onetimeInitAction != null)
                m_onetimeInitAction(t);

            return t;
        }
    }

    public void Store(<strong>T</strong> obj)
    {
        m_objectStack.Push(obj);
    }
}

This implementation is very simple and straightforward. The parameter ‘T‘ has two constraints that are specified by way of “where T : class, new()“. Firstly, ‘T‘ has to be a class (after all, only reference types need to be object-pooled), and secondly, it must have a parameterless constructor.

The constructor takes your best guess of the maximum number of objects in the pool as a first parameter. The other two parameters are (optional) closures – if given, the first closure will be used to reset a pooled object, while the second initializes a new one. ObjectPool<T> has only two methods besides its constructor, New() and Store(). Because the pool uses a lazy approach, all work happens in New(), where new and recycled objects are either initialized or reset. This is done via two closures that can optionally be passed to the constructor. Here is how the pool could be used in a class that derives from MonoBehavior.

class SomeClass : MonoBehaviour
{
    private ObjectPool&lt;<em>List&lt;</em>Vector3<em>&gt;</em>&gt; m_poolOfListOfVector3 =
        new ObjectPool&lt;<em>List&lt;</em>Vector3<em>&gt;</em>&gt;(32,
        (list) =&gt; {
            list.Clear();
        },
        (list) =&gt; {
            list.Capacity = 1024;
        });

    void Update()
    {
        List&lt;<em>Vector3</em>&gt; listVector3 = m_poolOfListOfVector3.New();

        // do stuff

        m_poolOfListOfVector3.Store(listVector3);
    }
}

A pool that lets the managed type reset itself

The basic version of the object pool does what it is supposed to do, but it has one conceptual blemish. It violates the principle of encapsulation insofar as it separates the code for initializing / resetting an object from the definition of the object’s type. This leads to tight coupling, and should be avoided if possible. In the SomeClass example above, there is no real alternative because we cannot go and change the definition of List<T>. However, when you use object pooling for your own types, you may want to have them implement the following simple interface IResetable instead. The corresponding classObjectPoolWithReset<T> can hence be used without specifying any of the two closures as parameters.

public interface IResetable
{
    void Reset();
}

public class ObjectPoolWithReset&lt;<strong>T</strong>&gt; where <strong>T</strong> : <strong>class</strong>, IResetable, new()
{
    private Stack&lt;<strong>T</strong>&gt; m_objectStack;

    private Action&lt;<strong>T</strong>&gt; m_resetAction;
    private Action&lt;<strong>T</strong>&gt; m_onetimeInitAction;

    public ObjectPoolWithReset(int initialBufferSize, Action&lt;<strong>T</strong>&gt;
        ResetAction = null, Action&lt;<strong>T</strong>&gt; OnetimeInitAction = null)
    {
        m_objectStack = new Stack&lt;<strong>T</strong>&gt;(initialBufferSize);
        m_resetAction = ResetAction;
        m_onetimeInitAction = OnetimeInitAction;
    }

    public <strong>T</strong> New()
    {
        if (m_objectStack.Count &gt; 0)
        {
            <strong>T</strong> t = m_objectStack.Pop();

            t.Reset();

            if (m_resetAction != null)
                m_resetAction(t);

            return t;
        }
        else
        {
            <strong>T</strong> t = new <strong>T</strong>();

            if (m_onetimeInitAction != null)
                m_onetimeInitAction(t);

            return t;
        }
    }

    public void Store(<strong>T</strong> obj)
    {
        m_objectStack.Push(obj);
    }
}

A Pool with Collective Reset

Some types of data structures in your game may never persist over a sequence of frames, but get retired at or before the end of each frame. In this case, when we have a well-defined point in time by the end of which all pooled objects can be stored back in the pool, we can rewrite the pool to be both easier to use and significantly more efficient. Let’s look at the code first.

public class ObjectPoolWithCollectiveReset&lt;<strong>T</strong>&gt; where <strong>T</strong> : <strong>class</strong>, new()
{
    private List&lt;<strong>T</strong>&gt; m_objectList;
    private int m_nextAvailableIndex = 0;

    private Action&lt;<strong>T</strong>&gt; m_resetAction;
    private Action&lt;<strong>T</strong>&gt; m_onetimeInitAction;

    public ObjectPoolWithCollectiveReset(int initialBufferSize, Action&lt;<strong>T</strong>&gt;
        ResetAction = null, Action&lt;<strong>T</strong>&gt; OnetimeInitAction = null)
    {
        m_objectList = new List&lt;<strong>T</strong>&gt;(initialBufferSize);
        m_resetAction = ResetAction;
        m_onetimeInitAction = OnetimeInitAction;
    }

    public <strong>T</strong> New()
    {
        if (m_nextAvailableIndex &lt; m_objectList.Count)
        {
            // an allocated object is already available; just reset it
            <strong>T</strong> t = m_objectList[m_nextAvailableIndex];
            m_nextAvailableIndex++;

            if (m_resetAction != null)
                m_resetAction(t);

            return t;
        }
        else
        {
            // no allocated object is available
            <strong>T</strong> t = new <strong>T</strong>();
            m_objectList.Add(t);
            m_nextAvailableIndex++;

            if (m_onetimeInitAction != null)
                m_onetimeInitAction(t);

            return t;
        }
    }

    public void ResetAll()
    {
        m_nextAvailableIndex = 0;
    }
}

The changes to the original ObjectPool<T> class are substantial this time. Regarding the signature of the class, the Store() method is replaced by ResetAll(), which only needs to be called once when all allocated objects should go back into the pool. Inside the class, the Stack<T> has been replaced by aList<T> which keeps references to all allocated objects even while they’re being used. We also keep track of the index of the most recently created-or-released object in the list. In that way, New() knows whether to create a new object or reset an existing one.

Delegates, Action Delegates, Func Delegates, and Lambda Expressions

Delegates

A delegate is a type that safely encapsulates a method, similar to a function pointer in C and C++. Unlike C function pointers, delegates are object-oriented, type safe, and secure. The type of a delegate is defined by the name of the delegate. A Delegate can be thought of as a reference pointer to an object/method. When it gets called, it notifies all methods that reference the delegate.

Delegate Multicast Example (+=):

using UnityEngine;
using System.Collections;

public class MulticastScript : MonoBehaviour 
{
    delegate void MultiDelegate();
    MultiDelegate myMultiDelegate;
    

    void Start () 
    {
        myMultiDelegate += PowerUp;
        myMultiDelegate += TurnRed;
        
        if(myMultiDelegate != null)
        {
            myMultiDelegate();
        }
    }
    
    void PowerUp()
    {
        print ("Orb is powering up!");
    }
    
    void TurnRed()
    {
        renderer.material.color = Color.red;
    }
}

Action Delegates

You can use the Action(Of T) delegate to pass a method as a parameter without explicitly declaring a custom delegate. The benefit here is you don’t have to declare a delegate. The compiler is smart enough to figure out the proper types.

However there is a limitation, the corresponding method action must not return a value. (In C#, the method must return void.)

Action Delegate Example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ModernLanguageConstructs
{
    class Program
    {
        static void Main(string[] args)
        {
            // Part 1 - First action that takes an int and converts it to hex
            Action&lt;int&gt; displayHex = delegate(int intValue)
            {
                Console.WriteLine(intValue.ToString("X"));
            };

            // Part 2 - Second action that takes a hex string and 
            // converts it to an int
            Action&lt;string&gt; displayInteger = delegate(string hexValue)
            {
                Console.WriteLine(int.Parse(hexValue,
                    System.Globalization.NumberStyles.HexNumber));
            };
            
            // Part 3 - exercise Action methods
            displayHex(16);
            displayInteger("10");
        }
    }
}

Func Delegates
This differs from Action<> in the sense that it supports parameters AND return values.
You can use this delegate to represent a method that can be passed as a parameter without explicitly declaring a custom delegate. The encapsulated method must correspond to the method signature that is defined by this delegate.
This means that the encapsulated method must have one parameter that is passed to it by value, and that it must return a value.
Func Delegate Example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ModernLanguageConstructs
{
    class Program
    {
        static void Main(string[] args)
        {
            // Part 1 - First Func&lt;&gt; that takes an int and returns a string
            Func&lt;int, string&gt; displayHex = delegate(int intValue)
            {
                return (intValue.ToString("X"));
            };

            // Part 2 - Second Func&lt;&gt; that takes a hex string and 
            // returns an int
            Func&lt;string, int&gt; displayInteger = delegate(string hexValue)
            {
                return (int.Parse(hexValue,
                    System.Globalization.NumberStyles.HexNumber));
            };

            // Part 3 - exercise Func&lt;&gt; delegates
            Console.WriteLine(displayHex(16));
            Console.WriteLine(displayInteger("10"));
        }
    }
}
 Lambda Expressions
A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.
All lambda expressions use the lambda operator =>, which is read as “goes to”. The left side of the lambda operator specifies the input parameters (if any) and the right side holds the expression or statement block.
The lambda expression x => x * x is read “x goes to x times x.”
Lambda Expression Example:
private IEnumerator waitThenCallback(float time, Action callback)
{
   yield return new WaitForSeconds(time);
   callback();
}

void Start()
{
  splashScreen.show();

  StartCoroutine(waitThenCallback(5, () =&gt; 
         { Debug.Log("Five seconds have passed!"); }));
  StartCoroutine(waitThenCallback(10, () =&gt; 
         { Debug.Log("Ten seconds have passed!"); }));
  StartCoroutine(waitThenCallback(20, () =&gt; 
  {
    Debug.Log("Twenty seconds have passed!"); 
    splashScreen.hide();
  }));
}
Another Lambda Expression Example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ModernLanguageConstructs
{
    class Program
    {
        static void Main(string[] args)
        {
            // Part 1 - An action and a lambda
            Action&lt;int&gt; displayHex = intValue =&gt;
            {
                Console.WriteLine(intValue.ToString("X"));
            };

            Action&lt;string&gt; displayInteger = hexValue =&gt;
            {
                Console.WriteLine(int.Parse(hexValue,
                    System.Globalization.NumberStyles.HexNumber));
            };

            // Part 2 - Use the lambda expressions
            displayHex(16);
            displayInteger("10");

        }
    }
}

Communicate with Comments

/// Comments

After declaring the public method or field,  enter “///” in the line above it, that will auto fill a comment block. This will also enable hover help text.

Summary Comment Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/// &lt;summary&gt;
/// Initializes a new instance of the &lt;see cref="EnemyLoader"/&gt; class.
/// &lt;/summary&gt;
/// &lt;param name='enemyName'&gt;
/// Enemy name. e.g evilfrog bustyfox
/// &lt;/param&gt;
/// &lt;param name='successCallback'&gt;
/// Success callback.
/// &lt;/param&gt;
/// &lt;param name='failureCallback'&gt;
/// Failure callback.
/// &lt;/param&gt;
public EnemyLoader( string enemyName, LoadSuccess successCallback, LoadFailure failureCallback)

Code Comment Example:

Bad Good
// increment enemies // we know there is now another enemy
enemies++; enemies++;

Enums and Bitwise operators

Enums and Unity Editor

How to make life easier by making enum flags into mask fields in the Unity inspector.

EnumFlagAttribute.cs [Must be placed in a folder that is not Editor]

using UnityEngine;

public class EnumFlagAttribute : PropertyAttribute
{
 public string enumName;

 public EnumFlagAttribute() {}

 public EnumFlagAttribute(string name)
 {
    enumName = name;
 }
}

EnumFlagsAttributeDrawer.cs [Must be placed in the Editor folder]

using System;
using System.Reflection;
using UnityEditor;
using UnityEngine;

[CustomPropertyDrawer(typeof(EnumFlagAttribute))]
public class EnumFlagDrawer : PropertyDrawer {
 public override void OnGUI(Rect position, SerializedProperty property, GUIContent label)
 {
    EnumFlagAttribute flagSettings = (EnumFlagAttribute)attribute;
    Enum targetEnum = GetBaseProperty&lt;Enum&gt;(property);

    string propName = flagSettings.enumName;
    if (string.IsNullOrEmpty(propName))
    propName = property.name;

    EditorGUI.BeginProperty(position, label, property);
    Enum enumNew = EditorGUI.EnumMaskField(position, propName,   
    targetEnum);
    property.intValue = (int) Convert.ChangeType(enumNew,  
    targetEnum.GetType());

    EditorGUI.EndProperty();
 }

 static T GetBaseProperty&lt;T&gt;(SerializedProperty prop)
 {
    // Separate the steps it takes to get to this property
    string[] separatedPaths = prop.propertyPath.Split('.');

    // Go down to the root of this serialized property
    System.Object reflectionTarget =   
    prop.serializedObject.targetObject as object;
    // Walk down the path to get the target object
    foreach (var path in separatedPaths)
    {
       FieldInfo fieldInfo =  
       reflectionTarget.GetType().GetField(path);
       reflectionTarget = fieldInfo.GetValue(reflectionTarget);
    }
    return (T) reflectionTarget;
 }
}

It is important to note that Enum definition should also use power-of-two values. Bitwise shifts achieve this, they basically shift the bits of a number right (>>) or left (<<). If you have a decimal number, let’s say “1” and you shift it of one position to the left, you’ll have “10”. Another shift and you’ll get “100”. If shifting one position in base ten is equivalent to multiply (or divide) by ten, shifting one position in base two is equivalent to multiply (or divide) by two. This is why bitwise shifts can be used to easily create powers of two.

Test.cs [Attach this to a gameObject to see the property drawer in action]

using UnityEngine;
using System.Collections;
using System;

public enum MyEnum
{                      //BINARY DEC
  None = 0 &lt;&lt; 0,     //000000 0
  First = 1 &lt;&lt; 1,    //000001 1
  Second = 1 &lt;&lt; 2,   //000010 2
  Third = 1 &lt;&lt; 3,    //000100 4
  Fourth = 1 &lt;&lt; 4,   //001000 8
  Fifth = 1 &lt;&lt; 5,
  Sixth = 1 &lt;&lt; 6,
  Seventh = 1 &lt;&lt; 7,
  Eighth = 1 &lt;&lt; 8,
}

public class Test : MonoBehaviour 
{
  [EnumFlagAttribute]
  public MyEnum m_enum;

  public MyEnum m_lastEnum;
  public void Update()
  {
    if (m_lastEnum != m_enum) 
    {
      Debug.Log (m_enum);
      m_lastEnum = m_enum;
      if ((m_enum &amp; MyEnum.Fourth) != 0) 
                        {
        Debug.Log ("HAS FOURTH");
      }

    }
  }
}

Anonymous Functions – Closure Callbacks

Here is an example of anonymous functions in unity, below is a very helpful technique around loading assets and then setting a delegate to grab that loaded asset. Below, take note of the ‘closure’ delegate, which is being passed into the asset bundle manager load function as the second parameter. Because the delegate was defined inside the loop it forms closure around the variable ‘i’ which would usually not exist. When each bundle is loaded the callback will trigger and the asset will be placed at the correct index in the array.


for(int i=0; i < things.Length; i++)
{
    AssetBundleManager.Load(thingNames[i], delegate (AssetBundle bundle)
    {
        characters[i] = bundle.Load(thingNames[i] + "LOD0")
    });
}

Optimising Unity games for Mobile

Optimise for CPU and GPU

CPU

CPU is often limited by the number of batches that need to be rendered. “Batching” is where the engine attempts to combine the rendering of multiple objects into a chunk of memory in order to reduce CPU overhead due to resources switching.

To draw an object on the screen, the engine has to issue a draw call to the graphics API (e.g. OpenGL or Direct3D). Draw calls are often expensive, with the graphics API doing significant work for every draw call, causing performance overhead on the CPU side. This is mostly caused by the state changes done between the draw calls (e.g. switching to a different material), which causes expensive validation and translation steps in the graphics driver.

Basically draw calls are the commands that tells the GPU to render a certain set of vertices as triangles with a certain state (shaders, blend state and so on). It should be noted that draw calls aren’t necessarily expensive. In older versions of Direct3D, many calls required a context switch, which was expensive, but this isn’t true in newer versions. The main reason to make fewer draw calls is that graphics hardware can transform and render triangles much faster than you can submit them. If you submit few triangles with each call, you will be completely bound by the CPU and the GPU will be mostly idle. The CPU won’t be able to feed the GPU fast enough. Making a single draw call with two triangles is cheap, but if you submit too little data with each call, you won’t have enough CPU time to submit as much geometry to the GPU as you could have.

There are some real costs with making draw calls, it requires setting up a bunch of state (which set of vertices to use, what shader to use and so on), and state changes have a cost both on the hardware side (updating a bunch of registers) and on the driver side (validating and translating your calls that set state).

Unity uses static batching and dynamic batching to address this.

  • Static Batching: combine static (i.e. not moving) objects into big meshes, and render them in a faster way.

Internally, static batching works by transforming the static objects into world space and building a big vertex + index buffer for them. Then for visible objects in the same batch, a series of “cheap” draw calls are done, with almost no state changes in between. So technically it does not save “3D API draw calls”, but it saves on state changes done between them (which is the expensive part).

  • Dynamic Batching: for small enough meshes, transform their vertices on the CPU, group many similar ones together, and draw in one go.

Built-in batching has several benefits compared to manually merging objects together (most notably, the objects can still be culled individually). But it also has some downsides too (static batching incurs memory and storage overhead; and dynamic batching incurs some CPU overhead). Only objects sharing the same material can be batched together. Therefore, if you want to achieve good batching, you need to share as many materials among different objects as possible.

If you have two identical materials which differ only in textures, you can combine those textures into a single big texture – a process often called texture atlasing. Once textures are in the same atlas, you can use a single material instead.

Currently, only Mesh Renderers are batched. This means that skinned meshes, cloth, trail renderers and other types of rendering components are not batched.

Semitransparent shaders most often require objects to be rendered in back-to-front order for transparency to work. Unity first orders objects in this order, and then tries to batch them – but because the order must be strictly satisfied, this often means less batching can be achieved than with opaque objects.

Manually combining objects that are close to each other might be a very good alternative to draw call batching. For example, a static cupboard with lots of drawers often makes sense to just combine into a single mesh, either in a 3D modeling application or using Mesh.CombineMeshes.

 

GPU

GPU is often limited by fillrate or memory bandwidth. If running the game at a lower display resolution makes it faster then you’re most likely limited by fillrate on the GPU. Fill rate refers to the number of pixels that a video card can render or write to memory every second. It is measured in megapixels or gigapixels per second, which is obtained by multiplying the clock frequency of the graphics processing unit (GPU) by the number of raster operations (ROPs).

 

Textures – Texture Size, Compression, Atlases and MipMaps

Optimal Texture Type – PNG is the lesser of many evils. It does lossless image compression compared to lossy JPEG compression, it doesn’t do alpha as great as TGA does – but it does do compression and alpha mapping good enough to make it better than the other file types.

Texture Compression –  ETC texture compression, however doesn’t support alpha channels. If alpha then go with uncompressed.

You should always have mipmaps checked if you’re using 3D, because otherwise you get awful artifacts when the camera moves, plus it runs faster since it doesn’t have to calculate so many pixels for distant objects. Other than looking slightly blurry compared to not having mipmaps, there shouldn’t be any downsides, and the slight blurriness is more than compensated for by the lack of flickering texture artifacts. You can use trilinear filtering so that the transition between mipmap levels is smooth. If you have any serious degradation with mipmaps, that’s not normal

To create Texture Atlases use Texture Packer Tool with the standalone version

http://www.codeandweb.com/texturepacker/download

https://www.assetstore.unity3d.com/en/#!/content/8905

 

Models – Triangle Count and UV Map

  • Don’t use any more triangles than necessary
  • Try to keep the number of UV mapping seams and hard edges (doubled-up vertices) as low as possible

You should use only a single skinned mesh renderer for each character. Unity optimizes animation using visibility culling and bounding volume updates and these optimizations are only activated if you use one animation component and one skinned mesh renderer in conjunction. The rendering time for a model could roughly double as a result of using two skinned meshes in place of a single mesh and there is seldom any practical advantage in using multiple meshes.

When animating use as few bones as possible

A bone hierarchy in a typical desktop game uses somewhere between fifteen and sixty bones. The fewer bones you use, the better the performance will be. You can achieve very good quality on desktop platforms and fairly good quality on mobile platforms with about thirty bones. Ideally, keep the number below thirty for mobile devices and don’t go too far above thirty for desktop games.

Use as few materials as possible

You should also keep the number of materials on each mesh as low as possible. The only reason why you might want to have more than one material on a character is that you need to use different shaders for different parts (eg, a special shader for the eyes). However, two or three materials per character should be sufficient in almost all cases.

 

Culling and LOD (Level of Detail)

Occlusion Culling is a feature that disables rendering of objects when they are not currently seen by the camera because they are obscured (occluded) by other objects. This does not happen automatically in 3D computer graphics since most of the time objects farthest away from the camera are drawn first and closer objects are drawn over the top of them (this is called “overdraw”). Occlusion Culling is different from Frustum Culling. Frustum Culling only disables the renderers for objects that are outside the camera’s viewing area but does not disable anything hidden from view by overdraw. Note that when you use Occlusion Culling you will still benefit from Frustum Culling.

The occlusion culling process will go through the scene using a virtual camera to build a hierarchy of potentially visible sets of objects. This data is used at runtime by each camera to identify what is visible and what is not. Equipped with this information, Unity will ensure only visible objects get sent to be rendered. This reduces the number of draw calls and increases the performance of the game.

 

Fog and Lighting Effects

The solution we came up with is the use of simple mesh faces with a transparent texture (Fog planes) instead of global fog. Once a player comes too close to a fog plane, it fades out and moreover, vertices of the fog plane are pulled away (because even a fully transparent alpha surface still consumes lot of render time).

 

Debug Performance – Rendering Statistics, and Frame Debugger

Rendering Statistics

The Game View has a Stats button in the top right corner. When the button is pressed, an overlay window is displayed which shows realtime rendering statistics, which are useful for optimizing performance. The exact statistics displayed vary according to the build target.

Frame Debugger

The Frame Debugger lets you freeze playback for a running game on a particular frame and view the individual draw calls that are used to render that frame. As well as listing the drawcalls, the debugger also lets you step through them one-by-one so you can see in great detail how the scene is constructed from its graphical elements.

 

Extra Tips

  • Set Static property on a non-moving objects to allow internal optimizations like static batching.
  • Do not use dynamic lights when it is not necessary – choose to bake lighting instead.
  • Use compressed texture formats when possible, otherwise prefer 16bit textures over 32bit.
  • Use pixel shaders or texture combiners to mix several textures instead of a multi-pass approach.
  • CG: Use half precision variables when possible.
  • Do not use Pixel Lights when it is not necessary – choose to have only a single (preferably directional) pixel light affecting your geometry.
  • Alpha blending is ruthless on mobile.
  • Use occlusion culling.
  • Use texture atlases and pay attention to texture memory.
  • Limit particle emission count, use fast mobile shaders.
  • Use lightmapping, baked shadows, and blob shadows.

 

 

 

 

 

Shader Writing for Unity

Before we begin creating our own shaders we need to understand some basics.


 

What are Shaders?

Shaders in Unity – small scripts that contain the mathematical calculations and algorithms for calculating the colour of each pixel rendered, based on the lighting input and the material configuration.
A shader is simply code, a set of instructions that will be executed on the GPU. It is a program for one of the stages of the graphics rendering pipeline. All shaders can be divided into two groups: vertices and fragment(pixel). In a nutshell shaders are special programs which represents how different materials are renderered.

What is a Material?
Materials are wrappers which contain a shader and the values for its properties. Hence, different materials can share the same shader, feeding it with different data.
Another way of describing Materials is that they are definitions of how a surface should be rendered, including references to textures used, tiling information, colour tints and more. The available options for a material depend on which shader the material is using.
In general materials are not much more than containers for shaders and textures that can be applied to 3D models. Most of the customization of materials depends on which shader is chosen for it, although all shaders have some common functionality. Basically a material determines object appearance and includes a reference to a shader that is used to render geometry or particles.
In summary a shader’s job is to take in 3D geometry, convert it to pixels and render it on the 2D screen. A shader can define a number of properties that you will use to affect what is displayed when your model is rendered – the settings of those properties when stored are a material.

What is the Graphics Pipeline
The Graphics Pipeline or Rendering Pipeline refers to the sequence of steps used to create a 2D raster representation of a 3D scene.

Input Data
Data is sent in to the pipeline in the Input Assembler and processed all the way through the stages until it is displayed as a pixel on your monitor. The data typically being a 3D model (vert position, normal direction, tangents, texture coordinates and color).
Even sprites, particles, and textures in your game world are usually rendered using vertices just like a 3D model.

What came before?
“The fixed pipeline” – Pre DirectX 8 and OpenGL Arb assembly language. Fixed way to transform pixels and vertices. Impossible for developer to change how pixels and verts were transformed
and processed after passing them to the GPU.

Stages of the Graphics Pipeline
Vertex Shader Stage
This stage is executed per vertex and is mostly used to transform the vertex, do per vertex calculations or make calculations for use later down pipeline.
Hull Shader Stage (Only used for tessellation)
Takes the vert as input control points and convert it in to control points that make up a patch (a fraction of a surface)
Domain shader stage (Only used for tessellation)
This stage calculates a vertex position of a point in the patch created by the Hull Shader
Geometry Shader Stage
A geometry shader is an optoinal program that takes the primitives (a point, line, triangle etc) as an input and can modify remove or add geometry to it.
Pixel Shader Stage
The pixel shader (also known as fragment shaders in the openGL world) is executed once per pixel giving color to a pixel. It gets its input from the earlier stages in the pipeline
and is mostly used for calculating surface properties, lighting, and post-process effects.

Optimize!
Each of the stages above is usually executed thousands of times per frame, and can be a bottleneck in the graphics pipeline. A simple cube made from triangles usually has around 36 verts. This means that the vertex shader stage will be executed 36 times every frame, and if you aim for 60 fps, this will be executed 2160 times per second. Optimize as much as you can.

Unity’s Rendering Pipeline
So with shaders we can define how our object will appear in the game world and how it will react to lighting. How these lights will react on the objects depend on the passes of the shader and which rendering path is used. The rendering path can be changed through Unity’s Player Settings. Or it can be overridden in the camera’s ‘Rendering Path’ setting in the inspector. In Unity there are 3 rendering paths: Vertex Lit,Forward Rendering and Deferred Rendering. If the graphics card can’t handle the current selected render path it will fallback and use another one. So for example if deferred rendering isn’t supported by the graphics card, Unity will automatically use Forward Rendering. If forward rendering is not supported it will change to Vertex Lit. Since all shaders are influenced by the rendering path that is set I will briefly describe what each rendering path does.

Vertex Lit
Vertex Lit is the simplest lighting mode available. It has no support for real-time shadows. It is commonly used on old computers with limited hardware. Internally it will calculate lighting from all lights at the object vertices in one pass. Since lighting is done on a per-vertex level, per-pixel effects are not supported.

Forward Rendering
Forward rendering renders each object in one or more passes, depending on the lights that affect the object. All lights are treated differently depending on the settings and intensity being set by the user. When forward rendering is used, the amount of pixel lights set from the quality menu that affect the object will be rendering using full per-pixel lighting. Additionally 4 point lights are calculated per-vertex and all other lights are computed as Spherical Harmonics which is an approximation. A light can be per-pixel lit depending on several situations. Lights with the render mode set to Not Important, are always per-vertex or spherical harmonics. Brighter lights are always calculated per-pixel also when the render mode is set to Important. Forward rendering is the default selected rendering path in Unity.

Deferred Rendering
In Deferred rendering there is no limit on the number of lights that affect an object and all lights are calculated on a per-pixel base. This means that all lights interact with normal maps etc. Lights can also have cookies and shadows. Since all lights are calculated per-pixel it works great on big polygons. Deferred rendering is only available in Unity Pro.

Creating a Shader in Unity
1.) Firstly we need a 3D model with a material on it which will use our new shader. (Add Sphere)
2.) Create Shader – surface shader
3.) Add material, Set the shader this material is using to our new shader, now set 3D model mesh renderer material to this new material.
4.) This is what our ShaderLab shader is structured like at start:
Shader “Category/ShaderName” {
     Properties{}
     SubShader {
          Pass {
             CGPROGRAM
             // your shaders here
             ENDCG
          }
     }
    SubShader {
    }
    SubShader {
    }
    SubShader {
    }
    FallBack “FallbackShaderName”
}
The category is used to place the shader in the shader dropdown, and the name is used to identify it. Next each shader can have many properties. These can be normal numbers and floats, color data or textures. ShaderLab got a way of defining these so it looks good and user friendly in the Unity inspector.
Now we need to define at least one sub shader so our object can be displayed. We can have more than one sub shader, Unity will pick the first sub shader that runs on the graphics card. Each sub shaders defines a list of rendering passes. Each pass causes the geometry to be rendered once. Generally speaking you would like to use the minimum amount of passes possible since which every added pass our performance goes down because of the object being rendered again. A pass can defined in 3 ways, a regular pass, use pass or a grab pass.
The ‘UsePass’ command is used when we want to use another pass from another shader. This can help by reducing code duplication.
The ‘GrabPass’ is a special pass. It grabs the content of the screen where the object is to be drawn into a texture. This texture can then be used for more advanced image based processing effects. A regular pass sets various states for the graphics hardware. For example we could turn on/off vertex lighting, set blending mode, or set fog parameters.
Inside each subshader, there needs to be a pass, as a shader ca be executed in multiple passes. Try ot keep the number of passes to a minimum for performance reasons but a pass will render the geometry object once and then move on to the next pass. Most shaders will only need one pass.
Your shader implementation will be inside the pass, surrounded by CGPROGRAM or GLSLPROGRAM and ENDGLSL if you want to use GLSL. Unity will cross compile CG to optimized GLSL or HLSL depending on the platform.
Then we have the fallback. If none of the shaders will work, we can fallback to another simple shader like the diffuse shader.
Here we have an example of a shader that takes in ambient light.
1.) Category and name, can be whatever you want.
Shader “UnityShaderExample/SimpleAmbientLight” 
2.) Properties, first the name of the property, then a display name that will show up in the Unity Editor, a prop type and default value
  Properties {
        _AmbientLightColor (“Ambient Light Color”, Color) = (1,1,1,1)
        _AmbientLighIntensity(“Ambient Light Intensity”, Range(0.0, 1.0)) = 1.0
    }
3.) Sub Shaders
    SubShader 
    {
4.) Passes per Sub Shader
        Pass 
        {
            CGPROGRAM
5.) Define Shader Compilation Target
#pragma target 2.0
6.) Define the name of the function that will be used as the vertex shader
#pragma vertex vertexShader 
7.)  Define name of function to be used as fragment shader
#pragma fragment fragmentShader
8.) Define our variables that the property is pointing at, these must be same as property name above
    fixed4 _AmbientLightColor; 
            float _AmbientLighIntensity;
9.) Vertex Shader
            float4 vertexShader(float4 v:POSITION) : SV_POSITION
            {
                return mul(UNITY_MATRIX_MVP, v);
            }
10.) Pixel Shader
            fixed4 fragmentShader() : SV_Target
            {
                return _AmbientLightColor * _AmbientLighIntensity;
            }
            ENDCG
        }
    }
}

What is this Shader doing?
The Vertex Shader
The Vertex Shader is doing one thing only, and that is a matrix calculation. This function takes one input, and that is the vertex position only, and it got one output, the transformed position of the vertex (SV_POSITION) in screen space, the position of the vertex on the screen, stored by the return value of this function. This value is obtained by multiplying the vertex position (currently in local space) with the Model, View and Projection matrices easily obtained by Unity’s’ built-in state variable.
This is done to position the vertices at the correct place on your monitor, based on where the camera is (view) and the projection.
The SV_POSITION is a semantic as is used to pass data between different shader stages in the programmable pipeline. The SV_POSITION is interpreted by the rasterizer stage. Think if this as one of many registers on the GPU you can store values in. This semantic can store a vector value (XYZW), and since it is stored in SV_POSITION, the GPU knows that the intended use for this data is for positioning.

The Pixel Shader
This is where all the coloring is happening, and our algorithm is implemented. This algorithm doesn’t need any input as we won’t do any advanced lighting calculations yet (we will learn this in the next tutorial). The output is the RGBA value of our pixel color stored in SV_Target (a render target, our final output).

Unity Shaders Reference Material
This table of mathematical functions from the Nvidia Developer Zone is a great help