Ok, no, no it’s not. I mean, just look at all of that new code you’ll need to run. But, this is where the fun part of development comes in.

Could we, at Unity, have made it just as fast as before? Spoiler alert: We could have, but there are three important things our engineers have done to get back performance while maintaining safety.

First, observe that AnimationEventBlittable is a struct, not a class (it was a class in the first draft of this code). It is allocated on the stack (not via the GC), which makes it low-cost. Code generation from Mono, Il2CPP, and CoreCLR for the FromAnimationEvent method is great, and the team has yet to find any measurable overhead for the method itself.

Of course, the FromAnimationEvent method does call out to the GCHandle.Alloc method (four times!), and that method is not cheap. All of the .NET virtual machine implementations need to do non-trivial work to allocate GCHandles and update internal data structures to track them. While profiling these changes, we realized something important – the GCHandles don’t live long. Each handle is only needed while the native code is executing, so you can easily reuse them. This implementation pools a small number of GCHandles, and reuses them for each call to FromAnimationEvent. This means the cost to allocate these GCHandles goes to nearly zero for realistic use cases, where FromAnimationEvent will be called many times.

There is also a hidden cost that has yet to be shown in the native code. Recall the earlier discussion around my nonchalant mention of C++ code needing to “unwrap those GCHandles.” Well, it turns out that CoreCLR makes this process really fast. To obtain the target of a GCHandle (i.e., unwrap it), CoreCLR charges you the same cost as a simple pointer dereference – that’s it!

However, our benchmarks were significantly slower with Mono and IL2CPP for the same code… so what gives? We found that GCHandle unwrapping was actually rather expensive for Mono and IL2CPP. Thankfully, this seldom happens on critical paths, but with the changes that have been made, it is now a factor to keep in mind. As such, we’ve implemented the same algorithm that CoreCLR uses in Mono and IL2CPP.

With all of the changes that have been laid out, our internal benchmarks for AnimationEvent – including both the AddEvent method and other public API methods – show no difference between the previous code and the new code. Sweet!

Source: Unity Technologies Blog