Built into the Microsoft C++ compiler and runtime, CastGuard is a pivotal security enhancement designed to significantly reduce the number of exploitable Type Confusion vulnerabilities in applications. Joe Bialek gave a talk about CastGuard at BHUSA2022 (slides) that explains the overall goals of the feature, how it was developed, and how it works at a high level. This article offers a journey into my discovery CastGuard – delving into a technical evaluation of its mechanics, exploring illustrative examples, and highlighting relevant compiler flags.
While looking into new control flow guard feature support in the Windows PE load config directory a while back, I stumbled across a newly added field called CastGuardOsDeterminedFailureMode
, added in Windows 21H2. I had never heard of CastGuard before so, naturally, I wondered what it did.
To give a brief overview, CastGuard is intended to solve Type Confusion problems such as the following:
In this application, SayMeow
will print “Woof!”, in a classic example of type confusion through an illegal downcast. The compiler is unable to infer that the Dog
type being passed to SayMeow
is a problem, because the function takes an Animal
type, so no contract is broken there. The cast within SayMeow
is also valid from the compiler’s perspective, because a Cat
is an Animal, so it is entirely valid to downcast if you, the developer who wrote the code, know that the object being passed is in fact a Cat
or a descendent type thereof. This is why this bug class is so pernicious – it’s easy to violate the type contract, especially in complex codebases.
Ordinarily this can be solved with dynamic_cast
and RTTI, which tags each object with type information, but this has its own problems (see the talk linked above for full details) and it’s non-trivial to replace static_cast
with dynamic_cast
across a large codebase, especially in the case where your code has to coexist with 3rd party / user code (e.g. in the case of runtime libraries) where you can’t even enforce that RTTI is enabled. Furthermore, RTTI causes significant codegen bloat and performance penalties – a static cast is free (you’re interpreting the memory natively as if it were the type being cast to) whereas a dynamic cast with RTTI requires a handful of stores, loads, jumps, and calls on every cast.
CastGuard acts as an additional layer of protection against type confusion, or, more specifically, against cases where type confusion is the first-order memory vulnerability; it is not designed to protect against cases where an additional memory corruption issue is leveraged first. Its goal is to offer this protection with minimal codegen bloat and performance overhead, without modifying the (near-universally relied upon) ABI for C++ objects.
CastGuard leverages the fact that vftables (aka vtables) uniquely identify types. As long as the types on the left- and right-hand side of the cast have at least one vftable, and both types were declared within the binary being complied, the object types can be consistently and uniquely determined by their vftable address (with one caveat: comdat folding for identical vftables must be disabled in the linker). This allows the vftable pointer to be used as a unique type identifier on each object, avoiding the need for RTTI bloat and expensive runtime checks. Since an object’s vftable pointer is almost certainly being accessed around the same time as any cast involving that object, the memory being accessed is probably already in cache (or is otherwise about to benefit from being cached) so the performance impact of accessing that data is negligible.
Initially, Microsoft explored the idea of creating bitmaps that describe which types’ vftables are compatible with each other, so that each type that was observed to be down-cast to had a bitvector that described which of the other vftables were valid for casting. However, this turns out to be inefficient in a bunch of ways, and they came up with a much more elegant solution.
The type vftables are enumerated during link time code generation (LTCG). A type inheritance hierarchy is produced, and that hierarchy is flattened into a top-down depth-first list of vftables. These are stored contiguously in memory.
To use the above code as an example, if we assume that each vftable is 8 bytes in size, the CastGuard section would end up looking like this:
Offset | Name |
0x00 | __CastGuardVftableStart |
0x08 | Organism::$vftable@ |
0x10 | Animal::$vftable@ |
0x18 | Dog::$vftable@ |
0x20 | Cat::$vftable@ |
0x28 | __CastGuardVftableEnd |
Notice that parent types are always before child types in the table. Siblings can be in any order, but a sibling’s descendants would come immediately after it. For example, if we added a WolfHound
class that inherited from Dog
, its vftable would appear between Dog::$vftable@
and Cat::$vftable@
in the above table.
At any given static_cast<T>
site the compiler knows how many other types inherit from T
. Given that child types appear sequentially after the parent type in the CastGuard section, the compiler knows that there are a certain number of child type vftables appearing immediately afterward.
For example, Animal
has two child types – Cat
and Dog
– and both of these types are allowed to be cast to Animal
. So, if you do static_cast<Animal>(foo)
, CastGuard checks to see if foo’s vftable pointer lands within two vftable slots downward of Animal::$vftable@
, which in this case would be any offset between 0x10 and 0x20 inclusively, i.e. the vftables of Animal
, Dog
, and Cat
. These are all valid. If you try to cast an Organism
object to the Animal
type, CastGuard’s check detects this as being invalid because the Organism
object vftable pointer is to offset 0x08, which is outside the valid range.
Looking back again at the example code, the cast being done is static_cast<Cat>
on a Dog
object. The Cat
type has no descendants, so the range size of valid vftables is zero. The Cat
type’s vftable, Cat::$vftable@
, is at offset 0x20
, whereas the Dog
object vftable pointer points to offset 0x18, so it therefore fails the CastGuard range check. Casting a Cat
object to the Cat
type works, on the other hand, because a Cat
object’s vftable pointer points to 0x20, which is within a 0 byte range of Cat::$vftable@
.
This check is optimised even further by computing the valid range size at compile time, instead of storing the count of descendent types and multiplying that by the CastGuard vftable alignment size on every check. At each static cast site, the compiler simply subtracts the left-hand side type’s vftable address from the right-hand side object’s vftable pointer, and checks to see if it is less than or equal to the valid range. This not only reduces the computational complexity of each check, but it also means that the alignment of vftables within the CastGuard section can be arbitrarily decided by the linker on a per-build basis, based on the maximum vftable size being stored, without needing to include any additional metadata or codegen. In fact, the vftables don’t even need to be padded to all have the same alignment, as long as the compiler computes the valid range based on the sum of the vftable sizes of the child types.
I mentioned earlier that CastGuard only protects casts for types within the same binary. The CastGuard range check described above will always fail if a type from another binary is cast to a type from the current binary, because the vftable pointers will be out of range. This is obviously unacceptable – it’d break almost every program that uses types from a DLL – so CastGuard includes an extra compatibility check. This is where the __CastGuardVftableStart
and __CastGuardVftableEnd
symbols come in. If the vftable for an object being cast lands outside of the CastGuard section range, the check fails open and allows the cast because it is outside the scope of protection offered by the CastGuard feature.
This approach is much faster than dynamic casting with RTTI and adds very little extra bloat in the compiled binary (caveat: see the talk for details on where they had to optimise this a bit further for things like CRTP). As such, CastGuard is suitable to be enabled everywhere, including in performance-critical paths where dynamic casting would be far too expensive.
Pretty cool, right? I thought so too.
Let’s now go back to the original reason for me discovering CastGuard in the first place: the CastGuardOsDeterminedFailureMode
field that was added to the PE load config structure in 21H2. It’s pretty clear that this field has something to do with CastGuard (the name rather gives it away) but it isn’t clear what the field actually does.
My first approach to figure this out was to enumerate every single PE file on my computer (and a Windows 11 Pro VM), parse it, and look for nonzero values in the CastGuardOsDeterminedFailureMode
field. I found a bunch! This field is documented as containing a virtual address (VA). I wrote some code to parse out the CastGuardOsDeterminedFailureMode
field from the load config, attempt to resolve the VA to an offset, then read the data at that offset.
I found three overall classes of PE file through this scan method:
- PE files where the
CastGuardOsDeterminedFailureMode
field is zero. - PE files where the
CastGuardOsDeterminedFailureMode
field contains a valid VA which points to eight zero bytes in the.rdata
section. - PE files where the
CastGuardOsDeterminedFailureMode
field contains what looks like a valid VA, but is in fact an invalid VA.
The third type of result is a bit confusing. The VA looks valid at first glance – it starts with the same few nibbles as other valid VAs – but it doesn’t point within any of the sections. At first I thought my VA translation code was broken, but I confirmed that the VAs were indeed invalid when translated by other tools such as CFF Explorer and PE-Bear. We’ll come back to this later.
I loaded a few of the binaries with valid VAs into Ghidra and applied debugging symbols. I found that these binaries contained a symbol named __castguard_check_failure_os_handled_fptr
in the .rdata
section, and that the CastGuardOsDeterminedFailureMode
VA pointed to the address of this symbol. I additionally found that the binaries included a fast-fail code called FAST_FAIL_CAST_GUARD (65)
which is used when the process fast-fails due to a CastGuard range check failure. However, I couldn’t find the __CastGuardVftableStart
or __CastGuardVftableEnd
symbols for the CastGuard vftable region that had been mentioned in Joe’s talk.
Searching for these symbol names online led me to pieces of vcruntime source code included in SDKs as part of Visual Studio. The relevant source file is guard_support.c
and it can be found in the following path:
[VisualStudio]/VC/Tools/MSVC/[version]/crt/src/vcruntime/guard_support.c
It appears that the CastGuard feature was added somewhere around version 14.28.29333, and minor changes have been made in later versions.
Comments in this file explain how the table alignment works. As of 14.34.31933, the start of the CastGuard section is aligned to a size of 16*sizeof(void*)
, i.e. 128-byte aligned on 64-bit platforms and 64-byte aligned on 32-bit platforms.
There are three parts to the table, and they are allocated as .rdata subsections: .rdata$CastGuardVftablesA
, .rdata$CastGuardVftablesB
, and .rdata$CastGuardVftablesC
.
Parts A and C store the __CastGuardVftablesStart
and __CastGuardVftablesEnd
symbols. Both of these are defined as a CastGuardVftables
struct type that contains a padding field of the alignment size. This means that the first vftable in the CastGuard section is placed at __CastGuardVftablesStart + sizeof(struct CastGuardVftables)
.
Part B is generated automatically by the compiler. It contains the vftables, and these are automatically aligned to whatever size makes sense during compilation. If no vftables are generated, part B is essentially missing, and you end up with __CastGuardVftablesEnd
placed 64/128 bytes after __CastGuardVftablesStart
.
The guard_support.c
code does not contain the CastGuard checks themselves; these are emitted as part of the compiler itself rather than being represented in a public source file. However, guard_support.c
does contain the failure routines and the AppCompat check routine.
When a CastGuard check at a static_cast
site fails, it calls into one of four failure routines:
__castguard_check_failure_nop
– does nothing.__castguard_check_failure_debugbreak
– raises a breakpoint by calling__debugbreak()
__castguard_check_failure_fastfail
– fast-fails using__fastfail(FAST_FAIL_CAST_GUARD)
__castguard_check_failure_os_handled
– calls an OS handler function
Rather than calling the AppCompat check routine at every static_cast site, the check is instead deferred until a CastGuard check fails. Each of the check failure routines above, with the exception of nop, first calls into the AppCompat check routine to see if the failure should be ignored.
The AppCompat check routine is implemented in __castguard_compat_check
, and it looks like this:
This routine is responsible for checking whether the right-hand side (object being cast) vftable pointer is pointing somewhere between the first vftable in the CastGuard section and __CastGuardVftablesEnd
. If it is, the AppCompat check returns true (i.e. this is a valid case that CastGuard should protect against), otherwise it returns false.
In the case of __castguard_check_failure_os_handled
, the handler code looks like this:
If the AppCompat routine says that the failed check should be honoured, it calls an OS handler wrapper. The wrapper function looks like this:
The __castguard_check_failure_os_handled_fptr
function pointer being referred to here is the symbol that CastGuardOsDeterminedFailureMode
points to in the load config table – the exact one I was trying to figure out the purpose of!
That function pointer is defined as:
The declspec
is important here – it places __castguard_check_failure_os_handled_fptr
in the same section as CFG/XFG pointers, which means (as the code comment above points out) that the OS handler function pointer is protected in the same way as the CFG/XFG pointers. Control flow from the CastGuard check site to the check failure function to the AppCompat check function can be protected by control flow guard, but flow from the failure routine to the OS handled function pointer cannot because its value is (presumably always) unknown at compile time. This is why the wrapper function above is required, with guard(nocf)
applied – it disables CFG for the flow from the check failure function to the OS handler function, since CFG would likely disallow the indirect call, but since the pointer itself is protected it doesn’t actually matter.
This indicates that CastGuardOsDeterminedFailureMode
is intended to be used to specify the location of the __castguard_check_failure_os_handled_fptr
symbol, which in turn points to an OS handler function that is called when a check failure occurs.
None of this is documented but, given that Joe’s BHUSA2022 talk included an anecdote about Microsoft starting the CastGuard feature off in a report-only mode, I can only presume that CastGuardOsDeterminedFailureMode
was designed to provide the binaries with this reporting feature.
At this point we still have a couple of open questions, though. First, how does the compiler pick between the four different failure handlers? Second, how are the CastGuard checks themselves implemented? And third, why do a lot of the binaries have invalid VAs in CastGuardOsDeterminedFailureMode
?
To answer the first question, we have to take a look at c2.dll
in the MSVC compiler, which is where CastGuard is implemented under the hood. This DLL contains a class called CastGuard which, unsurprisingly, is responsible for most of the heavy lifting. One of the functions in this class, called InsertCastGuardCompatCheck
, refers to a field of some unknown object in thread-local storage and picks which of the four check functions to insert a call to based on that value:
Value | Call |
1 | __castguard_check_failure_fastfail |
2 | __castguard_check_failure_debugbreak |
3 | __castguard_check_failure_os_handled |
4 | __castguard_check_failure_nop |
From prior reverse engineering expeditions into the MSVC compiler, I remembered that config flags passed to the compiler are typically stored in a big structure in TLS. From there I was able to find the hidden compiler flags that enable CastGuard and control its behaviour.
Hidden flags can be passed to each stage of the compiler using a special /d
command line argument. The format of the argument is /dN
… where N specifies which DLL the hidden flag should be passed to (1
for the front-end compiler, c1.dll
, or 2
for the code generator, c2.dll
). The flag is then appended to the argument.
The known hidden compiler flags for CastGuard are:
Flag | Description |
/d2CastGuard- | Disables CastGuard. |
/d2CastGuard | Enables CastGuard. |
/d2CastGuardFailureMode:fastfail | Sets the failure mode to fast-fail. |
/d2CastGuardFailureMode:nop | Sets the failure mode to nop. |
/d2CastGuardFailureMode:os_handled | Sets the failure mode to OS handled. |
/d2CastGuardFailureMode:debugbreak | Sets the failure mode to debug break. |
/d2CastGuardOption:dump_layout_info | Dumps the CastGuard layout info in the build output. |
/d2CastGuardOption:force_type_system | Forces type system analysis, even if the binary is too big for fast analysis. This is intended to be used with the linker, rather than the compiler, so warning C5066 is raised if you pass it. |
/d2CastGuardTestFlags:# | Sets various test flags for the CastGuard implementation, as a bitwise numeric value. Hex numbers are valid. |
So now we know how the different failure modes are set: at build time, with a compiler flag.
If we rebuild the example code with some extra compiler flags, we can try CastGuard out:
/d2CastGuard /d2CastGuardFailureMode:debugbreak /d2CastGuardOption:dump_layout_info
The compiler then prints layout information for CastGuard:
When executed, the static cast in SayMeow
has a CastGuard check applied and raises a debug break in __castguard_check_failure_debugbreak
.
We can also learn a little more about CastGuard from the warnings and errors that are known to be associated with it, by looking at the string tables in the compiler binaries:
- C5064: “CastGuard has been disabled because the binary is too big for fast type system analysis and compiler throughput will be degraded. To override this behavior and force enable the type system so CastGuard can be used, specify the flag /d2:-CastGuardOption:force_type_system to the linker.”
- C5065: “The CastGuard subsystem could not be enabled.”
- C5066: “CastGuardOption:force_type_system should not be passed to the compiler, it should only be passed to the linker via /d2:-CastGuardOption:force_type_system. Passing this flag to the compiler directly will force the type system for all binaries this ltcg module is linked in to.”
- C5067: “CastGuard is not compatible with d2notypeopt”
- C5068: “CastGuard is not compatible with incremental linking”
- C5069: “CastGuard cannot initialize the type system. An object is being used that was built with a compiler that did not include the necessary vftable type information (I_VFTABLETIS) which prevents the type system from loading. Object: %s”
- C5070: “CastGuard cannot initialize the type system. An object is being used that was built with a compiler that did not include the necessary type information (I_TIS) which prevents the type system from loading. Object: %s”
- C5071: “CastGuard cannot initialize the type system. An error occurred while trying to read the type information from the debug il. Object: %s”
Digging even further into the implementation, it appears that Microsoft added a new C++ attribute called nocastguard
, which can be used to exclude a type from CastGuard checks. Based on my experimentation, this attribute is applied to types (applying the attribute to an argument or variable causes a compiler crash!) and disables checks when a static cast is performed to that type.
Changing our example code to the following causes the CastGuard check to be eliminated, and the type confusion bug returns:
If nocastguard
is applied to the Dog
or Animal
type instead, the CastGuard check returns and the type confusion bug is prevented. This indicates that, at least in this unreleased implementation, the attribute is specifically used to prevent CastGuard checks on casts to the target type.
This newly CastGuard-enabled development environment makes it easy to experiment and disassemble the binary and see what the code looks like. In the simplest version of our example program, the result is actually quite amusing: the program does nothing except initialise a Dog
object and immediately unconditionally call the failure routine in main. This is because the CastGuard check is injected into the IL during the optimisation phase. You can see this in practice: turning off optimisations causes the CastGuard pass to be skipped entirely. Since the check is part of the IL, it is subject to optimisation passes. The optimiser sees that the check essentially boils down to if (Cat::$vftable@ != Dog::$vftable@) { fail; }
, whose expression is always true, which results in the branch being taken and the entire rest of the code being eliminated. Since SayMeow
is only called once, it gets inlined, and the entire program ends up as a call to the CastGuard failure routine. This implies that it could technically be possible for a future release to identify such a scenario at build time and raise an error or warning.
To study things a little better, let’s expand the program in a way that introduces uncertainty and tricks the compiler into not optimising the routines. (Note: we can’t turn off optimisations to avoid all the inlining and elimination because that also turns off CastGuard.)
This results in a program with an entirely normal looking main function, with no references to CastGuard routines. SayMeow
looks like the following:
This is pretty much expected: *animal
dereferences the passed pointer to get to the vftable for the object, and, since the Cat
type has no descendent types, the range check just turns into a straight equality check.
To make things more interesting, let’s add a WolfHound
type that inherits from Dog
, and a function called SayWoof
that works just like SayMeow
but with a cast to Dog
instead of Cat
. We’ll also update main so that it can create an Animal
, Cat
, Dog
, or WolfHound
.
Upon building this new program, the compiler dumps the CastGuard layout:
We can see that the WolfHound
vftable is placed immediately after the Dog
vftable, and that the Dog
type is compatible with the Dog
and WolfHound
types. We can also see that the size of the range check is 0x10
, which makes sense because WolfHound
‘s vftable comes 0x10
bytes after Dog
‘s vftable.
The CastGuard check in SayWoof
now ends up looking something like this:
Let’s enumerate the possible flows here:
- If the type being passed is
Dog
, then*animal
is equal toDog::$vftable@
, which makes*animal - Dog::$vftable@
equal zero, so the check passes. - If the type being passed is
WolfHound
, then*animal
is equal toWolfHound::$vftable@
, which is positioned 0x10 bytes beforeDog::$vftable@
. As such,*animal - Dog::$vftable@
will equal 0x10, and the check passes. - If the type being passed is
Cat
, then*animal
is equal toCat::$vftable@
, which makes*animal - Dog::$vftable@
equal 0x20, and the check fails. - If the type being passed is
Animal
, then*animal
is equal toAnimal::$vftable@
. SinceAnimal::$vftable@
is positioned beforeDog::$vftable@
in the table, the result of the unsigned subtraction will wrap, causing the result to be greater than 0x10, and the check fails.
This shows CastGuard in action quite nicely!
For completeness, let’s go back and wrap up a small loose end relating to the hidden compiler flags: test flags. The /d2CastGuardTestFlags
option takes a hexadecimal number value representing a set of bitwise flags. The test flags value is written to a symbol called CastGuardTestFlags
inside c2.dll
, and this value is used in roughly ten different locations in the code as of version 14.34.31933.
In the process of reverse engineering this code, I discovered that four separate check approaches are implemented – RangeCheck
(0x01, the default), ROLCheck
(0x02), ConstantBitmapCheck
(0x03), and BitmapCheck
(0x04) – presumably following the sequence of approaches and optimisations that were mentioned in the talk.
Here’s what I was able to figure out about these flags:
Flag Value | Notes |
0x01 | Switches the check type to ROLCheck (0x02), as long as neither 0x02 nor 0x40 are also set. |
0x02 | Switches the check type to ConstantBitmapCheck (0x03), as long as 0x40 is not also set. |
0x04 | Appears to enable an alternative strategy for selecting the most appropriate vftable for a type with multiple inheritance. |
0x08 | Forces CastGuard::IsCastGuardCheckNeeded to default to true instead of false when no condition explicitly prevents a check, which appears to force the generation of CastGuard checks even if a codegen pass was not performed. |
0x10 | Forces generation of metadata for all types in the inheritance tree. Types that are never part of a cast check, either as a cast target or valid source type, do not normally end up as part of the CastGuard section. For example, Organism is ignored by CastGuard in our example programs because it never ends up being relevant at a static cast site. When this flag is enabled, all types in the inheritance tree are treated as relevant, and their vftables are placed into the CastGuard section. A type which is never part of a static cast, and whose parent and child types (if there are any) are never part of a static cast, are still kept separate and don’t end up in the CastGuard section. |
0x20 | Exact behaviour is unclear, but it seems to force the CastGuard subsystem to be enabled in a situation where error C5065 would be otherwise raised, and forces the TypeSystem::Builder::ProcessILRecord function to continue working even if an internal boolean named OneModuleEnablesCastGuard is false. |
0x40 | Switches the check type to BitmapCheck (0x04) and, if /d2CastGuardOption:dump_layout_info is also set, prints the bitmap in the build output. |
The three alternative check patterns function exactly as was explained in the BHUSA2022 talk, so I won’t go into them any further.
Unless I missed anything, we appear to be down to just one final question: why am I seeing invalid VAs in CastGuardOsDeterminedFailureMode
on a bunch of Windows executables?
At first I thought that there might be some kind of masking going on, with certain bits guaranteed to be zero in the VA due to alignment requirements, with those bit positions being reused to set or indicate the failure mode or check type. This doesn’t make much sense, though, and I can find no supporting evidence. It appears that this is a bug from an earlier implementation of CastGuard, when Microsoft were trialling rolling out notify-only protection on certain components. I couldn’t concretely confirm this theory, but I did manage to have a quick chat with someone who worked on the feature, and they were as surprised to see the invalid VAs as I was.
It takes time to get these compiler-level bug class mitigations implemented correctly. The analysis in this article was originally performed in February 2023, but CastGuard remains unofficial and undocumented as of October 2023. Given the unfathomable quantity of existing code that interacts with COM interfaces, all of which might be affected by this feature, and the politically fractious intersection between C++ language standards and implementation-specific language features, it isn’t particularly surprising that it’s taking Microsoft a while to roll this mitigation out.