CVE-2022-21972 is a Windows VPN Use after Free (UaF) vulnerability that was discovered through reverse engineering the raspptp.sys kernel driver. The vulnerability is a race condition issue and can be reliably triggered through sending crafted input to a vulnerable server. The vulnerability can be be used to corrupt memory and could be used to gain kernel Remote Code Execution (RCE) or Local Privilege Escalation (LPE) on a target system.
Affected Versions
The vulnerability affects most versions of Windows Server and Windows Desktop since Windows Server 2008 and Windows 7 respectively. To see a full list of affected Windows versions check the official disclosure post on MSRC:
https://msrc.microsoft.com/update-guide/vulnerability/CVE-2022-21972
The vulnerable code is present on both server and desktop distributions, however due to configuration differences, only the server deployment is exploitable.
Overview
This vulnerability is based heavily on how socket object life cycles are managed by the raspptp.sys driver. In order to understand the vulnerability we must first understand some of the basics in the kernel driver interacts with sockets to implement network functionality.
Sockets In The Windows Kernel – Winsock Kernel (WSK)
WSK is the name of the Windows socket API that can be used by drivers to create and use sockets directly from the kernel. Head over to https://docs.microsoft.com/en-us/windows-hardware/drivers/network/winsock-kernel-overview to see an overview of the system.
The way in which the WSK API is usually used is through a set of event driven call back functions. Effectively, once a socket is set up, an application can provide a dispatch table containing a set of function pointers to be called for socket related events. In order for an application to be able to maintain its own state through these callbacks, a context structure is also provided by the driver to be given to each callback so that state can be tracked for the connection throughout its life-cycle.
raspptp.sys and WSK
Now that we understand the basics of how sockets are interacted with in the kernel, let’s look at how the raspptp.sys driver uses WSK to implement the PPTP protocol.
The PPTP protocol specifies two socket connections; a TCP socket used for managing a VPN connection and a GRE (Generic Routing Encapsulation) socket used for sending and receiving the VPN network data. The TCP socket is the only one we care about for triggering this issue, so lets break down the life cycle of how raspptp.sys handles these connections with WSK
- A new listening socket is created by the
WskOpenSocket
function in raspptp.sys. This function is passed aWSK_CLIENT_LISTEN_DISPATCH
dispatch table with theWskConnAcceptEvent
function specified as theWskAcceptEven
handler. This is the callback that handles a socket accept event, aka new incoming connection. - When a new client connects to the server the
WskConnAcceptEvent
function is called. This function allocates a new context structure for the new client socket and registers aWSK_CLIENT_CONNECTION_DISPATCH
dispatch table with all event callback functions specified. These areWskConnReceiveEvent
,WskConnDisconnectEvent
andWskConnSendBacklogEvent
for receive, disconnect and send events respectively. - Once the accept event is fully resolved,
WskAcceptCompletion
is called and a callback is triggered (CtlConnectQueryCallback
) which completes initialisation of the PPTP Control connection and creates a context structure specifically for tracking the state of the clients PPTP control connection. This is the main object which we care about for this vulnerability.
The PPTP Control connection context structure is allocated by the CtlAlloc
function. Some abbreviated pseudo code for this function is:
The important parts of this structure to note are the CtlCtxReferenceCount
and CtlWaitTimeoutNdisTimerHandle
structure members. This new context structure is stored on the socket context for the new client socket and can then be referenced for all of the events relating to the socket it binds to.
The only section of the socket context structure that we then care about are the following fields:
PptpCtlCtx
– The PPTP specific context structure for the control connection.CtlReceiveCallback
– The PPTP control connection receive callback.CtlDisconnectCallback
– The PPTP control connection disconnect callback.CtlConnectQueryCallback
– The PPTP control connection query (used to get client information on a new connection being complete) callback.
raspptp.sys Object Life Cycles
The final bit of background information we need to understand before we delve into the vulnerability is the way that raspptp keeps these context structures alive for a given socket. In the case of the PptpCtlCtx
structure, both the client socket and the PptpCtlCtx
structure have a reference count.
This reference count is intended to be incremented every time a reference to either object is created. These are initially set to 1
and when decremented to 0
the objects are freed by calling a free callback stored within each structure. This obviously only works if the code remembers to increment and decrement the reference counts properly and correctly lock access across multiple threads when handling the respective structures.
Within raspptp.sys, the code that performs the reference increment and de-increment functionality usually looks like this:
As you may have guessed at this point, the vulnerability we’re looking at is indeed due to incorrect handling of these reference counts and their respective locks, so now that we have covered the background stuff let’s jump into the juicy details!
The Vulnerability
The first part of our use after free vulnerability is in the code that handles receiving PPTP control data for a client connection. When new data is received by raspptp.sys the WSK layer will dispatch a call the the appropriate event callback. raspptp.sys registers a generic callback for all sockets called ReceiveData
. This function parses the incoming data structures from WSK and forwards on the incoming data to the client sockets contexts own receive data call back. For a PPTP control connection, this callback is the CtlReceiveCallback
function.
The section of the ReceiveData
function that calls this callback has the following pseudo code. This snippet includes all the locking and reference increments that are used to protect the code against multi threaded access issues…
the CtlReceiveCallback
function has the following pseudo code:
The CtlpEngine
function is the state machine responsible for parsing the incoming PPTP control data. Now there is one very important piece of code that is missing from these two sections and that is any form of reference count increment or locking for the PptpCtlCtx
object!
Neither of the callback handlers actually increment the reference count for the PptpCtlCtx
or attempt to lock access to signify that it is in use; this is potentially a vulnerability because if at any point the reference count was to be decremented then the object would be freed! However, if this is so bad, why isnt every PPTP server just crashing all the time? The answer to this question is that the CtlpEngine
function actually uses the reference count correctly.
This is where things get confusing. Assuming that the raspptp.sys driver was completely single threaded, this implementation would be 100% safe as no part of the receive pipeline for the control connection decrements the object reference count without first performing an increment to account for it. In reality however, raspptp.sys is not a single threaded driver. Looking back at the initialization of the PptpCtlCtx
object, there is one part of particular interest.
Here we can see the allocation of an Ndis
timer object. The actual implementation of these timers isn’t important, but what is important is that these timers dispatch there callbacks on a separate thread to that of which WSK dispatches the ReceiveData
callback. Another interesting point is that both use the PptpCtlCtx
structure as their context structure.
So what does this timer callback do and when does it happen? The code that sets the timer is as follows:
We can see that a 30 second timer trigger is set and when this 30 seconds is up, the CtlpWaitTimeout
callback is called. This 30 second timer can be canceled but this is only done when a client performs a PPTP control handshake with the server, so assuming we never send a valid handshake after 30 seconds the callback will be dispatched. But what does this do?
The CtlpWaitTimeout
function is used to handle the timer callback and it has the following pseudo code:
As we can see the function mainly serves to call the eerily named CtlpDeathTimeout
function, which has the following pseudo code:
This is where things get even more interesting. The CtlCleanup
function is the function responsible for starting the process of tearing down the PPTP control connection. This is done in two steps. First, the state of the Control connection is set to CtlStateUnknown
which means that the CtlpEngine
function will be prevented from processing any further control connection data (kind of). The second step is to push a task to run the similarly named CtlpCleanup
function onto a background worker thread which belongs to the raspptp.sys driver.
The end of the CtlpCleanup
function contains the following code that will be very useful for us being able to trigger a use after free as it will always run on a different thread to the CtlpEngine
function.
It decrements the reference count on the PptpCtlCtx
object and even better is that no part of this timeout pipeline increments the reference count in a way that would prevent the free function from being called!
So, theoretically, all we need to do is find some way of getting the CtlpCleanup
and CtlpEngine
function to run at the same time on seperate threads and we will be able to cause a Use after Free!
However, before we celebrate too early, we should take a look at the function that actually frees the PptpCtlCtx
function because it is yet another callback. The fpCtlCtxFreeFn
property is a callback function pointer to the CtlFree
function. This function does a decent amount of tear down as well but the bits we care about are the following lines
Now there is more added complication in this code that is going to make things a little more difficult. The call to WskCloseSocketContextAndFreeSocket
actually closes the client socket before freeing the PptpCtlCtx
structure. This means that at the point the PptpCtlCtx
structure is freed, we will no longer be able to send new data to the socket and trigger any more calls into CtlpEngine
. However, this doesn’t mean that we can’t trigger the vulnerability, since if data is already being processed by CtlpEngine
when the socket is closed we simply need to hope the thread stays in the function long enough for the free to occur in CtlFree
and boom – we have a UAF.
Now that we have a good old fashioned kernel race condition, let’s take a look at how we can try to trigger it!
The Race Condition
Like any good race condition, this one contains a lot of moving parts and added complication which make triggering it a non trivial task, but it’s still possible! Let’s take a look at what we need to happen.
- 30 second timeout is triggered and eventually runs
CtlCleanup
, pushing aCtlpCleanup
task onto a background worker thread queue. - Background worker thread wakes up and starts processing the
CtlpCleanup
task from its task queue. CtlpEngine
starts or is currently processing data on a WSK dispatch thread when theCtlpCleanup
function frees the underlyingPptpCtlCtx
structure from the worker thread!- Bad things happen…
Triggering the Race Condition
The main parts of this race condition to consider are what are the limits on the data can we send to the server to spend as much time as possible in CtlpEngine
parsing loop and can we do this without cancelling the timeout?
Thankfully as previously mentioned the only way to cancel the timeout is to perform a PPTP control connection handshake, which technically means we can get the CtlpEngine
function to process any other part of the control connection, as long as we don’t start the handshake. However the state machine within CtlpEngine
needs the handshake to take place to enable any other part of the control connection!
There is one part of the CtlpEngine
state machine that can still be partially validly hit (without triggering an error) before the handshake has taken place. This is the EchoRequest
control message type. Now we can’t actually enter the proper handling of the message type before the handshake has taken place but what we can do is use it to iterate through all the sent data in the parsing loop without triggering a parsing error. This effectively forms a way of us spinning inside the CtlpEngine
function without cancelling the timeout which is exactly what we want. Even better is that this remains true when the CtlStateUnknown
state is set by the CtlCleanup
function.
Unfortunately the maximum amount of data we can process in one WSK receive data event callback trigger is limited to the maximum data that can be received in one TCP packet. In theory this is 65,535 bytes but due to the size limitation of Ethernet frames to 1,500 bytes we can only send ~1,450 bytes (1,500 minus the headers of the other network layer frames) of PPTP control messages in a single request. This works out at around 90 EchoRequest
messages per callback event trigger. For a modern CPU this is not a lot to churn through before hopping out of the CtlpEngine
function.
Another thing to consider is how do we know if the race condition was successful or a failure? Thankfully in this regard the server socket being closed on timeout works in our favour as this will cause a socket exception on the client if we attempt to send any more data once the server closes the socket. Once the socket is closed we know that the race is finished but we don’t necessarily know if we did or didn’t win the race.
With these considerations in place, how do we trigger the vulnerability? It actually becomes a simple proof of concept. Effectively we just continually send EchoRequest
PPTP control frames in 90 frame bursts to a server until the timeout event occurs and then we hope that we’ve won the race.
We won’t be releasing the PoC code until people have had a chance to patch things up but when the PoC is successful we may see something like this on our target server:
Because the PptpCtlCtx
structure is de-initialised there are a lot of pointers and properties that contain invalid values that, if used at different parts of the Receive Event handling code, will cause crashes in non fun ways like Null pointer deference’s. This is actually what happened in the Blue Screen of Death above, but the CtlpEngine
function did still process a freed PptpCtlCtx
structure.
Can we use this vulnerability for anything more than a simple BSOD?
Exploitation
Due to the state of mitigation in the Windows kernel against memory corruption exploits and the difficult nature of this race condition, achieving useful exploitation of the vulnerability is not going to be easy, especially if seeking to obtain Remote Code Execution (RCE). However, this does not mean it is not possible to do so.
Exploitability – The Freed Memory
In order to asses the exploitability of the vulnerability, we need to look at what our freed memory contains and where about it is in the Windows kernel heap. In windbg we can use the !pool
command to get some information on the allocated chunk that will be freed in our UaF issue.
ffff828b17e50d20 size: 2a0 previous size: 0 (Allocated) *PTPT
We can see here that the size of the freed memory block is 0x2a0
or 672 bytes. This is important as it puts us in the allocation size range for the variable size kernel heap segment. This heap segment is fairly nice for use after free exploitation as the variable size heap also maintains a free list of chunks that have been freed and their sizes. When a new chunk is allocated this free list is searched and if a chunk of an exact or greater size match is found it will be used for the new allocation. Since this is the kernel, any other part of the kernel that allocates non paged pool memory allocations of this or a similar size could end up using this freed slot as well.
So, what do we need in order to start exploiting this issue? ideally we want to find some allocated object in the kernel that we can control the contents of and allocate at 0x2a0
bytes in size. This would allow us to create a fake PptpCtlCtx
object, which we can then use to control the CtlpEngine
state machine code. Finding an exact size match allocation isn’t the only way we could groom the heap for a potential exploit but it would certainly be the most reliable method.
If we can take control of a PptpCtlCtx
object what can we do? One of the most powerful bits of this vulnerability from an exploit development perspective are the callback functions located inside the PptpCtlCtx
structure. Usually a mitigation called Control Flow Guard (CFG) or Xtended Flow Guard (XFG) would prevent us from being able to corrupt and use these callback pointers with an arbitrary executable kernel address. However CFG and XFG are not enabled for the raspptp.sys driver (as of writing this blog) meaning we can point execution to any instruction located in the kernel. This gives us plenty of things to abuse for exploitation purposes. A caveat to this is that we are limited to the number of these gadgets we can use in one trigger of the vulnerability, meaning we would likely need to trigger the vulnerability multiple times with different gadgets to achieve a full exploit or at least that’s the case on a modern Windows kernel.
Exploitability – Threads
Allocating an object to fill our freed slot and take control of kernel execution through a fake PptpCtlCtx
object sounds great, but one additional restriction on the way in which we do this is that we only have access to CtlpEngine
using the freed object for a short period of CPU time. We can’t use the same thread that is processing the CtlpEngine
to allocate objects to fill the empty slot, and if we do it would be after the thread has returned from CtlpEngine
. At this point the vulnerability will no longer be exploitable.
What this means is that we would need the fake object allocations to be happening in a separate thread in the hope that we can get one of our fake objects allocated and populated with our fake object contents while the vulnerable kernel thread is still in CtlpEngine
, allowing us to then start doing bad things with the state machine. All of this sounds like a lot to try and get done in relatively small CPU windows, but it is possible that it could be achieved. The issue with any exploit attempting to do this is going to be reliability, since there is a fairly high chance a failed exploit would crash the target machine and retrying the exploit would be a slow and easily detectable process.
Exploitability – Local Privilege Escalation vs Remote Code Execution
The ability to exploit this issue for LPE is much more likely to be successful over the affected Windows kernel versions than exploiting it for RCE. This is largely due to the fact that an RCE exploit will need to be able to first leak information about the kernel using either this vulnerability or another one before any of the potential callback corruption uses would be viable. There are also far fewer parts of the kernel accessible remotely, meaning finding a way of spraying a fake PptpCtlCtx
object into the kernel heap remotely is going to be significantly harder to achieve.
Another reason that LPE is a much more viable exploit route is that the localhost socket or 127.0.0.1 allows for far more data than the ethernet frame capped 1,500 bytes we get remotely, to be processed by each WSK Receive event callback. This significantly increases most of the variables for achieving successful exploitation!
Conclusion
Wormable Kernel Remote Code Execution vulnerabilities are the holy grail of severity in modern operating systems. With great power however comes great responsibility. While this vulnerability could be catastrophic in its impact ,the skill to pull off a successful and undetected exploit is not to be underestimated. Memory corruption continues to become a harder and harder art form to master, however there are definitely those out there with the ability and determination to achieve the full potential of this vulnerability. For these reasons CVE-2022-21972 is a vulnerability that represents a very real threat to internet connected Microsoft based VPN infrastructure. We recommend that this vulnerability is patched with priority in all environments.
Timeline
- Vulnerability Reported To Microsoft – 29 Oct 2021
- Vulnerability Acknowledged – 29 Oct 2021
- Vulnerability Confirmed – 11 November 2021
- Patch Release Date Confirmed – 12 November 2021
- Patch Release – 10 May 2022