The Weak Bug – Exploiting a Heap Overflow in VMware

Share this…

Introduction

In march 2017, I took part in the pwn2own contest with team Chaitin Security Research Lab. The target I was focused on was VMware Workstation Pro and we managed to get a working exploit before the contest. Unfortunately, a version of VMware was released on March 14th, the day before the contest, with a patch for the vulnerability our exploit was taking advantage of. This blog post is a narrative of our journey from finding the vulnerability to exploiting it. I would like thank @kelwin whose assistance was indispensable during the development of the exploit. I would also like to thank the ZDI folks for their recent blog post which motivated us to get off our asses and make this writeup :P.
The post is divided into three parts. First we will briefly describe the VMware RPCI gateway, next we will describe the vulnerability and finally we’ll have a look at how we were able to use this single exploit to defeat ASLR and get code execution.

The VMware RPCI

Unsurprisingly, VMware exposes a number of ways for the guest and host to communicate with each other. One of these ways is through an interface called the Backdoor. The guest is able to send commands through this interface in user mode because of an interesting design. This same interface is used (partly) by VMware Tools in order to communicate with the host. Let’s have a look at some sample code (taken from lib/backdoor/backdoorGcc64.c in open-vm-tools):

void  
Backdoor_InOut(Backdoor_proto *myBp) // IN/OUT  
{
   uint64 dummy;

   __asm__ __volatile__(
#ifdef __APPLE__
        /*
         * Save %rbx on the stack because the Mac OS GCC doesn't want us to
         * clobber it - it erroneously thinks %rbx is the PIC register.
         * (Radar bug 7304232)
         */
        "pushq %%rbx"           "\n\t"
#endif
        "pushq %%rax"           "\n\t"
        "movq 40(%%rax), %%rdi" "\n\t"
        "movq 32(%%rax), %%rsi" "\n\t"
        "movq 24(%%rax), %%rdx" "\n\t"
        "movq 16(%%rax), %%rcx" "\n\t"
        "movq  8(%%rax), %%rbx" "\n\t"
        "movq   (%%rax), %%rax" "\n\t"
        "inl %%dx, %%eax"       "\n\t"  /* NB: There is no inq instruction */
        "xchgq %%rax, (%%rsp)"  "\n\t"
        "movq %%rdi, 40(%%rax)" "\n\t"
        "movq %%rsi, 32(%%rax)" "\n\t"
        "movq %%rdx, 24(%%rax)" "\n\t"
        "movq %%rcx, 16(%%rax)" "\n\t"
        "movq %%rbx,  8(%%rax)" "\n\t"
        "popq          (%%rax)" "\n\t"
#ifdef __APPLE__
        "popq %%rbx"            "\n\t"
#endif
      : "=a" (dummy)
      : "0" (myBp)
      /*
       * vmware can modify the whole VM state without the compiler knowing
       * it. So far it does not modify EFLAGS. --hpreg
       */
      :
#ifndef __APPLE__
      /* %rbx is unchanged at the end of the function on Mac OS. */
      "rbx",
#endif
      "rcx", "rdx", "rsi", "rdi", "memory"
   );
}

Looking at this code, one thing that seems odd is the inl instruction. Under normal circumstances (default I/O privilege level on Linux for instance), a user mode program should not be able to issue I/O instructions. Therefore this instruction should simply just cause the user mode program to fault and crash. This instruction actually generates a privilege error and on the host the hypervisor catches this fault. This ability to communicate with the host from a user land in the guest makes the Backdoor an interesting attack surface since it satisfies the pwn2own requirement: “An attempt in this category must be launched from within the guest operating system from a non-admin account and execute arbitrary code on the host operating system.” .The guest puts the value 0x564D5868 in $eax and the I/O port numbers 0x5658 or 0x5659 are stored in $dx for low bandwidth and high bandwidth data transfers respectively. Other registers are used for passing parameters. For instance the lower half of $ecx is used to store the backdoor command number. In the case of RPCI, the command number is set to BDOOR_CMD_MESSAGE = 30. The file lib/include/backdoor_def.h contains a list of some supported backdoor commands. The host catches the fault, reads the command number and dispatches the corresponding handler. There are a lot of other details I am omitting here so if you are interested in this interface you should read the source code.

RPCI

The Remote Procedure Call Interface is built on top of the aforementioned backdoor and basically allows a guest to issue requests to the host to perform certain operations. For instance, operations like Drag n Drop / Copy Paste as well as number of other random things such as sending or retrieving info on the guest use this interface. The format of RPCI requests is pretty simple: <cmd> <params>. For example the RPCI request "info-get guestinfo.ip" can be used in order to request the IP address assigned to the guest. For each RPCI command, an endpoint is registered and handled in vmware-vmx.

Please note that some RPCI commands can also use the VMCI sockets but that is beyond the scope of this article.

The Vulnerability

After some time reversing the different RPCI handlers, I decided to focus on the DnD and Copy&Paste endpoints. They seemed to be the most complex command handlers and therefore I was hoping it would be the best place to hunt for vulnerabilities. Although I got a chance to understand a lot of the inner workings of DnD/CP, it became apparent however that a lot of the functionality in these handlers is not reachable without user interaction. The core functionality of DnD/CP basically maintains some state machine which has some unsatisfiable states when there is no user interaction (e.g mouse drag from host to guest).
At a loss, I decided to have a look at the vulnerabilities that were reported during Pwnfest 2016 and mentioned in this VMware advisory, my idb had a lot of “symbols” at this point so it was easy to use bindiff to find the patches. The code below shows one of the vulnerable functions before it was patched (which turns out has source code available in services/plugins/dndcp/dnddndCPMsgV4.c; the vulnerability is still in master branch of the git repo of open-vm-tools btw):

static Bool  
DnDCPMsgV4IsPacketValid(const uint8 *packet,  
                        size_t packetSize)
{
   DnDCPMsgHdrV4 *msgHdr = NULL;
   ASSERT(packet);

   if (packetSize < DND_CP_MSG_HEADERSIZE_V4) {
      return FALSE;
   }

   msgHdr = (DnDCPMsgHdrV4 *)packet;

   /* Payload size is not valid. */
   if (msgHdr->payloadSize > DND_CP_PACKET_MAX_PAYLOAD_SIZE_V4) {
      return FALSE;
   }

   /* Binary size is not valid. */
   if (msgHdr->binarySize > DND_CP_MSG_MAX_BINARY_SIZE_V4) {
      return FALSE;
   }

   /* Payload size is more than binary size. */
   if (msgHdr->payloadOffset + msgHdr->payloadSize > msgHdr->binarySize) { // [1]
      return FALSE;
   }

   return TRUE;
}

Bool  
DnDCPMsgV4_UnserializeMultiple(DnDCPMsgV4 *msg,  
                               const uint8 *packet,
                               size_t packetSize)
{
   DnDCPMsgHdrV4 *msgHdr = NULL;
   ASSERT(msg);
   ASSERT(packet);

   if (!DnDCPMsgV4IsPacketValid(packet, packetSize)) {
      return FALSE;
   }

   msgHdr = (DnDCPMsgHdrV4 *)packet;

   /*
    * For each session, there is at most 1 big message. If the received
    * sessionId is different with buffered one, the received packet is for
    * another another new message. Destroy old buffered message.
    */
   if (msg->binary &&
       msg->hdr.sessionId != msgHdr->sessionId) {
      DnDCPMsgV4_Destroy(msg);
   }

   /* Offset should be 0 for new message. */
   if (NULL == msg->binary && msgHdr->payloadOffset != 0) {
      return FALSE;
   }

   /* For existing buffered message, the payload offset should match. */
   if (msg->binary &&
       msg->hdr.sessionId == msgHdr->sessionId &&
       msg->hdr.payloadOffset != msgHdr->payloadOffset) {
      return FALSE;
   }

   if (NULL == msg->binary) {
      memcpy(msg, msgHdr, DND_CP_MSG_HEADERSIZE_V4);
      msg->binary = Util_SafeMalloc(msg->hdr.binarySize);
   }

   /* msg->hdr.payloadOffset is used as received binary size. */
   memcpy(msg->binary + msg->hdr.payloadOffset,
          packet + DND_CP_MSG_HEADERSIZE_V4,
          msgHdr->payloadSize); // [2]
   msg->hdr.payloadOffset += msgHdr->payloadSize;
   return TRUE;
}

This function is called in Version 4 of DnD/CP from the host’s side when the guest sends fragment DnD/CP command packets. The host invokes this function in order to reassemble the chunks of the DnD/CP message sent by the guest.
The first packet received should have payloadOffset == 0 and binarySize specifying the size of a buffer dynamically allocated on the heap. At [1], there is a check to make sure that the payloadOffset and payloadSize do not go out of bounds by comparing it to the binarySize of the packet header. At [2] , the data is copied to the allocated buffer. However, the check at [1] is flawed because it only works for the first received packet. For subsequent packets, the check is invalid since the code expects the binarySize field of the packet header to match that of the first packet in the fragment stream. You might also have noticed that at [1] there is an integer overflow, but this is actually not exploitable since payloadOffset needs to be set to either 0 or should be equal to expected payloadOffset of the buffered message.
Therefore, the vulnerability can be triggered for example by sending the following sequence of fragments:

packet 1{  
 ...
 binarySize = 0x100
 payloadOffset = 0
 payloadSize = 0x50
 sessionId = 0x41414141
 ...
 #...0x50 bytes...#
}

packet 2{  
 ...
 binarySize = 0x1000
 payloadOffset = 0x50
 payloadSize = 0x100
 sessionId = 0x41414141
 ...
 #...0x100 bytes...#
}

Armed with this knowledge, I decided to have a look at Version 3 of DnD/CP to see if anything had been missed in there. Lo and behold, the exact same vulnerability was present in Version 3 of the code:
(this vulnerability was discovered by reversing, but we later noticed that the code for v3 was also present in the git repo of open-vm-tools.)

Bool  
DnD_TransportBufAppendPacket(DnDTransportBuffer *buf,          // IN/OUT  
                             DnDTransportPacketHeader *packet, // IN
                             size_t packetSize)                // IN
{
   ASSERT(buf);
   ASSERT(packetSize == (packet->payloadSize + DND_TRANSPORT_PACKET_HEADER_SIZE) &&
          packetSize <= DND_MAX_TRANSPORT_PACKET_SIZE &&
          (packet->payloadSize + packet->offset) <= packet->totalSize &&
          packet->totalSize <= DNDMSG_MAX_ARGSZ);

   if (packetSize != (packet->payloadSize + DND_TRANSPORT_PACKET_HEADER_SIZE) ||
       packetSize > DND_MAX_TRANSPORT_PACKET_SIZE ||
       (packet->payloadSize + packet->offset) > packet->totalSize || //[1]
       packet->totalSize > DNDMSG_MAX_ARGSZ) {
      goto error;
   }

   /*
    * If seqNum does not match, it means either this is the first packet, or there
    * is a timeout in another side. Reset the buffer in all cases.
    */
   if (buf->seqNum != packet->seqNum) {
      DnD_TransportBufReset(buf);
   }

   if (!buf->buffer) {
      ASSERT(!packet->offset);
      if (packet->offset) {
         goto error;
      }
      buf->buffer = Util_SafeMalloc(packet->totalSize);
      buf->totalSize = packet->totalSize;
      buf->seqNum = packet->seqNum;
      buf->offset = 0;
   }

   if (buf->offset != packet->offset) {
      goto error;
   }

   memcpy(buf->buffer + buf->offset,
          packet->payload,
          packet->payloadSize);
   buf->offset += packet->payloadSize;
   return TRUE;

error:  
   DnD_TransportBufReset(buf);
   return FALSE;
}

This function is called for fragment reassembly of DnD/CP protocol version 3. Here we can see the same situation as before at [1]; trusting that totalSize from the subsequent fragments would match totalSize of the first fragment. Thus this vulnerability can be triggered in a similar fashion to the previous one:

packet 1{  
 ...
 totalSize = 0x100
 payloadOffset = 0
 payloadSize = 0x50
 seqNum = 0x41414141
 ...
 #...0x50 bytes...#
}

packet 2{  
 ...
 totalSize = 0x1000
 payloadOffset = 0x50
 payloadSize = 0x100
 seqNum = 0x41414141
 ...
 #...0x100 bytes...#
}

This brings us to the title of this blog post: “The Weak Bug”. In the context of a contest like pwn2own, I think the bug is weak because not only was it inspired by a previously reported one, it was pretty much exactly the same one. Therefore it really was no surprise when it was patched before the contest (okay, maybe we didn’t expect it to get patched one day before the contest :P). The corresponding VMware advisory can be found here. The latest version of VMware Workstation Pro affected by this bug is version 12.5.3.
We can now have a look at how to abuse the vulnerability and come up with a guest to host escape!

Exploitation

We want to gain code execution through this vulnerability so we need to either find a function pointer to overwrite on the heap or to corrupt the vtable of a C++ object.
First though, let’s have a look at how to set the DnD/CP protocol to version 3. This can be done by sending the following sequence of RPCI commands:

tools.capability.dnd_version 3  
tools.capability.copypaste_version 3  
vmx.capability.dnd_version  
vmx.capability.copypaste_version  

The first two lines respectively set the versions of DnD and Copy/Paste. The latter two lines query the versions. They are required because querying the versions is what actually causes the version to be switched. The RPCI command handler for the vmx.capability.dnd_version checks if the version of the DnD/CP protocol has been modified and if so, it will create a corresponding C++ object for the specified version. For version 3, two C++ objects of size 0xA8 are created; one for DnD commands and one for Copy/Paste commands.

The vulnerability gives us control over the allocation size as well as the overflow size but it also allows us to write out of bounds multiple times. Ideally we can just allocate an object of size 0xA8 and make it land before the C++ object then overwrite the vtable pointer with a pointer to controlled data to get code execution.
It is not as simple as that however, since there are a few things we need to address first. Mainly we need to find a way to defeat ASLR which in our case implies also dealing with the Windows Low Fragmented Heap.

Defeating ASLR

We need to find an object we can overflow into and somehow influence it to get us in info leak; like an object we can read back from the guest with a length field or a data pointer we can easily corrupt. We were unable to find such an object so we decided to reverse the other RPCI command handlers a bit more and see what we could come up with. Of particular interest were commands that had counter parts, in other words, you can use one command to set some data and then use another related command to retrieve the data back. The winner was the info-set and info-get command pair:

info-set guestinfo.KEY VALUE  
info-get guestinfo.KEY  

VALUE is a string and its string length controls the allocation size of a buffer on the heap. Moreover we can allocate as many strings as we want in this way. But how can we use these strings to leak data ? Simply by overwriting past the null byte and “lining” up the string with the adjacent chunk. If we can allocate a string (or strings) between the overflowing chunk and a DnD or CP object, then we can leak the vtable address of the object and hence the base address of vmware-vmx. Since we can allocate many strings, we can increase our chances of obtaining this heap layout despite the randomization of the LFH. However there is still an aspect of the allocations we do not control and that is whether a DnD or CP object is allocated afterour overflowing heap chunk. From our tests, we were able to get a probability of success between 60% and 80% by playing with different parameters of our exploit such as allocating and free’ing different amounts of strings.

In summary, we have the following (Ov is the overflowing chunk, S is a string and T is the target object):

Simple OverflowThe plan is basically to allocate a number of strings filled with A‘s for example then we overflow the adjacent chunk with some B‘s, read back the value of all the allocated strings, the one that contains B‘s is the one we have corrupted. At this point we have a string we can use to read the leak with, so we can keep overflowing with a granularity matching the size of the objects in the bucket (0xA8) and reading back the string every time to check if there is some leaked data in the string. We can know that we have reached the target object because we know the offsets (from the vmware-vmx base) of the vtables of the DnD and CopyPaste objects. Therefore after each overflow, we can look at the last bits of the retrieved data to see if they match that of the vtable offsets.

Getting Code Execution

Now that we have obtained the info leak and know what type of C++ object we are about to overflow we can proceed with the rest of the exploitation. There are two cases we need to handle, CopyPaste and DnD. Please note that this is probably just one line of exploitation out of many others.

The CopyPaste case

In the case of the CopyPaste object, we can just overwrite the vtable and make it point to some data we control. We need a pointer to controlled data which will be interpreted as the vtable address of the object. The way we decided to do this is by using another RPCI command: unity.window.contents.start. This command is used for the Unity mode to draw some images on the host and allows us to have some values that we control at a know offset from the base address of vmware-vmx. To of the arguments taken by the command are width and height of the image, each of them a 32-bit word. By combining the two, we can have a 64-bit value at a known address. We line it up with the vtable entry of the CopyPaste object that we can trigger by just sending a CopyPaste command. In summary we do the following:

  • Send a unity.window.contents.start to write a 64-bit address of a stack pivot gadget at a know address with the heightand width parameters.
  • Overwrite the vtable address with a pointer to the 64-bit address (adjusted with the offset of the vtable entry that will be called).
  • Trigger the use of the vtable by sending a CopyPaste command.
  • ROP.
The DnD case

In the case of the DnD object, we can’t just overwrite the vtable because right after the overflow the vtable is accessed to call another method so we need to do it another way. This is because we only know the address of 1 qword that we control through the unity image’s width and height, so we can’t forge a vtable of the size we want.
Let’s have a look at the structure of the DnD and CP objects which can be summarized as follows (again, some similar structures can be found in open-vm-tools but they have slightly different formats in vmware-vmx):

DnD_CopyPaste_RpcV3{  
    void * vtable;
    ...
    uint64_t ifacetype;
    RpcUtil{
        void * vtable;
        RpcBase * mRpc;
        DnDTransportBuffer{
            uint64_t seqNum;
            uint8_t * buffer;
            uint64_t totalSize;
            uint64_t offset;
            ...
        }
        ...
    }
}

RpcBase{  
    void * vtable;
    ...
}

A lot of fields have been omitted since they are irrelevant for the purpose of this blog post.
There is a pointer to an RpcBase object which is also a C++ object. Therefore if we can overwrite the mRpc field with a pointer-to-pointer to data we control, we can have a vtable of our liking for the RpcBase object. For this pointer we can also use the unity.window.contents.start command. Another parameter the command takes on top of width and height is imgsize, which controls the size of the image buffer. This buffer is allocated and its address can also be found at a static offset from the vmware-vmx base. We can populate the contents of the buffer by using the unity.window.contents.chunk command. In summary we do the following:

  • Send a unity.window.contents.start command to allocate a buffer where we will store a fake vtable.
  • Send a unity.window.contents.chunk command to populate the fake vtable with some stack pivot gadget.
  • Overwrite the mRpc field of the DnD object with an address pointing to the address of the allocated buffer.
  • Trigger the use of the vtable of the mRpc field by sending a DnD command.
  • ROP.

P.S: There is a RWX page in vmware-vmx (at least in version 12.5.3).

Notes on Reliability

As mentioned earlier, the exploit is not 100% reliable due to the Windows LFH. Some things can be attempted in order to increase the reliability. Here is a short list:

  • Monitor allocations of size 0xA8 to see if we can take advantage of the determinism of the LFH after a number of malloc’s() and free’s() as described here and here.
  • Find some other C++ objects to overwrite, preferably some that we can spray.
  • Find some other objects on the heap with function pointers, preferably some that we can spray.
  • Find a seperate info leak bug that we can use as an oracle.
  • Be more creative.

Useless Video

Here is a video of the exploit in “action”.

(Yes, it’s VMware inside VMware.)

Conclusion

“No pwn no fun” and make sure that if you want to take part in some contest like pwn2own you either have multiple bugs or you find some inspired vulnerabilities.

Source:https://acez.re/the-weak-bug-exploiting-a-heap-overflow-in-vmware/