Return Flow Guard
[DannyWei, lywang, FlowerCode] of Tencent Xuanwu Lab
Here is a preliminary documentation of the RFG implementation. We will update it once we have new findings and corrections.
We analyzed the Return Flow Guard introduced in Windows 10 Redstone 2 14942, released on October 7, 2016.
1 PROTECTION METHODS
Microsoft introduced Control Flow Guard in Windows 8.1 to protect against malicious modification of indirect call function pointers. CFG checks the target function pointer before each indirect call. However, CFG cannot detect modification of the return address on stack, or Return Oriented Programming.
The newly added RFG effectively stops these kind of attacks by saving the return address to fs:[rsp] at the entry of each function, and compare it with the return address on stack before returning.
Enabling RFG require both compiler and operating system support. During compilation, the compiler instruments the file by reserving a certain number of instruction spaces in the form of nop instructions.
When the target executable runs on a supported operating system, the reserved spaces are dynamically replaced with RFG instructions to check function return addresses. Otherwise, these nop instructions will not interfere with normal execution flow of the program.
The difference between RFG and GS (Buffer Security Check) is that the stack cookie can be obtained by using information leak or brute forcing, the RFG return address is written to the Thread Control Stack out of reach of attackers. This significantly increased the difficulty of the attack.
2 CONTROL SWITCHES
2.1 MMENABLERFG GLOBAL VARIABLE
This variable is controlled by a registry value located at:
1 | \Registry\Machine\SYSTEM\CurrentControlSet\Control\Session Manager\kernel |
2.1.1 Initialization
1 | KiSystemStartup -> KiInitializeKernel -> InitBootProcessor -> CmGetSystemControlValues |
2.2 IMAGE FILE CONTROL FLAG
Control flags are stored in the IMAGE_LOAD_CONFIG_DIRECTORY64 structure in PE file.
Flags in GuardFlag field indicate RFG support status.
1 | #define IMAGE_GUARD_RF_INSTRUMENTED 0x00020000 // Module contains return flow instrumentation and metadata |
2.3 PROCESS CONTROL FLAG
2.3.1 Querying
The RFG status can be queried through Win32 API GetProcessMitigationPolicy.
1 | typedef enum _PROCESS_MITIGATION_POLICY { |
2.3.2 Structure Definition
1 | typedef struct _PROCESS_MITIGATION_RETURN_FLOW_GUARD_POLICY { |
3 NEW MEMBERS IN PORTABLE EXECUTABLE FORMAT
3.1 IMAGE_LOAD_CONFIG_DIRECTORY64
RFG instrumented portable executables have added several new fields, 24 bytes in total.
1 | ULONGLONG GuardRFFailureRoutine; |
2 pointers (16 bytes):
Virtual Address of the _guard_ss_verify_failure function
Virtual address of the _guard_ss_verify_failure_fptr function pointer, which points to the _guard_ss_verify_failure_default function by default.
Information about the address table (6 bytes):
DynamicValueRelocTableOffset recording the offset of dynamic relocation table relative to the relocation table, and
DynamicValueRelocTableSection recorded the section index of the dynamic value relocation table.
The remaining bytes are reserved.
3.2 IMAGE_DYNAMIC_RELOCATION_TABLE
RFG instrumented portable executables have a new dynamic relocation table after the normal relocation table.
1 | typedef struct _IMAGE_DYNAMIC_RELOCATION_TABLE { |
The absolute address of an entry can be calculated from ImageBase + VirtualAddress + TypeOffset.
4 INSTRUCTION INSERTION
4.1 COMPILE TIME
4.1.1 Inserted Prologue Bytes (9 Bytes)
1 | MiRfgNopPrologueBytes |
4.1.2 Inserted Epilogue Bytes (Example, 15 Bytes)
1 | retn |
To reduce overhead, the compiler also inserts a _guard_ss_common_verify_stub function. Instead of inserting nop bytes at the end of every function, the compiler simply ends most function with a jmp to this stub function. This stub function has nop bytes to be replaced with epilogue bytes by the kernel at runtime, and a retn instruction at the end.
1 | __guard_ss_common_verify_stub proc near |
4.2 RUNTIME
MiPerformRfgFixups performs the instruction replacement according to function information stored in IMAGE_DYNAMIC_RELOCATION_TABLE when new executable section is being created.
4.2.1 Replaced Prologue Bytes (9 Bytes)
The kernel uses MiRfgInstrumentedPrologueBytes to replace compiler inserted prologue bytes.
1 | MiRfgInstrumentedPrologueBytes |
4.2.2 Replaced Epilogue Bytes (15 Bytes)
The kernel uses MiRfgInstrumentedEpilogueBytes and _guard_ss_verify_failure function address recorded in to replace the compiler inserted epilogue bytes.
1 | MiRfgInstrumentedEpilogueBytes |
5 THREAD CONTROL STACK
To implement RFG, Microsoft introduced Thread Control Stack, and reused the fs segment register on x64 architecture. When RFG enabled process executes the mov fs:[rsp], rax instructions, fs segment register points to the current thread’s ControlStackLimit on the control stack, and write rax into rsp offset.
All user mode threads in one process are using different memory blocks within same Thread Control Stack. We can enumerate the virtual address descriptor tree of the process to obtain the _MMVAD structure that describes the process’s Thread Control Stack.
1 | typedef struct _MMVAD { |
We can use _EPROCESS.VadRoot to walk through the VAD tree. If _MMVAD.Core.VadFlags.RfgControlStack flag is set, the current _MMVAD describes the virtual memory address range of the thread control stack (StartingVpn, EndingVpn, StartingVpnHigh, EndingVpnHigh in _MMVAD.Core), defined as follows:
1 | typedef struct _MMVAD_FLAGS { |
When a RFG protected thread is created, nt!MmSwapThreadControlStack sets the thread’s ETHREAD.UserFsBase. It uses MiLocateVadEvent to search for MMVAD to be set as UserFsBase.
It uses the following formula to calculate the ETHREAD.UserFsBase:
1 | ControlStackBase = MMVAD.Core.EventList.RfgProtectedStack.ControlStackBase |
Each thread has its own shadow stack range in Thread Control Stack. If the current thread uses range ControlStackBase ~ ControlStackLimit, then ControlStackLimit = KTHTREAD.StackLimit + ControlStackLimitDelta. So the actual value stored in UserFsBase is the offset of ControlStackLimit from StackLimit. When multiple threads access the shadow stack simultaneously, the actual address accessed is located at ETHREAD.UserFsBase + rsp.
6 RFG IN ACTION
We wrote a simple yara signature to identify RFG instrumented PE file.
1 | rule rfg { |
Usage:
1 | yara64.exe -r -f rfg.yara %SystemRoot% |
We can observe from the output that most system executable files are already RFG instrumented in this version of Windows.
Here we use IDA Pro and WinDbg to examine a RFG instrumented calc.exe.
1 | .text:000000014000176C wWinMain |
The entry point before runtime replacement
1 | 0:000> u calc!wWinMain |
The entry point after runtime replacement
1 | .text:00000001400025BC __guard_ss_common_verify_stub |
The common verify stub function before runtime replacement
1 | 0:000> u calc!_guard_ss_common_verify_stub |
The common verify stub function after runtime replacement
7 REFERENCES
Exploring Control Flow Guard in Windows 10 Jack Tang, Trend Micro Threat Solution Team
http://sjc1-te-ftp.trendmicro.com/assets/wp/exploring-control-flow-guard-in-windows10.pdf