[DannyWei, lywang, FlowerCode] of Tencent Xuanwu Lab

Here is a preliminary documentation of the RFG implementation. We will update it once we have new findings and corrections.

We analyzed the Return Flow Guard introduced in Windows 10 Redstone 2 14942, released on October 7, 2016.

1 PROTECTION METHODS

Microsoft introduced Control Flow Guard in Windows 8.1 to protect against malicious modification of indirect call function pointers. CFG checks the target function pointer before each indirect call. However, CFG cannot detect modification of the return address on stack, or Return Oriented Programming.
The newly added RFG effectively stops these kind of attacks by saving the return address to fs:[rsp] at the entry of each function, and compare it with the return address on stack before returning.
Enabling RFG require both compiler and operating system support. During compilation, the compiler instruments the file by reserving a certain number of instruction spaces in the form of nop instructions.
When the target executable runs on a supported operating system, the reserved spaces are dynamically replaced with RFG instructions to check function return addresses. Otherwise, these nop instructions will not interfere with normal execution flow of the program.
The difference between RFG and GS (Buffer Security Check) is that the stack cookie can be obtained by using information leak or brute forcing, the RFG return address is written to the Thread Control Stack out of reach of attackers. This significantly increased the difficulty of the attack.

2 CONTROL SWITCHES

2.1 MMENABLERFG GLOBAL VARIABLE

This variable is controlled by a registry value located at:

1
2
\Registry\Machine\SYSTEM\CurrentControlSet\Control\Session Manager\kernel
EnableRfg : REG_DWORD

2.1.1 Initialization

1
KiSystemStartup -> KiInitializeKernel -> InitBootProcessor -> CmGetSystemControlValues

2.2 IMAGE FILE CONTROL FLAG

Control flags are stored in the IMAGE_LOAD_CONFIG_DIRECTORY64 structure in PE file.
Flags in GuardFlag field indicate RFG support status.

1
2
3
#define IMAGE_GUARD_RF_INSTRUMENTED                    0x00020000 // Module contains return flow instrumentation and metadata
#define IMAGE_GUARD_RF_ENABLE 0x00040000 // Module requests that the OS enable return flow protection
#define IMAGE_GUARD_RF_STRICT 0x00080000 // Module requests that the OS enable return flow protection in strict mode

2.3 PROCESS CONTROL FLAG

2.3.1 Querying

The RFG status can be queried through Win32 API GetProcessMitigationPolicy.

1
2
3
4
5
typedef enum _PROCESS_MITIGATION_POLICY {
// ...
ProcessReturnFlowGuardPolicy = 11
// ...
} PROCESS_MITIGATION_POLICY, *PPROCESS_MITIGATION_POLICY;

2.3.2 Structure Definition

1
2
3
4
5
6
7
8
9
10
typedef struct _PROCESS_MITIGATION_RETURN_FLOW_GUARD_POLICY {
union {
DWORD Flags;
struct {
DWORD EnableReturnFlowGuard : 1;
DWORD StrictMode : 1;
DWORD ReservedFlags : 30;
} DUMMYSTRUCTNAME;
} DUMMYUNIONNAME;
} PROCESS_MITIGATION_RETURN_FLOW_GUARD_POLICY, *PPROCESS_MITIGATION_RETURN_FLOW_GUARD_POLICY;

3 NEW MEMBERS IN PORTABLE EXECUTABLE FORMAT

3.1 IMAGE_LOAD_CONFIG_DIRECTORY64

RFG instrumented portable executables have added several new fields, 24 bytes in total.

1
2
3
4
ULONGLONG  GuardRFFailureRoutine; 
ULONGLONG GuardRFFailureRoutineFunctionPointer;
DWORD DynamicValueRelocTableOffset;
WORD DynamicValueRelocTableSection;

2 pointers (16 bytes):
Virtual Address of the _guard_ss_verify_failure function
Virtual address of the _guard_ss_verify_failure_fptr function pointer, which points to the _guard_ss_verify_failure_default function by default.

Information about the address table (6 bytes):
DynamicValueRelocTableOffset recording the offset of dynamic relocation table relative to the relocation table, and
DynamicValueRelocTableSection recorded the section index of the dynamic value relocation table.
The remaining bytes are reserved.

3.2 IMAGE_DYNAMIC_RELOCATION_TABLE

RFG instrumented portable executables have a new dynamic relocation table after the normal relocation table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
typedef struct _IMAGE_DYNAMIC_RELOCATION_TABLE {
DWORD Version;
DWORD Size;
// IMAGE_DYNAMIC_RELOCATION DynamicRelocations[0];
} IMAGE_DYNAMIC_RELOCATION_TABLE, *PIMAGE_DYNAMIC_RELOCATION_TABLE;

typedef struct _IMAGE_DYNAMIC_RELOCATION {
PVOID Symbol;
DWORD BaseRelocSize;
// IMAGE_BASE_RELOCATION BaseRelocations[0];
} IMAGE_DYNAMIC_RELOCATION, *PIMAGE_DYNAMIC_RELOCATION;

typedef struct _IMAGE_BASE_RELOCATION {
DWORD VirtualAddress;
DWORD SizeOfBlock;
// WORD TypeOffset[1];
} IMAGE_BASE_RELOCATION;
Symbol in IMAGE_DYNAMIC_RELOCATION indicates the stored entries are for function prologues or function epilogues, defined as follows:
#define IMAGE_DYNAMIC_RELOCATION_GUARD_RF_PROLOGUE 0x00000001
#define IMAGE_DYNAMIC_RELOCATION_GUARD_RF_EPILOGUE 0x00000002

The absolute address of an entry can be calculated from ImageBase + VirtualAddress + TypeOffset.

4 INSTRUCTION INSERTION

4.1 COMPILE TIME

4.1.1 Inserted Prologue Bytes (9 Bytes)

1
2
3
4
5
6
7
MiRfgNopPrologueBytes
xchg ax, ax
nop dword ptr [rax+00000000h]
4.1.2 Inserted Epilogue Bytes (Example, 15 Bytes)
retn
db 0Eh dup(90h)
retn

4.1.2 Inserted Epilogue Bytes (Example, 15 Bytes)

1
2
3
retn
db 0Eh dup(90h)
retn

To reduce overhead, the compiler also inserts a _guard_ss_common_verify_stub function. Instead of inserting nop bytes at the end of every function, the compiler simply ends most function with a jmp to this stub function. This stub function has nop bytes to be replaced with epilogue bytes by the kernel at runtime, and a retn instruction at the end.

1
2
3
4
5
__guard_ss_common_verify_stub proc near
retn
db 0Eh dup(90h)
retn
__guard_ss_common_verify_stub endp

4.2 RUNTIME

MiPerformRfgFixups performs the instruction replacement according to function information stored in IMAGE_DYNAMIC_RELOCATION_TABLE when new executable section is being created.

4.2.1 Replaced Prologue Bytes (9 Bytes)

The kernel uses MiRfgInstrumentedPrologueBytes to replace compiler inserted prologue bytes.

1
2
3
MiRfgInstrumentedPrologueBytes
mov rax, [rsp]
mov fs:[rsp], rax

4.2.2 Replaced Epilogue Bytes (15 Bytes)

The kernel uses MiRfgInstrumentedEpilogueBytes and _guard_ss_verify_failure function address recorded in to replace the compiler inserted epilogue bytes.

1
2
3
4
MiRfgInstrumentedEpilogueBytes
mov r11, fs:[rsp]
cmp r11, [rsp]
jnz

5 THREAD CONTROL STACK

To implement RFG, Microsoft introduced Thread Control Stack, and reused the fs segment register on x64 architecture. When RFG enabled process executes the mov fs:[rsp], rax instructions, fs segment register points to the current thread’s ControlStackLimit on the control stack, and write rax into rsp offset.
All user mode threads in one process are using different memory blocks within same Thread Control Stack. We can enumerate the virtual address descriptor tree of the process to obtain the _MMVAD structure that describes the process’s Thread Control Stack.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
typedef struct _MMVAD {
/* 0x0000 */ struct _MMVAD_SHORT Core;
union {
union {
/* 0x0040 */ unsigned long LongFlags2;
/* 0x0040 */ struct _MMVAD_FLAGS2 VadFlags2;
}; /* size: 0x0004 */
} /* size: 0x0004 */ u2;
/* 0x0044 */ long Padding_;
/* 0x0048 */ struct _SUBSECTION* Subsection;
/* 0x0050 */ struct _MMPTE* FirstPrototypePte;
/* 0x0058 */ struct _MMPTE* LastContiguousPte;
/* 0x0060 */ struct _LIST_ENTRY ViewLinks;
/* 0x0070 */ struct _EPROCESS* VadsProcess;
union {
union {
/* 0x0078 */ struct _MI_VAD_SEQUENTIAL_INFO SequentialVa;
/* 0x0078 */ struct _MMEXTEND_INFO* ExtendedInfo;
}; /* size: 0x0008 */
} /* size: 0x0008 */ u4;
/* 0x0080 */ struct _FILE_OBJECT* FileObject;
} MMVAD, *PMMVAD; /* size: 0x0088 */

typedef struct _MMVAD_SHORT {
union {
/* 0x0000 */ struct _RTL_BALANCED_NODE VadNode;
/* 0x0000 */ struct _MMVAD_SHORT* NextVad;
}; /* size: 0x0018 */
/* 0x0018 */ unsigned long StartingVpn;
/* 0x001c */ unsigned long EndingVpn;
/* 0x0020 */ unsigned char StartingVpnHigh;
/* 0x0021 */ unsigned char EndingVpnHigh;
/* 0x0022 */ unsigned char CommitChargeHigh;
/* 0x0023 */ unsigned char SpareNT64VadUChar;
/* 0x0024 */ long ReferenceCount;
/* 0x0028 */ struct _EX_PUSH_LOCK PushLock;
union {
union {
/* 0x0030 */ unsigned long LongFlags;
/* 0x0030 */ struct _MMVAD_FLAGS VadFlags;
}; /* size: 0x0004 */
} /* size: 0x0004 */ u;
union {
union {
/* 0x0034 */ unsigned long LongFlags1;
/* 0x0034 */ struct _MMVAD_FLAGS1 VadFlags1;
}; /* size: 0x0004 */
} /* size: 0x0004 */ u1;
/* 0x0038 */ struct _MI_VAD_EVENT_BLOCK* EventList;
} MMVAD_SHORT, *PMMVAD_SHORT; /* size: 0x0040 */

typedef struct _RTL_BALANCED_NODE {
union {
/* 0x0000 */ struct _RTL_BALANCED_NODE* Children[2];
struct {
/* 0x0000 */ struct _RTL_BALANCED_NODE* Left;
/* 0x0008 */ struct _RTL_BALANCED_NODE* Right;
}; /* size: 0x0010 */
}; /* size: 0x0010 */
union {
/* 0x0010 */ unsigned char Red : 1; /* bit position: 0 */
/* 0x0010 */ unsigned char Balance : 2; /* bit position: 0 */
/* 0x0010 */ unsigned __int64 ParentValue;
}; /* size: 0x0008 */
} RTL_BALANCED_NODE, *PRTL_BALANCED_NODE; /* size: 0x0018 */

typedef struct _RTL_AVL_TREE {
/* 0x0000 */ struct _RTL_BALANCED_NODE* Root;
} RTL_AVL_TREE, *PRTL_AVL_TREE; /* size: 0x0008 */

typedef struct _EPROCESS {

struct _RTL_AVL_TREE VadRoot;

}

We can use _EPROCESS.VadRoot to walk through the VAD tree. If _MMVAD.Core.VadFlags.RfgControlStack flag is set, the current _MMVAD describes the virtual memory address range of the thread control stack (StartingVpn, EndingVpn, StartingVpnHigh, EndingVpnHigh in _MMVAD.Core), defined as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
typedef struct _MMVAD_FLAGS {
struct /* bitfield */ {
/* 0x0000 */ unsigned long VadType : 3; /* bit position: 0 */
/* 0x0000 */ unsigned long Protection : 5; /* bit position: 3 */
/* 0x0000 */ unsigned long PreferredNode : 6; /* bit position: 8 */
/* 0x0000 */ unsigned long NoChange : 1; /* bit position: 14 */
/* 0x0000 */ unsigned long PrivateMemory : 1; /* bit position: 15 */
/* 0x0000 */ unsigned long PrivateFixup : 1; /* bit position: 16 */
/* 0x0000 */ unsigned long ManySubsections : 1; /* bit position: 17 */
/* 0x0000 */ unsigned long Enclave : 1; /* bit position: 18 */
/* 0x0000 */ unsigned long DeleteInProgress : 1; /* bit position: 19 */
/* 0x0000 */ unsigned long PageSize64K : 1; /* bit position: 20 */
/* 0x0000 */ unsigned long RfgControlStack : 1; /* bit position: 21 */
/* 0x0000 */ unsigned long Spare : 10; /* bit position: 22 */
}; /* bitfield */
} MMVAD_FLAGS, *PMMVAD_FLAGS; /* size: 0x0004 */

typedef struct _MI_VAD_EVENT_BLOCK {
/* 0x0000 */ struct _MI_VAD_EVENT_BLOCK* Next;
union {
/* 0x0008 */ struct _KGATE Gate;
/* 0x0008 */ struct _MMADDRESS_LIST SecureInfo;
/* 0x0008 */ struct _RTL_BITMAP_EX BitMap;
/* 0x0008 */ struct _MMINPAGE_SUPPORT* InPageSupport;
/* 0x0008 */ struct _MI_LARGEPAGE_IMAGE_INFO LargePage;
/* 0x0008 */ struct _ETHREAD* CreatingThread;
/* 0x0008 */ struct _MI_SUB64K_FREE_RANGES PebTebRfg;
/* 0x0008 */ struct _MI_RFG_PROTECTED_STACK RfgProtectedStack;
}; /* size: 0x0038 */
/* 0x0040 */ unsigned long WaitReason;
/* 0x0044 */ long __PADDING__[1];
} MI_VAD_EVENT_BLOCK, *PMI_VAD_EVENT_BLOCK; /* size: 0x0048 */

typedef struct _MI_RFG_PROTECTED_STACK {
/* 0x0000 */ void* ControlStackBase;
/* 0x0008 */ struct _MMVAD_SHORT* ControlStackVad;
} MI_RFG_PROTECTED_STACK, *PMI_RFG_PROTECTED_STACK; /* size: 0x0010 */

When a RFG protected thread is created, nt!MmSwapThreadControlStack sets the thread’s ETHREAD.UserFsBase. It uses MiLocateVadEvent to search for MMVAD to be set as UserFsBase.
It uses the following formula to calculate the ETHREAD.UserFsBase:

1
2
3
ControlStackBase = MMVAD.Core.EventList.RfgProtectedStack.ControlStackBase
ControlStackLimitDelta = ControlStackBase - (MMVAD.Core.StartingVpnHigh * 0x100000000 + MMVAD.Core.StartingVpn ) * 0x1000
ETHREAD.UserFsBase = ControlStackLimitDelta

Each thread has its own shadow stack range in Thread Control Stack. If the current thread uses range ControlStackBase ~ ControlStackLimit, then ControlStackLimit = KTHTREAD.StackLimit + ControlStackLimitDelta. So the actual value stored in UserFsBase is the offset of ControlStackLimit from StackLimit. When multiple threads access the shadow stack simultaneously, the actual address accessed is located at ETHREAD.UserFsBase + rsp.

6 RFG IN ACTION

We wrote a simple yara signature to identify RFG instrumented PE file.

1
2
3
4
5
6
7
8
9
10
rule rfg {
strings:
$pe = { 4d 5a }
$a = { 66 90 0F 1F 80 00 00 00 00 }
$b = { C3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 C3 }
$c = { E9 ?? ?? ?? ?? 90 90 90 90 90 90 90 90 90 90 E9 }

condition:
$pe at 0 and $a and ($b or $c)
}

Usage:

1
yara64.exe -r -f rfg.yara %SystemRoot%

We can observe from the output that most system executable files are already RFG instrumented in this version of Windows.
Here we use IDA Pro and WinDbg to examine a RFG instrumented calc.exe.

1
2
3
.text:000000014000176C wWinMain
.text:000000014000176C xchg ax, ax
.text:000000014000176E nop dword ptr [rax+00000000h]

The entry point before runtime replacement

1
2
3
4
0:000> u calc!wWinMain
calc!wWinMain:
00007ff7`91ca176c 488b0424 mov rax,qword ptr [rsp]
00007ff7`91ca1770 6448890424 mov qword ptr fs:[rsp],rax

The entry point after runtime replacement

1
2
3
4
.text:00000001400025BC __guard_ss_common_verify_stub
.text:00000001400025BC retn
.text:00000001400025BD db 0Eh dup(90h)
.text:00000001400025CB retn

The common verify stub function before runtime replacement

1
2
3
4
5
6
0:000> u calc!_guard_ss_common_verify_stub
calc!_guard_ss_common_verify_stub:
00007ff7`91ca25bc 644c8b1c24 mov r11,qword ptr fs:[rsp]
00007ff7`91ca25c1 4c3b1c24 cmp r11,qword ptr [rsp]
00007ff7`91ca25c5 0f85f5000000 jne calc!_guard_ss_verify_failure (00007ff7`91ca26c0)
00007ff7`91ca25cb c3 ret

The common verify stub function after runtime replacement

7 REFERENCES

Exploring Control Flow Guard in Windows 10 Jack Tang, Trend Micro Threat Solution Team
http://sjc1-te-ftp.trendmicro.com/assets/wp/exploring-control-flow-guard-in-windows10.pdf