Hooking can be used by legitimate software for reverse engineering, for example, to examine the user mode function calls that a malicious program is making.
It can also be used by a malicious program to hide certain aspects of itself. For example, malware might try and install a hook into Windows API functions, which list files in Windows such as FindFirstFile and FindNextFile. When a particular file used by the malware is found by one of these functions, the malware hook code can change how the function works so that it skips over it to the next file, thus hiding the file from Windows Explorer.
How Does A Function Hook Work?
A function hook usually has three parts:
- A piece of redirection code overwrites part of the target function which will redirect any calls to it, into a callback function in our code
- The callback is the second part and informs us that the target function was called. This is the main part of the hook; the part that allows us to change the behaviour of the target function or log the information passed into the target function.
- The final part is commonly called a trampoline, since it bounces us back into the target function, as if nothing ever happened. The trampoline is created by the hooking code and holds a copy of part of the hooked function we overwrote initially. The trampoline also contains some code to redirect execution back into to the hooked function just after the code we overwrote.
Function Hooking
Unfortunately, hooking is quite low level, so we’re going to have to get our hands dirty and do some work at the assembly level.
Hooking a function involves several steps:
- Place our hooking code inside the target process: One way of doing this is to inject our code as a DLL. More information on DLL injection can be found in my previous blog posts on DLL injection
- Find the address of the function that we want to hook: If it’s a windows API function or a function exported by a DLL that is loaded in the target process, then we can use GetProcAddress
- Construct a trampoline, copy and then overwrite the instructions at the start of the target function with a jump to the hook callback
In 32bit applications, we can use a relative jump near instruction “JMP” which can be used to jump or move the current instruction pointer (EIP) backwards or forwards in memory by up to 2GB.
If, for example, our target callback is 70000 bytes in front of the function (0x11170 in hex) the instruction would be:
[cpp]
JMP 0x1116B
[/cpp]
The actual byte level representation of this in memory would be as follows:
[cpp]
0xE9 0x6B 0x11 0x01 0x00
[/cpp]
The jump instruction in this format requires five bytes. Note that we have also taken into account these 5 bytes used in the JMP instruction making the jump offset actually five bytes less than 70000, which is 0x1116B in hex. The address specified in the JMP call is always relative to the end of the JMP instruction.
So to install our hook, we need to overwrite five bytes of the start of the target function with a JMP. Except it’s not quite that simple…
Intel instructions are variable length, so we also need to make sure that we first copy any whole instructions that may be overwritten in the target function. This involves disassembling the instructions of the target function until we have disassembled enough code to contain our 5 bytes.
For example, here is the disassembly of the prologue of the Windows API function RegSetValueExW:
[cpp]
_RegSetValueExW@24:
push 28h 6A 28
push 75835CC0h 68 C0 5C 83 75
...
...
[/cpp]
The code bytes are on the right hand side. It can be seen that if we overwrote 5 bytes we would overwrite into the middle of the second push instruction. Disassembling this shows us that we need to copy 7 bytes into the trampoline to capture whole instructions and then pad out the JMP instruction we insert with two NOP instructions, which are 1 byte each. The hooked RegSetValueExW would look like this after it was hooked.
[cpp]
_RegSetValueExW@24:
jmp 1234567h E9 67 45 23 01
nop 90
nop 90
...
...
[/cpp]
To do this we could write our own disassembler, or use something like BeaEngine or Udis86 both open source disassembler implementations.
Now we’ve hooked the function and copied the original code into our trampoline, we need to set the trampoline to jump back into to the target function just after our inserted JMP instruction.
[cpp]
Trampoline:
push 28h 6A 28
push 75835CC0h 68 C0 5C 83 75
jmp <address of instruction after JMP>
...
[/cpp]
If we don’t want to use a JMP for this and calculate a relative offset, we can potentially push a value onto the stack and use RET to return directly to that address.
Hot Patching
Some components of Windows, for example, Kernel32.dll have been built by Microsoft with hot patching enabled . What this means is that some space has been left for inserting a jmp instruction in order to patch the code at runtime to jump to an updated version of the function. In some cases it may be possible to use this hot patch area to contain our hook.
For example, here is the disassembly of GetProcAddress in Kernel32.dll and some of the surrounding code:
[cpp]
nop 90
nop 90
nop 90
nop 90
nop 90
_GetProcAddressStub@8:
mov edi,edi 8B FF
push ebp 55
mov ebp,esp 8B EC
pop ebp 5D
...
[/cpp]
Notice that the first instruction is MOV EDI, EDI which is essentially a NOP. Also notice that the five code bytes before the function are all NOP instructions.
We can hook this by first changing the MOV EDI, EDI instruction to a short relative JMP which takes only two bytes and jump backwards to the start of the NOP instructions, which is 5 bytes away. Taking into account the instruction length of the JMP it will be 7 bytes, so -7 bytes which is 0xFB in hex:
[cpp]
nop 90
nop 90
nop 90
nop 90
nop 90
_GetProcAddressStub@8:
jmp fbh EB FB
push ebp 55
mov ebp,esp 8B EC
pop ebp 5D
...
[/cpp]
Now we can just replace the five NOP instructions with a near JMP relative as before.
It would be quite simple to check for this pattern when hooking functions and take advantage of this hot patch area if it exists.
64Bit Function Hooking
On x64 platforms, the same hooking technique can be used. It only works however if the callback function and trampoline are within 2GB distance of the hooked function as the jump instruction can only take a jump offset operand that uses 32 bits.
Using the VirtualQuery API call free memory blocks can be located within 2GB of the target function and then a redirection stub can be placed here which uses a longer format jump to reach the user function.
[cpp]
UINT_PTR currentAddress = (UINT_PTR )pTargetFunctionAddress + TWO_GB;
MEMORY_BASIC_INFORMATION memInfo = { 0 };
PVOID pTrampoline = NULL;
do
{
if (VirtualQuery((LPVOID)currentAddress, &memInfo, trampolineSize))
{
if (memInfo.State == MEM_FREE)
{
pTrampoline = VirtualAlloc((LPVOID)currentAddress,
trampolineSize,
MEM_RESERVE |
MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
}
}
currentAddress -= PAGE_SIZE;
} while (pTrampoline == NULL &&
(UINT_PTR)pTargetFunctionAddress < currentAddress);
[/cpp]
For the trampoline and redirection stub, we would need a non-relative jump so that the user function which calls the trampoline can reside anywhere in the addressable range. One possible option for constructing a longer jump is to push a 64bit address onto the stack and use a RET instruction to return to the target address.
We can push a 64 bit address directly by making use of a register, we first save the register on the stack and then put our return address into it. Then we can exchange the register and the stack location to restore the previous register state before returning:
[cpp]
push rax 50
movabs rax,0xAAAAAAAAAAAAAAAA 48 b8 AA AA AA AA AA AA AA AA
xchg QWORD PTR [rsp],rax 48 87 04 24
ret
[/cpp]
Another possibility, detailed by Nikolay Igotti pushes a 32 bit immediate value onto the stack the lower part of the address, this type of push instruction actually reserves 64 bits on the stack in order to preserve alignment. Then remaining 32bits of the address can be moved into the additional stack space reserved by the push instruction. Finally RET is called which will pop the value from the stack and jump to the address:
[cpp]
push 0xAAAAAAAAA 68 AA AA AA AA
mov DWORD PTR [rsp+0x4],0xBBBBBBBB c7 44 24 04 BB BB BB BB
ret
[/cpp]
The advantage with Nikolay’s method is that it uses only 14 bytes and can be reduced further if we know the target address is in a 4GB range since the top half of the address wouldn’t need to be written in that case.
This longer jump could be used to jump to the user function, however it is easier to try and first allocate some memory for a redirector stub within 2Gb of the target function as the target function may contain a relative jump instruction in the bytes that are copied to the trampoline. This would then need to be “fixed up” to jump to the correct location which could be problematic.
Int 3 Hooking Method
Another method which could be used for hooking, is to place an “int 3” break instruction into the code instead of a jmp instruction. This will cause an exception to be raised when the int 3 instruction is executed.
If we install a vectored exception handler using the windows API call AddVectoredExceptionHandler we can trap the exception before any other exception handler, and move execution to a user callback function:
[cpp]
PVOID WINAPI AddVectoredExceptionHandler(
_In_ ULONG FirstHandler,
_In_ PVECTORED_EXCEPTION_HANDLER VectoredHandler
);
[/cpp]
This method has the benefit of only requiring only a single byte to be replaced in the target as int 3 is a single byte instruction. For example, hooking RegSetValueExW as in the previous examples:
[cpp]
_RegSetValueExW@24:
push 28h 6A 28
push 75835CC0h 68 C0 5C 83 75
...
...
[/cpp]
The downside to this method is that the target program, or debugger could add its own vectored exception handler, which may be called before ours and may not pass the exception to our own handler. The result could be a program crash.
[cpp]
_RegSetValueExW@24:
int 3 CC
nop 90
push 75835CC0h 68 C0 5C 83 75
...
[/cpp]
Other Considerations
- As briefly mentioned earlier, if the function prologue code contains a relative JMP instruction that we will overwrite, simply copying this to the trampoline function will not work since the copied JMP will then be jumping relative to the trampoline and not the original function. In this case it may be necessary to re-calculate the jump offset in the copied function prologue if possible
- Some functions may consist of less than five bytes of executable code, for example the following code, while not particularly useful is still a valid function:
[cpp]
nop 90
ret c3
[/cpp]
Int3 hooking could be used to hook this, or if hot patching is enabled that area could be used. It's also possible to use IAT hooking, which will be discussed in an additional blog post.
- A target function may have already been hooked – we could check for this by examining the code that resides on disc and comparing it to the function in memory. If a difference is found, the original hook could first be removed by copying the code from disc then our new hook can be added. Although removing someone else’s hook may cause errors if they attempts to remove it at a later date after we have already overwritten it. Also bear in mind that if you’re dealing with a malicious program, it may be returning the incorrect data from disc in order to hide its hook
- Threads may be running in the target process that are calling functions as you’re patching them, so it would be wise to suspend all other threads in a process before implanting the hook and resume them afterwards
- VirtualAlloc, VirtualFree, VirtualProtect and VirtualQuery API calls as well as the memcpy API call will most likely be used internally by any hooking engine. If these are to be hooked by the user, then it may be necessary to make a copy of these functions in memory, or have special handling code for these if the user attempts to hook them. The memcpy function can usually be replaced by a compiler intrinsic such as __movsb.