Stack Frames, Calling a Function, and the Function Prologue
Previously Read:
The Stack
Key Terminology:
.text section
Stack Frame
Function Prologue
Function Epilogue
Offset
Base Pointer
Return Address
What Is A Stack Frame
In order to understand and participate in vulnerability research, it's crucial to understand how a ‘stack frame’ is laid out on the stack when a function is called, as well as understanding some of the more important sections of a ‘compiled’ file. This helps in understanding what values are possible targets for exploitation, and the implications of various exploits and techniques we may try to implement later on.
When a function is called, various arguments, values, and local variables are PUSHed onto the stack. This combination of data that is temporarily stored on the stack when a function is called, is called the ‘stack frame’, this tutorial will take a semi-deep beginners level dive into understanding stack frames.
Note that there are multiple ‘calling conventions’ that are commonly used when handling stack frame, this tutorial focuses on STDCALL to keep the tutorial easy. After reading this it is probably a good idea to quickly pull up a Stack Overflow article on the difference between different calling conventions (https://stackoverflow.com/questions/949862/what-are-the-different-calling-conventions-in-c-c-and-what-do-each-mean).
Important Sections Of A Compiled File
When human written/readable code is 'compiled', the compiler takes that code and turns it into binary code (hence why compiled files are often called 'binaries'). Among many things, it strips the code of any human readable variable names, and shuffles things around to make it more efficient for a computer to process. This is awesome for computers, but can be confusing for humans, luckily there are standardized ways that compilers do this. These standards dictate that there are different 'sections' of a compiled file that need to exist, and for a vulnerability researcher, there are 3 sections that are particularly important.
.text
This section contains all of the programs actual 'code' that will be executed in the form of assembly instructions. The .text section contains instructions like: MOV, PUSH, POP, ADD, SUB, LEA, etc. All of the program's functions are contained somewhere in the .text section, this does not include library functions which are imported from libraries or other files.
.data
External variables that have values that are already known are stored here. For example, something like:
int number = 5;
would be stored in the .data section. The compiler was able to determine 'number's' values while it was compiling, then it did any of the computations it needed about that, and then put that value in the .data section. Variables of other types that have a known value are all treated in this same way.
.bss
External variables that don't have a known value are stored here. For example, something like:
int number;
would be placed in the .bss section. The compiler knows that it needs to save a few bytes aside (an int's size is variable depending on the type of system) for an int to be stored, but it doesn't yet know its value. Most likely, later on as the program runs, it will do some math that produces this int value, and it will then store it there. A common phrase that people use to remember this section by is the 'better save space' section.
Calling A Function With CALL
While a program is running, it keeps a running bookmark of what the next instruction to be executed is. The 'Instruction Pointer' keeps track of what instruction is next, and almost exclusively points at instructions within the .text section. This often means that it is just increment slightly repentantly, walking through multiple instructions that are right next to each other in the .text section. However, the CALL instruction allows the computer to go from executing one set of instructions within the .text section, to hopping to a complete other different set of instructions in the .text section, just as using a function does in a higher level language.
'Calling a function' relies on the CALL instruction, it does 2 main things:
It pushes the value stored in the instruction pointer onto the stack, this is a memory address of the instruction it planned on executing next. The reason the CALL instruction does this is that eventually, we will want to return and continue executing that code, but for now we are hopping elsewhere in the .text section to run a different function, so storing this address on the stack to return to will help the program return later. This address is called the 'return address'.
Changes the instruction pointer to point to the new set of instructions we want to execute.
In summary, the CALL instruction pushes the address of the next instruction onto the stack, and then jumps to a new area of the .text section to run a completely different function. It is very important to remember that the CALL instruction pushes the return address onto the stack before jumping to and executing the new function. This means that the first thing that always happens when a function is called, and the first part of a stack frame that is laid out on the stack, is a return address.
Function Prologue
As was introduced before, as a function runs, local variables and 'other values' (to be introduced later) are PUSHed onto and POPed off of the stack. These 'other values' include the return address that was just discussed, and (as we are about to discuss) the base pointer. Now that our previous CALL instruction has stored the return address onto the stack so that we can eventually return to the previous code we we're executing, we turn our attention to the actual function to be executed. Every function that is executed starts off in a standardized way with a collection of instructions called the 'function prologue'. To be clear, this means that if you were to read though the .text section of a program, you would spot multiple functions that all have this function prologue blurb as a sort of preamble that happens before they execute what the author actually wrote in the pre-compiled code. The function prologue is responsible for setting up the stack so that the function that is about to be executed can appropriately use the stack. A function prologue happens immediately after a CALL instruction as the are the first instructions in any function, it contains 3 instructions:
Push the 'base address' onto the stack - PUSH EBP - Before anything more happens with the stack, the base pointer is pushed onto the stack. At this time, EBP contains the base pointer for the previous function. This needs to be temporarily stored on the stack at this time because the value stored in the EBP register is about to be updated for the new function we're preparing to execute. After this happens, the stack frame now has (in this order):
the return address to the previous function
the base address to the previous function.
Because a CALL instruction and the proceeding function prologue happen for every function that is about to be executed, every function that is called by a program subsequently has this set up.
Move the current stack pointer into EBP - MOV EBP, ESP - Every time a function is called, the current top of the stack becomes the new base, and then the stack grows from its new offset. In practice, this is because the stack pointer always points to the top of the stack, and therefore is only incremented as items are PUSHed onto the stack. This means that if we want to start putting more stuff on the stack without overwriting the stuff that is already there, we start doing so at - the top of the stack, exactly where the stack pointer points. So, the new base pointer is set to equal the top of the stack, and will be used as the new base offset as our stack continues to grow in preparation for serving as temporary memory for the function that is about to be executed.
Grow the stack pointer further to make more room for local variables - SUB ESP - At this point in time, the
return address has been PUSHed onto the stack
the previous base pointer has been PUSHed onto the stack
the new base pointer has been set to the top of the stack to serve as our new base for offsets.
The only thing that’s left to do is to grow the stack pointer further to make room between the base pointer and the top of the stack where the program can store local variables and other information within this stack frame. As was described in ‘The Stack’ tutorial, because the stack grows downwards towards 0x00000000, growing the stack further actually means subtracting from the stack pointer in order to pull it further downwards. At this point in time the stack pointer is subtracted enough to make room for the local variables it will need to temporarily store during the upcoming function’s execution.
At this time the stack looks like the following:
[INSERT PICTURE]
The Order Of Local Variables Within The Stack Frame
Determining the order of local variables just by reading the pre-compiled code can be tricky, and realistically if you ever need to know this in a real world example, you should just examine the program in question in a debugger (like GDB). This is sometimes important if there is a specific variable value that needs to be changed, or if the ordering determines what might or might not be altered in the course of an exploit.
If there are no security measures, and no optimizations in place, the local variables will be ordered in the same order that they are in the human written code. For example:
int i = 5;
char buffer[8] = { 0x00 };
int j = 10;
Given the above local variable, growing downwards the stack would look as follows:
4 byte return address
4 byte base pointer address
4 byte int 5
8 byte array buffer
4 byte int 10
This is the most common layout that you should expect to see in ’Capture The Flag’ challenges and practice problems.
However, with more realistic programs the order of which the variables are allocated space is often reversed; or in some cases jumbled in order to optimize the amount of space used. Different security and optimization compiler options can (and will) jumble the order of these.
No matter what order they are in, it’s important to keep in mind that variables grow upwards away from the top of the stack. That means that if an array is declared as a local variable, the 0th element will be the closest to the top of the stack, and as you go up in array elements, you get closer to the base pointer and return address. This is useful for overflow exploits!
Function Arguments
Up until this point we’ve just ignored the fact that most functions take arguments. While there are different standards for how compilers handle function arguments, there are generally two common ways this happens.
Most commonly arguments for a function are PUSHed onto the stack from right to left as they are declared in the author’s code before the CALL instruction is even processed. This means that, while local variables are usually at negative offsets from the base pointer, function arguments are at positive offsets. For example, a local int might be at EBP-0x8 and a local array might start at EBP-0x64, however a function’s argument might be found at EBP+0x16.
Another way that some ‘calling conventions’ pass arguments (such as FASTCALL) is by storing them in a register., thus saving time.
Return Value
The commonly accepted way to return a value from a function is to store it in the general purpose EAX register.
Function Epilogue
Depending on which ‘calling convention’ is being used, the called function (the ‘callee’) might be required to clean up the stack after itself, or the calling function (the ‘caller’) might be required to do so. Most commonly at a beginner level you should expect to see the former, where the function prologue and epilogue happen both within the called function.
As you might have guessed, the function epilogue just does the exact opposite of the function prologue, it sets the stack back to how it was before the function prologue (and however the executed function left it) took place. This once again takes three instruction:
1Move the current EBP into the stack pointer - MOV ESP, EBP - Putting the stack pointer back where it was before the function was called.
POP the previous base address off the stack back into EBP - POP EBP - This restores the previous function’s base pointer.
Return to executing the previous function - RET - This is more or less the exact opposite of a CALL instruction. It POPs the return address of the stack, and then returns the program execution to that address.
Summary: Stack Frames, Calling a Function, and the Function Prologue
When a function is called, it’s return address is PUSHed onto the stack with a CALL instruction. Then the function prologue is executed, which takes the previous function’s base address and PUSHes onto the stack, as well as subtracting from the stack pointer in order to make room for local variables. This combination of the stack pointer and base pointer, and local variables is called the ‘stack frame’. Further, if a function has arguments, they are, depending on the calling convention, pushed onto the stack from right to left before the function is even called and are also considered part of the stack frame.
Practice Exercise:
Compile this C code yourself using GCC and inspect the ‘stack_frame_example’ binay yourself in GDB. Find the return address and the base pointer for whacky_function(). You should research how to use GDB, to get you started. some useful gdb commands for this will be:
Commands Examples
disass [FUNCTION NAME] disass main
start start
info registers i r
x/[INSERT NUMBER]xb $[INSERT REGISTER] x/4xb $ebp
si si
c c
bp bp
Compiler options:
gcc stack_frame_example.c -o stack_frame_example -m32 -fno-stack-protector -O0
Code:
#include <stdio.h>
void whacky_function(void)
{
int a = 5;
int b = 2;
Int c = 0;
c = a + b;
printf(“This whacky function prints! %d\n”, c);
}
int main()
{
int y = 20;
int z = 30;
whacky_function();
return 0;
}