The Stack
Previously Read:
N/A
Key Terminology:
Last In First out (LIFO)
The Stack
Registers
Instruction Pointer
Base Pointer
Stack Pointer
What Is The Stack
"The stack is the memory set aside as scratch space" that a program needs to run (https://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap). Simply put, the stack is a giant chunk of memory that a program can temporarily store things in whenever it needs. More precisely, the stack is a ‘Last In First Out’ (LIFO) data structure.
The classic analogy of a ‘Last In First Out’ (LIFO) data structure is a stack of plates. Individual items, in this analogy plates, are stacked on top of each other. Every time a new plate is added, it is placed on the top of the stack of plates. Then, whenever a plate is needed, you can't just grab one from the middle because they would all come crashing down, instead you must grab the plate at the top. If you want a plate that is further down, you must remove all the plates above it until you work your way down to the plate you want.
Translating this into x86 NASM, you cannot 'PUSH' things onto the stack and 'POP' them off the stack willy nilly. Just like the stack of plates, when something is pushed onto the stack, it is placed on top of everything else. Likewise, a POP instruction will always pop the top most item off the stack. This means that the last item that was placed on the stack, will still be at the top of the stack and easiest to access, and will be the first item popped off the stack whenever something wants to POP something from the stack.
Programs will commonly PUSH a value onto the stack they want to preserve before running off with some calculations, and then later return to POP it back into a register and use it once again.
Something that the plate analogy does not communicate well is that much like an icicle, **the stack grows downwards from 0xffffffff to 0x00000000**, so the 'top' of the stack is really referring to the tip of the stack as it grows downwards. For example, if we had an icicle hanging from a roof 20 meters in the air, as it ‘grows’, it would grow downwards to the ground (0 meters).
Another important note, do not confuse the program that is being executed with the stack, it is its own separate memory structure managed by the computer. In fact, the stack and the program are usually very far away from each other (the sections of a program are covered in a different tutorial). An analog example would be the difference between a student's textbook and their scratch paper used for solving problems, one is a holistically compiled item, the other is a temporary space managed by the student. Yes the student needs both of these things to do their homework properly, but they are two distinctly different things.
How Is The Stack Used
So you now have this weird analogy in your head of a stack of plates, cool, we don’t work in a kitchen so why is this useful? The last section explained *what the stack is*, but that means little to the unfamiliar. In order to understand how it's used, we can consider another analogy.
Think about how a student works on a school project. They may start by reading an article in an internet browser, then half way through the article realize that in order to understand this article, they need to read another article. So they write down a note about what they were just doing in their notebook, and then open up a new tab and start reading the new article. Then they realize they need to find a different article for their research. Once again they write down a note about what they were doing, and open up a new tab. Well, sooner or later the student has 40 tabs open, and a ton of different notes about doing things that turned into other things. This is more or less how a program calling functions, which often call yet more functions, work. Also, these tabs are a LIFO structure. If the student got frustrated and started deleting random tabs and erasing their notes, they would get very lost very quickly.
Every time the student finishes using a tab, they close it, and return to the previous tab they had open. They look down at their notes about what they were doing in that tab, and continue. Eventually, they keep finishing work and closing tabs, returning to the tab that was used before that one. Then, finally, after a long time and many tabs, the student returns to their original tab and their original note. Now the project is complete! This is exactly how the stack is used by programs, every time a function is called (a new tab in our analogy) new information and variables are stored onto the stack, until that function calls yet another function and this happens again.
Important Registers
The stack is neat and all, but it sounds like one giant confusing stack of different values. How is it used by a program, and how is it kept organized? The answer to all of these questions is: *registers*. CPU's use registers to quickly store data physically close to the processor, however these are expensive to make, and therefore there are only a few registers on any given CPU. Specifically, there are 3 registers that are of vital importance because of the values that they hold.
Instruction Pointer
This value is stored in the EIP/RIP register
When a computer is executing a program, it needs to keep track of where it is. Much like pointing your finger at words on a book as you read, there is a register that the computer uses to keep track of what assembly instruction is to be executed next. It is constantly updating a bookmark of precisely where program execution is. To be clear, the instruction pointer almost never points at the stack itself, and instead points somewhere (very far away) within the program itself. However this value is often stored on the stack, and is the target of many beginner exploits. For these reasons and more, it is both a very important value to know and a very important value for the computer when it is executing a program.
Base Pointer
This value is stored in the EBP/RBP register
As things are pushed onto and popped off of the stack, it can become hard (and more importantly inefficient) to keep track of. With the 'physical plate stack' analogy, it is very easy to tell all of the plates apart, as they are physically different items, this is obviously not the case with code. Every time the program starts executing a new function, it needs to store yet more information and variables on the stack. To keep track of all this, a new 'base' is set every time a new function is called. This is identical to how 'downs' in American football. Where every time a new 'down' is scored (achieved by moving the ball forward 10 yards), a new base is set at where the ball was moved to, and the team must once again move it 10 more yards from that new location. This is convenient on the stack because it allows us to talk in 'offsets' instead of absolute value addresses. For example, a fan watching an American football game wouldn't exclaim "they moved it from the 42 line all the way up to the 16 line, wow!", instead they would just say "they moved it 26 yards", meaning they moved it 26 yards from the base pointer. This is very common on the stack, instead of saying a local variable exists at 0x565580a0, we say it exists at EBP-0x16 (note '0x' denotes hex notation).
Stack Pointer
This value is stored in ESP/RSP register
So we now know that we are constantly keeping track of what instructions we're executing, and we are periodically updating our 'base' on the stack to make finding things on the stack easier, but we still don't have a quick way to find the top of the stack. Every time something is pushed onto or popped off of the stack, it is from the top, and because of this the stack grows and shrinks as it's needed throughout program execution. To facilitate pushing things onto and popping things off of the stack, we need something that always points to the top of the stack. This is why we have the 'stack pointer', which always stays pointing at the top of the stack.
A final note on these registers. You should know that there are physical registers within the CPU, and then there are the values they hold. Yes they are distinct separate things, but they are usually referred to as one item. For example, "if someone asks you, where does ESP point", they clearly mean the Stack Pointer that is stored with the ESP register.
Summary: The Stack
The Stack is a large chunk of space that programs use for temporary storage as they run, it is organized as a First In Last Out (LIFO) data structure. Things are PUSHed onto the stack and POPed off of the stack. The ‘top’ of the stack, as it grows downwards towards 0x00000000, is tracked in the ESP register. For further efficiency, a base pointer is always stored in the EBP register in order to allow us to locate items on the stack using offsets as opposed to absolute values.
Practice Exercise:
The purpose of this practice exercise is just to introduce GDB, and the practice exercise format. Unlike many resources that are available to beginners to VR, this is not a Capture The Flag (CTF) challenge. You should not be avoiding looking up answers or obvious help. Look up all the resources you need, and just generally get comfortable with the tools involved in what this practice problem asks. The only goals here are to make sure you understand all of the above text, and to provide a context example where you can implement and inspect the concepts discussed.
Compile this C code yourself using GCC and inspect the ‘stack_example’ binay yourself in GDB. Practice setting a breakpoint, and inspecting the stack, and registers, at that break point. You should research how to use GDB, to get you started. some useful gdb commands for this will be:
Commands Examples
disass [FUNCTION NAME] disass main
start start
info registers i r
x/[INSERT NUMBER]xb $[INSERT REGISTER] x/4xb $ebp
si si
c c
bp bp
Compiler options:
gcc stack_example.c -o stack_example -m32 -fno-stack-protector -O0
Code:
#include <stdio.h>
int main()
{
int y = 20;
int z = 30;
int answer = 0;
answer = y + z;
puts(answer);
return 0;
}