Lab 4: GDBFall 2024

due Thursday, October 31 at 6 p.m.

This lab will introduce you to memory layout and GDB concepts that are critical for performing buffer-overflow attacks in project 4. You will not be actively attacking code in this lab, but instead dissecting an innocent C file to understand how it appears in memory when compiled and during execution.

Setup

Memory layout and buffer-overflow exploitation depends on details of the target system. For this lab and the corresponding project, you must create your solutions inside the Project 4 VM, as it has been configured to standardize the stack layout and disable certain security features that would complicate your work.

Follow the setup instructions on the Project 4 VM page. You only need to do this once; if you follow the guide for this lab, you can use the same VM and workflow for the project.
Check out your starter code from the GitHub template inside the VM. You must do this in a folder in the native Linux filesystem. It won’t work correctly if you use a shared folder located in the host OS.

You do not need to worry about running ./build.sh or ./test.sh for the lab.

Resources

GDB

You will make use of the GDB debugger for dynamic analysis within the VM, which you should recall from EECS 280. Useful commands that you may not know are disassemble, info reg, x, and stepi. See the GDB help for details, and don’t be afraid to experiment. The GDB reference sheet may also be useful.

x86_64 assembly

These are many good references for Intel’s assembly language, but note that our project targets use the 64-bit x86_64 ISA (sometimes abbreviated to x64), not the older 32-bit x86 ISA. The stack is organized differently in x86_64 and x86. If you are reading any online documentation, ensure that it is based on the x86_64 architecture, not x86.

Also note that there are 2 different syntaxes for this assembly language, known as Intel syntax and AT&T syntax. They’re just 2 ways of expressing the same code. In this class we’re always using Intel syntax, but keep in mind that online resources might be using AT&T syntax. You can tell which is which because AT&T syntax uses percent signs (%) everywhere and Intel syntax doesn’t.

Big versus Little Endian

The final task in this lab will involve endianness. Refer to this guide if you are unfamiliar with endianness or need a refresher. Also, there are helpful images you can find via Google that visually diagram this concept.

Tasks

You will write all answers in submit.toml. The Even Better TOML VS Code extension is recommended here. Make sure that it’s installed in the VM (not just locally).

Part 1 - Examine assembly code

Open a terminal within VS Code and start GDB on the lab4 compiled binary ($ gdb lab4). Using GDB’s disas (shorthand for disassemble) command, answer the following questions regarding the assembly code:

Task 1 Where in memory is the line of assembly code that makes a call to foo?

Task 2 Where in memory does the function foo begin?

Task 3 This code uses the printf function from libc. Where in memory does this function begin?

Takeaway: Be mindful of the difference between the address of the line that calls a function versus the address of the function itself.

Part 2 - Peer into stack during execution

Project 4 will require extensive use of GDB’s x command to look at stack contents at various stages of execution. First, you will need to set a breakpoint at your desired location, then run the binary. Assuming you have opened GDB with the compiled lab4 binary ($ gdb lab4), use the following commands:

(gdb) break [address/function]
(gdb) run

The break parameter can be either an explicit address (such as one of your submissions in the previous task) in the form *0x################ or the name of a function.

Refer to the GDB reference sheet as well as the lab slides for various ways to print the stack using x.

Task 4 Set a breakpoint at foo. Run the code. At the start of the foo function, what is the hexadecimal value of the byte stored at $rbp + 8?

Continue to the end of execution

(gdb) c

Remove your previously set breakpoint

(gdb) delete

Task 5 Disassemble foo. Look for the line that calls printf and put a breakpoint at that instruction. The breakpoint should be in foo, not in printf. Run the code, then examine memory using the x command at $rsp + n where n is replaced with an integer of your choice (start with 0 if you’re unsure). Locate where the word “THIS” is stored in memory. Change the value of n so that the memory dump begins exactly at the captial “T”. What value of n makes this happen?

Takeaway: The x command is extremely versatile. Make use of its features when constructing and debugging an attack.

Part 3 - Endianness

Keeping track of big versus little endian can get confusing when examining the stack. The final portion of this lab will familiarize you with the output style from different GBD x commands regarding endianness. Note that x86_64 uses little-endian byte ordering.

Continue to the end of previous execution and remove breakpoints again.

The function foo multiplies the variable num by 0x16b28f9. Disassemble foo and see if you can find the instruction that performs this multiplication. The next instruction moves the result of this calculation back into memory. Set a breakpoint right after it is moved into memory. Now run the code, then examine the stack in the following two ways:

(gdb) x/1gx $rsp
(gdb) x/8bx $rsp

In both cases, we are viewing the same bytes, but you’ll realize they appear to be in a different order. Printing the bytes in giant word format clumps every 8 bytes together, interprets them as a little-endian integer, and displays them in the way we’re used to reading hexadecimal numbers (big-endian). Printing the individual bytes displays how the memory is actually configured (little-endian in our case for x86_64).

Examine the disassembly of foo to figure out where in memory it keeps the value of num. Try reading the memory there instead of at $rsp.

Task 6 What is the value of num?

Task 7 What are the values of the raw bytes that represent num (in order)?

Takeaway: While giant word format is more consise, when in doubt use individual byte format printing as it shows what the stack space actually looks like.

Submission

Submit the following file to the Autograder by the deadline:

submit.toml