AFL is neat because it's conceptually simple, but practically revolutionized fuzzing, especially for non-specialists. The basic algorithm is
def run_test_case(input):
# return a set() of instructions that the target executes when ran on `input`
# or throw a CrashException if the target crashes
def mutate(input):
# mess with the input- flip some bits, delete chunks, set things to 0xffffffff... randomly
# return the mutated input
def fuzz(initial_test_cases):
test_cases = initial_test_cases
coverage_seen = set()
# collect coverage from the initial inputs
for case in test_cases:
coverage_seen += run_test_case(case)
while True:
fuzzed = mutate(random.choice(test_cases))
try:
new_coverage = run_test_case(fuzzed) - coverage_seen
if new_coverage:
# ooh, this input did something we've never seen before!
# save it, so it can be used as a starting point
# for even more mutation
test_cases.add(fuzzed)
coverage_seen += new_coverage
except CrashException:
# we successfully crashed the target!
# save fuzzed off to disk or something and log a happy message
There's more to it in practice, of course- for example, run_test_case doesn't return a set of instructions, it's a bitmap / psuedo-Bloom filter of basic blocks hit. (A basic block, to a first approximation, is "a sequence of instructions that doesn't have any unusual control flow" - so if you run the first instruction in a basic block, you'll run all the rest.) And there's a fair bit of complexity involved for performance reasons.
But the core algorithm is simple enough that you can actually implement it in pure Python, for Python programs, in < 100LOC - and that implementation will find real bugs in a lot of Python libraries.
Nitpick, but what made AFL much more successful than other fuzzers is that it's not collecting coverage of basic blocks, but a (quantized and bloom-ish filtered) count of basic block transitions from block X to block Y.
The funniest part is that this ugly hack kept working across platforms for many years; whereas when somebody else implemented a "proper" integration with the clang / llvm API, their solution proved to be extremely fragile. The API wasn't stable between compiler versions, and because it wasn't really used much, it had all kinds of bugs, including being outright unusable at times.
Also, most distros packaged clang in a way that made it impossible to compile the plugin, because of missing or mismatched headers, missing companions tools, etc. So you had to download and rebuild the whole compiler, which took hours (and that's if you didn't get stuck in a dependency hell).
So yeah, this was very much a lesson in "worse is better".
Amazing tool, quite extensible (adapted it to work only in memory+tmpfs without touching disk - specific corner-case... Very easily) and readable. It's also funny to scale to multiple cores and multiple machines.
One other benefit for me personally is that finding this little chunk of code really demystified a lot of what was happening. It’s been a few years but I still recall the aha tingles from the whole thing.
I suck at assembly but this proved to be a nice spot to hack away at it (with copious googling) to even further improve my understanding of not only the instrumentation but the effects that mutations had on program execution.
Neat! I hear it's becoming more common to use non-random mutations. Are people using something other than AFL for those, or is there a way to bolt that on to AFL?
Last I reviewed, the mutations only fell through to "random havoc" after other stages were tried. So for each mutation, there are deterministic bit flips, number adds, "interesting" numbers set, dictionary uses, and only _then_ does it start the random havoc stage on the mutation. So it's not all random and you can start it with whatever value you want via dictionary or other.
One of my design goals for AFL was to make it very simple to use - because there's plenty of fuzzers that work OK when you dial in 50 knobs just right, but fail spectacularly otherwise - basically ensuring that nobody but the author can really use the tool to its full capacity.
While AFL++ is cool, it sort of ditches that philosophy, giving you a lot of options to tweak, but not necessarily a whole lot of hope that you're going to tweak them the right way. So, that's one gotcha to keep in mind.
This shouldn't be understated. I went from having never run a fuzzer to having a 16-node run executing for thousands of machine hours on a somewhat unfamiliar C++ codebase with minimal effort.
Both the tool and the documentation made it easy for me jump in, identify bugs, write new test cases, implement a fix, and verify the fix passed without issue. I've mentioned it on HN before, but AFL taught me how incredibly difficult it is, even for experts (think most senior engineers at a FAANG) in the field, to write C++ without security vulnerabilities. I was even able to find and fix bugs which were previously reported but no one was able to reproduce reliably.
By the way, a "lop" is a lop-eared rabbit, one where the ears droop instead of being upright. The American fuzzy lop is an rather obese domesticated breed, for what that's worth.
Michal Zalewski is one of those people who, by one method or another, seems to be able to do 100x what most people manage to do with their life. I still go back and read his introduction to CNC machining / building robots every year or two. I still get something new out of it every time.
AFL might be security-oriented, but I think it should be part of the testing strategy for any software that reads input.
Right now at work I'm modifying an old C++ codebase, and knowing that AFL can't crash it is just one of the main things letting me sleep at night. (When I first set it up, it almost immediately found two minor bugs that were obvious in retrospect).
Coincidentally I've actually spent the last couple of weekends putting together a little toolkit to allow fuzzing of cpython bytecode using AFL/AFL++: https://github.com/risicle/cpytraceafl
But the core algorithm is simple enough that you can actually implement it in pure Python, for Python programs, in < 100LOC - and that implementation will find real bugs in a lot of Python libraries.