American fuzzy lop – a security-oriented fuzzer

XMPPwocky · on Jan 28, 2020

AFL is neat because it's conceptually simple, but practically revolutionized fuzzing, especially for non-specialists. The basic algorithm is

  def run_test_case(input):
    # return a set() of instructions that the target executes when ran on `input`
    # or throw a CrashException if the target crashes
  
  def mutate(input):
    # mess with the input- flip some bits, delete chunks, set things to 0xffffffff... randomly
    # return the mutated input

  def fuzz(initial_test_cases):
    test_cases = initial_test_cases
    coverage_seen = set()
    # collect coverage from the initial inputs
    for case in test_cases:
      coverage_seen += run_test_case(case)

    while True:
      fuzzed = mutate(random.choice(test_cases))
      try:
        new_coverage = run_test_case(fuzzed) - coverage_seen
        if new_coverage:
          # ooh, this input did something we've never seen before!
          # save it, so it can be used as a starting point
          # for even more mutation
          test_cases.add(fuzzed)
          coverage_seen += new_coverage
      except CrashException:
          # we successfully crashed the target!
          # save fuzzed off to disk or something and log a happy message

There's more to it in practice, of course- for example, run_test_case doesn't return a set of instructions, it's a bitmap / psuedo-Bloom filter of basic blocks hit. (A basic block, to a first approximation, is "a sequence of instructions that doesn't have any unusual control flow" - so if you run the first instruction in a basic block, you'll run all the rest.) And there's a fair bit of complexity involved for performance reasons.

But the core algorithm is simple enough that you can actually implement it in pure Python, for Python programs, in < 100LOC - and that implementation will find real bugs in a lot of Python libraries.

toasted_flakes · on Jan 28, 2020

Nitpick, but what made AFL much more successful than other fuzzers is that it's not collecting coverage of basic blocks, but a (quantized and bloom-ish filtered) count of basic block transitions from block X to block Y.

XMPPwocky · on Jan 28, 2020

And speaking of its simplicity - the original code to instrument programs involves, uh... doing this on assembly files. https://github.com/google/AFL/blob/master/afl-as.c#L284

Makes a pretty good case for "worse is better", if you like.

f- · on Jan 28, 2020

(Author here)

The funniest part is that this ugly hack kept working across platforms for many years; whereas when somebody else implemented a "proper" integration with the clang / llvm API, their solution proved to be extremely fragile. The API wasn't stable between compiler versions, and because it wasn't really used much, it had all kinds of bugs, including being outright unusable at times.

Also, most distros packaged clang in a way that made it impossible to compile the plugin, because of missing or mismatched headers, missing companions tools, etc. So you had to download and rebuild the whole compiler, which took hours (and that's if you didn't get stuck in a dependency hell).

So yeah, this was very much a lesson in "worse is better".

touisteur · on Jan 28, 2020

+ it allows working with GCC-based compilers (GNAT for example) and mixed C/Ada code bases.

Although the instrumentation via asm patching works well in most cases, it can break down in strange ways. See (shameless plug) : https://blog.adacore.com/running-american-fuzzy-lop-on-your-...

Amazing tool, quite extensible (adapted it to work only in memory+tmpfs without touching disk - specific corner-case... Very easily) and readable. It's also funny to scale to multiple cores and multiple machines.

jcims · on Jan 28, 2020

One other benefit for me personally is that finding this little chunk of code really demystified a lot of what was happening. It’s been a few years but I still recall the aha tingles from the whole thing.

I suck at assembly but this proved to be a nice spot to hack away at it (with copious googling) to even further improve my understanding of not only the instrumentation but the effects that mutations had on program execution.

So for me worse was definitely better.

cjbprime · on Jan 28, 2020

Neat! I hear it's becoming more common to use non-random mutations. Are people using something other than AFL for those, or is there a way to bolt that on to AFL?

kodablah · on Jan 28, 2020

Last I reviewed, the mutations only fell through to "random havoc" after other stages were tried. So for each mutation, there are deterministic bit flips, number adds, "interesting" numbers set, dictionary uses, and only _then_ does it start the random havoc stage on the mutation. So it's not all random and you can start it with whatever value you want via dictionary or other.

(I kinda made a Java version https://github.com/cretz/javan-warty-pig that attempts to mimic this logic)

alltakendamned · on Jan 28, 2020

I suggest using AFLPlusPus instead, a maintained fork of AFL with a lot of community patches further extending it's performance and capabilities.

https://github.com/vanhauser-thc/AFLplusplus

AFL itself hasn't seen updates in years.

f- · on Jan 28, 2020

One of my design goals for AFL was to make it very simple to use - because there's plenty of fuzzers that work OK when you dial in 50 knobs just right, but fail spectacularly otherwise - basically ensuring that nobody but the author can really use the tool to its full capacity.

While AFL++ is cool, it sort of ditches that philosophy, giving you a lot of options to tweak, but not necessarily a whole lot of hope that you're going to tweak them the right way. So, that's one gotcha to keep in mind.

jonhohle · on Jan 28, 2020

This shouldn't be understated. I went from having never run a fuzzer to having a 16-node run executing for thousands of machine hours on a somewhat unfamiliar C++ codebase with minimal effort.

Both the tool and the documentation made it easy for me jump in, identify bugs, write new test cases, implement a fix, and verify the fix passed without issue. I've mentioned it on HN before, but AFL taught me how incredibly difficult it is, even for experts (think most senior engineers at a FAANG) in the field, to write C++ without security vulnerabilities. I was even able to find and fix bugs which were previously reported but no one was able to reproduce reliably.

If there was an AFL t-shirt, I'd wear it ;-)

rurban · on Jan 29, 2020

I just recently got a patch into AFL proper, so they are watching. But I didn't see AFL++ yet, which looks much better. Thanks.

Animats · on Jan 28, 2020

By the way, a "lop" is a lop-eared rabbit, one where the ears droop instead of being upright. The American fuzzy lop is an rather obese domesticated breed, for what that's worth.

tomashubelbauer · on Jan 28, 2020

Holy! I've been calling this software American fuzzy_loop_ for years without ever noticing my mistake.

toasted_flakes · on Jan 28, 2020

The project home page [1] has a lot of information and a "bug trophy case"

[1] http://lcamtuf.coredump.cx/afl/

VBprogrammer · on Jan 28, 2020

Michal Zalewski is one of those people who, by one method or another, seems to be able to do 100x what most people manage to do with their life. I still go back and read his introduction to CNC machining / building robots every year or two. I still get something new out of it every time.

http://lcamtuf.coredump.cx/gcnc/

dang · on Jan 28, 2020

We've changed the URL to that from https://github.com/google/AFL above.

jkaptur · on Jan 28, 2020

AFL might be security-oriented, but I think it should be part of the testing strategy for any software that reads input.

Right now at work I'm modifying an old C++ codebase, and knowing that AFL can't crash it is just one of the main things letting me sleep at night. (When I first set it up, it almost immediately found two minor bugs that were obvious in retrospect).

ris · on Jan 28, 2020

Coincidentally I've actually spent the last couple of weekends putting together a little toolkit to allow fuzzing of cpython bytecode using AFL/AFL++: https://github.com/risicle/cpytraceafl

dang · on Jan 28, 2020

Previous threads:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

I think there have been many more even than these, but can't think of how else to search for them.

devy · on Jan 28, 2020

Why isn't the site that hosts security research tool downloads is NOT an HTTPS site? ¯\_(ツ)_/¯

Chrome browser will rendering non-HTTPS sites as "not secure". And the download links are local.

http://lcamtuf.coredump.cx/afl/