Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, or failing built-in code assertions or for finding potential memory leaks.

– Wikipedia

In this post, I explain how to use libFuzzer with a Swift package in four sections:

  1. “Coverage-guided fuzzers” introduces two popular fuzzers and briefly explains how they work
  2. “libFuzzer basics” is a step-by-step tutorial to using libFuzzer in a Swift package
  3. “Fuzzing in practice” offers advice on how to integrate libFuzzer into your workflow and how to use it effectively
  4. “Caveats” discusses the limitations and downsides of fuzz-testing

Coverage-guided fuzzers

A few months ago I decided to use a fuzzer to test a parser. I had heard of American Fuzzy Lop (AFL), so I decided to try it.

AFL is a coverage-guided, evolutionary fuzzer. It works by feeding seemingly random files to your program. It analyzes the code coverage caused by each input file and tries to reach as many areas of the code as possible by mutating the files using a genetic algorithm.

What’s important to understand is that AFL does not know about the purpose of your program nor the kind of inputs it expects. From AFL’s perspective, the files it generates are meaningless binary blobs that just happen to uncover new behavior from your program. The hope is that after a few hundred thousand iterations, it will have found inputs that can trigger obscure corner cases and help you find bugs. Mike Ash experimented with AFL back in 2015, you can read his blog post if you wish to learn more about how it works and how to use it.

In order to analyze the control flow of the program, AFL has to inject special instrumentation into the binary, which it typically does at compile-time. Unfortunately, at the time I tried it, it was not capable of doing so for Swift, which prevented me from using it.

But fortunately, it turns out that the development snapshot of Swift ships with a built-in fuzzer: LLVM’s libFuzzer. LibFuzzer is very similar to AFL, but instead of using files as input, it is linked to your program and directly calls a function inside it (the “fuzzing entry point”). In practice, that makes it faster and less taxing on your hardware.

I was very quickly able to compile my library to use with libFuzzer, and it started finding bugs in seconds. I now think of it as a magical Find Bugs button.


libFuzzer basics

In this section, I’ll give an overview of how to use libFuzzer with a Swift package. I will write every step in detail and provide a sample program so you can follow along easily.

Create a new folder

mkdir FuzzTest
cd FuzzTest

Swift version

I am using the Swift trunk development snapshot of April 4. I’ve had trouble in the past using libFuzzer with other snapshots, so make sure you download this version if you are experiencing any issue. You can use swiftenv to manage multiple versions of Swift.

$ swiftenv local DEVELOPMENT-SNAPSHOT-2018-04-04-a
$ swift --version

Apple Swift version 4.2-dev (LLVM 0d52728a8a, Clang b9ad4239d1, Swift a1eb1fe24c)
Target: x86_64-apple-darwin17.5.0

or use the full path to the binary:

$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2018-04-04-a.xctoolchain/usr/bin/swift --version

Apple Swift version 4.2-dev (LLVM 0d52728a8a, Clang b9ad4239d1, Swift a1eb1fe24c)
Target: x86_64-apple-darwin17.5.0

Creating the Swift package

$ swift package init --type executable

It created an executable target called FuzzTest, and a file Sources/FuzzTest/main.swift, which we are going to modify.

Creating the function that libFuzzer will test

Swift’s integration with libFuzzer is documented in the swift repository here. The first step is to create an entry point function in main.swift:

@_cdecl("LLVMFuzzerTestOneInput") public func fuzzMe(data: UnsafePointer<CChar>, size: CInt) -> CInt {
    // Test our code using provided data.
}

This function will be called with each input created by libFuzzer. Its _cdecl name needs to be LLVMFuzzerTestOneInput, but its Swift name can be anything.

Because I lack imagination, I am going to test the same program as Mike Ash did in his 2015 blog post, which crashes if the input starts with deadbeef.

@_cdecl("LLVMFuzzerTestOneInput") public func fuzzMe(data: UnsafePointer<CChar>, size: CInt) -> CInt {

    let buffer = UnsafeBufferPointer(start: data, count: Int(size))
    guard buffer.count >= 8 else {
        return 1
    }
    if
        buffer[0] == 0x64,
        buffer[1] == 0x65,
        buffer[2] == 0x61,
        buffer[3] == 0x64,
        buffer[4] == 0x62,
        buffer[5] == 0x65,
        buffer[6] == 0x65,
        buffer[7] == 0x66
    {
        fatalError()
    }
    return 0
}

It is written using multiple conditions instead of a single elementsEqual function call to take advantage of the way fuzzers work. Fuzzers struggle to find long magic strings, but they are very good at exploring nested branches.

Compiling

We can now compile this program using swift build and passing two extra arguments to swiftc:

  • -sanitize=fuzzer to link libFuzzer and add the necessary instrumentation to the binary
  • -parse-as-library to avoid inserting the default main entry point and use the fuzzer entry point instead
swift build -c release -Xswiftc -sanitize=fuzzer -Xswiftc -parse-as-library

Compile Swift Module 'FuzzTest' (1 sources)
Linking ./.build/x86_64-apple-macosx10.10/release/FuzzTest

Launching the fuzzer

The binary created by swift-build is the fuzzing tool itself. You can now try to launch it, without any options, to test our program.

./.build/x86_64-apple-macosx10.10/release/FuzzTest

After a few seconds at most, it finds a string starting with deadbeef that triggers a crash.

Fatal error: : file /Users/loic/Projects/FuzzTest/Sources/FuzzTest/main.swift, line 18
==20328== ERROR: libFuzzer: deadly signal
NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 1 ChangeBinInt-; base unit: ff6537e1c2cece46456ecb9f2cdbe4269af33e8d
0x64,0x65,0x61,0x64,0x62,0x65,0x65,0x66,0x5e,0xa0,
deadbeef^\xa0
artifact_prefix='./'; Test unit written to ./crash-6364b0ad2035c3407a97b1b0267cb601145d3b1d
Base64: ZGVhZGJlZWZeoA==

and it created a file crash-.... that contains the problematic input. This is kind of amazing considering that AFL reportedly took a couple of hours to find the same crash in Mike Ash’s blog post. This difference in speed actually makes it possible to integrate fuzzing into our daily testing routine.


Fuzzing in practice

So it works very well for this toy program that was specifically written to be fuzzer-friendly, but is it really useful when testing more complex libraries? And how can I integrate libFuzzer into my workflow? To answer these questions, I am going to create a new target in my Swift package. It is a library that parses and evaluates mathematical expressions written in prefix notation, such as:

* + 1 2 / 7 2
fibonacci 7
minimum 5 13 652 -2 maximum 2 1 7 9

Fuzzing corpora

It is recommended to maintain folders of interesting test cases and seed them to libFuzzer in order to improve the efficiency of the fuzzing process.

I like to work with at least two folders: InitialFuzzingCorpus and GeneratedFuzzingCorpus. The “initial” corpus is tracked by git, while the “generated” corpus is only used as a temporary folder for libFuzzer to store the inputs it deems interesting.

So to test my calculator library, I first create some files inside InitialFuzzingCorpus (each line is in a different file):

fibonacci 2
+ * - / 3 2 98 23 76
minimum - 1 2 + + 1 2 fibonacci 6 maximum 2 2 3 8
...

And I launch the fuzzer, specifying the name of the two corpora. It will automatically store new interesting inputs in GeneratedFuzzingCorpus:

$ …/FuzzTest GeneratedFuzzingCorpus InitialFuzzingCorpus

Every once in a while, I merge the generated fuzzing corpus into the initial one:

$ …/FuzzTest -merge=1 InitialFuzzingCorpus GeneratedFuzzingCorpus
$ rm GeneratedFuzzingCorpus/*
$ git add InitialFuzzingCorpus
$ git commit -m "Update fuzzing corpus"

In addition to making the fuzzer more efficient, it is very useful to have a corpus full of weird inputs. You can run more resource-intensive tests on them, use them as sample data in applications that use the library, or simply stare and wonder at them.

Dictionaries

I wrote earlier that fuzzers were really bad at finding long magic values. How, then, can I test expressions with words like minimum and fibonacci in them?

There are two solutions to this problem. The first is to include these words in your initial corpus and hope the fuzzer understands that they are keywords. The second, better solution is to create a dictionary. A dictionary is a file that lists keywords used by your program.

Let’s create one in a file called dic.txt:

kw1="minimum"
kw2="maximum"
kw3="fibonacci"

and pass it to the fuzzer with the option -dict=dic.txt.

Testing the library

My fuzzing entry point is a simple function that parses the input and evaluates the result of the expression. In short:

guard let e = Expression(string: inputData), let result = e.evaluate() else {
    return 1
}
return 0

Let’s launch the fuzzing tool and see if it finds anything!

$ …/FuzzTest -dict=dic.txt GeneratedFuzzingCorpus InitialFuzzingCorpus

Wait a few seconds… and… it crashed! The problematic input is:

33333333333333333333333333300133332

Indeed, I store the integers in an Int, which cannot hold integers of arbitrary length. I fix the parser so that it refuses to read literals that are too long. I try again… and… ugh, it takes a long time.

Is there anything we can do to make it faster?

Diagnosing and improving the fuzzer

To get an idea of how efficient the fuzzer is, we can look at a few different things:

  • How much time does it take to parse one input?
  • What is the size of the generated inputs?
  • What do the generated inputs look like? Are they really “interesting”?
  • How does the code coverage evolve over time?

To answer these questions, let’s read what the fuzzer prints to the terminal:

…/FuzzTest  -dict=dic.txt GeneratedFuzzingCorpus InitialFuzzingCorpus

…
…
Loading corpus dir: GeneratedFuzzingCorpus
Loading corpus dir: InitialFuzzingCorpus
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
…
…
#4421	NEW    cov: 339 ft: 1493 corp: 218/13381b exec/s: 2210 rss: 27Mb L: 266/4096 MS: 2 CrossOver-ManualDict- DE: "minimum"-

The line starting with INFO is interesting: we now know that libFuzzer will generates files that are at most 4096 bytes. Moreover, we can limit the maximum size using the option -max_len.

The line starting with #4421 contains an event code and some statistics:

  • cov: 993 ft: 1493: this gives information about code coverage. cov is the number of code blocks or edges covered by the current session; ft is a more fine-grained measure of code coverage. We want to see these numbers increase over time.
  • exec/s: 2210: the number of fuzzer iterations per second. Keep in mind that is often after millions of iterations that the fuzzer uncovers interesting test cases. Therefore, this number should be very high.

Maximum input size

Let’s try to limit the size of the generated inputs to 128 bytes by using -max_len=128 and see how it affects the execution speed:

…/FuzzTest -max_len=128 -dict=dic.txt GeneratedFuzzingCorpus InitialFuzzingCorpus

…
…
#22023	REDUCE cov: 402 ft: 1823 corp: 289/8471b exec/s: 22023 rss: 27Mb L: 95/128 MS: 1 EraseBytes-

We can now perform about 20,000 iterations per second, about ten times as many as before. That’s a big improvement!

Parallel fuzzing

We have greatly increased the iteration speed. But we can increase it even further by running multiple fuzzing jobs in parallel and make them operate on the same corpora. The main process will stop when all the jobs have finished. To run N jobs, use the option -jobs=N. If you have a computer with P cores, the jobs will be split into P / 2 processes. You can override the number of processes to Q with the option -workers=Q.

ASCII only

Reading the files in GeneratedFuzzingCorpus, I notice that many of them contain weird Unicode characters, and many of them are not even valid UTF-8 files. While it’s a good idea to feed our library with complex and invalid strings, we also know that our syntax only uses ASCII characters and that we handle Unicode well by using Swift’s String.

So after fuzz-testing our library with arbitrary strings, it might be a good idea to instruct libFuzzer to only generate ASCII strings, as it will increase the signal-to-noise ratio of the generated inputs. We can do so with the option -only_ascii=1.

Value profile

Using the option -use_value_profile=1, we can opt into more fine-grained code coverage, which might allow the fuzzer to discover more interesting inputs. However, it will also slow down the iteration speed and increase the size of the generated corpus.

Sanitizers

It is possible to use libFuzzer with AddressSanitizer (ASAN). It will slow down the iteration speed but might catch more bugs. It is especially important to use ASAN if you are manipulating memory manually. You can enable it with the option -sanitize=fuzzer,address.

Testing the library again

Let’s put all those options to use! I am going to fuzz-test my calculator with:

  • a dictionary of keywords
  • a maximum input length of 128
  • only ASCII inputs
  • value profile enabled
  • 8 jobs executed by 4 worker processes

And I will ask these jobs to write their crashing inputs inside the folder artifacts with -artifact_prefix=artifacts/.

…/FuzzTest \
-dict=dic.txt \
-max_len=128 \
-only_ascii=1 \
-use_value_profile=1 \
-jobs=4 -workers=4 \
-artifact_prefix=artifacts/ \
GeneratedFuzzingCorpus InitialFuzzingCorpus

After a few seconds, it finds some crashes, here’s one of them:

/ minimum -1

Here it crashes because I try to create a Range from 0 ..< -1. I fix it and move on to the next crash:

/ 6 minimum 2 * maximum 5 minimum 2 0 -421 minimum 2 -8 87 maximum 2 05 minimum 2 0 -421908 74 87491855814 8749192700000 7758897

Hmm… I wonder what’s wrong with this one?

Minimizing crashes

Imagine the fuzzer found a crash and the input is a large, convoluted mess. You try to debug it but it’s just too complex. What do you do?

You ask libFuzzer to minimize it for you!

It will try to reduce the size of the input as much as possible while still causing a crash, and then store the minimized input in a file.

To minimize a file, use the option -minimize_crash=1. Provide the folder in which you want the minimized inputs to be written with -artifact_prefix=<folder>/ and set a timeout with -max_total_time=<time in seconds>. Finally, pass the file that you want to minimize as argument.

…/FuzzTest \
-minimize_crash=1 \
-artifact_prefix="minimized_crash/" \
-max_total_time=30 \
crash-to-minimize.txt

After max_total_time, the fuzzer will be disappointed with itself, saying it “failed to minimize beyond” a certain file.

INFO: Done MinimizeCrashInputInternalStep, no crashes found
CRASH_MIN: failed to minimize beyond minimized_crash/minimized-from-e783d1b67b38bb0b350ee96662cd475406c01f72 (23 bytes), exiting

But it usually does a pretty good job at reducing the input. Let’s look at minimized_crash/minimized-from-e783d1b67b38bb0b350ee96662cd475406c01f72:

* 77579750 577587555555

Now it’s clear that I forgot to implement overflow checking for multiplication!

Conclusion

In this section, I first proposed one particular way of managing a corpus of test cases to use with libFuzzer. While this method works for me, it might not be appropriate for your project, and you are free to manage multiple corpora in a way that fits your use case. I then showed a few ways to tweak the behavior of libFuzzer to make it more efficient. And finally, I showed how to minify a complex test case to make it easier to debug your program.

The content of this section was mostly taken from the libFuzzer website, which I encourage you to read if you want to dive deeper into the topic.


Caveats

Coverage-guided, evolutionary fuzz-testing is a very powerful tool that is unfortunately difficult to use for many kinds of libraries. When considering whether to fuzz a library, here are some questions you should ask:

  • Is the behavior of the library deterministic and reproducible for a given input? Randomness is simply not allowed.

  • Can relatively small inputs (~1KB) trigger most possible code paths in the library? The answer should be yes. The smaller the inputs, the more efficient fuzzing will be.

  • Can the library process inputs quickly (<10ms per input)?

  • Does the library expect highly structured data (e.g. JSON fitting a specific schema)? The answer should be no. The binary mutations used by libFuzzer rarely produce clean, structured data.

    It is possible to work around this issue in a couple of ways. The first one is to use an intermediate compact binary representation for the inputs. The second one is to overwrite the binary mutations used by libFuzzer. These techniques are meant to dramatically increase the proportion of well-formed inputs. I intend to write about this topic soon(ish).

Even if your problem lends itself perfectly to fuzz-testing, keep in mind that it can be a slow process, requiring hours and hours of computation. It is best to keep a dedicated machine for fuzzing purposes and let it run continuously.


The end

I hope that this post has inspired you to experiment with libFuzzer. Traditionally, fuzz-testing has been used for parsers and applications with high security risks. But I believe that it can be used more broadly and benefit many developers. There are many hurdles to overcome before this becomes a reality. First, we need to come up with clear guidelines on how to set up and maintain fuzz-tests. But more importantly, we need to show that it is a practical solution for many different kinds of libraries. This will be the topic of my next post 🤓