A quick tour of LLVM’s Sanitizer coverage

By: on October 4, 2017

After reading about the new coverage features in hy­po­thesis, I’ve become in­t­erested in how guided fuzzing (as im­ple­mented by Amer­ican Fuzzy Lop or LLVM’s lib­Fuzzer works in­tern­ally with Rust and LLVM. The first step is to un­der­stand how cov­erage works.

Clang’s San­it­izer Cov­erage doc­u­ment­a­tion ex­plains the func­tion­ality very well, so I’ll not re­peat too much of that.

First of all, I started off by looking at the Rust Fuzz pro­ject’s set of tar­gets. The run­-­fuzzer.sh driver script tells cargo to pass sev­eral extra flags to the com­piler. The flag -C passes=sancov in­structs the com­piler to also run the sancov com­piler pass, which an­not­ates the gen­er­ated code to add calls into the coverage runtime, and -C llvm-args=-san­it­izer­-­cov­er­age-­level=3 in­structs LLVM to re­cord edge cov­erage so that we can tell what paths of code ex­ecuted (eg: dif­fer­en­ti­ating between branches of an if/else ex­pres­sion). The ad­di­tional -Z san­it­izer­=ad­dress also tells the com­piler to link in the san­it­izer sup­port runtime, which in­cludes the routines to re­cord and save cov­er­age.

We’ll start with a trivial pro­gram in main.rs:

<pre>
#[in­line(never)]
fn show(a: String) {
    println!("{}", a);
}

fn main() {
    use std::en­v::args;
    for a in args() {
        show(a)
    }
}
</pre>

If we com­pile this with RUST­FLAGS=' -C passes=sancov -C llvm-args=-san­it­izer­-­cov­er­age-­level=3 -Z san­it­izer­=ad­dress' cargo run and then look at the res­ulting dis­as­sembled code, using ob­j­dump -CS tar­get/de­bug/­covtest 1, then we see it’s inserted chunks of code such as:

<pre>
10465:       48 8d 05 24 86 34 00    lea    0x348624(%rip),%rax        # 358a90 <com­pleted.7561+0x10>
1046c:       48 05 d4 03 00 00       add    $0x3d4,%rax
10472:       48 89 c7                mov    %rax,%rdi
10475:       e8 56 68 0e 00          callq  f6cd0 <__san­it­izer­_­cov>
</pre>

Gran­ted, I’m not great at reading assembly, but this appears to lookup the cur­rent pro­gram counter2, mas­sages it a little to create a guard ad­dress, and passes that as the first ar­gu­ment to the __san­it­izer­_cov func­tion.

This looks up the caller’s cur­rent pro­gram coun­ter, then passes that into Cov­er­ageData::Add, which checks uses the guard to check if that point has already been re­cor­ded. If not, it’ll re­cord the pro­gram counter for later stor­age.

This all gets setup by the global con­structors, the same mech­anism uses to call con­structors for static ob­jects in C++. The compiler synthesizes a func­tion named san­c­ov­.­mod­ule_ctor that then calls __san­it­izer­_­cov­_­mod­ule_init; which al­loc­ates space and sets up the cov­erage data struc­tures. The san­it­izer runtime will also en­sure that if needed, __san­it­izer­_­cov­_­dump is called when the pro­cess exits; so that the cov­erage in­form­a­tion will get saved to disk, and analyzed later.

So code cov­erage is one of those things that can seem some­what ma­gical; mostly be­cause modern com­pilers can seem aw­fully com­plex (and in fair­ness, they do an awful lot); but the nuts and bolts of it aren’t that com­plic­ated in them­selves.

LLVM does have the very cool fea­ture that it’s pos­sible to provide your own im­ple­ment­a­tion of the cov­erage in­ter­face, al­lowing you to do cus­tom­ized, very de­tailed tra­cing of your pro­gram, if you want to do fan­cier things like analyzing the exact con­trol flow of your pro­gram. But that’s an exercise for another day.


  1. this as­sumes the GNU binutils suite; com­monly used on Linux. Other sys­tems will likely have sim­ilar tools.
  2. i.e.: the in­struc­tion that was run­ning at the time
Share

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*