Paper Reading Questions

For each paper, you should submit two text files via the Gradescope submission web site, as described below. The submission is due before lecture. (The Gradescope entry code for 6.5660 is 3JZ23X.)

Lecture 11

Each time EXE adds a branch constraint it queries STP to check that there exists at least one solution for the current path's constraints. What would go wrong if EXE did not use STP, and instead tried all branches? What would go wrong if EXE randomly selected a branch to check?

Questions and answers for lecture 11 from past years

The ideas are pretty widely used, so I imagine the authors are pretty
happy with that.  Microsoft uses symbolic execution in their internal
tools to find bugs in Windows, and reportedly it's really effective.
There are open-source tools based on symbolic execution, like
https://s2e.systems/, https://angr.io/, and https://klee.github.io/,
among others.

Nickolai.

On Mon, Mar 14, 2022 at 5:17 PM Anonymous wrote:
>
> It sounds like it took a ton of work to devise and built EXE and STP. Are they sad that it is not actually being used anywhere in industry? :(
Symbolic execution is reasonably widely used, e.g., https://angr.io/
and https://s2e.systems/ on the open-source side and Microsoft uses
some form of symbolic execution internally for finding bugs in Windows
code.

Nickolai.

On Mon, Mar 14, 2022 at 12:08 PM Anonymous wrote:
>
> Has EXE been widely adopted or included in other dynamic/static code checkers?
> If not, does this mean modern open source software might not have gone through EXE checking and may have exploitable bugs that can be found with EXE?
> Or perhaps leaked proprietary source code might be exploitable by running EXE to find bugs?
EXE asks the SAT solver if it's possible to come up with a set of
assignments to symbolic values that would satisfy all of the
constraints.  If the SAT solver says yes, then it provides such an
assignment (any assignment that satisfies the constraints), which is
what EXE uses.

Nickolai.

On Mon, Mar 14, 2022 at 3:50 PM Anonymous wrote:
>
> How does EXE decide which concrete value to use? In the example on page 3, it selects i=2 to be a test case and i != 2 to be another test case, because i=2 is currently legal  but would cause an error. but how does it handle if multiple values of i would have also caused this same error? How does it decide what value to actually split on.
Focusing on specific parts of the kernel is probably a good plan
regardless of the exact approach used.  The paper refers to some
concurrent work by more-or-less the same authors in using EXE to find
bugs in Linux file system code, for instance.

Nickolai.

On Mon, Mar 14, 2022 at 10:11 AM Anonymous wrote:
>
> If EXE can never solve something in the same complexity level as the Linux Kernel, how would one approach using symbolic approaches to fix bugs in the kernel? Would one simply go through subsets of the kernel and test separately?
On Sat, Mar 12, 2022 at 1:19 PM Anonymous wrote:
> 1. If a bug is found for a given symbolic input value that falls in a range, is there a test case generated for every single possible input in that range? Or just the endpoint/s?

EXE generates one test case for a given execution path.

> 2. Have there been any reported cases of symbolic execution being used for malicious intent? I can imagine a hacker using EXE+STP on, say, open source Android code precisely to find exploitable bugs...

Sure, one offshoot of this line of research has been in using symbolic
execution to synthesize attack code -- see
https://www.ndss-symposium.org/wp-content/uploads/2017/09/Avg.pdf for
example.

> 3. Purely out of curiosity I wonder if EXE+STP was ever run on itself.

Not that I know of.

Nickolai.

The heuristics are used to decide the order in which EXE explores
different paths.  If you run EXE long enough to explore all paths, it
will be complete (find bad inputs), and the heuristics don't matter.
The reason heuristics matter is that, for sufficiently large programs,
it's not feasible to run EXE long enough to check all paths.

Nickolai.

On Mon, Mar 14, 2022 at 12:05 AM Anonymous wrote:
>
> I didn't fully comprehend the explanation in the search heuristics section. Is EXE still guaranteed to find bad inputs (if they exist) when using the search heuristics? Or does it just happen with a very high probability?
On Sat, Mar 12, 2022 at 11:31 AM Anonymous wrote:
> Could there be some compiler-dependent bugs that EXE would not catch?

Possibly, yes.  For example, bugs due to undefined behavior are one
class of such "compiler-dependent" bugs.  E.g., see
https://people.csail.mit.edu/nickolai/papers/wang-undef-2012-08-21.pdf

Nickolai.

On Mon, Mar 14, 2022 at 3:35 PM Anonymous wrote:
> What is eager translation?

I believe in the context of this paper they mean their algorithm
immediately converts the constraints, as opposed to converting them at
some later point.

> What are the conditions in which a solver (like STP) times out?

I doubt it's feasible to fully characterize these conditions, but one
intuitive example would be asking STP if it's possible for SHA1(x)=y
for some constant value y and symbolic x.  Solving this would require
STP to invert the SHA1 hash function (or brute-force all possible
inputs to it), which is believed to be quite difficult.

> Is EXE guaranteed to terminate or do they just run until a "satisfactory" number of test cases are found?

It's all a matter of time.  In some very theoretical sense, EXE will
probably terminate if run for an unbounded amount of time, on a
program that terminates on all inputs.  But that's not very useful:
that amount of time might be way too large to be meaningful.

> In section 4.4, the paper states "BFS converges to full coverage more than twice as fast as DFS: 7,956 test cases versus 18,667", and I found this confusing. I thought the number of test cases for full coverage would be the same for each method? Or is it how many they tried?

I think here they are referring to coverage of lines of code in the
program being tested, rather than all possible execution paths.

Nickolai.

I don't think I have any hard references to papers on how symbolic
execution is integrated into software development over time (e.g.,
CI), but one paper
(https://www.doc.ic.ac.uk/~cristic/papers/symex-icse-impact-11.pdf)
mentions Microsoft runs it continuously on a large number of machines.
I imagine you would want to cache some inputs so that, when a new
version of the software needs to be tested, you can quickly use
existing inputs to hopefully get most of the coverage cheaply, and
then use symbolic execution techniques to try reaching the new parts
of the code that weren't reached by previous inputs.

Nickolai.

On Sun, Mar 13, 2022 at 6:31 PM Anonymous wrote:
>
> Question Lecture 11
>
> How are approaches like EXE be intergrated into workflows? Would they be run often like unit tests? Is the runtime prohibitive that the generated tests be used but EXE would be run less frequently to regenerate cases?
On Sun, Mar 13, 2022 at 3:08 PM Anonymous wrote:
> What advantages does EXE provide over something like Angr (no compilation required)? Is it mostly just performance or could the compiled instrumentation by EXE prove much more accurate

The specific reason we're reading EXE is because it was one of the
first papers to introduce symbolic execution, and explains how it
works (since they're introducing this technique).  Lots of tools use
symbolic execution nowadays, including angr; their papers focus on the
differences of how they applied symbolic execution in a specific
context, so they're not as good of an introduction to symbolic
execution in general.

Nickolai.

On Mon, Mar 14, 2022 at 4:42 PM Anonymous wrote:
> How would EXE track constraints related to for/while loop?

Each iteration of the loop involves a conditional (that determines
whether to break out of the loop).  That conditional would be
translated into a constraint, and if both branches are possible, EXE
would add one more potential execution path.  EXE's heuristics for
deciding what paths to explore would then be in charge of deciding
whether to keep unfolding more iterations of the loop, or to take one
of the paths that exited the loop and continue executing that instead.

Nickolai.

These ideas (symbolic execution) are reasonably widely used to find
bugs in real systems; e.g., Microsoft built and uses a symbolic
execution tool to look for security bugs in its own source code.
Fuzzing is another big idea that works well for finding bugs in
practice.

Nickolai.

On Sun, Mar 13, 2022 at 7:14 PM Anonymous wrote:
>
> What other interesting exploits has EXE been responsible for finding? Being that this is an older system, what is the current state of automated bug-checking systems
Yes, every serious C program uses bug-finding tools these days.

Nickolai.

On Mon, Mar 14, 2022 at 3:10 PM Anonymous wrote:
>
>
> This paper is from 2006, are bug checking tools (or later advances) like this used by C developers in practice?
Yes; e.g., Microsoft uses symbolic execution tools internally to find
bugs in Windows code, and apparently it's quite effective.

Nickolai.

On Mon, Mar 14, 2022 at 3:46 PM Anonymous wrote:
>
> How widely are symbolic execution tools used in practice? Are they actually used to test industry code?
On Sat, Mar 12, 2022 at 11:37 PM Anonymous wrote:
> I understand that the STP queries at each branch is meant as a measure against branching into infeasible code regions, but would EXE still be able to function correctly if the code had path explosion (many branches that are valid both ways)? I know that this is a huge issue with some symbolic execution frameworks like angr (https://www.youtube.com/watch?v=VJoNraxFliM&t=1045s) - are there hypothetical ways one could go about to mitigate this issue?

Some symbolic execution systems implement state merging optimizations
as a way to deal with some path-explosion patterns.  But it only works
for cases when state combining "makes sense" (i.e., leads to compact
symbolic states), and isn't always going to help.

Nickolai.

On Sat, Mar 12, 2022 at 11:18 PM Anonymous wrote:
> Why do analysis tools like EXE tend to always ignore floats? Floats appear in a wide variety of programs and are not impossible to conceptualize using axioms.

The bit-level behavior of floats is pretty complicated, and
floating-point arithmetic rarely shows up in programs that matter for
security.

Nickolai.

There are symbolic execution tools for binary code -- e.g.,
https://angr.io/ and https://s2e.systems/

Nickolai.

On Sun, Mar 13, 2022 at 10:46 PM Anonymous wrote:
>
> I liked the idea of bug finding using symbloic execution, but it seems to require some source code to reason about input spaces and branches. Would it be possible to do symbolic checking on binary executable code though (potentially by passing symbolic input and reasoning about how the program state/output changes)? e.g. if App Store wants to investigate apps for bugs without the source code)? Or is it necessary to have access to some source code?
One intuitive advantage for depth-first search is that it lets EXE
finish at least some executions, instead of having lots of in-flight
executions, with none of them reaching a final state.

Nickolai.

On Sat, Mar 12, 2022 at 11:12 PM Anonymous wrote:
>
> Why does the program use Depth First Search to explore branches when EXE forks execution? From my understanding, as well as other data given in the paper, using Breadth First search will avoid the program from going to far down the same rabbit hole before finding a solution.
On Mon, Mar 14, 2022 at 12:47 PM Anonymous wrote:
> Even with the optimizations mentioned, is there an upper bound on program size (lines of code) that EXE can reasonably handle? What size programs may result in an annoying (to developers) amount of runtime?

What matters is the size and complexity of constraints, and the number
of paths.  Straight-line programs with relatively simple constraints
would be easy to analyze with EXE, even if they had lots of code.  But
a short function that has complex constraints could cause problems.

> The paper mentions using MiniSAT in STP, which is a conflict-driven SAT Solver. Why might the authors have chosen this particular kind of SAT Solver? Would look-ahead SAT Solvers also work well in this context?

As long as the SAT solver performs well for program analysis type
queries, it doesn't matter for the symbolic execution system.

Nickolai.

On Mon, Mar 14, 2022 at 12:48 PM Anonymous wrote:
> How significant of a limitation is it that EXE/STP has no support for floating point numbers?

Depends on the application you want to test.  Many security-critical
applications (like the ones the authors tried out) don't use floating
point, so for those applications it's not a big deal.

Nickolai.

Yes, there are now a number of tools that work on binaries, such as
https://angr.io/ and https://s2e.systems/

Nickolai.

On Mon, Mar 14, 2022 at 5:18 PM Anonymous wrote:
>
> The paper states that EXE works by first compiling C source code with their exe-cc compiler, and then compiling the resulting code with gcc.
>
> Were there any efforts to build similar bug-finding programs that operate on compiled binaries? It seems like this would have made it more language agnostic, and by using the binary, they likely could have benefitted performance-wise from control flow optimizations in most compilers.
In theory EXE has a model of execution, which should be a correct
model of how the program executes.  So that suggests if they have an
input in their model that leads to some execution, it should cause the
program to run that way in reality too.

In practice, the paper reports on their experience that shows that,
yes, this works well in practice.

Symbolic execution is used in some program analysis tools; I believe
Microsoft's device driver analysis tools use a form of symbolic
execution.

Nickolai.

On Sat, Feb 26, 2022 at 4:42 PM Anonymous wrote:
>
> How do they know EXE can reconstruct an input that produces a code
> path? Is there some way of proving this in theory or practice? How
> common is symbolic execution used by software testers in the real
> world? Is there any class of test cases that can't be generated by
> EXE?
Symbolic execution is reasonably widely used, e.g., https://angr.io/
and https://s2e.systems/ on the open-source side and Microsoft uses
some form of symbolic execution internally for finding bugs in Windows
code.

Nickolai.

On Mon, Mar 14, 2022 at 12:19 PM Anonymous wrote:
>
> This is a fairly old paper. How good are these kinds of checkers nowadays?
They compile the libraries with EXE's toolchain as well, in order to check them.

For some queries, STP can't solve them in any reasonable amount of
time, and times out.  EXE moves on to checking other executions.

Nickolai.

On Sun, Mar 13, 2022 at 3:44 PM Anonymous wrote:
>
> How does EXE establish constraints on imported libraries? Does it have to re-evaluate the source code for each imported library?
>
> Aren't some constraints invariantly hard to solve? How does STP approach this and work backward to input bits? (ie, anything involving comparing hashed values)
The authors are academic researchers; at the time, they had been
working on bug-finding as a way of building more reliable systems
software (like OS kernels).  They weren't building anything specific
that necessitated symbolic execution to find bugs in that artifact.

Nickolai.

On Mon, Mar 14, 2022 at 3:14 PM Anonymous wrote:
>
> The paper talks about EXE being an effective bug-finding tool that's different from other tools since it can more deeply check real code and automatically generates test cases. What was the inspiration behind creating EXE? Was there a specific problem or project at hand that needed a deeper way to check for bugs in the code?
They're somewhat complicated and non-algebraic (e.g., floating point
addition is non-associative in general because adding two small values
works "as you might expect" but adding a small value to a large value
might be a no-op since the small value might be too small to represent
at the large value's exponent).  Also, they're rarely used in contexts
that might cause security bugs that this paper is looking for.

Nickolai.

On Sun, Apr 2, 2017 at 8:03 PM, Anonymous wrote:
> Why doesn't STP handle floating-point constraints?

Floating point is somewhat complicated and non-algebraic (e.g.,
floating point addition is non-associative in general because adding
two small values works "as you might expect" but adding a small value
to a large value might be a no-op since the small value might be too
small to represent at the large value's exponent).  Also, floating
point is rarely used in contexts that might cause security bugs that
this paper is looking for.

Nickolai.

On Sun, Apr 2, 2017 at 8:40 PM, Anonymous wrote:
> Rotem Hemo
> Apr 2
> LEC 12: Question
>
> What so complexed about floating points that the authors of EXE weren't able to support it?

This process keeps track of queries that have been issued before
(using hashes to detect when queries are the same), and if it detects
a repeat query, it just returns the saved results.

Nickolai.

On Sun, Apr 2, 2017 at 9:57 PM, Anonymous wrote:
> I don’t see why creating a server process to coordinate EXE process using hash optimizes EXE.

EXE takes C source code.  As it explores branches (in practice, for
any non-trivial program, it would be unlikely to explore all of them),
and finds some branch that triggers an error, it asks STP to generate
an example input that satisfies the constraints of that branch, and
reports that example input to the user.

Nickolai.

On Sun, Apr 2, 2017 at 9:18 PM, Anonymous wrote:
> Question about EXE/STP
> I’m a little confused how the checks happen/the work flow - does it take in a binary to explore all the branches, or does it take in the source code? And is generating the test code after it’s explored all the branches?

EXE can catch any bug that the developer (or EXE user) can flag.  If
you can write an assert() statement that captures your bug condition,
EXE will flag any executions that fail that assertion.  You'll
actually write such things in lab 3 (looking for more app-level bugs
using symbolic execution).

The limits of EXE are more in terms of constraint complexity than
program size.  Even a small program could generate complex constraints
that are hard for EXE/STP to solve (e.g., a program that dereferences
a null pointer if the SHA1 hash of its input is a certain value).

Nickolai.

On Sat, Mar 25, 2017 at 10:54 PM, Anonymous wrote:
> Lec 12 Student Question
>
> Exe seems to catch bugs which involve errors in what is computationally allowed.For example it can catch buffer overflow bugs, overflowing reads, divide-by-zero errors. However, am I correct that it is unable to, and was not designed to, catch bugs where execution doesn't violate a memory rule or computational rule (e.g. divide by zero) but causes unwanted execution? (I'm thinking, for example, a logical error in zoobar that allows a person to steal zoobars by putting the transfer amount as negative.

On Thu, Mar 16, 2017 at 3:36 PM, Anonymous wrote:
> - What happens if there is a bug in a loop somewhere that results in an infinite loop? Would  one of the EXE processes get stuck in this loop? If so, would it just halt after some pre-set timeout? Would it be able to report the infinite loop?

I suspect they have heuristics that kill a forked process after some time.

> - Are there variants of EXE that work for other programming languages?

Lab 3 is an EXE-like system for Python.  There's also a system that
does symbolic execution for an entire virtual machine, so it's
language-agnostic (check out "S2E").

> - Is EXE commonly used in industry? I would imagine that large companies like Google have hundreds of thousands of lines of code written in C or C++, and EXE seems like it would be a useful tool in validating code before pushing production code.

EXE itself is not (it was just a research prototype).  The techniques
are, though; many bugfinding companies use symbolic execution, and I
know Microsoft uses similar tools internally.

Nickolai.

The limits of EXE are more in terms of constraint complexity than
program size.  Even a small program could generate complex constraints
that are hard for EXE/STP to solve (e.g., a program that dereferences
a null pointer if the SHA1 hash of its input is a certain value).

Nickolai.

On Sat, Mar 25, 2017 at 10:56 PM, Anonymous wrote:
> Is there a max programme size that EXE can run until completion?

Symbolic execution systems (though not EXE in particular) deal with
external I/O by modeling stubs that, effectively, have some kind of an
assert() inside there to tell the symbolic execution system which
calls are "bad" or not.  So that might be a fine way to look for
errors that allow opening arbitrary files.

Nickolai.

On Thu, Mar 30, 2017 at 4:41 PM, Anonymous wrote:
>
> Is there something like EXE that combines user input? In other words, it develops a mapping of paths, and visually provides it to a user to now better reason about what edge cases may come up, to test for, especially for things like system file open calls that EXE simply can't reason about?

I imagine an effective approach would be to pin down all the sources
of non-determinism (e.g., getting random numbers, thread scheduling,
etc), and make them deterministic.  This is doable (and there's lots
of systems that do it) but mostly orthogonal to EXE.

Nickolai.

On Mon, Mar 27, 2017 at 1:36 PM, Anonymous wrote:
> 6.858 Paper Question for Lecture #12
> EXE: Automatically Generating Inputs of Death
> 03/25/17
> Ismael 'Izzy' Gomez
> -----
>
> What would need to change about EXE to make it an effective bug-finding tool for non-deterministic code too?

There's two general ways symbolic execution systems like EXE deal with
state.  The first is to run on a symbolic state (e.g., any database
contents).  This would likely require the programmer to write some
assertions about what are valid database states; otherwise the program
will crash due to some nonsense coming back as a result of a SQL
query.  The second is to include the database in the "symbolic world"
and run several requests, keeping track of the symbolic values of any
affected database rows after each run.

Nickolai.

On Sun, Mar 26, 2017 at 10:59 PM, Anonymous wrote:
> How does EXE reason about programs that contain some kind of state from run to run? E.g. in the lab we only were able to have our concolic execution analyzer complete one run request per test run, what if we needed a few specially designed requests in succession?

I'm not entirely sure what attack scenario you have in mind.

The intended use case of EXE is app developers (or security auditors)
looking for potential bugs in software that's well-meaning but buggy.

If the app developer doesn't want a security auditor to run EXE to
discover bugs, I suppose they could write code that builds constraints
that EXE/STP has trouble solving.

It's not an isolation/sandboxing system, though one could probably
combine it with a sandboxing/isolation mechanism to make it safe to
run EXE over untrusted code.

Nickolai.

On Fri, Mar 24, 2017 at 5:42 PM, Anonymous wrote:
> 6.858 Lecture 12 - Symbolic Execution
> 4/3/17
>
> In Section 4.1 of the paper, the constraint cache optimization for EXE is
> described. Could the cache be a potential point of attack against EXE?
> Since the server process stores the cache in a file, if someone were to
> gain control of this file, he/she could presumably craft some "solved"
> constraints, insert them into the cache, and cause EXE to miss out on
> bugs in the code using these phony solved constraints.

On Tue, Mar 28, 2017 at 11:00 AM, Anonymous wrote:
> However, the way EXE does it seems to be different. I don't really understand it, but there's something about a table mapping concrete bytes to STP bitvectors, and STP has these read and write primitives, and so on. Why do they do it this way? Is it basically the same approach? Is it more efficient?

It gives the SMT solver (STP in their case) more semantic
understanding of what's going on in the program, so it can apply
various optimizations.  At some level, the basic plan is roughly what
you described, but EXE/STP makes it actually work in practice with a
variety of optimizations.  These days, SMT solvers are pretty
sophisticated engines and I don't fully understand all their
optimizations..

Nickolai.

EXE is not actually complete: in practice, for any non-trivial
program, it is infeasible to try all possible paths, for two reasons:
first, there's exponentially many of them (as a function of how many
if statements the program has in its execution), and second, the
SAT/SMT solver might not be able to solve the constraint on some path
(e.g., a program that crashes if the input has a specific SHA1 hash
value).

Nickolai.

On Sun, Apr 2, 2017 at 8:58 PM, Anonymous wrote:
> How does EXE efficiently keep track of all of the branches and constraints? Wouldn't a complex program have an exponentially large number of branches to keep track of?

The format is not relevant.  But yes, for every possible choice of
"if" statement branches, etc, they generate a separate fork of the
process, and the process gets its own directory.  So, after running
EXE for a while, you'll get a bunch of directories each corresponding
to a possible input that goes down a different control flow path, and
EXE will check if that input triggers something interesting.

Nickolai.

On Fri, Mar 31, 2017 at 4:36 PM, Anonymous wrote:
> In the more detailed explanation of how EXE works mechanically (page 4), I didn't
> really understand their explanation of how EXE, "for each path, creates two files:
> one to hold the concrete bytes it generates, the other to hold the values for
> each decision". Does "each path" here refer to any branch that happens during
> the code's execution? So each possible path of execution, even if it differs
> by a single "If ___, print 'hello'" gets its own two files? I don't really
> understand how the format of these files works either, if that's relevant.

Symbolic values come from two sources: either the developer explicitly
makes a value symbolic using make_symbolic(), as in Figure 1, or the
program assigns the result of some expression involving symbolic
values to some memory location.  A symbolic value is not a sequence of
bits but rather a logical expression describing what values may appear
in those bits.  (E.g., this 4-byte integer value can be anything
greater than 5.)

Nickolai.

On Sun, Apr 2, 2017 at 7:26 PM, Anonymous wrote:
> The paper never seems to explain what a 'symbolic value' is. How are symbolic values assigned?

For their Figure 1 example, make_symbolic(&i) stores a symbolic value
"i" in the memory location of variable i (or, more precisely, since
it's a 4-byte value, it stores "i[0:8]" in the first byte, "i[8:16]"
in the second byte, etc).

Variable "a" is concrete, so let's say it has address 0x1000.

On line 8, variable "p" gets a+i*4.  "a" is concrete but "i" is
symbolic, so the resulting expression for "p" is 0x1000+4*i.  EXE
stores this symbolic expression in the memory location of "p" (or,
again, to be more precise, it stores the individual 8-bit byte pieces
of that 4-byte pointer).

Nickolai.

On Sun, Apr 2, 2017 at 8:03 PM, Anonymous wrote:
> So, what exactly happens when a token in a line of code is assigned a
> symbolic value? What is stored in memory and what of it is what the term
> actually refers to?
>
>
> On 04/02/2017 07:44 PM, Nickolai Zeldovich wrote:
>>
>> Symbolic values come from two sources: either the developer explicitly
>> makes a value symbolic using make_symbolic(), as in Figure 1, or the
>> program assigns the result of some expression involving symbolic
>> values to some memory location.  A symbolic value is not a sequence of
>> bits but rather a logical expression describing what values may appear
>> in those bits.  (E.g., this 4-byte integer value can be anything
>> greater than 5.)
>>
>> Nickolai.
>>
>> On Sun, Apr 2, 2017 at 7:26 PM, Anonymous wrote:
>>>
>>> The paper never seems to explain what a 'symbolic value' is. How are
>>> symbolic values assigned?

Nelson and Oppen's paper talks about two things, from what I remember:
one is how to turn a program into a constraint (which EXE does as
well, using forking instead of building up one large expression), and
the second is how to try to solve this complex constraint.  The first
is not super interesting; EXE's plan of forking at every if statement
works well in practice, and then each fork just has to maintain the
constraint about its control flow path.  The second is what the EXE
authors are complaining about (they think STP is a much better SAT/SMT
solver).  Though, in practice, none of these solvers are "complete":
if some query is running for a long time (e.g., more than a second),
EXE just kills it and ignores that branch.

Nickolai.

On Sun, Apr 2, 2017 at 4:37 PM, Anonymous wrote:
> The paper mentions Nelson and Oppen’s cooperating decision procedures framework- I skimmed their paper, but am still confused as to how exactly it works. It seems like they simply collapse execution into a functional Lisp-like representation, but how does this guarantee that all branches are taken? Isn’t some kind of fuzzer or input solver still necessary?

On Sun, Apr 2, 2017 at 2:23 AM, Anonymous wrote:
> 1. I understand how the program try to use symbolic expression in general, but I'm not quite sure how this expression get checked on each branch.
>
> For example if the constriants on branch are (int) a>2 and the other branch is a<=2. Won't the program end up computing all possible value of int>2 in the first branch? If so, does the purpose this program is generating automated, exhausive tests?

EXE doesn't actually compute a concrete value after an if statement;
it asks STP if some value exists.  If the program later hits an error,
then EXE asks STP for a concrete assignment of values to every
symbolic variable (like "a"), so that it can tell the user an example
of an input that triggers an error.

> 2. What is the unconstrained symbolic array mentioned in 4.2 last sentence?

I think they mean A and B are both arrays that contain arbitrary
symbolic values (i.e., values that have no constraints placed on them
from earlier execution).

Nickolai.

Symbolic execution is reasonably widely used for finding bugs.
There's open-source systems like S2E (http://s2e.systems/), symbolic
execution is used by various bug-finding companies (like GrammaTech),
and it's used in internal tools at Microsoft (e.g., the SAGE system).

Nickolai.

On Sun, Apr 2, 2017 at 9:36 PM, Anonymous wrote:
> Is EXE used regularly in industry or has it not really been adopted?

In practice, for any non-trivial program, it is infeasible to try all
possible paths, for two reasons: first, there's exponentially many of
them (as a function of how many if statements the program has in its
execution), and second, the SAT/SMT solver might not be able to solve
the constraint on some path.

Nickolai.

On Sun, Apr 2, 2017 at 5:01 PM, Anonymous wrote:
> The paper devotes a section to discussing alternative search heuristics for which branches to follow first.  Isn't the advantage of EXE's method to actually try all of the branches though, not just achieve high block coverage?

S2E (http://s2e.systems/) does symbolic execution of an entire virtual machine.

Nickolai.

On Sun, Apr 2, 2017 at 8:26 PM, Anonymous wrote:
> Doesn't modelling the environment / libraries become an issue. Are there any clever ways people have circled around this problem ?

On Sun, Apr 2, 2017 at 7:22 PM, Anonymous wrote:
> How much branching is too much?

Undefined. :-)  The more a program branches, the more cases EXE has to
explore.  In practice, this means for any non-trivial program, EXE
likely can't explore all possible branches, and has to use heuristics
to decide which branches are worth exploring.

> Why 8-bit sized bitvectors to represent each symbolic data block?

EXE models each byte of memory separately, and each byte is modeled as
an 8-bit vector.

Nickolai.

EXE is not actually complete: in practice, for any non-trivial
program, it is infeasible to try all possible paths, for two reasons:
first, there's exponentially many of them (as a function of how many
if statements the program has in its execution), and second, the
SAT/SMT solver might not be able to solve the constraint on some path
(e.g., a program that crashes if the input has a specific SHA1 hash
value).

Nickolai.

On Sun, Apr 2, 2017 at 9:52 PM, Anonymous wrote:
> Very cool paper. What are the downsides/limitations to EXE? From what
> I understand EXE can be used as a silver bullet for eliminating a
> large number of bugs. Are there bugs that it cannot detect?

Symbolic execution is reasonably widely used for finding bugs.
There's open-source systems like S2E (http://s2e.systems/), symbolic
execution is used by various bug-finding companies (like GrammaTech),
and it's used in internal tools at Microsoft (e.g., the SAGE system).

Nickolai.

On Sun, Apr 2, 2017 at 9:36 PM, Anonymous wrote:
> How often is EXE used in the real world?

On Sun, Apr 2, 2017 at 12:35 PM, Anonymous wrote:
> EXE does not seem to be capable of finding bugs that are "legal" code, but just not what the creator intended. Is that correct?

To catch such bugs, the developer would have to add some assert()
statements, so that EXE knows when it's hit an error.  (Lab 3 involves
something like this too.)

> Does EXE work on non-deterministic code? For instance, something with multi threading? If not, can it be modified to do so, and if so how and why does it work?

I suspect the main thing to modify would be to add a deterministic
threading system, something like
https://people.cs.umass.edu/~emery/pubs/dthreads-sosp11.pdf

Nickolai.

EXE has its own set of "bug conditions", such as out-of-bound writes,
divide by zero, etc.  EXE has a memory model kind-of like what Baggy
and fat-pointer systems use; it knows the bounds of arrays, and does
not treat the entire memory as a single flat array, which helps them
detect what "out-of bounds" is.  Programmers can define their own
app-level bug conditions using assert() statements.

Nickolai.

On Sun, Apr 2, 2017 at 6:54 PM, Anonymous wrote:
> How do you define a bug in general in EXE? specifically regarding buffer overflows do we use fat pointers or baggy bounds? What does EXE misses other than floating points operations? How does STP performs inequalities if it has no notion of type in the first place?

Symbolic execution is reasonably widely used for finding bugs.
There's open-source systems like S2E (http://s2e.systems/), symbolic
execution is used by various bug-finding companies (like GrammaTech),
and it's used in internal tools at Microsoft (e.g., the SAGE system).
S2E is probably the place to look if you want to know more about
state-of-the-art symbolic execution.

Nickolai.

On Sun, Apr 2, 2017 at 9:11 PM, Anonymous wrote:
> Samantha Briasco-Stewart; erosolar@mit.edu
> 6.858 Paper Question 12; Due 2017.04.02 @ 10:00PM
>
>  - EXE seems like a really useful system - how widely is it used in industry or
>    academia today? What caveats are there to the system that might not be
>    discussed in the paper? Also, this was published in 2006 - have there been
>    improvements made more recently/other systems based on this created? What
>    can I read to learn more?
>

The paper doesn't really say; one possibility is that it will simply
not process any floating-point operations symbolically, and run them
as if all operands were concrete.

Nickolai.

On Sun, Apr 2, 2017 at 8:53 PM, Anonymous wrote:
> Max Lancaster
> Lecture 12 Question
>
> How will EXE react to a line of code that uses a floating point?

As-is, probably EXE would not, but it's not hard to make it work:
"just" capture all the sources of non-determinism (like reading
/dev/urandom for randomness), and feed in the same value every time
(or treat it as a symbolic value, and then figure out for which random
values the program might crash).

Nickolai.

On Sun, Apr 2, 2017 at 7:02 PM, Anonymous wrote:
> Would this system be compatible with nondeterministic programs?

Symbolic execution is reasonably widely used for finding bugs.
There's open-source systems like S2E (http://s2e.systems/), symbolic
execution is used by various bug-finding companies (like GrammaTech),
and it's used in internal tools at Microsoft (e.g., the SAGE system).

Nickolai.

On Sun, Apr 2, 2017 at 8:49 PM, Anonymous wrote:
> Has this been used in industry, or did it fail to gain traction (possibly due to migration away from C)?

On Thu, Mar 30, 2017 at 10:20 PM, Anonymous wrote:
> Why is it difficult for STP to support floating point numbers?

They're somewhat complicated and non-algebraic (e.g., floating point
addition is non-commutative in general because adding two small values
works "as you might expect" but adding a small value to a large value
might be a no-op since the small value might be too small to represent
at the large value's exponent).  Also, they're rarely used in contexts
that might cause security bugs that this paper is looking for.

> Another question, does the user of EXE have to explicitly mark all expressions they define as symbolic (if they area indeed symbolic) or does EXE also infer the symbolism/concrete status of some variables when initialized?

The EXE user just has to flag the initial user input; any subsequent
copies of that input by the program (e.g., int a = b) is handled by
EXE without further user input.

Nickolai.

Yes: in practice, for any non-trivial program, it is infeasible to try all
possible paths, for two reasons: first, there's exponentially many of
them (as a function of how many if statements the program has in its
execution), and second, the SAT/SMT solver might not be able to solve
the constraint on some path.

Nickolai.

On Sun, Apr 2, 2017 at 6:05 PM, Anonymous wrote:
> For more complicated programs (like pcre, bpf, etc.) with very deep branches,
> does EXE just "tend towards" full coverage but not necessarily attain it? For
> example, if you have a for-loop somewhere that runs a 100 times and executes an
> if-else conditional, that's technically 2^100 different branch combinations, so
> there's no way that it can cover them all -- or does this never really happen
> in practice because most of the branches end up being unreachable (as
> determined by the STP constraints).

As-is, probably EXE would not, but it's not hard to make it work:
"just" capture all the sources of non-determinism (like reading
/dev/urandom for randomness), and feed in the same value every time
(or treat it as a symbolic value, and then figure out for which random
values the program might crash).

Nickolai.

On Fri, Mar 31, 2017 at 10:07 PM, Anonymous wrote:
> Does EXE work with non-deterministic code? For example, if I have some code that checks a variable x is less than a randomly generated number, would EXE work with that?

If STP can't solve constraints, EXE can't explore that part of the
program.  Bugs in any part of the system that's not being checked by
EXE also wouldn't show up.  If an adversary guesses a user's password,
EXE has little to say about it.  Etc.  :)

Nickolai.

On Sun, Apr 2, 2017 at 2:18 PM, Anonymous wrote:
> What kinds of attacks are not detectable by EXE?

Yes: in practice, for any non-trivial program, it is infeasible to try all
possible paths, for two reasons: first, there's exponentially many of
them (as a function of how many if statements the program has in its
execution), and second, the SAT/SMT solver might not be able to solve
the constraint on some path.

Nickolai.

On Sun, Apr 2, 2017 at 5:11 PM, Anonymous wrote:
> Lecture 12
>
> Is it possible for EXE to miss an input that causes a bug?

On Sun, Apr 2, 2017 at 7:36 PM, Anonymous wrote:
> If EXE is meant to find bugs that can crash real code, then how does one ensure its practical correctness (not theoretical)? I.e its practical effectiveness is only as good as the correctness of its own code. So how in practice do people in the field ensure practical correctness of such a program? (would they have to resort to the methods described in the intro that they are trying to outdo?)

You mean figuring out if EXE itself is correct?  A fair question but
one that I haven't seen much work on.

Formal methods people do worry about the correctness of their tools,
and build systems where, say, proofs about program correctness are
checked using a small "proof-checking kernel".

Bug-finding people have two possible classes of problems: missing bugs
and reporting false bugs.  EXE deals with false reports by trying a
proposed input on the real program, and if it doesn't crash, EXE
doesn't report it.  EXE in general isn't complete, so it's not a
disaster if there's some bug in EXE that causes EXE to miss some path
or forget to report it.

Nickolai.

On Sun, Apr 2, 2017 at 2:34 PM, Anonymous wrote:
> “This ability lets it show that external forces can exploit a bug, improving on static analysis, which often cannot distinguish minor errors from showstoppers.” This line did not really make sense to me, how does EXE distinguish “minor errors” from “showstoppers”?

The authors are talking about what reports a developer might get from
running a static analysis tool on their program.  A static analysis
tool might say "you didn't check if this pointer is NULL in this
function!".  Whether that's a "minor error" or "showstopper" depends
on whether the program might actually crash as a result of this
missing check, due to a carefully crafted input, or if there's no
input that can cause this crash.  EXE only reports inputs that cause a
crash.

Nickolai.

EXE is not actually complete: in practice, for any non-trivial
program, it is infeasible to try all possible paths, for two reasons:
first, there's exponentially many of them (as a function of how many
if statements the program has in its execution), and second, the
SAT/SMT solver might not be able to solve the constraint on some path
(e.g., a program that crashes if the input has a specific SHA1 hash
value).

Nickolai.

On Sun, Apr 2, 2017 at 8:25 PM, Anonymous wrote:
> What sort of bugs are able to bypass EXE checks, as in it allows the STP to either detect there is no branch or that the branch is safe?

Yes, symbolic execution is reasonably popular in bug-finding; there's
a bunch of companies that build symbolic execution tools for finding
bugs; Microsoft uses symbolic execution internally for finding bugs in
their software; etc.

Nickolai.

On Tue, Mar 28, 2017 at 10:17 PM, Anonymous wrote:
> The paper mentions that EXE has been used to construct disks that could crash the Linux kernel. Have EXE or other similar systems been used to evaluate other OSes for similar vulnerabilities, and have EXE or similar systems become very popular in the 10 years that have elapsed since this paper was written?

It will never finish.  Too many paths, and the constraints are too hard.

This is the reason why EXE has various heuristics for which paths are
worth exploring: for any non-trivial program, it will never explore
all paths, so it must try to be somewhat clever about which paths to
pick, so as to aim to find more bugs.

Nickolai.

On Sun, Apr 2, 2017 at 7:51 PM, Anonymous wrote:
> How long would it take EXE to verify a code base with similar complexity to that of the Linux kernel, assuming no bugs?

Symbolic execution is reasonably widely used for finding bugs.
There's open-source systems like S2E (http://s2e.systems/), symbolic
execution is used by various bug-finding companies (like GrammaTech),
and it's used in internal tools at Microsoft (e.g., the SAGE system).

Nickolai.

On Sun, Apr 2, 2017 at 9:35 PM, Anonymous wrote:
> Are there any improvements on EXE since and how widely used is EXE for
> developement?

Probably not much harder/easier than what EXE's developers did (so,
moderately hard).  Lab 3 does one variant of this.

Nickolai.

On Sun, Apr 2, 2017 at 2:21 PM, Anonymous wrote:
> How difficult would it be to build something like EXE directly into a language and would that offer any benefits over the current approach?

Probably the main thing to add would be a deterministic threading
system, something like
https://people.cs.umass.edu/~emery/pubs/dthreads-sosp11.pdf

Nickolai.

On Sun, Apr 2, 2017 at 2:15 AM, Anonymous wrote:
> How does EXE handle multi-threaded programs? (It doesn't seem like it does?) If it doesn't, what sort of changes to EXE would have to be made to add multi-thread support?

Not a big deal; floating point numbers are almost never used in
security-critical code, which is why they don't bother implementing
them (it's also kind-of tricky, since floating point number are weird:
addition is non-associative, etc).

Nickolai.

On Sun, Apr 2, 2017 at 8:08 PM, Anonymous wrote:
> Paper: EXE
>
> My Question: EXE says that it doesn't handle floating point
> numbers. How big of a problem is this? How many programs use floating
> point?

In their case, "A" is not the variable referencing the array, but
rather a symbolic expression specifying the contents of the array at a
particular time.  So, if there were other writes, it would be read(A,
i) and read(write(A, x, y), j).

What do you mean about blind writes?  Maybe I missed something in the
paper, but I didn't see them mention blind writes..

Nickolai.

On Sat, Apr 1, 2017 at 3:03 PM, Anonymous wrote:
> Question about section 3.3 of the paper
>
> It’s said in the paper for the second transformation that “ if two indexes i and j are the same then read(A,i) and read(A,j) should return the same value”. Why is this so? Array can always be modified right? For example in the case of read-write-read. The second read would return the write. Does this have anything to do with the first transformation (read-over-write)?
>
> Also how are blind writes eliminated?

I haven't seen floating-point symbolic execution systems; I think in
addition to the fact that floating point is weird (e.g.,
non-commutative addition), it's rarely used in contexts where security
matters.

The scalability constraint, in some sense, is the complexity of
constraints generated by EXE when it executes the program;
NP-completeness is not directly the reason why it's hard (plenty of
poly-time problems are expensive to solve and don't scale well, and
many SAT instances generated by symbolic execution are efficiently
solved despite the general-case solution being NP).

Nickolai.

On Sun, Apr 2, 2017 at 4:39 AM, Anonymous wrote:
> Have there been any attempts to implement EXE with floating point support by using approximations to deal with the tractability of the solutions? It appears that the main scalability constraint the system faces is the fact that SAT is NP-complete.

EXE is not actually complete: in practice, for any non-trivial
program, it is infeasible to try all possible paths, for two reasons:
first, there's exponentially many of them (as a function of how many
if statements the program has in its execution), and second, the
SAT/SMT solver might not be able to solve the constraint on some path
(e.g., a program that crashes if the input has a specific SHA1 hash
value).

Nickolai.

On Sun, Apr 2, 2017 at 7:14 PM, Anonymous wrote:
> Was there any sort of pen testing on EXE to figure out vulnerabilities within it such that it doesn’t generate input of deaths for vulnerable code?

EXE as described in the paper doesn't deal with randomness; solving it
would presumably require enumerating all possible sources of
randomness (e.g., getting a random number from the OS by reading
/dev/urandom), and turning that randomness into a symbolic expression.

Nickolai.

On Sun, Apr 2, 2017 at 6:43 PM, Anonymous wrote:
> How does EXE handle creating an input file with a program that will crash due to a randomly generated dynamic variable?

They're somewhat complicated and non-algebraic (e.g., floating point
addition is non-commutative in general because adding two small values
works "as you might expect" but adding a small value to a large value
might be a no-op since the small value might be too small to represent
at the large value's exponent).  Also, they're rarely used in contexts
that might cause security bugs that this paper is looking for.

Nickolai.

On Sat, Apr 1, 2017 at 11:00 PM, Anonymous wrote:
> Why doesn't STP support floating-point numbers? Doesn't this restrict the use of EXE in checking programs?

There's certainly some work on symbolic execution for different
languages.  There's a bunch of low-level symbolic execution tools that
work at even lower levels than EXE (e.g., S2E runs an entire virtual
machine in symbolic execution mode), so that they're language
agnostic.  There's some symbolic execution for Java and C#, I believe,
the latter at Microsoft.  There's some work also on adding symbolic
execution to the Python interpreter (as opposed to implementing it
inside of Python, as in lab3); that has some downsides (complexity)
and upsides (completeness).

Nickolai.


On Sat, Apr 1, 2017 at 4:31 PM, Anonymous wrote:
> I've started on lab3 and I see we are building something similar to EXE for Python programs. Are symbolic execution programs often used when developing applications in Python and other high-level languages? I'd never heard of them before but they do seem really useful so I'm curious about the prevalence.

On Sun, Apr 2, 2017 at 12:47 PM, Anonymous wrote:
> Why don't the creators of EXE use BFS rather than DFS with some heuristics? In terms of coverage, it appears as though BFS gets to a higher percentage faster than DFS. Does it just come down to the heuristics the DFS search uses to prune and that being better than DFS in the long run?

One intuition might be that, in a complex program, BFS would never get
very far from the input -- you would immediately create tons of
different branches, and BFS would keep exploring a little bit of each
one, but would never get to an actual bug that's a number of steps
down one branch.

There's a bunch of research on what heuristics to use for symbolic
execution.  Nothing definitive, though; different heuristics work
better/worse for different programs.

Nickolai.

The EXE developers worked closely with SAT solver developers (the STP
SAT solver) to make the SAT solver fast enough for their needs.  The
authors are these two developers together: Dawson, Cristian, and Pavel
were building EXE and working on finding bugs, and Vijay and David
were building their SAT solver, STP, at the time.  This paper is, in
part, why SAT solvers are now good enough to tackle these kinds of
tasks.

Nickolai.

On Fri, Mar 31, 2017 at 6:44 PM, Anonymous wrote:
> Why weren't general SAT solvers considered for use in programs like EXE before STP? Were the
> existing implementations of SAT solvers too slow, or was there not a good way to convert specialized
> decision procedures into general satisfiability constraints?

Floating point is somewhat complicated and non-algebraic (e.g.,
floating point addition is non-associative in general because adding
two small values works "as you might expect" but adding a small value
to a large value might be a no-op since the small value might be too
small to represent at the large value's exponent).  Representing the
FPU as a boolean circuit is possible but really inefficient, compared
to how most SAT solvers deal with integer operations.  Also, floating
point is rarely used in contexts that might cause security bugs that
this paper is looking for.

Nickolai.

On Sun, Apr 2, 2017 at 9:22 PM, Anonymous wrote:
> 6.858 Lecture Question 12
> Marcus Boorstin
>
> Why doesn't STP support floating-point arithmetic?  Even accounting for traditionally problematic areas like rounding errors, the processor still has a set of rules for each floating-point operation built into its silicon that operates deterministically on given areas of memory, so it seems that that same set of rules (which as far as I know is documented) could be written into STP in software.

If the function pointer is a concrete value, it runs as it would in a
regular execution environment.

For symbolic function pointers, EXE treats it the same way as a double
dereference (asks STP for some value that matches the current
constraints, and runs with that, instead of considering every possible
value allowed by the symbolic constraints).

Nickolai.

On Sun, Apr 2, 2017 at 3:43 PM, Anonymous wrote:
> How does EXE handle function pointers?

Any complex program likely has too many control flow paths to explore
in a reasonable amount of time.  Or, the constraints might be too hard
to solve: e.g., a program that dereferences a null pointer if the SHA1
of its input is a certain value.

Nickolai.

On Sat, Apr 1, 2017 at 2:15 PM, Anonymous wrote:
> Are there programs for which EXE could not exhaustively determine the safety of all paths?

This paper mentions they ran an earlier version of EXE on file system
code from the Linux kernel and found a number of bugs in it.  So,
yeah, kernel code isn't much different from any other code that EXE
could run.

EXE is not actually complete: in practice, for any non-trivial
program, it is infeasible to try all possible paths, for two reasons:
first, there's exponentially many of them (as a function of how many
if statements the program has in its execution), and second, the
SAT/SMT solver might not be able to solve the constraint on some path
(e.g., a program that crashes if the input has a specific SHA1 hash
value).

Nickolai.

On Sun, Apr 2, 2017 at 9:07 PM, Anonymous wrote:
> I found the idea of instrumenting system code quite interesting. Is it possible to
> instrument core kernel subsystems? As in, the subsystems that are generally not exported
> to userspace in an microkernel design.
>
> From the paper it appears that the only major limitations are the lack of pointer and floating
> point support. Is it, therefore, safe to assume that EXE can find bugs in any program the does
> not use floating point and double dereferences?

EXE is not actually complete: in practice, for any non-trivial
program, it is infeasible to try all possible paths, for two reasons:
first, there's exponentially many of them (as a function of how many
if statements the program has in its execution), and second, the
SAT/SMT solver might not be able to solve the constraint on some path
(e.g., a program that crashes if the input has a specific SHA1 hash
value).

Nickolai.

On Sun, Apr 2, 2017 at 9:18 PM, Anonymous wrote:
> What are the types of vulnerabilities that EXE cannot find?

Symbolic execution is reasonably widely used for finding bugs.
There's open-source systems like S2E (http://s2e.systems/), symbolic
execution is used by various bug-finding companies (like GrammaTech),
and it's used in internal tools at Microsoft (e.g., the SAGE system).
It's usually used in focused ways, though (e.g., taking a Windows
driver and testing it on symbolic inputs); it's not practical to run
all of Windows through symbolic execution with completely symbolic
input.

Nickolai.

On Sun, Apr 2, 2017 at 9:32 PM, Anonymous wrote:
> EXE seems like a very useful tool. How much use does it (or similar systems) see in the industry? Is it common to run all code through it? Or perhaps critical routines?

In practice, symbolic execution cannot explore all paths in any
reasonably interesting program, but despite that, it seems reasonably
effective at searching program executions for bugs.  In terms of
scaling, there's a cool system called S2E (http://s2e.systems/) that
does symbolic execution for an entire qemu virtual machine.

Nickolai.

On Sun, Apr 2, 2017 at 9:53 PM, Anonymous wrote:
> Zane Markel
> Reading Question for Lecture 12
>
> How well does EXE scale? Beyond the performance metrics listed in the paper, I'm trying to get a sense of what programs can be realistically tested with symbolic execution. I get that most command line utilities could be checked. What about bigger user applications like word processors (especially ones that take macros) or browsers? Perhaps closer to the line edge of feasibility, what about utilities with large, complex input like image transformers and gzip?

If EXE has accumulated a bunch of constraints, but they consist of
several independent groups, it considers each group's query
independently, and if it already issued a query for that group's
constraint, it doesn't bother issuing it again (even if some other
group's constraint did change).

Nickolai.

On Sun, Apr 2, 2017 at 12:19 AM, Anonymous wrote:
> I don't understand the constraint independence optimization very well. How does it let us avoid rerunning repeated queries?

NP-completeness has little to do with the practical running time of any
particular SAT solver implementation.  None of them are trying to
contradict NP-completeness; they just aim to efficiently implement
heuristics that work well for constrains that arise in practice (and there
has been a huge amount of progress in this direction).

Nickolai.

On Apr 2, 2017 8:24 PM, Anonymous wrote:

> Shouldnt the SAT-solver still cause STP to run in a similar amount of time
> as the Nelson-Oppen approach, because 3-SAT is still unsolvable in
> polynomial time, so the problems being solved by this approach are similar
> to the downsides of the Nelson-Oppen approach?
>
Symbolic execution is reasonably widely used for finding bugs; the
ideas are not really tied to C.  There's open-source systems like S2E
(http://s2e.systems/), symbolic execution is used by various
bug-finding companies (like GrammaTech), and it's used in internal
tools at Microsoft (e.g., the SAGE system).

Nickolai.

On Sun, Apr 2, 2017 at 9:44 PM, Anonymous wrote:
> Do other programming languages have similar testing tools?

Lab 3 will get you to build an EXE-like thing for Python.  The most
expensive parts of symbolic execution are queries to the SMT solver,
and the choice of static vs runtime typing doesn't affect that very
much.

Nickolai.

On Sun, Apr 2, 2017 at 8:58 PM, Anonymous wrote:
> Could EXE be ported to work on dynamically typed or interpreted languages, or
> does the kind of checks that EXE perform is most efficient for statically
> typed or compiled languages?

I don't believe so.  There's a lot of work on race detectors, such as
Eraser (http://cseweb.ucsd.edu/~savage/papers/Tocs97.pdf) and many
more tools since then.

Nickolai.

On Sun, Apr 2, 2017 at 9:39 PM, Anonymous wrote:
> Does EXE have any mechanism for detecting (possible) race conditions?

Yup, gzip, image processing libraries, etc seem like a good potential
use cases for symbolic execution.  For any non-trivial program, EXE is
unlikely to explore all possible paths, because there's just too many
of them, and EXE/STP might encounter complex constraints that take a
long time to solve.  But it doesn't have to explore all paths to find
some paths that lead to bugs.

Nickolai.

On Sun, Apr 2, 2017 at 9:24 PM, Anonymous wrote:
> The benchmarks given in the paper are given relative to different optimizations in EXE. How computationally expensive is this sort of system? In class it was mentioned that compression / decompression software often has errors in it. How feasible would it be to run EXE or a similar system on the entire utility?

http://klee.github.io is from the same authors.

http://s2e.systems is a different open source symbolic execution system.

Nickolai​.

On Apr 2, 2017 8:23 PM, Anonymous wrote:

> Is the code they produced for the paper available anywhere? I.e. the EXE
> system.
>
Yup, EXE is not actually complete: in practice, for any non-trivial
program, it is infeasible to try all possible paths, for two reasons:
first, there's exponentially many of them (as a function of how many
if statements the program has in its execution), and second, the
SAT/SMT solver might not be able to solve the constraint on some path
(e.g., a program that crashes if the input has a specific SHA1 hash
value).

So, in your example, EXE's heuristics might de-prioritize the
exploration of that loop, and probably not find the bug in question.

Nickolai.

On Sun, Apr 2, 2017 at 9:23 PM, Anonymous wrote:
> Suppose EXE encounters a loop which runs n times, where n is an input. Can EXE catch a bug which would be triggered only at the last pass of the loop only when n satisfies some condition (e.g. n%1024 == 0)? I feel this is a tricky case since EXE would probably give low priority to running the same loop n times, and such bugs might represent weird corner cases for buffer overflows and the like.

I think there isn't actually much to it: they effectively unroll the
loop, so that the loop turns many "if" statements that might or might
not break out of the loop.  And then they handle these "if" statements
just like any other "if" statement.

Nickolai.

On Sun, Apr 2, 2017 at 9:56 PM, Anonymous wrote:
> The paper only briefly mentions how they handle loops. I would be interested to know more specific details.

EXE takes a program like this:

  int i;
  make_symbolic(&i);
  if (i > 5)
    return;
  if (i < 2)
    return;
  /* somehow trigger an error here, perhaps using assert() */

and generates queries for STP that look like this (for the branch
where both if statements didn't return):

  (NOT (i > 5)) AND (NOT (i < 2))

now STP says either "no, that's not possible", in which case EXE
determines there's no way to get to that error, or it says "yes,
that's possible, and here's an example assignment (e.g., i=3)", in
which case EXE tells the user "if you plug in i=3, the program will
hit an error".

Nickolai.

On Sun, Apr 2, 2017 at 9:34 PM, Anonymous wrote:
> I don't really understand how STP works - it generates test cases, right? But I don't really understand what it's solving. Why is it like a SAT solver?

That's roughly what the paper describes.  Could you be more specific?

Nickolai.

On Sun, Apr 2, 2017 at 9:54 PM, Anonymous wrote:
> How does the EXE generate all possible values that could cause a bug in input?

One hardness stems from the fact that floating-point operations are
"weird": e.g., addition can be non-commutative/non-associative when
dealing with operands of very different magnitude.

It's also not super relevant because floating point is rarely used in
security-critical code.

Nickolai.

On Sun, Apr 2, 2017 at 2:05 PM, Anonymous wrote:
> What is so hard about floating-point operations such that EXE does not support floating-point?

There's three general approaches: stub out the external interface with
a simple replacement (e.g., return "timed out" for every RPC, or
include a more sophisticated model that tries to play along), allow
external calls if they're stateless, or include the RPC server in the
"symbolic world" and execute everything together.

Nickolai.

On Mon, Mar 27, 2017 at 2:46 AM, Anonymous wrote:
> How does EXE handle the fact that exploring branches may have external side effects (RPC calls, etc)?