An information-flow security policy constrains a computer system's end-to-end use of information, even as it is transformed in computation. For instance, an information-flow confidentiality policy would not just restrict what secret data could be revealed directly, but restrict any output that might allow inferences about the secret. A key challenge in information-flow security is how to allow programs that reveal some information in their correct operation, while still catching bugs that cause too much information to flow. Expressing a policy quantitatively, in terms of a specific number of bits of information, is often an effective program-independent way of distinguishing what scenarios should be allowed and disallowed. In this research, we explored techniques to make quantitative measurements of information flow practical for realistic systems.
For instance, suppose you've taken a picture of yourself doing something embarrassing:
You'd like to post the picture on the web, but in a way that wouldn't reveal your identity. So you open the picture in your favorite image-editing program and try various transformations:
Unfortunately, it can be difficult to tell just by looking at the output of a program how much information it contains. None of the above images look immediately recognizable, but perhaps an adversary could use a more sophisticated technique to recover the original image. A numeric information-flow bound, though, represents a mathematical limit on what information any adversary could recover. For instance, we used our information-flow measurement tool on each of the transformations above to give an upper bound on the amount of information about the face area of the original image is left in the transformed image. The tool's results confirm that for two of the four transformations, less than 1% of the original information remains, while for the two, it estimates that all of the information remains:
In fact, the second and fourth transformations are discrete analogs of transformations that would be completely information-preserving in their continuous versions (there is probably still some loss of information via rounding that our tool misses). For instance, a easy approach to inverting the swirling second transformation is to just swirl again with the opposite angle, which recovers a much more recognizable image:
In our research, we've developed a family of new techniques for measuring how much information about a program's secret inputs is revealed by its public outputs on a particular execution, in order to check a quantitative policy on realistic systems. Our approach builds on dynamic tainting, tracking at runtime which bits might contain secret information, and also uses static control-flow regions to soundly account for implicit flows via branches and pointer operations. We introduce a new graph model that bounds information flow by the maximum flow between inputs and outputs in a flow network representation of an execution. The flow bounds obtained with maximum flow are much more precise than those based on tainting alone (which is equivalent to graph reachability). The bounds are a conservative estimate of channel capacity: the amount of information that could be transmitted by an adversary making an arbitrary choice of secret inputs. The image transformation example above is one of several case studies on six real C, C++, and Objective C programs, three of which have more than 250,000 lines of code, in which we used the tool to check the confidentiality of a different kind of information appropriate to each program. Its results either verified that the information was appropriately kept secret on the examined executions, or revealed unacceptable leaks, in one case due to a previously unknown bug.
For more details about our techniques, see our paper from PLDI 2008, "Quantitative Information Flow as Network Flow Capacity".
Our implementation of our quantitative information-flow techniques is a dynamic analysis tool called Flowcheck. Flowcheck is a security testing tool: you run your program under its supervision, with certain inputs marked as secret, and it produces an upper-bound estimate of the amount of information about those secret inputs revealed by the program's outputs. Flowcheck is based on the Valgrind framework for dynamic analysis tools.
New in 2016 is Flowcheck version 1.20. It has pretty much the same features as the previous version 1.00, but it has been updated to use a more recent version of Valgrind so that it is easier to compile and run on modern Linux systems.
Version 1.20 of Flowcheck is available for download here.
The distribution contains a modified version of Valgrind, source for a maximum-flow computation tool using the Boost Graph Library, and some glue scripts; it's intended for Linux/x86-32 systems. Some basic documentation of the tool, including a tutorial-style example of its typical usage, is included in a README file.