LangSec Workshop

at IEEE Security & Privacy, May 24, 2018

Papers & Slides

Keynote

Mike Walker, "Persuasive Language for Language Security: Making the case for software safety."

Cyber Grand Challenge was a DARPA program that proved the feasibility of real time conflict between hacking machines. It was also a live demo, a moment, and a symbol. If LangSec is, as Buckminster Fuller instructed, not to struggle against a problematic model but instead to build a new one, then symbols, demonstrations, and ways to communicate the power of a new paradigm for software construction must be employed. A collection of narratives and lessons on the subtle task of persuading power, authored by a hacker who spent four years in Washington attempting to change the world.

[Full text]

Papers

Keith Irwin, "Redesigning Secure Protocols to Compel Grammatical Compliance Checking."

A frequent goal of LangSec research is to demonstrate the benefits of having protocol definitions whose messages can have their grammar checked, to ensure that they are well-formed prior to any other processing. It is generally taken as a given that, although we can design protocols that enable this checking, we cannot compel implementers of these protocols to actually carry out these checks. In this paper we demonstrate that it is possible to modify the protocols through the use of encryption such that the implementer is essentially required to do the checking if they wish their implementation to interoperate with other implementations without errors. Crucially, this will be the case even when they are only sending and receiving well-formed messages, thus transforming the silent vulnerability of unchecked messages into an obvious error. In particular, we demonstrate how to do this for checking whether or not messages or portions thereof belong to a specified regular language and whether or not they belong to a specified context-free language.

[slides] [paper]

Jayakrishna Menon, Christophe Hauser, Yan Shoshitaischvili, and Stephen Schwab, "A binary analysis approach to retrofit security in input parsing routines."

Despite numerous attempts to mitigate memory corruption vulnerabilities in low-level code over the years, these vulnerabilities remain the most common vector of software exploitation today. Their common cause is the presence of errors in string manipulation, which are often found in input parsers, where the format of input data is verified and eventually converted into an internal program representation. This process, if done manually in an ad-hoc manner, is error prone and easily leads to unsafe and potentially exploitable behavior. While principled approaches to input validation exist, such as those based on parser generators (e.g., the legacy Lex and the newer Ragel), these require a formalization of the input grammar, which is not always a straightforward process and tends to dissuade programmers. As a result, a large portion of input parsing routines as found in commodity software is still implemented in an ad-hoc way, causing numerous security issues. We propose to address this problem from a post-development perspective, by targeting software presenting security risks in opaque, closed-source environments where software components have already been deployed and integrated, and where re-implementation is not an option (e.g., as part of an embedded device's proprietary firmware).

[slides] [paper]

Rodrigo Branco, Kekai Hu, Henrique Kawakami, and Ke Sun, "A Mathematical Model of Exploitation and Mitigation Techniques Using Set Theory."

One of the most challenging problems in computer security is formalization of vulnerabilities, exploits, mitigations, and their relationships. Despite various existing researches and theories, a mathematical model that can be used to quantitatively represent and analyze exploit complexity and mitigation effectiveness is still missing.

In this work, we introduce a novel way of modeling exploits and mitigation techniques with mathematical concepts from set theory and big-O notation. The proposed model establishes formulaic relationships between exploit primitives and exploit objectives, and enables the quantitative evaluation of vulnerabilities and security features in a system. We demonstrate the application of this model with two real-world mitigation techniques. It serves as the first step toward a comprehensive mathematical understanding and modeling of exploitations and mitigations, which will largely benefit and facilitate the practice of systems security assessment.

[slides] [paper]

Erik Poll, "LangSec revisited: input security flaws of the second kind."

We consider a simple classification of input flaws in two categories: (1) flaws in processing input, with buffer overflows in parsers as the classic example and (2) flaws in forwarding input to some other system, a.k.a. injection flaws, with SQL injection and XSS as classic examples. The LangSec paradigm identifies common root causes for both categories of flaws, but much of the LangSec literature and efforts focus on the first category of flaws, especially on techniques to eliminate parser bugs. In this paper, we classify the existing approaches to tackling the second category of flaws, to identify their anti-patterns and firmly place these in the LangSec perspective.

[slides] [paper]

Invited talks

(Session chair: Julien Vanegue)

Julien Vanegue, "The Automated Exploitation Grand Challenge: A Five-Year Retrospective."

Five years ago as the DARPA Cyber Grand Challenge was yet to be announced, the author presented a set of criteria that captured what could be considered an unconditional success to the problem of automating software security exploitation for modern vulnerabilities. Dubbed the Automated Exploitation Grand Challenge http://openwall.info/wiki/media/people/jvanegue/files/aegcvanegue.pdf, the list of eleven problems covering five distinct area of automation included challenges on Exploit Specification, Input Generation, Exploit Composition, Environment oracles as well as State Space Management. Some of the major problems included Heap Likelihood Inference (including in the presence of non-determinism), the discovery of side channel attacks (in particular timing side channels), as well as concurrency attacks.

Substantial progress was made towards achieving automation in these problems in the last five years. Still, creating an automated exploit generation framework for real targets in the presence of modern exploit mitigations remains a huge challenge. This talk revisits the problems posed five years ago and surveys the new tools and techniques that can now be leveraged, as well as new classes of security problems that were uncovered in the last five years that will impact the future of exploit automation.

[slides]

Thoman Dullien, "State machines, intended, emergent, and unintended computation."

Systems security, to a certain extent, is concerned with restricting what a general-purpose computer is allowed to do---namely, violate security requirements.

Computer science provides rich methods to reason about computations performed on various idealized and concrete machines that have been designed to facilitate either (or both) programming and reasoning about programs. In general, CS focuses on 'how to achieve a goal when things work properly'.

In the last 30 years, it has become evident that failures of systems security are usually results of 'what happens when things do not work properly'. In most real-world situations, the effect of 'something going wrong' is the emergence of a new computational device with emergent semantics---the 'weird machine'. Attackers then leverage the computational power of it to violate security requirements.

Academic systems security has, for the most part, outwardly ignored the existence of the weird machine phenomenon, while inwardly focusing on making some variants of it less programmable (through various randomizing mitigations). Applied attackers have, for the most part, used the abstraction of weird machine to guide their thinking, but do not care much for reducing its power.

This talk will explain how the 'weird machine' arises, and then argue that mitigating attacks means either restricting the computational power of the weird machines that can arise, or ensuring that they are practically (and, ideally, provably) unprogrammable.

[slides]

Gang Tan, "Bidirectional and executable specifications of machine code decoding and encoding."

Binary analysis, which analyzes machine code, requires a decoder for converting bits into abstract syntax of machine instructions. Binary rewriting requires an encoder for converting instructions to bits. We describe domain-specific languages that allow the specification of machine code decoding and encoding. It features a bidirectional grammar (bigrammar) language that enables the specification of both decoding and encoding in a single grammar. With dependent types, a bigrammar enables the extraction of an executable decoder and encoder as well as a correctness proof showing their consistency. The bigrammar DSL is embedded in Coq with machine-checked proofs. We have used the bigrammar DSL to specify the decoding and encoding of subsets of the x86-32 and MIPS instruction sets. We have also extracted an executable decoder and encoder from the x86 bigrammar with competitive performance.

This work is in collaboration with Greg Morrisett at Cornell University.

[slides]

Ben Delaware, "Using verified parsers and serializers to build format-aware fuzzers."

[slides]

Industry Case Studies and Work in Progress

Mark Tullsen, "Challenges and Possibilities for Secure ASN.1 Encoders and Decoders."

ASN.1 (Abstract Syntax Notation One) is a ubiquitous data description language that defines data and its encoding in a language-neutral and cross-platform framework. In many respects, ASN.1 does things right: (1) it is not limited to a single language or specific data representation, (2) it provides expressive data description features, (3) it allows for composing data descriptions, (4) it provides adopters with a choice of encoding schemes, and, (5) and it provides mechanisms for extensibility to allow protocols and formats to evolve gracefully.

However, ASN.1's complex and evolving language features, as well as its complex and evolving encoding schemes, together: hinder adoption, increase the complexity of tools & compilers, and necessitate large and complex encoder/decoder implementations.

This last creates a large attack surface for adversaries. Ideally, we would like to take advantage of ASN.1's power while also ensuring that the encoding/decoding code is free from error and vulnerabilities. Based on our experience at Galois providing "High Assurance ASN.1", we survey and assess a number of approaches to increasing assurance of code: Roll your own solutions (sub-setting or replacing ASN.1). Library based solutions. Using existing commercial compilers. Using existing open-source compilers. Development of a fully-compliant, open-source, ASN.1 compiler (potentially verified, potentially certifying). Extensive testing of encoders/decoders. And, last but not least, formal verification of encoders/decoders.

[slides]

Joey Dodds, "Comprehensive Language and Protocol Verification at Galois."

In this talk, we present our approach to language verification. In order to be correct with respect to languages, we consider both parsers and protocol languages. The parser must correctly represent textual or wire data to the rest of the system. A typical goal is a round trip property, that an arbitrary input object will be transformed into a bitstream that will decode into the same object. We prove this property interactively in the Lean theorem prover for arbitrary instantiations of an internal data type used in the Guardtime timestamping service.

We will then discuss protocol languages. Beyond parsing protocol messages, we must be sure that the parsed messages accurately represent the abstract state machine that represents the protocol. We use our verification of the TLS handshake in Amazon’s s2n library as an example of this proof style. In this proof, we show that existing C code is equivalent to a high-level specification of the TLS handshake state machine. From there, we can also prove higher level properties of protocols (does the state machine itself make sense?). We discuss such properties that we have proved about both s2n and Guardtime.

[slides]

Adam Crain, Prashant Anantharaman, "LangSec for Critical Infrastructure: SSP21, a secure lightweight SCADA protocol"

Designed according to LangSec principles, SSP21 is an open source development effort to create a modern secure encapsulation layer specifically for industrial control system protocols. It is intended to fill a technology gap where existing technologies like TLS are not applicable, namely for serial communication channels and endpoints with limited bandwidth and/or processing capabilities.

All SSP21 protocol messages and certificate formats are modeled in machine readable format and a code generator is used to create parsers and unparsers. In this presentation we will discuss the SSP21 message format, the certificate format, the (un)parser generator, how LangSec principles were applied, how SSP21 was resilient to fuzzing, and how it integrates with existing protocols and the TCP/IP protocol suite. SSP21 is moving towards standardization in 2018. It includes a reference implementation licensed under BSD-3 that is intended to accelerate adoption and interoperability.

SSP21's design shows how the pitfalls of the DNP3 protocol and the newer IEC 61850 protocol can be avoided. For DNP3, Sistrunk and Crain in 2014 demonstrated a variety of input-handling issues across multiple implementations and vendors, related to the protocol's complex syntax and occasional ambiguity of specification. As the industry transitions from DNP3 to IEC 61850, concerns remain regarding the complexity of 61850's messages, since it relies on ASN.1 notation and BER serialization. The gains from moving to a machine-readable message structure are mostly lost due to the complexity of the grammar and anti-patterns in the serialization format such as nested length fields. SSP21 shows how non-trivial structural benefits can be reaped without extra parsing complexity.

[slides]

Joe Rozner, "Building a Re-targetable Grammar Based Fuzzer."

(work-in-progress report)

[slides]

LangSec Roundtable

2018: Have we reached peak LangSec yet?

The most famous vulnerabilities of 2017 were Meltdown and Spectre. They seem to mark a departure from the LangSec input-driven anti-patterns dominating the vulnerability landscape---or do they?