Denial of service attacks are (possibly coordinated) efforts to disrupt a networked service by overloading it with artificial requests, leaving it unable to fulfill some or all legitimate requests. The simplest form of denial of service (DoS) is a flood of typical requests to a service that is greater than its capacity to serve them. Depending on the target, this may require a large investment in resources on the attacker side as services are often tuned to handle typical requests at scale. Attacks are generally much more effective when special tricks or exploits are used to amplify the impact. Wikipedia currently lists at least 23 such techniques for attack amplification.
A classic example of a DoS attack is the one misusing Network Time Protocol servers. It involved sending very tiny so-called monlist requests to public time servers. Since the NTP protocol runs over UDP, a low-level network protocol which does not set up a connection verifying the return address of the sender, it was possible to spoof the return address of the requests to that of the victim. Unpatched time servers would then send details of the last 600 hosts that requested the time to the spoofed return address, in which they would unknowingly participate in a distributed DoS attack, amplifying the magnitude of the attack by a factor of 556.9.
In this blog post, we will investigate a different class of attacks, and more specifically how data parsers can be exploited to create DoS attacks and how one can defend against them. The idea is to send specifically crafted requests that are very costly to process and may deplete a service's capacity much quicker than typical requests. Such attacks target the processing resources of a service rather than its limited network bandwidth. Particularly successful attacks may be able to take down services with a single attacking machine on a household connection.
how it works
In order to process a request, a service must first decode or parse it according to some specification. Wikipedia,
for example, uses a markup language known as Wikitext for its entries. Whenever an entry is updated, they must parse
the new wikitext and render it to HTML which can be displayed in the browser. If you are unfamiliar with markup languages,
here's how internal links to other pages are encoded using double brackets:
London has [[public transport]]. That
markup is then rendered to
London has <a href="https://en.wikipedia.org/wiki/Public_transport">public transport</a>,
which can be displayed in browsers. Wikitext is just one example; another popular markup language is markdown, which
is used everywhere from GitHub and online forums to this blog.
The way such markup is parsed can be a surface for attack when it scales poorly on certain inputs. Consider for example markup languages that use so-called fish brackets for links. Those may be scanned as below. Candidate spans are marked blue, and confirmed link spans are marked green.
This strategy works well enough for most texts, but it can exhibit some peculiar behavior when links are opened and never closed. When it cannot find a closing fish bracket, it has to backtrack to the token following the opening bracket and continue parsing from there. For the subsequent opening brackets, it must again scan until the end, only to fail and having to backtrack once more:
Note that second example is half the length of the first one, but it requires almost three times the number of probes. The problem is that we scan until the end of the text many times over while making only very little progress. Indeed, let be the length of the text, then token probes are required in total. By recognizing the arithmetic sum, this can be written as
Since the most significant term is quadratic, we say that such parsers have quadratic time complexity. This means that the time spent parsing longer and longer inputs can quickly grow out of control. One could think of this as an attack amplifier; even with bounded input size, the resources required by an attacker to bring down a service can be greatly reduced by exploiting quadratic parsing behavior. When the service accepts input of arbitrary size, the amplification factor is even unbounded!
patching quadratic behavior
The link scanning vulnerability can be fairly trivially be patched by setting a flag whenever we've reached the end of the text scanning for a closing fish bracket. When that flag is set, all subsequent opening fish brackets can be skipped immediately since we know that we won't find a matching closing bracket.
More generally, superlinear complexity emerges whenever the amount of progress is not proportional to the amount of work invested. This usually happens when spans have separate opening and closing delimiters and opening delimiters are allowed inside of the span. When you are in control of the specification, you can disallow these things on the spec level to prevent these behaviors. One way is to use the same token to mark the start and end of a span. Many markup languages accept asterisk delimiters for emphasis. This is generally fine:
When implementing a spec not completely in your own control, you'll have to try find a clever implementation that adheres to the spec with linear time complexity. This may not always be possible.
Even when a span's opening and closing delimiters are identical, it may still be possible to create malicious inputs that
cause superlinear behavior. For example, CommonMark allows code spans that are delimited by same-length runs of backticks.
```f(x)``` is rendered to
<code>f(x)</code> for instance. Starting code spans inside other open code spans
with the same delimiter will be interpreted as a closing delimiter and proportional progress will be made. However, since
these delimiters may be arbitrarily long, we can stack delimiters of increasing length which will never match
and will hence cause repeated backtracking:
The analysis of the scaling behavior is a bit more involved here. One could start with considering the number of delimiters . The scaling function is then identical to the quotient of the number of probes as a function of and the total length as a function of . The number probes ends up being cubic in , whereas the total length is quadratic (another arithmetic sum). Consequently, the number of probes scales with the power of the total text length. While not nearly as bad as quadratic scaling, it is still considered a vulnerability as it is superlinear.
Patching this is more difficult as well. The pulldown-cmark parser builds an index of all delimiters while scanning for a matching delimiter. If the scan ends without a match, that index can be used for subsequent searches to find matching delimiters without scanning the text again.
testing for vulnerabilities
Testing for known adversarial examples is relatively straight-forward by benchmarking the time it takes to parse such strings of increasing lengths. If the relative increase in parse time is significantly greater than the factor by which the input has grown, a regression is likely and the test should fail. Parser projects can include such tests in their continuous integration process to automatically detect performance regressions similar to how correctness is commonly tested.
It is often also possible to search for new vulnerabilities by testing repetitions of random short inputs containing tokens that are meaningful in the language at hand. However, such searches are unlikely to be conclusive. Since the number of possible token strings explodes exponentially with length, this method can only exhaustively test very short inputs with simple structure. It may be possible that some vulnerabilities require either a very specific token sequence that is unlikely to be sampled or a structure that is more complex than simple repetitions.
bounded lookahead grammars
The most rigorous defense against parser denial of service can be established by ensuring that the language belongs to a certain safe family. Many programming languages and file formats can be defined using formal context-free grammars, often presented in Backus–Naur form. Parser generators can automatically create parsers from such definitions. When these grammars belong to families with bounded lookahead like LALR(k) or LL(k), one never needs to look ahead more than tokens to classify a given token. The required number of probes is then trivially bounded by and hence parsers for such grammars require no more than linear time in the size of their input. Examples of such grammars are common SQL, XML and HTML. Do note that services accepting such languages may still be vulnerable to superlinear behavior, just not due to parsing.
Markdown notoriously lacks a formal grammar, largely because of its complex emphasis parsing rules, of which there are no fewer than 17. It is a specification that only in rare cases accounts for performance issues, and many guards must be embedded in its implementations to mitigate DoS attacks. As a result, when a new attack is found for the CommonMark spec, many parsers happen to be vulnerable.
As for cyber security in general, the adversarial nature of finding and patching vulnerabilities can be a lot of fun. We encourage you to only use knowledge of vulnerabilities to defend yourself and others from denial of service attacks, and to disclose them responsibly.
The code for the parser visualization is available on GitHub.