Published on Wed 30 October 2024 by @sigabrt9
TL;DR: A case study in using AFL++, afl-cov and basic custom harnesses to find a bug in libsoup for a public bug bounty program.
Introduction
Bug bounty programs with open-source software allow hunters to set up a fully custom environment to play with, which sometimes leads to the discovery of interesting bugs with little time investment.
One of the first things I like to do on these types of programs is to write basic harnesses and run the existing ones. This helps to learn how to configure the project, get familiar with the code, and sometimes, with a bit of luck, find some low-hanging bugs.
The case in this blogpost is about the public Gnome bug bounty program hosted on YesWeHack, and more specifically the libsoup library.
The libsoup library is an HTTP client/server library for GNOME that integrates with the glib library. It was an easy choice within the scope, as it is easy to identify what input could come from a user.
For those want to reproduce or follow along, the commit used is 3c54033634ae537b52582900a7ba432c52ae8174.
The setup
Getting open-source software to compile the way you want with your fuzzing tools of choice can be a bit tedious. In the case of AFL, it’s recommended to use a Dockerfile, as you may break packages within your installation during trial-and-error testing.
The AFL++ repository offers an initial Dockerfile that installs basically everything we may want and is a good starting point to compile an instrumented libsoup. While there are probably better ways to compile libsoup with AFL, a quick-and-dirty Dockerfile such as the following gives a usable base:
FROM aflplusplus/aflplusplus
RUN apt update && apt install -y libnghttp2-dev libsqlite3-dev libpsl-dev glib-networking tmux
WORKDIR /fuzzing/
##Getting libsoup, compile it with afl-clang-fast and asan
RUN git clone https://gitlab.gnome.org/GNOME/libsoup.git libsoup
ENV CC=/AFLplusplus/afl-clang-fast
ENV CXX=/AFLplusplus/afl-clang-fast++
ENV CFLAGS='-fsanitize=address'
ENV CXXFLAGS='-fsanitize=address'
WORKDIR libsoup
RUN meson setup _build --prefix /fuzzing/build_libsoup && meson install -C _build
WORKDIR /fuzzing/
The libsoup maintainers already fuzzed some API functions with four different harnesses that can be found in the fuzzing folder. It can be interesting to start by using the provided harnesses to see which functions are reached and if it is possible to achieve broader coverage. In this case however, I did not use those harnesses because I wanted to code my own.
In general, functions that include the word "parse" or "decode" are interesting targets. In this case, both soup_headers_parse_request
and soup_headers_parse_response
seem to be perfect targets: as the name suggests, they should both parse data from the network, which potentially means untrusted data. Moreover, both functions accept a const char *str
and an int len
as parameters, which allows to easily create libfuzzer-style harnesses that use the same arguments and are fully compatible with AFL++ for persistent fuzzing – avoiding the overhead of restarting the executable (amongst others) for every pass.
Here’s the first dummy harness:
#include <libsoup/soup.h>
int
LLVMFuzzerTestOneInput (const unsigned char *data,l size_t size)
{
SoupMessageHeaders *req_headers;
guint ret;
req_headers = soup_message_headers_new (SOUP_MESSAGE_HEADERS_REQUEST);
ret = soup_headers_parse_request((const char* )data,size,req_headers,NULL,NULL,NULL);
soup_message_headers_unref (req_headers);
return 0;
}
Adding a small dictionary with known HTTP headers can also help. Within the libsoup source code, the file soup-headers-names.c
contains all the supported headers. For the corpus, the file header-parsing-test.c
contains interesting cases to include for the fuzzer. Usually, the test files within a project are always interesting to look at for harnesses ideas and/or corpus samples.
I did not find interesting bugs with this harness, but it is possible to review the coverage it allowed to reach. The stats given on the AFL status screen are interesting to compare harnesses and fuzzers, but not very helpful in our case, as it is not possible to be sure that we hit a specific function or that we tested all the code within a function. The afl-cov tool (and line coverage tools in general) allows to see exactly which portions of the code were reached by the fuzzer, which is way more meaningful in this case.
The goal then is simple: adding more and more harnesses to cover as much code as possible.
For example, with the same harness, we can add the function soup_message_headers_get_content_type
:
#include <libsoup/soup.h>
int
LLVMFuzzerTestOneInput (const unsigned char *data, size_t size)
{
SoupMessageHeaders *req_headers;
guint ret;
req_headers = soup_message_headers_new(SOUP_MESSAGE_HEADERS_REQUEST);
ret = soup_headers_parse_request((const char* )data,size,req_headers,NULL,NULL,NULL);
if (ret == SOUP_STATUS_OK){
soup_message_headers_get_content_type(req_headers, NULL);
}
soup_message_headers_unref (req_headers);
return 0;
}
Adding the following lines to the Dockerfile will clone libsoup one more time and compile it with gcc and the --coverage
flag:
## Getting libsoup, compile it with coverage
RUN git clone https://gitlab.gnome.org/GNOME/libsoup.git libsoup-cov
ENV CC=gcc
ENV CXX=g++
ENV CFLAGS=--coverage
ENV CXXFLAGS=--coverage
WORKDIR libsoup-cov
RUN meson setup --prefix /fuzzing/build_libsoup_gcov _build && meson install -C _build
To run correctly with gcc, the harness needs to be updated, for example:
#include <libsoup/soup.h>
#include <stdio.h>
#include <libsoup/soup.h>
int
libfuzzer_harness (const unsigned char *data, size_t size)
{
SoupMessageHeaders *req_headers;
guint ret;
req_headers = soup_message_headers_new (SOUP_MESSAGE_HEADERS_REQUEST);
ret = soup_headers_parse_request((const char* )data,size,req_headers,NULL,NULL,NULL);
if (ret == SOUP_STATUS_OK){
soup_message_headers_get_content_type(req_headers, NULL);
}
return 0;
}
int main(int argc, char *argv[]){
size_t size;
int ret;
unsigned char* data;
FILE *file = fopen(argv[1], "r" );
if ( file == 0 ){
return -1;
}
fseek(file, 0, SEEK_END);
size = ftell(file);
fseek(file, 0, SEEK_SET);
data = malloc(size);
if (data == NULL){
return -1;
}
fread(data,1,size,file);
ret = libfuzzer_harness(data,size);
free(data);
fclose(file);
return 0;
}
Using afl-cov should generate an HTML file which recaps the coverage the fuzzer hits. It's also possible to use lcov with clang and call the libfuzzer harness directly, but I really like the --live
feature of afl-cov.
For example, the following screenshot shows the coverage result for the harness calling the function soup_headers_parse_request
: lines highlighted in red are related to the code that was never executed by the fuzzer, and lines in blue are the ones that were executed. Note, however, that in order to generate this output, afl-cov only used input that generates new coverage, which means that the number on the left (supposed to show the number of times a line is executed by the fuzzer) is not entirely accurate, as all inputs that did not generate new coverage were not used.
A crash
Reviewing the coverage, the function parse_content_foo
(actual function name in the source code) was hit, but not all the lines were executed by the fuzzer:
The function parse_content_foo
seems to be always called with *params = NULL
and returns early. It is possible to hit more coverage just by passing a GHashTable
instead of NULL
in soup_message_headers_get_content_type
. It was also obvious that the fuzzer was missing something just by checking the coverage for the soup_message_headers_get_content_type
function:
#include <libsoup/soup.h>
int
LLVMFuzzerTestOneInput (const unsigned char *data, size_t size)
{
SoupMessageHeaders *req_headers;
guint ret;
GHashTable *params;
req_headers = soup_message_headers_new(SOUP_MESSAGE_HEADERS_REQUEST);
ret = soup_headers_parse_request((const char* )data,size,req_headers,NULL,NULL,NULL);
if (ret == SOUP_STATUS_OK){
soup_message_headers_get_content_type(req_headers, ¶ms);
}
soup_message_headers_unref (req_headers);
return 0;
}
After a few seconds, an interesting crash arises:
Replaying the crash can be done directly through the harness compiled with AFL-clang:
An out-of-bound write in the heap is a powerful primitive, that can lead to remote code execution in many contexts. However, the bug bounty program scope is on the library itself, so finding a real-life example and exploiting this vulnerability on it is usually not required to have it fixed and may not be worth the extra time (depending on your objectives).
For reporting, it can be helpful to provide a Dockerfile so that triagers can easily set up the same environment. Creating one using the previous Dockerfile should be easy.
I did not go too deep into the root cause analysis, and just showed, using code modified for afl-cov (without the --coverage
flag), that it was possible to overwrite the size of the next chunk using this bug. Afterwards, the complete exploit will heavily depend on the code of the binary using libsoup.
The last thing to do should be to confirm that this vulnerability can be triggered from the network. We can use the simple-httpd.c
example and apply the following diff file to confirm that, by adding a call to soup_message_headers_get_content_type
(or any other function that can call the decode_rfc5987
function). soup_message_headers_get_content_type
seems to be a perfect fit in this case, as getting the content type of a request (or a response) isn't too unusual of a use case. Once again, I did not go too deep into figuring out which API calls could trigger the bug.
200a201,202
> GHashTable *params = NULL;
> const char *content_type;
220c222,223
<
---
> content_type = soup_message_headers_get_content_type(soup_server_message_get_request_headers(msg),¶ms);
> g_print ("%s\n", content_type);
The proof of concept for the crash and the patched simple-httpd.c
server can be found here
Conclusion
While using fuzzer on compiled languages such as C and C++ is already common in open-source software development, a quick and simple setup with basic harnesses can still yield results – and a nice and easy bounty in this case.
I would be interested in the future to use fuzzer on interpreted languages, such as ruzzy for ruby, Jazzer for Java, atheris for python and others in order to find logic bugs on opensource library that I can replay on bug bounty programs
2024-08-15: Report sent to STF Gnome bug bounty
2024-08-16: Report acknowledged by STF
2024-08-20: Further clarification requested & provided
2024-09-11: First bounty awarded
2024-09-11: Further clarification provided
2024-09-12: Second part of the bounty awarded