How to perform digital forensics of malicious PDF files? Easily checking if a PDF document has malware or backdoors

The PDF format has become one of the most popular ways to view files, as this format is compatible with all kinds of technological devices, including desktop computers, laptops, electronic tablets and smartphones. Because of this universal presence, threat actors began using these documents to deliver malware and easily deploy other attack variants.

This time, specialists from the International Institute of Cyber Security (IICS) will show you how to apply digital forensics to analyze PDF documents and determine if they are compromised with any variant of malicious content.

Before keep going, it is worth recalling that the attack chain via PDF usually begins by sending malicious documents via email. When these documents are opened on the affected system, in most cases JavaScript code is executed in the background capable of exploiting vulnerabilities in tools such as Adobe PDF Reader or storing executable files for later attack stages.

PDF documents, whether legitimate or malicious, have 4 main elements, mention digital forensics experts:

  • Header: Contains information about the version of the document and other general data
  • Body: Refers to the objects of the document. this element consists of flows that are used to store data
  • Cross-reference table: pointing to each object
  • Trailer: Element pointing to the cross-reference table

Now that we know the essential information about an attack via PDF documents, we will be able to review each way to analyze these elements.

PDF scanning using PDFiD

PDFiD is a component of Didier Stevens Suite capable of scanning PDF documents using a string list to detect JavaScript elements, embedded files, actions when opening files, and counting specific lines in a document.

In this example, we can see that PDFiD detected various objects, flows, JavaScript code, and OpenAction elements in the Report.pdf file. According to digital forensics experts, the presence of these elements suggests that the analyzed file contains JavaScript or Flash scripts. The /Embedded element indicates the presence of other formats within PDFs, while the /OpenAction, AA, and /Acroform elements initiate automatic actions when opening the file.

View the contents of PDF objects

We already know that there is JavaScript code inside the parsed PDF file. This will be the starting point of the research; to find an indirect JavaScript object, run the tool.

Based on the result of these scans, the hidden JavaScript code will execute the malware every time the file is opened, so the next step is to extract the malicious payload.

Extracting embedded files using Peepdf

This is a Python tool that contains all the necessary components for the validation and analysis of PDF files, mentioned digital forensics experts. To take full advantage of its capabilities enter the peepdf – i file_name.pdf command. The -i function will enable the interactive mode of the script:

To find more features, enter the –help command:

The scan result indicates that there is a file embedded in object 14. A closer inspection of this object allows you to see that it points to object 15; in turn, object 15 points to object 16. Finally, there are indications of the presence of a malicious file on object 17.

According to the content of the PDF, there is only one sequence in it, which also points to object 17. Therefore, object 17 is a sequence with an embedded file.

Stream 17 contains a file signature that begins with MZ and a hexadecimal value that begins with 4d 5a. According to digital forensics experts, these are signs that point to an executable file.

Next, we will save the sequence as a virus.exe executable.      

Behavioral analysis

Run the file in sup-tuals-tion using a 32-bit Windows 7 system.

As you can see from the Process Explorer window, virus.exe created two suspicious processes (zedeogm.exe, cmd.exe) that were interrupted after starting.

According to Process Monitor, the zedeogm.exe file was saved within running processes. Then he changed the rules set in Windows Firewall. The next step was to run the WinMail.exe file; after that, the program launched cmd.exe to run the tmpd849fc4d.bat file and stop the process.


The use of digital forensics techniques for the analysis of PDF documents can be essential to avoid interacting with malicious content. Together with other preventive measures, this practice can close one of the main vectors of threats today.

Other recommended measures to prevent this threat include:

  • Verify the sender of a spam email
  • Ignore links or attachments in unsolicited emails
  • Keep your antivirus tools always up to date
  • Check for typos, very common in malicious emails

As usual, we remind you that this material was prepared for informational purposes only and should not be taken as a call to action. IICS is not responsible for the misuse that may occur to the information contained herein.

To learn more about information security risks, malware variants, digital forensics, vulnerabilities and information technologies, feel free to access the International Institute of Cyber Security (IICS) websites.