MS Office Files Static Analysis
MS Office Files Static Analysis
MS Office is frequently used in phishing campaigns and is considered the principal weapon for major bot infection waves, including recent examples Emotet and Trickbot. The helpline receives many samples throughout the year, and analyzing at least their static components seems a necessity in order to deliver a quick response to our clients about whether the file shows malicious signs or not. The handler should not wait until clear evidence is available to consider the file malicious. As soon as malicious signs are seen following the instructions below, the beneficiary should be informed so that they know not to use the file.
Delivery of the suspicious document
- Request that the client send the suspicious document together with a description or details on how they received it (via IM, email, flash drive, etc.). - Email: ask the client to send the .eml file. You can find instructions on how to obtain it here.
NOTE: To make sure the file is successfully delivered and not flagged, we should recommend that the client compress the file in a .zip with a passcode.
Prepare the testing machine
NOTE: [Here you should include a reference to the virtual machine or lab/test machine used to safely perform this test.]
Open in a safe virtual machine or isolated computer built for that purpose.
NOTE: most of the oletools that we will use can be found on the Didier Stevens Suite.
Another option is to use the distro REMnux, created by Lenny Zeltser. It is very useful and includes many of the free document analysis tools mentioned in this article.
Acquire the file
If the file was sent by email, extract the attachments from the .eml file using munpack and save them in tmp by running the following command in the terminal:
munpack /tmp/NameEMLfile.eml ls /tmp
Check the results and determine the type of document (Word, Excel, etc.), filename extension (doc, docx, docm, etc.), characteristics (is it encrypted?, etc.), the file properties (using native Linux file command), and the file metadata (using Exiftool).
If the document is encrypted:
a. Check if the password was included in the original message or email. Password secure files most likely do not allow sandboxing by antivirus software, but in most cases, the password will be sent along with the phishing email/message.
b. Determine the type of encryption for documentation using this command:
python oledump.py -p plugin_office_crypto.py /tmp/NAMEofTHEFile.doc
c. Decrypt the file using decryption command with msoffice-crypt:
msoffice-crypt -d -p PASSWORD /tmp/NAMEofTHEFile.doc /tmp/NAMEofTHEFile.doc.dec
Analyze the decrypted file by following the steps in the Analysis section below.
Perform the static malware analysis
There are 2 common method of payload delivery using MS Office documents:
- Payload download is frequently used because of its flexibility. Malware authors can change the payload on their server while it is being distributed.
- Embedded payload is a more complex task, as cybercriminals have to ensure their malicious code goes undetected by antivirus and is able to achieve its goal.
In new file extensions, macros can be saved and enabled in .docm, .xlsm, .pptm rather than in .docx, .xslx, .pptx. The macros are then contained in a binary file named vbaProject.bin, with two supporting files: vbaProject.bin.rels and vbaData.xml. This provides us another indicator for malicious MS Office documents, as malware authors will often use the older file formats like .doc, .xls, .ppt.
Check the file’s content (if the file is encrypted, you must decrypt it before analysis) and double check for any malicious element using the yara rules as outlined below. If you find a malicious element, analyze it.
Double check if the file has VBA macros and/or is obfuscated. With Olevba, it is possible to scan the file, detect VBA Macros within the file and show the macro source code with VBA strings deobfuscated:
olevba.py NAMEofTHEFile.doc --reveal
Be aware of macro sheets flagged as “Very Hidden” or “Hidden”. This means that the sheet is not readily accessible via the Microsoft Excel User Interface (UI) but upon opening the file, it displays a message asking users to click the ‘Enable editing’ button, then the ‘Enable content’ button. Users who click these unwittingly enable the macro. This is an obfuscation technique that uses the formulas that are set to run upon opening the document. More information about this technique here.
Search for interesting sequences (IOCs) within the file such as URLs, IP addresses, executable filenames, etc. and determine if the information found is malicious. You can use the CIRCL URL Abuse testing form to get an idea of whether a URL is malicious (more information and techniques in Article #140).
Get the streams in the file that contains VBA code and their size:
python oledump.py /tmp/NAMEofTHEFile.doc
This command will display a numbered list of streams. In the left column, you’ll see each stream’s identifying number, which you’ll use to select a stream in the next steps. The larger number directly to the left of the stream name is the stream’s size.
Some streams may have letter indicators to the right of their identifying number. The letter
Mshows that the associated stream contains macro code. We’ll talk more about some other indicators in the next step.
Usually the streams with the biggest size are what contain the important code, so identify them and extract their code by replacing ‘StreamCode’ below with the stream’s identifying number:
python oledump.py -s StreamCode -v /tmp/NAMEofTHEFile.doc
NOTE: when analysing the streams, take into account any letter indicators you see next to their number:
M: this stream contains actual Macro code. The
-vflag in the above command will decompress the macro code.
m: this stream contains attribute declaration when decompressed, and there’s no actual code from that stream.
E: this stream produces an error when you attempt to decompress it. It might contain corrupted code.
Use the plugin
plugin_http_heuristics.pyto try to extract URLs from any malicious, obfuscated VBA macros:
python oledump.py -p plugin_http_heuristics.py /tmp/NAMEofTHEFile.doc
If the malicious content in MS Office document is not yet identified, proceed with running yara rules. Run various rules against the document to identify the suspicious object (refer to example shown in table below):
ls -l | grep yara (list all the yara rules available) yara -w -s [yara rule name] /tmp/NAMEofTHEFile.doc or oledump.py -y [yara rule name] /tmp/NAMEofTHEFile.doc
Suspicious object Yara rule VBA macro vba.yara Executable file embedded in OLE objects contains_pe_file.yara Detect a VBE file inside a byte sequence contains_vbe_file.yara Find shellcode embedded in documents maldoc.yara
NOTE: a detailed step by step case can be found here.
At this stage, the handler should be able to determine if the file includes malicious indicators or not. They should get back to the client with this initial result. The case should be closed if the file shows indication of compromise. If the file is malicious, the client should be informed and advised to not execute the file. If the file was executed, the device should be disconnected from the network. Another case should be created to clean any artifacts that the execution of the file may have created. Live or cold forensic techniques could be used here to determine the artifacts (see Articles #367 & #368). In most cases a factory reset is the best option. In the meantime, more analysis can be done on the code in order to deobfuscate it if necessary and if time and skills allow, as deobfuscation and code analysis are not always straightforward.
- Remember to share your findings in MISP by creating an event with your findings (Article #355).
Malicious sample files and analysis
- Ticket #28016: encrypted MS file pretending to be a job applicant’s resume.
- Contagio Malware dump
Further Investigation Techniques
If you have not found evidence of malicious content using the above steps, you may decide it is necessary to try some of the following techniques to determine whether the file is malicious.
Executable file found in the suspicious file
Embedded Flash program (SWF objects)
Embedding a Flash program inside an Office document provides attackers yet another way to run malicious code on the victim’s system. In this case, the code within the Flash object runs as soon as the victim opens the document without any warnings and without relying on exploits. This code is still subject to security restrictions imposed by Flash Player, so to perform escalated actions the code would need to exploit a vulnerability in Flash Player.
Search for embedded Flash objects in Office documents using the tool hachoir-subfile:
Extract the embedded Flash object by using a hex editor or xxxswf.py tool (already installed on REMnux):
xxxswf.py -xd /tmp/NAMEofTHEFile.doc
Manually analyze the extracted Flash file. Examine its strings with
strings, and locate embedded URLs by using
grep -E '(http|https)://[^/"]+' /tmp/ExtractedFlash.swf
cat /tmp/ExtractedFlash.swf | grep -Eo "(http|https)://[a-zA-Z0-9./?=_%:-]*" | sort -u
catcommands are not guaranteed to catch all embedded URLs, so the
stringsmethod should always be used to make sure nothing is missed.
Check if the file has ScriptBridge ActiveX control by using
If the result returns
Embedded payload of a Microsoft Office exploit
Another way to execute malicious code as part of an Office document involves exploiting vulnerabilities in a Microsoft Office application. The exploit is designed to trick the targeted application into executing the attacker’s payload, which is usually concealed within the Office document as shellcode.
More information in this link.
Double check results using a different tool
Double check results by using the OfficeMalScanner tool.
Scan for VB-Macro Scripts on the MS file.
OfficeMalScanner.exe NAMEofTHEFile.doc info
Look for malicious signatures and PE header.
OfficeMalScanner.exe NAMEofTHEFile.doc scan
scanargument will provide a Malicious Index as measurement on how malicious the file is. Anything above 10 is considered dangerous. If the number is between 10 and 20, it means a code signature has been found inside. If it is above 20, then it means a whole executable is probably embedded within.
If there’s any encrypted content, the
bruteargument can be used to try different decoders.
OfficeMalScanner.exe NAMEofTHEFile.doc scan brute
To locate the hidden payload, view the complete code (assembly language). If it is hard to understand the flow of the code, we can proceed to the next steps.
OfficeMalScanner.exe NAMEofTHEFile.doc scan debug
Alternatively, you may use DisView.exe, which comes with OfficeMalScanner.exe, to check the complete malicious assembly code.
DisView.exe NAMEofTHEFile.doc [offset of the code as given by results of OfficeMalScanner.exe]
Extract the binary file from the malicious file.
Malhost-Setup.exe test.xls malicious_binary [offset of the code as given by results of OfficeMalScanner.exe]
We can now use Virustotal, etc. to check the extracted “malicious_binary” file.