How to analyze MS Office Files to find signs of malicious behavior.

Edit me

MS Office Files Static Analysis

Problem

MS Office is frequently used in phishing campaigns and is considered the principal weapon for major bot infection waves, including recent examples Emotet and Trickbot. The helpline receives many samples throughout the year, and analyzing at least their static components seems a necessity in order to deliver a quick response to our clients about whether the file shows malicious signs or not. The handler should not wait until clear evidence is available to consider the file malicious. As soon as malicious signs are seen following the instructions below, the beneficiary should be informed so that they know not to use the file.

Solution

Delivery of the suspicious document

Request that the client send the suspicious document together with a description or details on how they received it (via IM, email, flash drive, etc.). - Email: ask the client to send the .eml file. You can find instructions on how to obtain it here.

NOTE: To make sure the file is successfully delivered and not flagged, we should recommend that the client compress the file in a .zip with a passcode.

Prepare the testing machine

NOTE: [Here you should include a reference to the virtual machine or lab/test machine used to safely perform this test.]

Open in a safe virtual machine or isolated computer built for that purpose.
Check that your test machine has the following tools:
- Munpack
- Exiftool
- Didier Stevens Suite
- Olevba
- msoffice-crypt
- OfficeMalScanner
- Hachoir-subfile
- xxxswfpy

NOTE: most of the oletools that we will use can be found on the Didier Stevens Suite.

Another option is to use the distro REMnux, created by Lenny Zeltser. It is very useful and includes many of the free document analysis tools mentioned in this article.

Acquire the file

If the file was sent by email, extract the attachments from the .eml file using munpack and save them in tmp by running the following command in the terminal:
```
 munpack /tmp/NameEMLfile.eml
 ls /tmp
```
Check the results and determine the type of document (Word, Excel, etc.), filename extension (doc, docx, docm, etc.), characteristics (is it encrypted?, etc.), the file properties (using native Linux file command), and the file metadata (using Exiftool).
Get the hashes (sha256sum) of the documents obtained and check them against known Threat Sharing platforms such as VirusTotal or CiviCERT’s Cuckoo Sandbox
If the document is encrypted:
a. Check if the password was included in the original message or email. Password secure files most likely do not allow sandboxing by antivirus software, but in most cases, the password will be sent along with the phishing email/message.

b. Determine the type of encryption for documentation using this command:
```
 python oledump.py -p plugin_office_crypto.py /tmp/NAMEofTHEFile.doc
```
c. Decrypt the file using decryption command with msoffice-crypt:
```
 msoffice-crypt -d -p PASSWORD /tmp/NAMEofTHEFile.doc /tmp/NAMEofTHEFile.doc.dec
```
d. Get the hashes (sha256sum) of the decrypted file and check them against known Threat Sharing platforms such as VirusTotal or CiviCERT’s Cuckoo Sandbox
Analyze the decrypted file by following the steps in the Analysis section below.

Perform the static malware analysis

There are 2 common method of payload delivery using MS Office documents:

Payload download is frequently used because of its flexibility. Malware authors can change the payload on their server while it is being distributed.
Embedded payload is a more complex task, as cybercriminals have to ensure their malicious code goes undetected by antivirus and is able to achieve its goal.

In new file extensions, macros can be saved and enabled in .docm, .xlsm, .pptm rather than in .docx, .xslx, .pptx. The macros are then contained in a binary file named vbaProject.bin, with two supporting files: vbaProject.bin.rels and vbaData.xml. This provides us another indicator for malicious MS Office documents, as malware authors will often use the older file formats like .doc, .xls, .ppt.

Check the file’s content (if the file is encrypted, you must decrypt it before analysis) and double check for any malicious element using the yara rules as outlined below. If you find a malicious element, analyze it.

VBA macros

Double check if the file has VBA macros and/or is obfuscated. With Olevba, it is possible to scan the file, detect VBA Macros within the file and show the macro source code with VBA strings deobfuscated:
```
 olevba.py NAMEofTHEFile.doc --reveal
```
Be aware of macro sheets flagged as “Very Hidden” or “Hidden”. This means that the sheet is not readily accessible via the Microsoft Excel User Interface (UI) but upon opening the file, it displays a message asking users to click the ‘Enable editing’ button, then the ‘Enable content’ button. Users who click these unwittingly enable the macro. This is an obfuscation technique that uses the formulas that are set to run upon opening the document. More information about this technique here.
Search for interesting sequences (IOCs) within the file such as URLs, IP addresses, executable filenames, etc. and determine if the information found is malicious. You can use the CIRCL URL Abuse testing form to get an idea of whether a URL is malicious (more information and techniques in Article #140).
Get the streams in the file that contains VBA code and their size:
```
 python oledump.py /tmp/NAMEofTHEFile.doc
```
This command will display a numbered list of streams. In the left column, you’ll see each stream’s identifying number, which you’ll use to select a stream in the next steps. The larger number directly to the left of the stream name is the stream’s size.

Some streams may have letter indicators to the right of their identifying number. The letter M shows that the associated stream contains macro code. We’ll talk more about some other indicators in the next step.
Usually the streams with the biggest size are what contain the important code, so identify them and extract their code by replacing ‘StreamCode’ below with the stream’s identifying number:
```
 python oledump.py -s StreamCode -v /tmp/NAMEofTHEFile.doc
```
NOTE: when analysing the streams, take into account any letter indicators you see next to their number:
- M: this stream contains actual Macro code. The -v flag in the above command will decompress the macro code.
- m: this stream contains attribute declaration when decompressed, and there’s no actual code from that stream.
- E: this stream produces an error when you attempt to decompress it. It might contain corrupted code.
Use the plugin plugin_http_heuristics.py to try to extract URLs from any malicious, obfuscated VBA macros:
```
 python oledump.py -p plugin_http_heuristics.py /tmp/NAMEofTHEFile.doc
```

If the malicious content in MS Office document is not yet identified, proceed with running yara rules. Run various rules against the document to identify the suspicious object (refer to example shown in table below):

 ls -l | grep yara     (list all the yara rules available)
 yara -w -s [yara rule name] /tmp/NAMEofTHEFile.doc or oledump.py -y [yara rule name] /tmp/NAMEofTHEFile.doc

For example:

Suspicious object	Yara rule
VBA macro	vba.yara
Executable file embedded in OLE objects	contains_pe_file.yara
Detect a VBE file inside a byte sequence	contains_vbe_file.yara
Find shellcode embedded in documents	maldoc.yara

NOTE: a detailed step by step case can be found here.

At this stage, the handler should be able to determine if the file includes malicious indicators or not. They should get back to the client with this initial result. The case should be closed if the file shows indication of compromise. If the file is malicious, the client should be informed and advised to not execute the file. If the file was executed, the device should be disconnected from the network. Another case should be created to clean any artifacts that the execution of the file may have created. Live or cold forensic techniques could be used here to determine the artifacts (see Articles #367 & #368). In most cases a factory reset is the best option. In the meantime, more analysis can be done on the code in order to deobfuscate it if necessary and if time and skills allow, as deobfuscation and code analysis are not always straightforward.

Reporting

Remember to share your findings in MISP by creating an event with your findings (Article #355).

Malicious sample files and analysis

Ticket #28016: encrypted MS file pretending to be a job applicant’s resume.
Contagio Malware dump

Further Investigation Techniques

If you have not found evidence of malicious content using the above steps, you may decide it is necessary to try some of the following techniques to determine whether the file is malicious.

Executable file found in the suspicious file

Embedded Flash program (SWF objects)

Embedding a Flash program inside an Office document provides attackers yet another way to run malicious code on the victim’s system. In this case, the code within the Flash object runs as soon as the victim opens the document without any warnings and without relying on exploits. This code is still subject to security restrictions imposed by Flash Player, so to perform escalated actions the code would need to exploit a vulnerability in Flash Player.

Search for embedded Flash objects in Office documents using the tool hachoir-subfile:
```
 hachoir-subfile /tmp/NAMEofTHEFile.doc
```
Extract the embedded Flash object by using a hex editor or xxxswf.py tool (already installed on REMnux):
```
 xxxswf.py -xd /tmp/NAMEofTHEFile.doc
```
Manually analyze the extracted Flash file. Examine its strings with strings, and locate embedded URLs by using grep and/or cat.
```
 strings /tmp/ExtractedFlashFile.swf
```
```
 grep -E '(http|https)://[^/"]+' /tmp/ExtractedFlash.swf
```
```
 cat /tmp/ExtractedFlash.swf | grep -Eo "(http|https)://[a-zA-Z0-9./?=_%:-]*" | sort -u
```
NOTE: the grep and cat commands are not guaranteed to catch all embedded URLs, so the strings method should always be used to make sure nothing is missed.

NOTE: Be aware that an SWF file may be stored on an external website but triggered by an infected file with embedded JavaScript, like this case, a Microsoft Office document with embedded JavaScript that retrieves the malicious SWF object from a remote URL.

Embedded JavaScript

Another way to automatically execute code when the victim opens a Microsoft Office document involves embedding ScriptBridge ActiveX control in the file. This control allows the attacker to embed and execute JavaScript.

Check if the file has ScriptBridge ActiveX control by using strings command:
```
 strings /tmp/NAMEofTHEFile.doc
```
If the result returns CONTROL ScriptBridge you can check if it has embedded JavaScript by using this command:
```
 strings --encoding={s,S,b,l,B,L} /tmp/NAMEofTHEFile.doc | grep -i JavaScript
```

Embedded payload of a Microsoft Office exploit

Another way to execute malicious code as part of an Office document involves exploiting vulnerabilities in a Microsoft Office application. The exploit is designed to trick the targeted application into executing the attacker’s payload, which is usually concealed within the Office document as shellcode.

Check for known vulnerabilities here and available Metasplot modules.

Double check results using a different tool

Double check results by using the OfficeMalScanner tool.

Scan for VB-Macro Scripts on the MS file.

 OfficeMalScanner.exe NAMEofTHEFile.doc info

Look for malicious signatures and PE header.
```
 OfficeMalScanner.exe  NAMEofTHEFile.doc scan
```
NOTE: The scan argument will provide a Malicious Index as measurement on how malicious the file is. Anything above 10 is considered dangerous. If the number is between 10 and 20, it means a code signature has been found inside. If it is above 20, then it means a whole executable is probably embedded within.
If there’s any encrypted content, the brute argument can be used to try different decoders.
```
 OfficeMalScanner.exe NAMEofTHEFile.doc scan brute
```
To locate the hidden payload, view the complete code (assembly language). If it is hard to understand the flow of the code, we can proceed to the next steps.
```
 OfficeMalScanner.exe  NAMEofTHEFile.doc scan debug
```
Alternatively, you may use DisView.exe, which comes with OfficeMalScanner.exe, to check the complete malicious assembly code.
```
 DisView.exe NAMEofTHEFile.doc [offset of the code as given by results of OfficeMalScanner.exe]
```

Extract the binary file from the malicious file.

 Malhost-Setup.exe test.xls malicious_binary [offset of the code as given by results of OfficeMalScanner.exe]

We can now use Virustotal, etc. to check the extracted “malicious_binary” file.

Comments

Useful resources

Tags:

MS Office Files static analysis

MS Office Files Static Analysis