[SOLVED] CS6262 Project 3- Malware Analysis Solution

100.00 $

Category:

Description

5/5 - (6 votes)

Project 3 is on malware analysis. You’ll be learning about and manipulating malware on the Windows and Android platforms. Read the write-up below to get started. It’s very long but also very comprehensive, walking you through the beginning of the project and including helpful posts from previous semesters on Piazza.

 Resources

Write-up

CS6262_P3_WriteupLinks to an external site.

Android Writeup

Actions

Questionnaire

assignment-questionnaire.txtDownload assignment-questionnaire.txt

FAQ

FAQ.md Download FAQ.md

VM

The project VM can be downloaded from:

https://www.dropbox.com/s/dnk6acztw9ewp83/Project%203.zip?dl=0Links to an external site.

The password to the archive is cs6262.

Submission:

There is an autograder on Gradescope to help verify your answers as you work. To use it, submit your assignment-questionnaire.txt file to both the Windows and Android assignments. However, this will not count for your final grade.

To receive full credit, you must submit

1) assignment-questionnaire.txt to Gradescope,

 

2) report.zip to Canvas.

If you did not submit report.zip on time, we will contact you, and a 5-point deduction will be applied to your total score.

Have fun, and start early!

Project 3: Malware Analysis

CS 6262

 

Sections:

  1. Window Malware Analysis
  2. Linux Malware Analysis
  3. Android Malware Analysis
  4. Tips for assignment-questionnaire.txt
  5. Miscellaneous VM Performance Tips
  6. Submission

Windows Malware Analysis

Scenario:

You got a malware sample from the wild! Your task is to discover what the malware does by analyzing it.

 

How do you discover the malware’s behaviors? There are multiple ways of analyzing it but we’ll be focusing on two ways: Static Analysis and Dynamic Analysis.

 

Static Analysis:

  • Manual Reverse Engineering
  • Programming binary analysis

 

Dynamic Analysis:

  • Network behavioral tracing
  • Runtime system behavioral tracing (File/Process/Thread/Registry)
  • Symbolic Execution
  • Fuzzing

 

In our scenario, you are going to analyze the given malware with tools that we provide. These tools help you to analyze the malware with static and dynamic analysis.

Objective:

  1. Find which server controls the malware (the command and control (C2) server)
  2. Discover how the malware communicates with the command and control (C2) server
  3. URL and Payload
  4. Discover what activities are done by the malware (Attack activities)

Requirement:

  1. Make sure that no malware traffic goes out from virtual machine
  2. The command and control server is dead, so YOU need to reconstruct it
  3. Use tools to reconstruct the server and then reveal hidden behaviors of the malware
  4. Analyze network traffic on the host and figure out the list of available commands for the malware
  5. Analyze network traffic and program trace of the host, and figure out what malware does 5. Write down your answer into assignment-questionnaire.txt

 

 

Project Structure:

  • Make sure to install/update to the latest version of VirtualBox

https://www.virtualbox.org/wiki/Downloads ● Download the Virtual Machine (VM)

https://www.dropbox.com/s/dnk6acztw9ewp83/Project%203.zip?dl=0

○ Unarchive the file with 7zip and password is cs6262 ● Network Configurations:

○ tap0:

■ Virtual network interface for Windows XP • IP Address: 192.168.133.101 ○ br0

■ A network bridge between Windows XP and Ubuntu

  • IP Address: 192.168.133.1

○ enp0s3

■ A network that faces the Internet

  • IPAddress:10.0.2.15 (it varies with your VirtualBox settings)

 

 

 

  • Open VirtualBox

○ Go to File → Import Appliance

○ Select the ova file and import it

○ For detailed information on how to import the VM, see:

 

○ Before starting, it might be useful to configure the settings, allocate more base memory, processors etc. to your VM, as per your device configurations for better performance. ● VM user credentials

○ Username: analysis

○ Password: analysis

 

 

NOTE: VM Setup

 

  • For M Series Mac Users:
    • Please install the latest version of UTM (https://mac.getutm.app/) and follow the instructions in the link to import and set up the VM.
  • Warning:
    • Due to different CPU architecture, running those windows programs based on X86 are very slow on a M Series Mac. It’s your own choice to get a X86 machine by yourself or keep using the M Series Mac. If you really have difficulty finding a X86 machine, please contact TAs on Ed Discussion ASAP, we can provide alternative solutions such as cloud VMs to you.

 

  • In the Virtual Machine:
    • Files

■ init.py

  • This initializes the project environment

○ Type your Georgia Tech username (your Canvas Login

Name) after running this •

$./init.py

update.sh

● This script updates the VM if any further update has been made by TAs ● Note:

○ Please run this script when you start the project! (If it says that you’re already updated when you run it, that’s fine)

○ If you have already completed stage 1 before running update.sh, you do NOT need to redo stage 1 – but you will need to run update.sh to complete stage 2

archive.sh

● This will archive the answer sheet for submission (create a zip file)

Directories:

■ vm

●     A directory that stores the Windows XP virtual machine (runs with

QEMU)

●     We use the given VM for both Cuckoo and a testbed.

shared

● A shared directory between the Ubuntu host and Windows guest

(XP is running on a VM within your project VM). You can copy/move files to or from this directory.

report

● The answer sheet for project questionnaire

setup

● Required files for setting up the machine. You don’t need to modify, nor use the files in this directory.

Tools
network

○ Configure your network firewall rules (iptables) by editing iptables-rules.

○ You can allow/disallow/redirect the traffic from the malware

○ ‘./reset’ command in this directory will apply the changes

cfg-generation (CFG stands for Control-Flow Graph)

○ An analysis tool that helps you to find interesting functions of malicious activity

○ You need to edit score.h to generate the control-flow graph  ○ Use xdot to open the generated CFG.

Sym-exec

○ A symbolic executor (based on angr :

https://github.com/angr)

■ Helps you to figure out the commands that malware expects

○ Use cfg-generation tool to figure out the address of the function of interests

c2-command

○ A simplified tool for C2 server reconstruction

○ You can write down command in the *.txt file as a line

○ It will randomly choose command at a time to send to the malware

○ Malware:

stage1.exe – stage 1 malware

● It will download the stage 2 malware if this malware receives the correct command

stage2.exe – stage 2 malware

● It will download the stage 3 malware if this malware receives the correct command

payload.exe – the linux malware attack payload

●     Analyze the dynamic instruction trace

●     Write a script to detect where the C&C communication happens –

Find the loop entry point and function sequence in the loop

●     Add constraint to symbolic execution to limit the loop to one

  • Find the feasible attacks within a given set of possible attacks.

Tutorials:

● stage1.exe malware

○ Update the project 3 before begin

■ Open the terminal (Ctrl-Alt-T, or choose terminal from the menu)

■ Run ./update.sh

  • It will update any necessary files that are required for this project.

 

 

 

 

○ Initializing the project

■ Open the terminal (Ctrl-Alt-T, or choose terminal from the menu)

■ Run ./init.py

  • Type your Georgia Tech username (the login name used for Canvas)
  • This will download the stage1 malware (stage1.exe) into the ~/shared directory

 

 

Note:

■ These are malware samples hosted under the Georgia Tech Network

  • It is likely that security measures would kick in and encrypt these files
    • That is all the malware samples you will be downloading during this project

■ IMPORTANT

  • After each download, make sure to check the type of file
  • In the linux VM, execute

$ file <path-to-exe>

  • If the result of that is an archive of some sort then execute:

unzip <path-to-exe>

  • Password: infected
  • For stage1 and stage2, the file format should be

 

 

 

  • For stage3, the file format should be

 

● Secure Experiment Environment

○ We need a secure experiment environment to execute the malware ○ Why?

■ Insecure analysis environment could damage your system ■ You may not want:

  • Encrypting your file during a ransomware analysis
  • Infecting machines in your corporate network during a worm analysis
  • Creating a tons of infected bot client in your network during a bot/trojan analysis ○ The solution:

■ Contain malware in a virtual environment

  • Virtual Machine ● Virtual Network

○ Conservative rules(allow network traffic only if it is secure) ○ We provide a Win XP VM as a testbed! ● Run Win XP VM

○ Run Windows XP Virtual Machine with virt-manager

○ Open a terminal

○ Type “virt-manager” and double click “winxpsp3”

○ Click the icon with the two monitors and click on “basecamp”

 

 

○ Right click on basecamp, and click “Start snapshot.” Click Yes if prompted.

○ Once, virt-manager successfully calls the snapshot, click Show the graphical console.

■ Click on the Windows Start Menu and Turn off Computer. ■ Then select Restart

 

 

○ DO NOT MODIFY OR DELETE THE GIVEN SNAPSHOTS!

■ The given snapshots are your backups for your analysis.

■ If something bad happens on your testbed, always revert back to the basecamp snapshot. ● Copy from Shared Directory

○ Go to the shared directory by clicking its icon (in Windows XP)

■ Copy stage1.exe into Desktop

■ If you execute it in the shared directory, the error message will pop up. Please copy the file to Desktop.

 

● Run the malware

○ Now we will run the malware

■ Execute stage1.exe (double click the icon)

■ It will say “Executing Stage 1 Malware”. Then, click OK.

  • You should click OK on each dialog to dismiss it

○ Otherwise, malware execution will be blocked

○ If you want to halt the malware that is running…

■ Execute stop_malware in the temp directory.

  • This will stop the currently running malware.
  • Please halt first before you execute another malware file.
  • Network Behavioral Analysis……..

○ To analyze network behaviors, you need

■ Wireshark (https://www.wireshark.org/)

■ Capturing & Recording inbound/outbound network packets

● Observing Network Behavior

○ By capturing and recording network packets through the tools

■ Reveal C&C protocol

■ Attack Source & Destination ○ But, malware will not do anything. Why?

■ The C2 server is dead!

■ Therefore, the malware (C2 client) will never unfold its behaviors.

■ Question?

  • If we know C&C dialog of malware, can we build a fake C2 server in order to unfold the malware behaviors?
  • Answer: Hack Yeah! That is your job for this project! Wireshark

○ Let’s check it through network monitoring ■ Everything has been already installed.

■ Open Wireshark, capture the traffic for the network bridge

(Make sure to run with root privileges)

■ IP address = 192.168.133.1

■ Reference: https://www.wireshark.org/docs/

■ Get yourself familiarized with Linux commands and how to employ Wireshark.

■ Other references:

○ From WireShark, we can notice that the malware tries to connect to the host at 128.61.240.66, but it fails

○ Let’s make it redirect to our fake C2 server

■ Go to ~/tools/network

■ Edit iptables_rules to redirect the traffic to 128.61.240.66 to

192.168.133.1 (fake host)

○ Whenever you edit iptables_rules, always run reset.

■ (type “./reset” from the ~/tools/network directory)

IMPORTANT! If you shut down your project VM, be sure to run reset again the next time you start it up.

 

 

 

 

 

 

● Reading C2 Traffic

○ Observing C2 traffic

■ In WireShark, we can notice that now the malware can communicate with our fake C2 server

  • But there will not be further execution, because the command is wrong

■ You can see the contents of the traffic by right-clicking on the line, then clicking Follow – TCP Stream

 

● Cuckoo

○ Let’s take a look at cuckoo. Cuckoo is NOT necessarily required to complete this project, but it is a useful tool to help you understand what your malware is doing, and therefore how you might want to modify your score.h file later in the project.

Note! You can’t run the testbed VM and cuckoo simultaneously.

Always turn off the testbed VM, and follow the steps below to execute Cuckoo

 

○ Open two terminals.

○ ‘$workon cuckoo’ (Set virtualenv as cuckoo for both terminal1 and terminal2)

○ Open one terminal in debug mode, with command: ‘$cuckoo -d’

○ Open other cuckoo terminal for the webserver, with command: ‘$cuckoo web’

 

○ Reference: Malware Analysis using Cuckoo Sandbox

○ If you get an error when running cuckoo web because port 8000 is already in

use, run “sudo fuser -k 8000/tcp” and try again.

 

○ The Cuckoo uses a

snapshot of the given testbed VM. ○ The snapshot is 1501466914

○ • DO NOT TOUCH the

snapshot!

 

 

 

 

 

 

 

 

 

 

● Upload a file to Cuckoo

○ To open the cuckoo web server, type the following URL into Chromium  ■ http://localhost:8000

○ To upload a file, click the red box and choose a file.

 

○ Once you click the Analyze button, it will take some time to run the malware.

 

 

 

 

 

 

 

 

  • Analysis with Cuckoo

○ Once you click the Analyze button, it will take some time to run the malware.

 

● Figuring Out the List of Commands

○ The malware does not exhibit its behavior because we did not send the correct command through our fake C2 server ○ We will use

■ File/Registry/Process tracing analysis to guess the malware behavior.

■ control-flow graph (CFG) analysis and symbolic execution to figure out the list of the correct commands

○ The purpose of tracing analysis is to draw a big picture of the malware ■ What kinds of System call/API does the malware use?

■ Does the malware create/read/write a file? How about a registry?

○ The purpose of CFG analysis is to find the exact logic that involves the interpretation of the command and the execution of malicious behavior

○ Then, symbolic execution finds the command that drives the malware into that execution path

 

● Tracing Analysis on Cuckoo

○ On the side bar, there are useful menus for tracing analysis. ■ We are focusing on:

  • Static Analysis ○ API/System Call.
  • Behavioral Analysis

○ Trace behaviors in time sequence. ● Static Analysis on Cuckoo

○ Static Analysis

■ Information about the malware. ■ Win32 PE format information

  • Windows binary uses the PE format
  • Complicated structure ● Sections includes

○ .text

○ Strings, etc.

○ .data

○ .idata

○ .reloc

○ More information: Malware researcher’s handbook (demystifying PE file)  ○ Interestingly three DLL(Dynamic Link Libraries) files are imported.

○ In WININET.dll, we can see that the malware uses http protocol.

○ In ADVAPI32.dll, we can check if the malware touches registry files ○ In Kernel32.dll, we can check the malware waiting signal, also sleep.

 

● Behavior Analysis on Cuckoo

○ Tracing a behavior(file/process/thread/registry/network) in time sequence.

○ Useful to figure out cause-and-effect in process/file/network.

○ Malware creates a new file and runs the process, then writes it to memory.

 

● Cuckoo analysis result

○ Based on our analysis with Cuckoo, we can determine if… ■ The malware uses HTTP protocol to communicate ● Communicate with whom? C&C?

  • Web server access? For checking if the C2 server is active?
  • Commands through http protocol? Cookies?

■ The malware touches(create/write/read) a file/registry/process

  • This might be a dropper? Or does it download a binary from the C2 server?
  • What is the purpose of creating processes?

Modifying the registry?

 

● Control Flow Graph Analysis

○ Based on the pre-information that we collected from the previous step, we are

going to perform CFG analysis & symbolic execution analysis

○ CFG:

■ graph representation of computation and control flow in the program

■ Nodes are basic blocks

■ Edges represent possible flow of control from the end of one block to the beginning of the other.

 

 

 

 

 

 

○ But, in malware analysis, we are analyzing CFG at the instruction level.

○ We provide a tool for you that helps to find command interpretation logic and malicious logic

■ We list the functions of system calls the malware uses internally

■ If you provide the score (how malicious it is, or how likely the malicious logic is to use such a function) for the functions, then the tool will find where the malicious logic is, based on its score

  • Example: if you set StrCmpNIA to have a score of 10, then the function that calls StrCmpNIA 5 times within itself will have the score 50.
  • A higher score implies that more functions related to the malicious activity are used within the malware.

■ Your job is to write the score value per each function

 

 

○ More info: http://www.cs.cornell.edu/courses/cs412/2008sp/lectures/lec24.pdf

○ From our network analysis, we know that the malware uses an Internet connection to 128.61.240.66

○ From our cuckoo-based analysis, we know that the malware uses the HTTP protocol.

○ Moreover, it uses some particular functions to communicate and stay in touch with the command and control server.

○ Modify the score values for these particular functions in order to generate a better CFG – for proper analysis.

○ Find the file to be edited – score.h.

○ Path: /tools/cfg-generation/score.h ○ Build control flow graph

■ By executing ./generate.py stage1, the tool gives you the CFG ● This finds the function with higher score

○ Implies that this calls high score functions on its execution ■ For stage2

  • Use ’stage2’ as argument

Note: your graph and its memory addresses will vary from this example ○ The function entry is at the address of 405190

■ And, there is a function (marked as sub) of score 12

  • At the address 40525a (marked in red)
  • Use the block_address, not the call sub_address

■ This implies that

  • sub_4050c0 calls some internet related functions. ● We need to find out what this command is

○ Run from 405190 to 40525a

● Finding Command

○ Finding Commands with Symbolic Execution

■ We want to find a command that drives malware from 405190 to 40525a

  • Let’s do symbolic execution to figure that out ○ What is symbolic execution?

■ Rather than executing the program with some input, symbolic execution treats the input data as a symbolic variable, then tries to calculate expressions for the input along the execution.

■ Path explosion

■ Modeling statements and environments

■ Constraint solving

○ Symbolic Execution Engine: Klee, Angr, Mayhem, etc. • Loading a binary into the analysis program

○ • Translating a binary into an intermediate representation (IR). • Translating that IR into a semantic representation

○ • Performing the actual analysis with symbolic execution.

 

 

 

 

 

 

○ In this example, ONLY i=2, j=9 conditions will lead the program to print “Correct!”

○ Symbolic execution is available to solve the expression in order to reach a target, in this case ”Correct”.

○ Let’s apply it into Malware Command & Control logic. A C&C bot(malware) is expecting inputs(solve the expressions) to trigger behaviors(targets).

 

 

 

○ In this example, ONLY ‘launch-attack’ and ‘remove’ commands(inputs) triggers attack() and destroy_itself().

○ Symbolic execution is able to find ”launch-attack” as an input to trigger attack(), which is a malicious behavior.

○ Plus, ”remove” will lead to destroy_itself(), which is another behavior.

○ Our job in this project with Symbolic execution is to find inputs, and then feed the inputs to trigger behaviors.

 

● Finding Commands with Angr

○  We prepared a symbolic executor and a solver for you

■ Your job is to find the starting point of the function which interprets the command, and find the end point where malware actually executes some function that does malicious operations

  • Use a Control-flow Graph (CFG) analysis tool!

■ The symbolic executor is called angr (http://angr.io/index.html) ○ We prepared a symbolic executor and a solver for you.

○ How do you run it?

■ Go to ~/tools/sym-exec

■ Run it like

python ./sym_exec.py [program_path] [start_address] [end_address]

○ Replace the (above) start and end addresses from your CFG graph.

python ./sym_exec.py ~/shared/stage1.exe 4050c0 40518a

○ The command will be printed at the end (if found)

 

 

● Reconstructing C2 server

○ After CFG analysis + symbolic execution, reconstruct the C2 server

○ The tool for reconstructing the C2 server is already on the VM

○ It runs nginx and php script

■ This will look like ~/tools/c2-command/stage*-command.txt

■ Your job is to add your commands to the relevant *.txt file

  • The command that leads the execution from 405190 to 40525a is

“$insert” (note: the name of the command you see may vary) ● Then, type ”$insert” and save the file.

  • Important: be sure to put the ‘$’ character before your commands, even if stage* – command.txt says that it’s optional
  • The order of commands in the file does not matter – they’ll run in a random order

Note: This means that if you want to run only a particular command, you’ll need to remove, or comment out the other commands in your file

 

 

● angr

○ SimState

■ angr – SimState

■ While angr perform symbolic execution, it stores the current state of the program in the SimState objects.

■ SimState is a structure that contains the program’s memory, register and other information.

■ SimState provides interaction with memory and registers. For example, state.regs offers read, write accesses with the name of each registers such as state.regs.eip, state.regs.rbx, state.regs.ebx, state.regs.ebh ■ Creating an empty 64 bit SimState

 

 

○ Bitvectors

■ Since, we are dealing with binary files, we don’t deal with regular integers.

■ In binary program, everything becomes bits and sequence of bits.

■ A bitvector is a sequence of bits used to perform integer arithmetic for symbolic execution.

■ Creating some 32 bit bitvector values

■ state.solver.BVV(4,32) will create 32 bit length bitvector with value 4

■ We can perform arithmetic operations or comparisons using the bitvectors

 

○ Symbolic Bitvectors

■ state.solver.BVS(’x’, 32) will create a symbolic variable named x with 32 bit length

■ Angr allows us to perform arithmetic operation or comparisons using them.

 

○ Registers

State provides access to the registers through state.regs.register_name where register_name could be rcx, ecx, cx, ch and cl. Same applies to the other registers.
Look at the types of registers — they are bit vectors
Look at the length of registers examined below.

● They are all symbolic bitvector because they are not initizlized yet.

For cl, ch, cx and ecx they are all part of rcx.
You can compare the length and the location of cl, ch, cx, ecx and rcx in angr with the actual architecture depicted below.

 

 

○ Constraints

■ In a CFG, a line like if ( x > 10 ) creates a branch. Please look at the Symbolic Execution Concepts tutorial.

■ Assuming x is a symbolic variable, this will create a <Bool x_5_32 > 4> when the True branch is taken for the successor state

■ For the false branch,negation of a <Boolx_5_32>4> will be created. ■ Adding a constraint to a SimState

  • Cl register equals to 11
  • add_constraints(state.regs.cl == 11)
  • add_constraints(state.regs.cl == state.solver.BVV(0xb, 8) since state.solver.BVV(0xb, 8) equals to 11
  • You can see their effect is the same for SimState in the example below.

 

○ Radare2

■ Launch radare2 with $ r2 ~/shared/payload.exe

■ Then type aaa which will analyze all (functions + bbs)

■ afl list all functions

 

afl lists all the functions which are hard to analyze.
afl~name grep the list of functions with given name
afl~attack will list all the functions having attack

 

You can use linux commands while inside the r2 console such as grep.
On the right side, you can see all the functions having the attack vector

(afl~send)

Using those api calls, this linux malware performs DDoS attacks based on the commands they receive from C&C server.
The example below shows how to find all the attack vectors calling sym.send/sym.sendto

 

Now, we have to iterate all the attack functions on the right. For example, the example below shows three attack functions, and only one of them is called. Our focus is the call sym.attack_????? functions.
Let’s analyze the example below.
axt sym.attack_app_http has only one reference which is a push instruction. This is not the attack function we are interested in.
axt sym_attack_app_cfnull has no reference at all. This is not the attack function we need to explore.
axt sym_attack_???? Is one of the functions listed on the right example, and have call sym.attack_????? Instruction. That is the function we need to explore more to determine the target address for the symbolic execution.
You need to find 2 attack functions.

 

After finding the attack function, we can determine the target address.

●     First, step into the function using s sym.attack_????.

●     Second, pdf | grep sym.send or pdf | grep sym.sendto to determine the instruction address

●     Third, s address_for_call_sym.send(to) to point to the instruction which is call sym.send or sym.sendto

●     Lastly, print 2 instructions starting with the call sym.send/sym.sendto instruction

The address of the instruction which is the successor of call sym.send(to) is the target address for the symbolic execution.

 

 

■ For more information :

○ You don’t have to use Radare2.

○ Here some of the tools you may want to use

objdump
IDA-Pro (Dissambly tool with GUI) (Free version)

https://www.hex-rays.com/products/ida/support/download_freewar e.shtml

Cutter (GUI for the radare2)

After stage1.exe

  • If you find all of the commands for stage1.exe malware, the malware will download stage2.exe by updating itself.
  • Now you’ve found the commands from running sym-exec.py
  • Add those commands to stage1-commands.txt. Remember to put $<command>.
  • Start up the windows VM again, then copy stage1.exe to the desktop. Then double click on it and continue.
  • Note if stage1 fails to download stage2, your firewall might be blocking it ○ This is actual malware so some IDS have signatures that match it.

 

  • For stage2.exe, please follow the same steps in the tutorial

○ Check its network access with Wireshark

○ Redirect network traffic to if required (if the connection fails)

○ Try to identify malicious functions by editing score.h and using the cfg-generation tool

○ Discover the list of commands using the symbolic execution tool ○ Fill the commands in ~/tools/c2-command/stage2-command.txt ○ Run it as mentioned before.

 

Linux Malware Analysis

 

  • exe will download stage3 malware, which is payload.exe. ○ This is Linux Malware.
  • We need to handle the linux malware differently unlike windows malware, and will use different tools and methods to analyze this malware

 

Linux Malware Tools

  • First copy the linux malware into a shared folder. The tools which you will use are installed inside the Linux host.
  • ~/tools/sym-exec/linux_sym_exe.py

○ for linux malware symbolic execution

○ python linux_sym_exec.py path_to_linux_mw start target

○ To make it work, you need to modify two linux_sym_exec.py functions

■ targs_len_before and opts_len_before ● ~/tools/dynamicanalysis/

○ instrace.linux.log : the dynamic instruction trace for the linux malware

○ detect_loop.py : you have to modify this file to find the loop in the given trace ○ Usage: python detect_loop.py

  • Run ‘python linux_sym_exec.py path_to_linux start target’.
  • It won’t be able to find any input because of path explosion. You need to add constraints to make symbolic execution targeted
  • Follow the steps in assignment- questionnaire.txt and find the inputs.
  • Analyze the dynamic instruction trace and locate the C&C communication

 

 

Android Malware Analysis

  • Manifest Analysis
    • Identifying suspicious components ● Static Analysis

○ Search for C&C commands and trigger conditions

○ Vet the app for any anti-analysis techniques that need to be removed.

  • Dynamic analysis
    • Leverage the information found via static analysis to trigger the malicious behavior.

 

Manifest Analysis

  • Identify suspicious components
    • Broadcast receivers registering for suspicious actions.

○ Background services

  • Narrow the scope of analysis
    • Malicious apps are repackaged in benign apps with thousands of classes.

Static Analysis

  • Search for C&C commands and trigger conditions

 

 

  • Identifying Anti-analysis techniques

 

Scenario

  • Analyzing Android Malware

○ You have received a malware sample sms.apk. ○ You need to identify communication with the C&C server ○ Identify anti-analysis techniques being used by the app.

○ Identify commands that trigger any malicious behavior.

Project Structure

  • Android emulator

○ An emulator for Android 4.4 is pre-installed ■ Run ‘run-emulator’

  • This will start the Android emulator (this takes along time, especially the first time you start it)

○ Jadx

■ Disassembles apk files into Java source code.

  • Apktool
    • Disassembles apk file into Smali.

○ Rebuilds apk files.

  • Write-up (~/Android/MaliciousMessenger/writeup.pdf)
    • Detailed guide on how to complete the Android section of the lab. ● Android App

○ ~/Android/MaliciousMessenger/tutorialApps ■ Emu-check.apk

  • A tutorial example (Shown as ‘My application’ in the emulator) ○ CoinPirate.apk

■ Another tutorial example

  • ~/Android/MaliciousMessenger/sms.apk

○ Target app to analyze to answer the questionnaire

  • READ ~/Android/MaliciousMessenger/writeup.pdf

Android Cheatsheet

Tips for assignment-questionnaire.txt

  • Please use the latest version of VirtualBox when you import the VM. Please do not modify anything related to network settings in the VM.

● Domain name

○ On the questionnaire sheet, there are entries for writing domain names. Please follow the following rules on getting answers for those questions.

○ You should write FQDN, which means, if the full domain name is canof.gtisc.gatech.edu then write canof.gtisc.gatech.edu, not just gatech.edu or gtisc.gatech.edu

○ For the others (connections check, DDoS, sending info, etc.), you should get the exact domain name that the malware uses. For example, the IP address 130.207.188.35 belongs to both coe.gatech.edu and web-plesk5.gatech.edu.

○ Because there are multiple mappings, you cannot be sure about which domain that the malware used by just using nslookup. In this case, please go through the other way of getting domain names from DNS Packets in Wireshark.  ○ All Domains should be based on Wireshark DNS packets

■ e.g., get it from a DNS query packet or redirect HTTP traffic into a local VM and examine the Host header.

○ If you get see the log in the Wireshark, You will find DNS query(Standard query) and DNS response(Standard query response)

○ In Domain Name System section, there is Query section, like below ○ Queries:

■ x.y.z: type A, class IN.

○ Answers:

■ x.y.z: type CNAME, class IN, cname a.b.c

○ You should use x.y.z

● URL

○ For all URLs, you do not have to specify the protocol (http:// or https://, etc.).

○ However, if HTTP traffic is like the following:

■ POST /a/b/c/d?asdf=1234 HTTP/1.1 Host: www.zzz.com ○ Then please write this as

www.zzz.com/a/b/c/d?asdf=1234

● Writing commands in *.txt files under c2-command directory

○ There are pre-installed PHP scripts in the VM locally that read the *.txt file for each stage,

■ These scripts send the command to the malware after reading them from the TXT files.

■ One caveat of these scripts is that they are written to send the commands in random order (i.e., if there are commands a, b, c, then the script will randomly choose one command and send it to the malware).

■ So if you want to test ONE command at a time, then please write only that command in the TXT file.

  • Ex. If you just want to run the command $uninstall, then please write only that command in stage1- command.txt.

● linux_sym_exec and detect_loop for linux malware

○ You could use free IDA-Pro, objdump or radare2 for this task to find out called attack functions, and the target addresses.

○ Look for some angr examples on the github, which adds constraints to the state.

○ For the loop detection, focus on function sequence that called repetitive ● Correct command but malware is not working?

○ Note that some commands for stage 2 are different per each student, by having 4 digit hexadecimal numbers at the end of the command.

■ Ex. a command for stage 2 is formatted like $COMMANDa1b4

■ (NOTE: three commands in stage 2 have the 4 digit hexadecimal tail.

■ All commands in stage 3 have the 4 digit hexadecimal tail on the command.

○ However, there could be a case that only gets the front part of the command like

■ $COMMAND

■ If the endpoint address of symbolic execution is not correctly set. In such a case, please set the correct end point that you can get the entire command.

● Cuckoo

○ In the VM, we provide cuckoo, which is a dynamic malware analysis framework.

■ It is very convenient and easy to use.

■ While you are running cuckoo, you might meet some warnings and errors “critical time blah blah~” and “YARA signature…. blah blah”. Please ignore them.

■ Because you are executing malware in the QEMU Windows VM, the framework needs to set a time.

  • Cuckoo will check if the malware is terminated or not.
  • However, the three malware you will meet are never going to be terminated (intentionally, modified by me for educational purposes.)
  • So, please ignore “critical time blah blah~, terminating.

■ In our case, the malware is never going to unfold even though you give an infinite time to be executing the malware unless you feed the right inputs

(The malware expects C2 commands.) ○ IPtable Setting

■ If you check /home/analysis/.cuckoo/conf/kvm.conf, you will find how we set the QEMU windows host VM.

■ You will find the IP of the host VM is “192.168.133.101”.

■ If you want to see network behaviors in Cuckoo, you want to forward the IP in /home/analysis/tools/network/iptables- rules.

■ For example, open iptables-rules, you want to add

sudo iptables -t nat -A PREROUTING -p tcp -s 192.168.133.101 -d

[DEST-IP] –dport 80 -j DNAT –to 192.168.133.1:80

 

Miscellaneous VM Performance Tips

Part 1 : Windows Malware / Generic VM Issues

  • Try lowering your screen resolution ● Save often!
  • Avoid using a resource heavy IDE like IntelliJ, Eclipse etc. Lightweight alternatives include gedit, vim, emacs, Sublime Text, Visual Studio Code, nano, etc ● Most importantly, do / run only 1 task at a time. That means:

○ Run the Windows VM only when:

■ Sending commands to malware

■ Analyzing network traffic via Wireshark

■ Once done with those tasks, turn off the Windows VM.

○ Avoid running the windows VM when:

■ Running cuckoo analysis

■ Generating CFGs

■ Running Symbolic Execution – This is quite resource intensive, avoid doing other stuff to get this done quickly. (TIP: If this seems to be taking infinite memory/time, you’re mostly trying to reach an unreachable / invalid address! check your addresses!)

○ Try running the VM at a lower resolution (recommend at-least 1280×800, for legibility) – If you have a very high resolution on your host machine. You can do this in 2 ways:

■ VirtualBox Menu – View > Virtual Screen 1 > Resize to a x b

■ Ubuntu Menu – Type “Displays” > Change it there

○ Restart after a task / stage. This is mostly a last resort but restarting the VM after finishing a task/stage made everything feel really smooth, instead of trying to free memory etc. Just be sure to run ./reset in ~/tools/networks after each VM restart!

 

Part 2: Android

  • Some of the above stuff applies here (VM Settings, resolution, etc).
  • Restarting after working on Part 1, helps a lot.
  • If you still really feel your android emulator is slow you can add the following flags to the emulator command flags in ~/bin/run-emulator

-memory 2048 -gpu swiftshader

  • You can experiment with RAM allocation and CPU usage based on your machine – but keep in mind that the project VM has only been tested at 4 GB and with 2 or 3 CPUs.

 

Extra Tips

  • Once you successfully complete the stage1 part, and the stage2 file is downloaded on the Windows Vm, you can move it to the shared folder, for better handling. Verify the file type as mentioned in the write-up before, and handle it in the same manner as stage1.
  • For stage2, do not forget to update the ‘iptables_rules’ files, and run ‘./reset’ after it.
  • General tips – If your device frequently lags, or takes a long time to execute, reboot your device.

○ Fewer resource allocation could result in some issues, you could try to reinstall the VM image (deleting the previously stored state), and even Virtual-box as a last resort.

  • Do NOT change the base snapshots.
  • Ensure you have set up no firewalls.
  • Some particular MAC users might be unable to unzip the project3.zip to obtain the .ova file, in which case login into DropBox as a user, instead of a guest. Verify the file properties afterwards.
  • For all users – a partial file download will result in errors. Verify once before execution.
  • Moreover, if you have a problem with your current device, (it’s too old or cannot allocate proper resources for a smoother experience), please contact us beforehand so we can arrange for an alternative, we cannot provide one in the last few days.
  • One alternative to try on your own: Amazon EC2. Set-up an OVA file on an EC2 instance, initially converting the file to a format supported by EC2, i.e. VMDK, VHD, or RAW formats. They may differ as per the instance type chosen by you. Next Step would be to upload the converted file to a S3 bucket, create a containers.json file for handling, start the instance and import it manually. Virtualbox or other operating software would need to be installed as well, and then import it and execute.

Submission

Required files

  • Zip the following files and upload report.zip to Canvas
    • Running ~/archive.sh will automatically zip all of the files

■ ~/report/assignment-questionnaire.txt

■ stage1.exe, stage2.exe, payload.exe (linux malware)

■ ~/tools/network/iptables_rules

■ ~/tools/cfg-generation/score.h

  • Running ~/archive.sh will create report.zip automatically.
    • Please check the content of your zip file before submitting it to Canvas ● Submit only ‘assignment-questionnaire.txt’ to Gradescope, the zip to Canvas (under Project3 Assignment).

If you did not submit report.zip on time, a 5-point deduction will be applied to your total score.

Questionnaire

  • To get credit for the project, you have to answer the questionnaire, found at on canvas
  • Please strictly follow the format or the example answer for each question in assignment-questionnaire.txt. TAs use an autograder for your submission. Windows Part

○ Read assignment-questionnaire.txt

○ Carefully read the questions, and answer them in assignment-questionnaire.txt ○ For each stage, there are 4-6 questions regarding the behavior of the malware. ● Android Part

○ READ ~/Android/MaliciousMessenger/writeup.pdf

Carefully read the writeup, answer in assignment-questionnaire.txt

Make sure you overwrite ANSWER_HERE

 

Rubric

  • The value for each max score is within its particular section ○ Windows has 110 possible points ○ Android has 100.

○ As each section is worth an equal amount of your overall P2 grade, we normalized the Windows score by dividing by 1.1 (and rounded up), then averaged it with the Android score to get your final grade. So effectively, each point in the table above is worth half a point of your final project grade (slightly less for Windows).

  • If the Partial Credit column is blank, there is no partial credit for the question. “Ratio” refers to Levenshtein ratio, it’s a metric of similarity between strings

Android Malware Analysis Lab

June 11, 2017

1       Background

1.1       Android Manifest File

[1] Every application must have an AndroidManifest.xml file in its root directory. The manifest file provides essential information about your app to the Android system, which the system must have before it can run any of the app’s code. Among other things, the manifest file does the following

  • It names the Java package for the application.
  • It describes the components of the application, which include the activies, services, broadcast receivers, and content providers that compost that application. It also names the classes that implement each of the components and publishes their capabilities, such as the Intent messages that they can handle. These declarations inform the Android system of the components and the conditions in which they can be launched.
  • It declares the permissions that the application must have in order to access protected parts of the API and interact with other applications. It also declares the permissions that others are required to have in order to interact with the app’s components.

In Listing 1 an example of an app’s manifest file is shown. From it, we can see that this app declares that it needs the INTERNET and RECEIVE SMS permissions. Additionally, the app uses three components: ActivityOne, SmsReceiver, and myAppsService. ActivityOne is declared in lines 80-85. The intent-filter tag specifies the types of intents that an activity, service, or broadcast receive can respond to. An intent filter declares the capabilities of its parent component – what an activity or service can do and what types of broadcasts a receiver can handle. It opens the component to receiving the intents of the advertised type, while filtering out those that are not meaningful for the component. Lines 16-21 declare a broadcast receiver component named SmsReceiver.

From the intent filters, we see that the Android OS will notify SmsReceiver when the device receives a new text message. The final component this app uses is a service component named ServiceOfApp declared on lines 23-25.

The Android Manifest file provides a high-level abstraction of an app’s behavior. When attempting to manually inspect the internal behaviors of an application statically, the manifest file is a good starting point. It provides key insights on the permissions an application is using, the components it is using, and how the application interacts with the Android OS and the outside world. Additional information about the contents and attributes of the manifest file can be found in the Android documentation [1].

<?xml version=”1.0” encoding=”utf−8”?>

<manifest               . . .                   package=”com. myApplicationPackage”>

<uses−permission                               android :name=”android . permission .INTERNET”/>

<uses−permission      android :name=”android . permission .RECEIVE SMS”/> application

activity                            android :name=” . ActivityOne”>

<intent−filter >

<action                        android :name=”android . intent . action .MAIN”/>

<category android :name=”android . intent . category .LAUNCHER”/>

</intent−filter >

</activity >

receiver                          android :name=”SmsReceiver”>

<intent−filter >

<action             android :name=

”android . provider . Telephony .SMS RECEIVED”/> </intent−filter > </receiver >

service android :name=” . ServiceOfApp”

</service >

<provider                                  android : authorities=”de . ub0r . android . smsdroid” android :

name=” . MessageProvider”/>

/application>

</manifest>

1

2

3

4

5

10

11

12

13

14

17

18

19

20

21

24

25

26

27

28

29

30

31

Listing 1: An example of an app’s Android Manifest File

1.2       Disassembling Android Apps

Android uses the Android application package (APK) format to distribute apps to Android devices. Apks are nothing more than a zip file containing resources and assembled Java code. However, if you were to simply unzip the apk you would only have two files: classes.dex and resources.arsc. Since viewing or editing compiled files is next to impossible, the apk file needs to be decoded or disassembled. If one wishes to analyze an app at the bytecode level, reverse engineering tools, such as Apktool [2] are available. Additionally, the app’s Java source code can be partially reconstructed using JADX [3]. You will probably find both tools useful for completing this lab.

1.2.1          Disassembling Apps using Apktool

Apktool is a reverse engineering tool for Android apps. It can decode resources to nearly original form and rebuild them after making some modifications. It also makes working with an app easier because of the project like file structure and automation of some repetitive tasks like building apk, etc. [2]. The functionality of Apktool is well-documented and we will briefly describes how this tool can be used to decode and build apk files. More information about Apktool can be found in its documentation [2].

1.2.2         Decoding apps using Apktool

In this example, we will use Apktool to decompile a malicious apk that was found in the wild (a7f94d45c7e1de8033db7f064189f89e82ac12c1) [4]. The apk is a repackaged version of the CoinPirates game that includes a malicious payload.

Apktool provides a command line interface. Its most common use case is for decoding and disassembling apk files. If you need to decode an apk file, you use the d (decode) option and pass the apk file as an argument. An example is shown in Listing 2 on line 1.

$ : apktool d a7f94d45c7e1de8033db7f064189f89e82ac12c1 . apk

I :            Using Apktool                                             2.2.1 on a7f94d45c7e1de8033db7f064189f89e82ac12c1 .

apk

I :             Loading resource               table . . .

I :                    Decoding AndroidManifest . xml with                       resources . . .

I : Loading resource table from f i l e : /home/joey /. local /share/ Apktool/framework /1.apk I : Regular manifest package . . . I : Decoding file −resources . . .

I :                  Decoding values ∗/∗ XMLs. . .

I :          Baksmaling             classes . dex . . .

I : Copying assets and libs . . .

I : Copying unknown                     f i l e s . . .

I : Copying                  original           f i l e s . . .

$ : ls a7f94d45c7e1de8033db7f064189f89e82ac12c1/ a7f94d45c7e1de8033db7f064189f89e82ac12c1 . apk

1

2

3

4

5

6

7

8

9 10

11

12

13

14

15

Listing 2: Decoding an apk using Apktool.

If you look in the directory created you should see something similar to Listing 3. For this lab, we will focus mostly on the AndroidManifest.xml file, the res/ directory, and the smali/ directory. The app’s resources, such as its images and layouts can be found in the res/ directory. In the smali/ directory, the original classes found in the classes.dex file can be found. Apktool converts the original classes.dex file into smali using baksmali[5], an assembler/disassembler for the dex format. We will discuss the contents of these files and smali syntax later on.

$ : ls

AndroidManifest . xml

Apktool . yml lib / original / res/ smali/

1

2

Listing 3: Contents of the directory created.

1.2.3          Building apk files using Apktool

Apktool also can rebuild an apk file from the decoded resources after making some modifications, such as modifying the smali code. To build an app you need to provide the b (build) parameter to Apktool and also provide the decoded directory as an argument like the example in Listing 4.

$Apktool b a7f94d45c7e1de8033db7f064189f89e82ac12c1/

I :           Using Apktool             2.2.1

I :                        Checking whether sources has changed . . .

I :                        Checking whether resources has changed . . .

I :            Building apk             f i l e . . .

I : Copying unknown                      f i l e s /dir . . .

1

2

3

4

5

6

Listing 4: Rebuilding an apk file using Apktool.

If you received no errors, the new apk should be found in the dist subdirectory of the directory provided as input. For example the apk created from running the command in Listing 4 is shown in Listing 5. In your working directory, you will still have a copy of the original apk file. It does not include any modifications you may have made.

$cd a7f94d45c7e1de8033db7f064189f89e82ac12c1/ dist /

$ls a7f94d45c7e1de8033db7f064189f89e82ac12c1 . apk

1

2

3

Listing 5: The location of the modified apk.

The next step is to sign the apk you just created. If the apk has not been signed it will fail to install on an emulator or real device. The Android SDK provides a utility program called apksigner that is located in the Android/Sdk/build-tools/SDK version/ directory. We have provided this program on your VM (You can also use jarsigner if you prefer). For this lab, you should just sign the apk with the debug key, which is located in the debug.keystore file located in your $HOME/.android/ directory. An example of signing an apk is shown in Listing 6. You need to provide the location of the keystore after the –ks option and pass the apk file as an argument. You will be prompted for a password. The default password is android.

$apksigner sign −−ks ˜/. android/debug . keystore a7f94d45c7e1de8033db7f064189f89e82ac12c1 . apk Keystore password for signer #1:

$

1

2

3

Listing 6: Signing your apk file (password is android).

After you have signed your apk, install it onto the emulator to verify everything went correctly.

1.3        Making modification using Apktool

Apktool can also be useful for making small modifications to the underlying byte code. For example, let’s assume a malicious app is using the anti-analysis check shown in Listing 7 to prevent the execution of any malicious behavior if the Build type is eng. Use apktool to disassemble this app, so that you can modify the code located in the smali directory. Use apktool to disassemble the app located in tutorialApps/emu-check.apk. After you have done so, open the file emu-check/smali/com/myapplication/MainActivity.smali in a text editor. You will see the code shown in Listing 8. The code shown is smali and is a representation of Dalvik bytecode. The Android Developer’s website provides a page that discusses the types of instructions and arguments [6].

For the checkEnvironment method, the app is checking the model’s build type to see if it is equal to the string “eng”. In the bytecode, we see that the value of Build.TYPE is stored in register v0 on line 7. The string constant “eng” is stored in register v1 on line 9. The comparison of the strings is completed on line 11 and the result is stored in register v0. On line 13 we see that if the value stored in register v0 is equal to zero, then a jump to the cond 0 branch will occur. Therefore, if the Build.TYPE is not ”eng” then a jump to cond 0 occurs and the malicious behavior will be triggered. Since we are on an emulator, our Build.TYPE will be “eng” and the jump will not occur. To force the controlflow to go to cond 0, change the statement on line 15 to “goto :cond 0”. This will force the branch to occur every time the app runs. Build and sign the app. Install it onto the emulator (If you installed the previous version you will need to uninstall it first) and open the app. If you check logcat, you will see that the Build type is ”eng”. However, the app will now log the ”do something malicious” instead.

protected void checkEnvironment (){ i f ( Build .TYPE. equals (”eng”) ) {

Log .d(TAG, ”checkEnvironment : do nothing”) ; return ;

} else {

Log .d(TAG, ”checkEnvironment : do something malicious . ”) ;

1

2

3

4

5

Listing 7: Prevents malicious behavior if the build type is eng.

 

# virtual methods

. method protected checkEnvironment ()V

. locals 2

. prologue . line 21 sget−object v0 , Landroid/os/Build;−>TYPE: Ljava/lang/String ; const−string v1 , ”eng”

invoke−virtual {v0 , v1} , Ljava/lang/String;−>equals ( Ljava/lang/ Object ;)Z move−result v0 if −eqz v0 , : cond 0

. line 22 const−string v0 , ”MainActivity” const−string v1 , ”checkEnvironment : do nothing”

invoke−static {v0 , v1} , Landroid/ util /Log;−>d( Ljava/lang/String ; Ljava/lang/String 😉 I

. line 27 : goto 0 return−void

. line 25 : cond 0 const−string v0 , ”MainActivity” const−string v1 , ”checkEnvironment : do something malicious .”

invoke−static {v0 , v1} , Landroid/ util /Log;−>d( Ljava/lang/String

; Ljava/lang/String 😉 I

goto        : goto 0

. end method

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

Listing 8: checkEnvironment in smali.

1.4                 Using JADX to disassemble Dex to Java Source Code

JADX [3] is another tool that can be used to disassemble apk files. However, JADX disassembles the Dalvik byte code into JAVA source code. The translation is imperfect and will most likely be incomplete, but it is still useful for doing analysis. JADX provides two interfaces: a command line interface and a gui interface. For this lab, we will only discuss the gui interface. You can start the GUI interface of JADX by running jadx-gui from the command line. When the program first opens, it will ask the user to choose a file to disassemble. It supports apk, dex, jar, class, zip, and aar files. This discussion will only discuss using apk files. After you choose the apk file, JADX will begin disassembling the apk. When it’s complete you should see the source code for each class in the Menu pane. If you review the source code, you can see it is not ideal, but it does provide insight into the app’s behavior.

2 Using the disassembled Code To Search For Suspicious Behavior

Now that we have disassembled the apk file, we can begin analyzing the source code to identify suspicious behavior. Defining behavior within Android is challenging. Behavior that may be suspicious or malicious in one application may be expected behavior in another application. It is reasonable for a messaging app to access a user’s contacts, but if a utility app, such as a flashlight app, accesses a user’s contacts it should raise suspicion. Therefore, the behavior that makes an application potentially malicious is not a particular pattern, but the behavior in an application that is inconsistent with the end user’s expectation. The easiest starting point for identifying any questionable behavior is by looking at the App’s manifest file. The manifest file provides a high-level abstract of an app’s behavior

2.1        Analyzing CoinPirates Manifest File

In JADX, the AndroidManifest.xml is located in the Resources/ directory. The highest level of security for Android is the permission system that protects the usage of sensitive behavior. The manifest file shows us that the CoinPirates app has access to 14 permissions. Malware often abuses the text messaging permissions to communicate with their C&C server and to try and send premium text messages without the user being aware.

<uses−permission android:name=”android . permission .INTERNET” /> <uses−permission android:name=”android . permission .

ACCESS NETWORK STATE” />

<uses−permission android:name=”android . permission .

READ PHONE STATE” />

<uses−permission android:name=”android . permission .WRITE SMS” />

<uses−permission android:name=”android . permission .RECEIVE SMS” />

<uses−permission android:name=”android . permission .SEND SMS” /> <uses−permission android:name=”android . permission .

RECEIVE BOOT COMPLETED” />

<uses−permission android:name=”android . permission .

CHANGENETWORKSTATE” />

<uses−permission android:name=”android . permission . WRITE APN SETTINGS” />

<uses−permission android:name=”android . permission . ACCESS WIFI STATE” />

<uses−permission android:name=”android . permission .

CHANGE WIFI STATE” />

<uses−permission android:name=”android . permission .WAKELOCK” /> <uses−permission android:name=”com. android . browser . permission .

READ HISTORY BOOKMARKS” />

<uses−permission android:name=”com. android . browser . permission .

WRITE HISTORY BOOKMARKS” />

4

5

6

7

8

9

10

11

12

13

14

15

Listing 9: Permissions used by CoinPirates

After observing the permissions, the next goal is to vet the application by analyzing how the application uses the sensitive APIs that are protected by the suspicious permissions. Since malware writers often repackage their payload within real apps with 100’s of classes, it would be too time-consuming to search through all the source code. Instead, we will focus on the entry points of the application.

3     Identifying Entry points into an Android applications

[2] Android applications are written using the Java programming language. Unlike conventional Java programs, Android applications do not have a main() function or a single entry point for execution. Instead, they are designed using components. App components make up the essential building blocks of an Android app. Each component is a different point through which the system can enter a developer’s application. There are four different types of components: activities, services, content providers, and broadcast receivers. Each type of component serves a different role and the set of components used in an Android application define its overall behavior. The activity component creates user interfaces. For example, a messaging application may have one activity that creates the user interface for allowing a user to input their message and another activity for allowing the user to view their contacts. The service component runs in the background to perform tasks. Unlike, activity components, service components do not have a user interface. For example, a service component can be used to play music in the background. The content provider component handles application data. Using content providers, an application can store data in files, SQLite databases, or other persistent storage locations an application can access. The broadcast receiver component responds to system-wide broadcast announcements. For example, the system may broadcast that a picture has been captured, and the broadcast receiver can alert the application of this action. In general, broadcast receivers do minimal work, but instead, alert other components that an event occurred.

Since the components are required to be declared in the manifest, this allows us to quickly identify any interesting entry points without having to search through the source code. To avoid detection, malware usually does not trigger until it receives commands from its C&C server. The two most common and efficient wants for this communication is through the network and sms. Since SMS can provide communication when the user does not have a wifi connection, it is usually preferred. Since this app has declared the RECEIVE SMS permission, we know that it has the ability to receive broadcasts about arriving text messages through a broadcast receiver. If a broadcast receiver wants to receive a text message, it must specify that it can handle this action by adding the action to its intent filter inside the manifest file. The action required is shown in Listing 10.

<action android:name=”android . provider . Telephony .SMS RECEIVED” />

1

Listing 10: Action required to receive SMS broadcasts

In the CoinPirates manifest, we see that only one receiver has this ability, and the component’s declaration provides us with enough information to identify the package and class name that declares the receiver. Additionally, the components declaration raises more suspicion. First, it is manipulating the naming convention and is located in the com.android package. Next, it has a priority of 10000. In Android, broadcasts can be ordered or sent to all apps at the same time. In general, applications with a higher priority will receive the broadcast first. Additionally, they have the choice of aborting the broadcast() or allowing it to be sent to the app with the next highest priority. Therefore, this behavior can be manipulated by malicious apps to hide the notification of received text messages[3]

            <receiver                       android:name=”com. android . SMSReceiver”>

<intent−f i l t e r                     android:priority=”10000”>

<action android:name=”android . provider . Telephony .

SMS RECEIVED” />

<action android:name=”android . provider . Telephony .

SMS SENT” />

</intent−f i l t e r> </ receiver>

4

5

6

Listing 11: Action required to receive SMS broadcasts

If we use JADX to analyze the source code for the SMSReceiver class, we can identify any suspicious behavior that may occur when a text message is received. The Android OS notifies broadcast receivers by calling the receiver’s onReceive method. Therefore, we should start our analysis from this point in the app. When looking over the source code of the onReceive method, we see that the method immediately queries a database called “mydb.” The source code also shows us that the values received from the database are being compared to the sender’s number and the contents of the sms body. Based on thee results of these comparisons, the app uses the needDel (delete text message) or needUpload variables to control the apps’ control-flow.

Identifying suspicious entry point that are defined in the manifest file, allows us to quickly identify suspicious behavior. For example, After analyzing the SMSReceiver we see that it is being used by the C&C server to trigger malicious behavior. We also know that the app uses the “mydb” database to interpret the C&C servers commands. While the SMSReceiver app provides the most insight, the malicious app is also using two other receivers, AlarmReceiver and BootReceiver, to start the Monitor Service. We leaving analyzing the MonitorService component to the reader.

3.0.1           Triggering Malicious Behavior Dynamically

Using static analysis, we can identify the necessary events required to trigger malicious behavior in the app. Our next goal will be to leverage the details we extracted from the static analysis to dynamically generate the malicious behavior at run time.

In the case that the events necessary to trigger the malicious behavior is dependent on external sources, such as a text message being received, we will need to simulate these events. Android provides several tools for injecting events into the emulator, and you can read the full documentation on the Developer’s Website [7]. One tool is the emulator console. Each running emulator instance provides a console that lets you query and control the emulated device environment. For example, you can use the console to manage port redirection, network characteristics, and telephony events while your application is running on the emulator. The console emulator will be useful for injecting events, such as text messages from a specific number or changing the location’s device. The official documentation provides several examples.

3.1      Android Resources

A developer can provide an app with resources by placing it in a specific subdirectory of the res/ folder. Once you provide a resource in your application, you can use it by referencing its resource ID. Each resource is grouped into a ”type“ such as string, layout, or drawable.

When viewing an APK in JADX, you can find the resources an app uses in the Resources directory under the resources.arsc tab. After expanding the resources.arsc file, you can find many basic resources, such as hardcoded strings found in the values directory.

When JADX decompiles the APK back into source code, resources will be referenced by their ID in the R class, you can use this to create a mapping from the Resource ID to its original name in the res/resources.arsc/values subdirectory.

4       Lab Assignment

4.1     Scenario

  • You have received a malware sample apk. Your task is to discover what the malware does by analyzing it.
  • You need to identify the components that are being used by the app to communicate with its C&C server.
  • You need to identify any anti-analysis techniques being used by the app and remove them if necessary.
  • You need to identify the commands that trigger the malicious behavior.

4.2      Grading

  • Your answer must be in the correct format. If your answer does not follow the expected format, it will not be graded/regraded.

4.3      Project Structure

  1. Tools
  • JADX
    1. Disassembles apk files into Java Source Code
  • Apktool
    1. Disassembles apk files into smali.
    2. Rebuilds apk files.

 

4.4     Questionnaire

  • To get your credit for the project, you have to answer the questionnaire in /android/report/assignment-questionnaire.txt
  • For each stage, there are 4-5 questionnaire that inquires regarding the behavior of the malware.

4.5      Stage 1

4.5.1         Question 0: (5 points)

  • Run the command ./start server and verify the server is active.
  • Start the Android emulator.
  • Use the People app that is preinstalled to add a contact to the device. The name of the contact should be your GT ID (e.g. JDoe2).
  • Open the app that is named Messenger (not Messaging). This is the app installed from the sms.apk.
  • The server will ask you if your GT ID is correct. If so, press ’y’ and you will receive the first answer.
  • Answer Format: You need to copy & paste the string between the answer tags. For example, if the output was < answer > 1234 < /answer >, your answer would be 1234. You should follow this answer format for all answers you receive from the server.
  • After receiving your answer for Question 0, you can turn off the Android emulator and server. They will not be needed until later on. The remaining part of Stage 1 will be using JADX to analyze the sms.apk’s source code.

4.5.2         Question 1: (10 points)

  • What is the name of the component that is used for communicating with the C&C server?
  • Related background sections: 1.1, 1.2, 3
  • Answer Format: If the correct component was the receiver described below, the answer would be android.AReceiver.
<receiver                           android:name=”com. android . AReceiver”>

. . .

</ receiver>

1

2

3

4.5.3         Question 2: (20 points)

Using SMS as a protocol for a C&C server is an important design decision that is different from traditional IP-based approaches known from infected PCs. The main advantages of an SMS-based approach instead of IP-based are the fact that it does not require steady connections, that SMS is ubiquitous, and that SMS can accommodate offline bots easily [8]. sms.apk is leveraging SMS to receive commands from its C&C server, you need to identify them.

  • When sms.apk receives a text message, it checks to see if the message matches a command. What are the commands?
  • Related background sections: 1.4, 2.1, 3, 3.1
  • Answer Format: A list of commands (sms bodies) sms.apk receives from its C&C server. The list should be separated by end lines (one command per line).

At this point we should have enough information to trigger the malicious behavior. The C&C server can be started by running ./start server from the command line. Start the server and send the necessary text messages. Unfortunately, no malicious behavior will be exhibited. This is because the malicious app has placed anti-analysis techniques into the app to prevent analysis. Our next goal will be to find them and see if we can emulate these triggers or remove them.

4.5.4         Question 3: (20 points)

The Android/BadAccents malware, discussed in [8], contains two specific checks on the incoming SMS number. It checks for ‘84’ and ‘82’ numbers, which indicates that the malware expects SMS from a C&C SMS server either located in China or South Korea. It seems the app we are inspecting does something similar.

  • What country code does sms.apk require the incoming text message to have before the malicious behavior will be triggered?
  • Related background sections: 1.4, 3.1
  • Answer Format: The country code required to trigger the commands (sms.apk only checks one country code).

4.6      Stage 2

From Stage 1, we know the required country code and the necessary commands to trigger the malicious behavior. However, even if we send the correct commands with the correct country code, sms.apk will still not exhibit any malicious behavior. In order to maximize the longevity of malware, malicious developers want to prevent analysis. Since the majority of dynamic analysis frameworks are based on emulation, malicious developers integrate anti-analysis techniques to change an app’s behavior. If an app senses that the underlying environment is an emulator and not a real phone, it will change its behavior to not exhibit any suspicious behavior. In Stage 2 we will try to identify how sms.apk is checking if it is on an emulator. Then we will modify sms.apk to remove this check and trigger the malicious behavior.

4.6.1         Question 1: (15 points)

The most basic form of emulation detection is when a malicious app leverages a static heuristic. Static heuristics are pre-initialized values that provide information about the underlying environment [9]. Apps running on a system can check these static heuristics by calling Android APIs. For many of the values, the emulator will return values that are inconsistent with what would happen if the app was running on an real device. For example, if the TelephonyManager.getDeviceId() API returns all 0’s, the device in question is an emulator. This is because this value cannot exist on a physical device.

A list of the possible static heuristics that can be found in sms.apk can be found in [10]. However, the one just mentioned would be a good starting point.

  • apk is leveraging an Android API to identify if the underlying environment is emulated. The return value of the API provides sms.apk with a static heuristic about the emulated enviornment. sms.apk compares the returned value to a hard-coded string, what is the value of this string?
  • Related background sections: 1.4, 2.1, 3, 3.1
  • Answer Format: The value of the string that the static heuristic is being compared to. For example, if emulation check = “01234” your answer would be 01234.

4.6.2         Question 2: (30 points)

The final question is a two-step process. The first step will be to modify sms.apk and remove the environment check so that we can run sms.apk on an emulator. The second step will be sending the commands found in Stage 1 to the emulator and having it exhibit malicious behavior. Upon success, the C&C server will generate the final answers.

  • What are the strings the C&C server provides you with when you dynamically trigger the malicious behavior in sms.apk?
  • Answer format: For each command sent to sms.apk, the C&C server will print out a string. The answer to question 3 will be a list of strings, one for each command. In your report, place each string on a separate line.

4.6.3       Step 1:

  • Prerequisites: Read Sections 1.2, it will discuss how to leverage apktool to disassemble an app into byte code. Additionally, you have been provided with a sample apk called emu-check.apk. Section 1.2 will walk you how to remove the emulation check in this basic app. If you have no previous experience modifying apks, it’s recommended that you start off by removing the emulation check from emu-check.apk before working on sms.apk.
  • To complete step 1, you will need to modify sms.apk, so that it will trigger its malicious behavior while running on an emulator.
  • Note: The modification you are required to make is extremely small, if you find yourself modifying more than a few characters then you are going in the wrong direction.

4.6.4       Step 2:

  • The final step of Question 3 should be straight forward if you have the correct answers for the previous steps. The first step should be starting up the server using the command “./start server”. Once the server has started, you need to trigger the malicious behavior by sending it commands. If you are successful, the server will provide you with an answer for the respective command (the order does not matter). Copy and paste each answer to your report.

References

  • App manifest documentation. https://developer.android. com/guide/topics/manifest/manifest-intro.html.
  • R Winsniewski. Apktool: a tool for reverse engineering android apk files. URL: https://ibotpeaches.github.io/Apktool/(vi sited on 07/27/2016), 2012.
  • Jadx, 2012. https://github.com/skylot/jadx.git.
  • Yajin Zhou and Xuxian Jiang. Dissecting android malware: Characterization and evolution. In Security and Privacy (SP), 2012 IEEE Symposium on, pages 95–109. IEEE, 2012.
  • Jesus Freke. Smali/baksmali. URL: http://code. google. com/p/smali.
  • Dalvik bytecode. https://source.android.com/devices/ tech/dalvik/dalvik-bytecode.html.
  • Controlling the emulator from the commandline. https:

//developer.android.com/studio/run/emulator-commandline.html# events.

  • Siegfried Rasthofer, Irfan Asrar, Stephan Huber, and Eric Bodden. How current android malware seeks to evade automated code analysis. In IFIP International Conference on Information Security Theory and Practice, pages 187–202. Springer, 2015.
  • Thanasis Petsas, Giannis Voyatzis, Elias Athanasopoulos, Michalis Polychronakis, and Sotiris Ioannidis. Rage against the virtual machine: hindering dynamic analysis of android malware. In Proceedings of the Seventh European Workshop on System Security, page 5. ACM, 2014.
  • Timothy Vidas and Nicolas Christin. Evading android runtime analysis via sandbox detection. In Proceedings of the 9th ACM symposium on Information, computer and communications security, pages 447–458. ACM, 2014.

[1] Portions of this section are reproduced from work created and shared by the Android Open Source Project and used according to terms described in the Creative Commons 2.5 Attribution License.

[2] Portions of this section are reproduced from work created and shared by the Android Open Source Project and used according to terms described in the Creative Commons 2.5 Attribution License.

[3] As of Android 4.4 this has been slightly adjusted. The default SMS app will always receive the broadcast first, regardless of priority.