Skip to content

Malware Spoofing, distinguishing between malware and Benign (safe) files based on representative digital images.

Notifications You must be signed in to change notification settings

Djaferbenchadi/Malware_analysis_binary

Repository files navigation

Malware Analysis (Binary Classification) using Kernel Constrained Subspace Method KCSM

Overview

This project focuses on malware analysis, specifically targeting malware spoofing and binary classification challenges. We implement the Kernel Constrained Subspace Method (KCSM) augmented with Random Fourier Features (RFF_CSM) for efficient and effective malware detection.

Full paper from here: https://ieeexplore.ieee.org/abstract/document/10215631.

  • Enhanced malware detection using KCSM, capable of distinguishing between malware and benign files.
  • Improved computational efficiency with the integration of RFF, reducing the complexity of kernel calculations.
  • Suitable for large-scale and real-time malware detection systems.

Summary

This paper proposes a novel approach based on subspace representation for malware detection, an important task of distinguishing between safe and malware (malicious) file classes. Our solution is to utilize a target software's byte-level visualization (image pattern) and represent the two classes by low-dimensional subspaces respectively, in high-dimensional vector space. We use the kernel constrained subspace method (KCSM) as a classifier, which has shown excellent results in various pattern recognition tasks. However, its computational cost may be high due to the use of kernel trick, which makes it difficult to achieve real-time detection. To address this issue, we introduce Random Fourier Features (RFF), which we can handle directly like standard vectors, bypassing the kernel trick. This approach reduces execution time by around 99%, while retaining a high recognition rate. We conduct extensive experiments on several public malware datasets, and demonstrate superior results against several baselines and previous approaches.

The analysis is conducted using three primary malware datasets, BIG2015, Malimg and the Dumpware datasets. In addition, we collect a safe class comprising 2500 cleanly coded files from three distinct operating systems: Windows 10 Pro, 11 Home, and 11 Pro.

All datasets are preprocessed for compatibility with the KCSM and RFF_CSM framework.

BIG2015 dataset ==> 2015 Microsoft Malware Classification Challenge.
Malimg dataset ==> NA ('mat' are shared here).
Dumpware dataset ==> Dumpware dataset.
Safe dataset ==> Request needed from the author.

Results

Methods Datasets Accuracy % Computation time
BAT algorithm and CNN (Cui et al.) Malimg 94.5% NA
Inception Net (Khan et al.) BIG2015+3000 safe 74.5% NA
ResNet-152 (Khan et al.) BIG2015+3000 safe 88.36% NA
CSM (OURS) BIG2015+2500 safe 83.25% 2.68 sec
Malimg+2500 safe 92.87% 2.38 sec
dumpware+2500 safe 99.06% 2.58 sec
KCSM (OURS) BIG2015+2500 safe 92.89% 4189.79 sec
Malimg+2500 safe 95.13% 6228.29 sec
dumpware+2500 safe 99.12% 1034.78 sec
RFF_CSM (OURS) BIG2015+2500 safe 93.50% 1.59 sec
Malimg+2500 safe 97.15% 1.01 sec
dumpware+2500 safe 99.26% 0.78 sec

Citing

To cite the paper, kindly use the following BibTex entry:

@article{djafer2023malware,
  title={Malware detection using Kernel Constrained Subspace Method},
  author={Djafer-Yahia-Messaoud, Benchadi and Bojan, Batalo and Kazuhiro, Fukui},
  journal={IEICE Proceedings Series},
  volume={78},
  number={P2-22},
  year={2023},
  publisher={The Institute of Electronics, Information and Communication Engineers}
}

Contact

If you have any enquiries or questions, you can open up an Github's issue above or contact me personally on [email protected].

About

Malware Spoofing, distinguishing between malware and Benign (safe) files based on representative digital images.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages