Research Article Open Access

Features Reweighting and Similarity Coefficient Based Method for Email Spam Filtering

Ahmed Osman Ali Elsiddig1, Ammar Ahmed E. Elhadi2 and Ali Ahmed3
  • 1 Karary University, Sudan
  • 2 Mashreq University Khartoum North, Sudan
  • 3 University of Science and Technology - Khartoum, Sudan
American Journal of Applied Sciences
Volume 14 No. 10, 2017, 983-993

DOI: https://doi.org/10.3844/ajassp.2017.983.993

Submitted On: 28 February 2017 Published On: 13 October 2017

How to Cite: Ali Elsiddig, A. O., Elhadi, A. A. E. & Ahmed, A. (2017). Features Reweighting and Similarity Coefficient Based Method for Email Spam Filtering. American Journal of Applied Sciences, 14(10), 983-993. https://doi.org/10.3844/ajassp.2017.983.993

Abstract

Spam is flooding the Internet with many copies of the same message, in an attempt to force the message on people who would not otherwise choose to receive it. Anti spam by determining whether or not an incoming email is spam has become an important problem. One of the main characters or the problem of Spam filtering is its high dimension of space feature. For this reason, we need a reducing stage of dimensions. This study tried to cover this side from spam detection techniques by study the effect of re-weight of features. The works started by applying similarity coefficient in the dataset and then re-weight the features in the dataset and applying similarity coefficient in the new data set. Finally make a Comparison between the result before and after re-weight and Comparison with feature selection method. The objective of this Thesis is: Study the similarity coefficient (Cosine and Dice) and Study the effects of the important feature to other features through the re-weight process. The most important results of this study are: Reweighting process did not improve the success rate of any of the two methods (Cosine and Dice). Also, Feature selection method led to improve detection in Cosine, while reweighting method not improve detection any of (Cosine or Dice).

  • 1,074 Views
  • 784 Downloads
  • 0 Citations

Download

Keywords

  • Spam
  • Spam Filtering
  • Feature Selection
  • Similarity Coefficient