Cohen et al. propose a system for the early detection of outgoing spammers, or ErDOS, that uses machine learning techniques and an approach mostly based on the social interaction of email accounts, or inter-account communication patterns. They use a set of features that characterize users and their behavior, including ratios of incoming and outgoing messages, intra- and inter-service provider communications, and per-account features based on these measures.
Their model was generated using WEKA’s implementation of the rotation forest classification technique. In the experimental evaluation, the authors use different time intervals for training and evaluation, and compare the results to existing models, in terms of true positives, percentage of suspicious accounts, and the early detection measure.
The paper is well written and quite easy to read. However, I found some of the assumptions quite unsuitable and have a concern that those assumptions affected the accuracy of this model. For instance, the authors rate any account as “spammy” if it sends even a single message tagged as spam by the content-based filter. These filters are prone to errors. We have all had at least a few legitimate messages end up in the spam folder on occasion. Increasing this lower limit would probably help in this case. Another probable limitation is that messages coming from blacklisted Internet protocol (IP) addresses are filtered out before the analysis, which means a significant number of spammy accounts that might have been very useful in this analysis are left out.