February 4, 2004
We provide an automated graph theoretic method for identifying individual users' trusted networks of friends in cyberspace. We routinely use our social networks to judge the trustworthiness of outsiders, i.e., to decide where to buy our next car, or to find a good mechanic for it. In this work, we show that an email user may similarly use his email network, constructed solely from sender and recipient information available in the email headers, to distinguish between unsolicited commercial emails, commonly called "spam", and emails associated with his circles of friends. We exploit the properties of social networks to construct an automated anti-spam tool which processes an individual user's personal email network to simultaneously identify the user's core trusted networks of friends, as well as subnetworks generated by spams. In our empirical studies of individual mail boxes, our algorithm classified approximately 53% of all emails as spam or non-spam, with 100% accuracy. Some of the emails are left unclassified by this network analysis tool. However, one can exploit two of the following useful features. First, it requires no user intervention or supervised training; second, it results in no false negatives i.e., spam being misclassified as non-spam, or vice versa. We demonstrate that these two features suggest that our algorithm may be used as a platform for a comprehensive solution to the spam problem when used in concert with more sophisticated, but more cumbersome, content-based filters.
Similar papers 1
April 4, 2005
Email is an increasingly important and ubiquitous means of communication, both facilitating contact between private individuals and enabling rises in the productivity of organizations. However the relentless rise of automatic unauthorized emails, a.k.a. spam is eroding away much of the attractiveness of email communication. Most of the attention dedicated to date to spam detection has focused on the content of the emails or on the addresses or domains associated with spam sen...
August 19, 2010
E-mail is probably the most popular application on the Internet, with everyday business and personal communications dependent on it. Spam or unsolicited e-mail has been estimated to cost businesses significant amounts of money. However, our understanding of the network-level behavior of legitimate e-mail traffic and how it differs from spam traffic is limited. In this study, we have passively captured SMTP packets from a 10 Gbit/s Internet backbone link to construct a social ...
April 4, 2005
Almost all of us have multiple cyberspace identities, and these {\em cyber}alter egos are networked together to form a vast cyberspace social network. This network is distinct from the world-wide-web (WWW), which is being queried and mined to the tune of billions of dollars everyday, and until recently, has gone largely unexplored. Empirically, the cyberspace social networks have been found to possess many of the same complex features that characterize its real counterparts, ...
January 19, 2006
Email graphs have been used to illustrate general properties of social networks of communication and collaboration. However, increasingly, the majority of email traffic reflects opportunistic, rather than symbiotic social relations. Here we use e-mail data drawn from a large university to construct directed graphs of email exchange that quantify the differences between social and antisocial behaviors in networks of communication. We show that while structural characteristics ...
April 5, 2005
We propose a new detection algorithm that uses structural relationships between senders and recipients of email as the basis for the identification of spam messages. Users and receivers are represented as vectors in their reciprocal spaces. A measure of similarity between vectors is constructed and used to group users into clusters. Knowledge of their classification as past senders/receivers of spam or legitimate mail, comming from an auxiliary detection algorithm, is then us...
February 1, 2016
Email classification and prioritization expert systems have the potential to automatically group emails and users as communities based on their communication patterns, which is one of the most tedious tasks. The exchange of emails among users along with the time and content information determine the pattern of communication. The intelligent systems extract these patterns from an email corpus of single or all users and are limited to statistical analysis. However, the email in...
August 27, 2009
Spam mitigation can be broadly classified into two main approaches: a) centralized security infrastructures that rely on a limited number of trusted monitors to detect and report malicious traffic; and b) highly distributed systems that leverage the experiences of multiple nodes within distinct trust domains. The first approach offers limited threat coverage and slow response times, and it is often proprietary. The second approach is not widely adopted, partly due to the lack...
June 6, 2013
Email service providers have employed many email classification and prioritization systems over the last decade to improve their services. In order to assist email services, we propose a personalized email community detection method to discover the groupings of email users based on their structural and semantic intimacy. We extract the personalized social graph from a set of emails by uniquely leveraging each node with communication behavior. Subsequently, collaborative simil...
November 2, 2010
Email Retrieval task has recently taken much attention to help the user retrieve the email(s) related to the submitted query. Up to our knowledge, existing email retrieval ranking approaches sort the retrieved emails based on some heuristic rules, which are either search clues or some predefined user criteria rooted in email fields. Unfortunately, the user usually does not know the effective rule that acquires best ranking related to his query. This paper presents a new email...
November 1, 2010
Email Retrieval task has recently taken much attention to help the user retrieve the email(s) related to the submitted query. Up to our knowledge, existing email retrieval ranking approaches sort the retrieved emails based on some heuristic rules, which are either search clues or some predefined user criteria rooted in email fields. Unfortunately, the user usually does not know the effective rule that acquires best ranking related to his query. This paper presents a new email...