wanted to know how it works! :-)'>
|
Cozahost Newsletter Archive |
|||
| Contact us | Cozahost | Subscribe | |
| Hi! | ||||||||||
This newsletter is a bit longer than normal: With the
recent addition of our spam blocking feature, we found that many of you
don't know it works. Even more surprising is
that many of you wanted to know how it works! You
are welcome to forward this newsletter to anyone you think may benefit. |
|
|||||||||
| ..:: Introduction ::.. | ||||||||||
|
With this ever growing nuisance in mind, we published a in-depth practical guide a while back on how to avoid spam, and then we upped the ante by introducing a free service to our clients were we intercept upwards of 90% of spam before it reaches their inboxes. Recently, virus writers and spam purveyors united in an unholy alliance where they use viruses and worms to spread pornography, "enlargement pills", and generally waste your time - making the spam problem bigger and more dangerous that ever before. In this article we explain why it is so difficult to rid the internet of these leeches - and we show you how ISPs can (and do) fight back successfully. The article makes the newsletter longer
than usual, but we hope you will enjoy and benefit from the information.
Our regular format continues next time.. |
||||||||||
| ..:: What is spam? ::.. | ||||||||||
|
On the other hand, you may already (or will soon) be receiving tens or even hundreds of emails with offers to enlarge a certain part of your anatomy, Viagra at discount, pornography and worse. This is the work of spammers. The "proper" name for these email advertisements is UCE - an abbreviation for Unsolicited Commercial Email. "Spam" is actually a trademark name for a canned meat product. :-) But I digress: the point is that the word "Spam" is commonly used to refer to "advertisement" emails from people or companies you never heard of: in other words, it is unsolicited. Virtually no internet email user will escape
this problem - unless you take the
necessary precautions. |
||||||||||
| ..:: Spam is a HUGE problem - and growing ::.. | ||||||||||
To give you an idea of the magnitude of the problem: In 2003 approximately 1.5 trillion spam messages was sent. In the first quarter of 2004, that number jumped to 1.6 trillion...in 3 months! The economic damage caused by spam (lost
productivity and network congestion) was estimated to be between US$58 billion
and US$78 billion in the first quarter of 2004 alone. |
||||||||||
| ..:: How do the spammers do it? ::.. | ||||||||||
|
Their answer: send 10 000 000 email messages to get 10 sales. Tomorrow, send another 10 000 000 emails and get another 10 sales...and so on, until some of the most notorious and largest spammers make more than US$ 1 million per year. (Remember that sending a million emails costs just about nothing.) Their profit margins are typically infinite, because a "sale" to them means getting money from the customer: in most cases they never deliver the product itself. Since there is a lot of money to be made by preying on the naiveté of internet users, these fraudsters can afford to spend time and money to hire programmers and technicians to make their operations difficult to trace. All reputable Internet Service Providers (ISP) will terminate an user's account immediately if they are identified as a spammer, so the bad guys will typically get an internet access account from an ISP (using false information) and then push as much spam through that ISP as possible before their operation is detected and the account is closed. By the time their account is terminated, the spammer has already set up one or more new accounts (using false information) with the same or several other ISPs, so the spammer just moves on to the next account. In many cases this process of burn and run is automated by special software. Of course spammers falsify sender email address and other details to make it more difficult for ISPs and law enforcement to find them, but the method most relied on to avoid identification is to abuse other (innocent) email servers in order to relay their junk mail. They find unsecured mail servers (open relays) by constantly scanning large blocks of internet network addresses, looking for mail servers that have not been properly secured. If you think that finding these open relays must be a lengthy and complicated process for spammers, you would be wrong. A typical user connecting to the internet with an ADSL or leased line will find that spammers locate their machines and probe for security holes within minutes after they connect. Scanning for open relays and sending spam requires significant computing power and a thick network pipe, but the spammers found a simple way around that: they use other peoples computers and network connections. This dirty trick involves creating computer viruses that, once it penetrated a PC, will help the spammer to find open relays, or even send spam on behalf of the spammer. (It is estimated that there are more than 14 million computers on the internet that has been compromised in this way - and the number is climbing.) The spammer controls his network of "Zombies" (PCs with the virus) as if it is one huge computer with virtually infinite computing power and oodles of bandwidth - all geared to pump junk into your inbox. The final insult: imagine a spammer
using your own virus infected PC to send you spam! :( |
||||||||||
| ..:: A perfect storm ::.. | ||||||||||
The way email is transported on the internet is more than 30 years old. It dates back to the days when the internet was mainly used by universities to share information, there were a few thousand machines on the network at most, and it was unimaginable that a person could have his very own personal computer. Since then the internet changed drastically: There are hundreds of millions of computers on the internet. Just about anybody can afford to buy a personal computer and connect it to the internet. The internet is growing so fast that the network numbering scheme will have to be changed within a few years because we are on the point of running out of the 4 BILLION possible addresses. Yet, amazingly, we are still using good old SMTP (Simple Mail Transfer Protocol) from the old, innocent days! SMTP assumes that everyone on the network is basically trust worthy and it mandates that any server on the network must be able to send email to any other server (or person) on the network - just like the physical postal system works. The number one strength of SMTP (universal connectivity) is also it's greatest weakness, because it allows spammers to send email anonymously and virtually untraceably. The very system that carries email on the internet, is the same system that is indirectly responsible for the huge amount of spam that threatens to destroy it. At this point you may be wondering why the protocol is not simply replaced by a newer, more secure protocol? The answer to this question is depressingly simple: Installed base. There is an old information technology joke that goes: Why could God create the earth and everything in 6 days? Answer: Because he did not have an installed base. No disrespect is intended to any religion - the analogy tries to explain that it is much easier to do a huge amount of work when you have the advantage of starting with a clean slate. Besides, many IT people believe they are directly related to the Almighty anyway ;-) Millions upon tens of millions of mail servers on the Internet use SMTP to send email. To change the protocol without disrupting the majority of email flowing around the internet is a virtually impossible task. Having said this, we are fast reaching a point where spam and viruses are so destructive to the very fabric of the Internet that changing SMTP (or replacing it with a new one) may be less painful than allowing this mess to continue - even if it means changing the software running on several million email servers... We (all internet users) are between the devil
and the deep blue sea - and the tide is coming in. |
||||||||||
| ..:: How ISPs try to block spam ::.. | ||||||||||
|
ISPs are therefore highly motivated to get rid of the problem, but they face these problems:
Quite a problem I'm sure you will agree. To make matters worse (for the ISP): If the ISP blocks some emails and not others, how can his clients be sure they received all the (legitimate) email that was sent to them? Have you ever had to resolve a dispute where one party insist they sent the email and the other party insist they never received it? How can you trust an email system if you know it is blocking email based on tricks in some black box? So what can the ISP do? Keyword scanningA popular (but not very sophisticated) way for ISPs to deal with spam is to look for keywords in a message. For instance, if the email contains the word "viagra", then it is probably spam...or is it? It is conceivable that one of their clients may want (or need to) discuss the merits of the medication with a friend - after all viagra is a legitimate and respected drug! Assuming that email is spam simply because it contains a reference to a trademark (owned by a company that has nothing to do with spam at all) is not acceptable. ISPs can no longer use this method to reliably block spam. Sender domain or address blacklistingMany ISPs will intercept email when it comes from an email address that belongs to a known spammer. This technique is known as email blacklisting, and it will block all messages that originate from a specific email address (or domain). Often these blocks are effective, but only with nuisance spammers. The professional spammers change (forge) sender email address with every outgoing message, or at least with every spam run. It is address blacklisting is effective in catching less than 0.5% of spam. Bayesian filtersA more advanced way of content filtering is to look at all the words in a message - instead of just looking for a few specific words. For instance, if the words "viagra", "order" and "free" appears in the same message then it is more likely that the message is spam. On the other hand, it might still be two friends discussing the drug, so a Bayesian filter looks at all the words of in the email and gives each of them a positive (spam) and negative (not spam) rating. When the total rating exceeds a certain level, then the email is classified as spam. For instance, when the message contains the first name of the person it is sent to and contains "neutral" words like "father" or "sick", then the spam rating decreases and the message may not be classified as spam. The idea is that one can calculate the probability that a message is spam by assigning a score to each of the words in the email, and then calculate a total probability for the whole message. The filter can "learn" what is spam and what is not by example. Every time you designate a message as spam, the filter will take all the words in the message and assign a higher spam probability to them. Normal emails (not designated by you as spam) are also recorded and will reduce all words in the message's spam probability. Over time, the filter builds up a dictionary of spam and non-spam words; based on the normal email traffic an individual receives. (The filter is slightly more clever than this simple example as it uses sophisticated math and statistical theories to analyze probabilities) Using this technology, the filter will not consider the word "viagra" as probable spam at all if you are a doctor that prescribes the medicine. These filters are used very effectively to combat spam and is used in a number of software packages. The first problem is that the filter takes a while to "train" and it is only effective when used on a personal basis...in other words every mail user needs to have his own filter customized to the email he receives and what he deems to be spam. The second problem is that spammers also know how Bayesian filters work so they will fill the message with random words from a dictionary to confuse the filter and reduce the spam rating the message receives. In one case I even received spam with two jokes tacked onto the end in an effort by the spammer to avoid the filters. Cute. :( The biggest drawback of Bayesian filters are that there is a very small chance that they will miss-classify a message as spam. As we said before even one mistake in a thousand is too high, because it might just be your aunt asking you about Viagra for the Uncle. If you don't reply, she will assume that you are ignoring her and you are out of her will. ISPs run a big risk if they use Bayesian filters that are not properly trained per individual user. Real time DNS black lists (DNS RBL)A real time blacklist (RBL) is a centrally maintained database of server addresses that have been positively identified as the source of spam. It works like this: The blacklist maintainer investigate spam complaints and once a server has been positively implicated, it's address is added to the black list and the administrator of the server is notified that his server is now blacklisted. (Mail servers cannot hide their internet addresses, so spammers cannot falsify the information) In addition to this manual process, the RBL provider publishes hundreds of thousands of email addresses (honey pots) where spammers can easily find them. Once a spammer sends and email to one of these email addresses, that server is immediately classified as a spam server. Sort of a high-tech real time trap for spammers. This technique works very well because the spammer has no way to know that joesoap@somedomain.com is actually not a real person but a spam trap. ISPs now use this RBL to check each and every email coming in to their servers. If the sending server is a known spammer, the email is flagged as spam and deleted. The RBL is constantly updated as the spammers move their accounts to new ISPs or when they use a new Zombie PC to send spam, because they inevitably send spam to a honey pot address, causing that source of spam to be identified and blacklisted. Cozahost uses a RBL that blocks more than a billion spam messages per month for more than 200 million internet users. Less than 5% of spam reach our users inboxes. What to do with the spam once it is identifiedAs in the earlier scenario with your aunt's will, it is still possible that an innocent party might be identified as a spammer - because your Aunt may be completely unaware of the fact that her ISP is in cahoots with spammers and listed in a DNS RBL! All responsible ISPs must therefore make sure that their clients know when, how and why any email was intercepted. Of course a simpler method would have been to tell the sender directly that his email has been rejected, but the problem is that all spam email use false sender addresses, so the reject message will either go to the wrong person or will be completely undeliverable. Cozahost therefore sends a detailed daily
report to our clients listing all spam emails that were intercepted. In
addition to this daily report, our clients can
login online and
view a real-time report of spam as it is intercepted. |
||||||||||
| ..:: In closing ::.. | ||||||||||
|
ISP based spam filtering can not be 100% effective, because spam is a moving target. The most efficient way to deal with the problem is for ISPs to use server based filtering or blacklists, and end users to use efficient and effective filter software to eliminate the 5% - 10% spam that survives server (ISP) based filtering. Internet users have a responsibility too - to make sure that they do not present themselves or their contacts as targets for spammers, but most of all to take appropriate measures to ensure that their PCs are secure enough not to be turned into spamming Zombies. After all: the internet is a global community
in which we all must live and work - we have to rely on each other to be good
net citizens and responsible neighbors. |
||||||||||
| ..:: Quick links | ||||||||||
|
||||||||||
| ..:: Subscribers | ||||||||||
You can read previous issues or subscribe to this newsletter here... |
||||||||||
| ..::Goodbye! :-) | ||||||||||
|
||||||||||
(c) Cozahost 2004, All rights reserved.