wanted to know how it works! :-)'>

Cozahost Newsletter Archive

Previous issues

Contact us Cozahost Subscribe

Cozahost newsletter: 06 May 2004
Hi!

This newsletter is a bit longer than normal:

With the recent addition of our spam blocking feature, we found that many of you don't know it works. Even more surprising is that many of you wanted to know how it works! :-)

You are welcome to forward this newsletter to anyone you think may benefit.

..:: In This Issue ::..

Introduction
What is spam
Spam is a HUGE problem and growing...
How do the spammers do it?
A perfect storm
How ISPs (can) block spam
In closing
Subscribe to this newsletter
..:: Introduction ::..


Spam is still public enemy number one to most internet users - sapping billions of dollars in lost productivity and lost bandwidth from the word economy.

With this ever growing nuisance in mind, we published a in-depth practical guide a while back on how to avoid spam, and then we upped the ante by introducing a free service to our clients were we intercept upwards of 90% of spam before it reaches their inboxes.

Recently, virus writers and spam purveyors united in an unholy alliance where they use viruses and worms to spread pornography, "enlargement pills", and generally waste your time - making the spam problem bigger and more dangerous that ever before.

In this article we explain why it is so difficult to rid the internet of these leeches - and we show you how ISPs can (and do) fight back successfully.

The article makes the newsletter longer than usual, but we hope you will enjoy and benefit from the information. Our regular format continues next time..

..:: What is spam? ::..


If you don't know what spam is, then you are one lucky internet user!

On the other hand, you may already (or will soon) be receiving tens or even hundreds of emails with offers to enlarge a certain part of your anatomy, Viagra at discount, pornography and worse.

This is the work of spammers.

The "proper" name for these email advertisements is UCE - an abbreviation for Unsolicited Commercial Email.

"Spam" is actually a trademark name for a canned meat product. :-) But I digress: the point is that the word "Spam" is commonly used to refer to "advertisement" emails from people or companies you never heard of: in other words, it is unsolicited.

Virtually no internet email user will escape this problem - unless you take the necessary precautions.

..:: Spam is a HUGE problem - and growing ::..

To give you an idea of the magnitude of the problem: In 2003 approximately 1.5 trillion spam messages was sent. In the first quarter of 2004, that number jumped to 1.6 trillion...in 3 months!

The economic damage caused by spam (lost productivity and network congestion) was estimated to be between US$58 billion and US$78 billion in the first quarter of 2004 alone.

..:: How do the spammers do it? ::..


Spammers are in it for the money. They know that less than 0.001% of the emails they send will result in a lead, and perhaps less than 0.01% of the leads will eventually result in a sale.

Their answer: send 10 000 000 email messages to get 10 sales. Tomorrow, send another 10 000 000 emails and get another 10 sales...and so on, until some of the most notorious and largest spammers make more than US$ 1 million per year. (Remember that sending a million emails costs just about nothing.)

Their profit margins are typically infinite, because a "sale" to them means getting money from the customer: in most cases they never deliver the product itself.

Since there is a lot of money to be made by preying on the naiveté of internet users, these fraudsters can afford to spend time and money to hire programmers and technicians to make their operations difficult to trace.

All reputable Internet Service Providers (ISP) will terminate an user's account immediately if they are identified as a spammer, so the bad guys will typically get an internet access account from an ISP (using false information) and then push as much spam through that ISP as possible before their operation is detected and the account is closed. By the time their account is terminated, the spammer has already set up one or more new accounts (using false information) with the same or several other ISPs, so the spammer just moves on to the next account. In many cases this process of burn and run is automated by special software.

Of course spammers falsify sender email address and other details to make it more difficult for ISPs and law enforcement to find them, but the method most relied on to avoid identification is to abuse other (innocent) email servers in order to relay their junk mail. They find unsecured mail servers (open relays) by constantly scanning large blocks of internet network addresses, looking for mail servers that have not been properly secured.

If you think that finding these open relays must be a lengthy and complicated process for spammers, you would be wrong.

A typical user connecting to the internet with an ADSL or leased line will find that spammers locate their machines and probe for security holes within minutes after they connect.

Scanning for open relays and sending spam requires significant computing power and a thick network pipe, but the spammers found a simple way around that: they use other peoples computers and network connections. This dirty trick involves creating computer viruses that, once it penetrated a PC, will help the spammer to find open relays, or even send spam on behalf of the spammer. (It is estimated that there are more than 14 million computers on the internet that has been compromised in this way - and the number is climbing.)

The spammer controls his network of "Zombies" (PCs with the virus) as if it is one huge computer with virtually infinite computing power and oodles of bandwidth - all geared to pump junk into your inbox.

The final insult: imagine a spammer using your own virus infected PC to send you spam! :(

..:: A perfect storm ::..

The way email is transported on the internet is more than 30 years old. It dates back to the days when the internet was mainly used by universities to share information, there were a few thousand machines on the network at most, and it was unimaginable that a person could have his very own personal computer.

Since then the internet changed drastically: There are hundreds of millions of computers on the internet. Just about anybody can afford to buy a personal computer and connect it to the internet.

The internet is growing so fast that the network numbering scheme will have to be changed within a few years because we are on the point of running out of the 4 BILLION possible addresses.

Yet, amazingly, we are still using good old SMTP (Simple Mail Transfer Protocol) from the old, innocent days!

SMTP assumes that everyone on the network is basically trust worthy and it mandates that any server on the network must be able to send email to any other server (or person) on the network - just like the physical postal system works.

The number one strength of SMTP (universal connectivity) is also it's greatest weakness, because it allows spammers to send email anonymously and virtually untraceably. The very system that carries email on the internet, is the same system that is indirectly responsible for the huge amount of spam that threatens to destroy it.

At this point you may be wondering why the protocol is not simply replaced by a newer, more secure protocol? The answer to this question is depressingly simple: Installed base. There is an old information technology joke that goes:

Why could God create the earth and everything in 6 days?

Answer: Because he did not have an installed base.

No disrespect is intended to any religion - the analogy tries to explain that it is much easier to do a huge amount of work when you have the advantage of starting with a clean slate. Besides, many IT people believe they are directly related to the Almighty anyway ;-)

Millions upon tens of millions of mail servers on the Internet use SMTP to send email. To change the protocol without disrupting the majority of email flowing around the internet is a virtually impossible task.

Having said this, we are fast reaching a point where spam and viruses are so destructive to the very fabric of the Internet that changing SMTP (or replacing it with a new one) may be less painful than allowing this mess to continue - even if it means changing the software running on several million email servers...

We (all internet users) are between the devil and the deep blue sea - and the tide is coming in.

..:: How ISPs try to block spam ::..


All internet users suffer under the burden of spam, but ISPs have a direct practical and financial incentive to deal with the problem: a) they have to protect their very expensive bandwidth and b) their clients insist that their ISP do something to protect them from spam.

ISPs are therefore highly motivated to get rid of the problem, but they face these problems:

  • Spam cannot be identified by sender email address because it is forged
  • Spam cannot be identified by email subject because it changes constantly
  • The body of spam emails have random words and misspelled names to prevent easy detection. (For instance, Viagra is spelled V.iagra, vi.agra or viagr@, etc)
  • Spam is sent from millions of virus infected PCs so it is difficult to find and track spam servers
  • Even if one in one thousand emails are blocked incorrectly, it is an unacceptable error rate - so the ISP must be 100% sure that the email is in fact spam before intercepting it.

Quite a problem I'm sure you will agree.

To make matters worse (for the ISP): If the ISP blocks some emails and not others, how can his clients be sure they received all the (legitimate) email that was sent to them? Have you ever had to resolve a dispute where one party insist they sent the email and the other party insist they never received it? How can you trust an email system if you know it is blocking email based on tricks in some black box?

So what can the ISP do?

Keyword scanning

A popular (but not very sophisticated) way for ISPs to deal with spam is to look for keywords in a message.

For instance, if the email contains the word "viagra", then it is probably spam...or is it?

It is conceivable that one of their clients may want (or need to) discuss the merits of the medication with a friend - after all viagra is a legitimate and respected drug! Assuming that email is spam simply because it contains a reference to a trademark (owned by a company that has nothing to do with spam at all) is not acceptable.

ISPs can no longer use this method to reliably block spam.

Sender domain or address blacklisting

Many ISPs will intercept email when it comes from an email address that belongs to a known spammer.

This technique is known as email blacklisting, and it will block all messages that originate from a specific email address (or domain).

Often these blocks are effective, but only with nuisance spammers. The professional spammers change (forge) sender email address with every outgoing message, or at least with every spam run.

It is address blacklisting is effective in catching less than 0.5% of spam.

Bayesian filters

A more advanced way of content filtering is to look at all the words in a message - instead of just looking for a few specific words.

For instance, if the words "viagra", "order" and "free" appears in the same message then it is more likely that the message is spam. On the other hand, it might still be two friends discussing the drug, so a Bayesian filter looks at all the words of in the email and gives each of them a positive (spam) and negative (not spam) rating.

When the total rating exceeds a certain level, then the email is classified as spam.

For instance, when the message contains the first name of the person it is sent to and contains "neutral" words like "father" or "sick", then the spam rating decreases and the message may not be classified as spam. The idea is that one can calculate the probability that a message is spam by assigning a score to each of the words in the email, and then calculate a total probability for the whole message.

The filter can "learn" what is spam and what is not by example. Every time you designate a message as spam, the filter will take all the words in the message and assign a higher spam probability to them. Normal emails (not designated by you as spam) are also recorded and will reduce all words in the message's spam probability. Over time, the filter builds up a dictionary of spam and non-spam words; based on the normal email traffic an individual receives. (The filter is slightly more clever than this simple example as it uses sophisticated math and statistical theories to analyze probabilities)

Using this technology, the filter will not consider the word "viagra" as probable spam at all if you are a doctor that prescribes the medicine.

These filters are used very effectively to combat spam and is used in a number of software packages.

The first problem is that the filter takes a while to "train" and it is only effective when used on a personal basis...in other words every mail user needs to have his own filter customized to the email he receives and what he deems to be spam.

The second problem is that spammers also know how Bayesian filters work so they will fill the message with random words from a dictionary to confuse the filter and reduce the spam rating the message receives. In one case I even received spam with two jokes tacked onto the end in an effort by the spammer to avoid the filters. Cute. :(

The biggest drawback of Bayesian filters are that there is a very small chance that they will miss-classify a message as spam.

As we said before even one mistake in a thousand is too high, because it might just be your aunt asking you about Viagra for the Uncle. If you don't reply, she will assume that you are ignoring her and you are out of her will.

ISPs run a big risk if they use Bayesian filters that are not properly trained per individual user.

Real time DNS black lists (DNS RBL)

A real time blacklist (RBL) is a centrally maintained database of server addresses that have been positively identified as the source of spam.

It works like this: The blacklist maintainer investigate spam complaints and once a server has been positively implicated, it's address is added to the black list and the administrator of the server is notified that his server is now blacklisted. (Mail servers cannot hide their internet addresses, so spammers cannot falsify the information)

In addition to this manual process, the RBL provider publishes hundreds of thousands of email addresses (honey pots) where spammers can easily find them. Once a spammer sends and email to one of these email addresses, that server is immediately classified as a spam server. Sort of a high-tech real time trap for spammers. This technique works very well because the spammer has no way to know that joesoap@somedomain.com is actually not a real person but a spam trap.

ISPs now use this RBL to check each and every email coming in to their servers. If the sending server is a known spammer, the email is flagged as spam and deleted.

The RBL is constantly updated as the spammers move their accounts to new ISPs or when they use a new Zombie PC to send spam, because they inevitably send spam to a honey pot address, causing that source of spam to be identified and blacklisted.

Cozahost uses a RBL that blocks more than a billion spam messages per month for more than 200 million internet users. Less than 5% of spam reach our users inboxes.

What to do with the spam once it is identified

As in the earlier scenario with your aunt's will, it is still possible that an innocent party might be identified as a spammer - because your Aunt may be completely unaware of the fact that her ISP is in cahoots with spammers and listed in a DNS RBL!

All responsible ISPs must therefore make sure that their clients know when, how and why any email was intercepted.

Of course a simpler method would have been to tell the sender directly that his email has been rejected, but the problem is that all spam email use false sender addresses, so the reject message will either go to the wrong person or will be completely undeliverable.

Cozahost therefore sends a detailed daily report to our clients listing all spam emails that were intercepted. In addition to this daily report, our clients can login online and view a real-time report of spam as it is intercepted.

..:: In closing ::..


To summarize: Spam is a huge and growing problem and ISPs have a real role to play to reduce the amount of junk that reach their client's inboxes; but the ISP must behave responsibly (and with respect) when they must interfere with their clients email.

ISP based spam filtering can not be 100% effective, because spam is a moving target. The most efficient way to deal with the problem is for ISPs to use server based filtering or blacklists, and end users to use efficient and effective filter software to eliminate the 5% - 10% spam that survives server (ISP) based filtering.

Internet users have a responsibility too - to make sure that they do not present themselves or their contacts as targets for spammers, but most of all to take appropriate measures to ensure that their PCs are secure enough not to be turned into spamming Zombies.

After all: the internet is a global community in which we all must live and work - we have to rely on each other to be good net citizens and responsible neighbors.

..:: Quick links


Here are some quick links to CozaHost services and products:

About us - Background information on Cozahost: who we are, why we are here and what we aim to do. Contact us - Use this link if you need to contact us for help, advice or support.
Register a domain name - Get an instant no obligation quote to register a domain name. (With optional email or dialup access.) Modem, ISDN or ADSL - Cozahost offers modem, ISDN and ADSL internet access at heavily discounted rates to our clients.
About web hosting - Article on how business can use a web site to gain new customers or become more competitive. Fax to email service - Fax to email service to receive your faxes privately, hassle free and anywhere in the world.

..:: Subscribers

You can read previous issues or subscribe to this newsletter here...
..::Goodbye! :-)


Wishing you happy, safe and productive computing - till next time. :-)

(c) Cozahost 2004, All rights reserved.


Cozahost Newsletter archive   Cozahost Home   Contact us