web hosting and domain names registration in South Africa

Since 1999


 

 
I tried registering a domain on another sa hosting site and had the worst service. Comparatively, cozahost is 100 times better and more efficient. Well done, I will be using this site in future for more registrations.

Meleni Moodley

New Page 1

How ISPs (can) block spam

ISPs can (and do) play a large role in limiting the amount of spam received by their clients - but keeping the junk out is a more difficult task than most people know.

Cozahost employs sophisticated software to protect our clients from spam.

This article explains why and how we do it.

(For practical tips and advice on how to avoid spam, please see the article "Seems like you volunteered to receive spam?")

What is spam?

If you don't know what spam is, then you are a very lucky internet user! 

On the other hand, you may already (or will soon) be receiving tens or even hundreds of emails with offers to enlarge a certain part of your anatomy, Viagra at discount, pornography and worse.

This is the work of spammers.

The "proper" name for these email advertisements is UCE - an abbreviation for Unsolicited Commercial Email. 

"Spam" is actually a trademark name for a canned meat product. :-) But I digress:  the point is that the word "Spam" is commonly used to refer to "advertisement" emails from people or companies you never heard of: in other words, it is unsolicited. 

Virtually no internet email user will escape this problem - unless you take the necessary precautions.

Spam is a HUGE problem - and growing

To give you an idea of the magnitude of the problem:  In 2003 approximately 1.5 trillion spam messages was sent.  In the first quarter of 2004, that number jumped to 1.6 trillion...in 3 months!

The economic damage caused by spam (lost productivity and network congestion) was estimated to be between US$58 billion and US$78 billion in the first quarter of 2004 alone.

At this time (January 2008), Cozahost is blocking on average of 45 000 spam emails per day!  If it takes just two seconds to download and delete an spam message: imagine the time that would have been wasted dealing with this junk. 

How do the spammers do it?

Spammers are in it for the money.  They know that less than 0.001% of the emails they send will result in a lead, and perhaps less than 0.01% of the leads will eventually result in a sale. 

Their answer:  send 10 million email messages to get 10 sales.  Tomorrow, send another 10 million emails and get another 10 sales...and so on, until some of the most notorious and largest spammers make more than US$ 1 million per year.  (Remember that sending a million emails costs just about nothing.)

Their profit margins are typically infinite, because a "sale" to them means getting money from the customer: in most cases they never deliver the product itself. 

Since there is a lot of money to be made by preying on the naiveté of internet users, these fraudsters can afford to spend time and money to hire programmers and technicians to make their operations difficult to trace.

All reputable Internet Service Providers (ISP) will terminate an user's account immediately if they are identified as a spammer, so the bad guys will typically get an internet access account from an ISP (using false information) and then push as much spam through that ISP as possible before their operation is detected and the account is closed.  By the time their account is terminated, the spammer has already set up one or more new accounts (using false information) with the same or several other ISPs, so the spammer just moves on to the next account.  In many cases this process of burn and run is automated by special software.

Of course spammers falsify sender email address and other details to make it more difficult for ISPs and law enforcement to find them, but the method most relied on to avoid identification is to abuse other (innocent) email servers in order to relay their junk mail.  They find unsecured mail servers (open relays) by constantly scanning large blocks of internet network addresses, looking for mail servers that have not been properly secured.

If you think that finding these open relays must be a lengthy and complicated process for spammers, you would be wrong. 

A typical user connecting to the internet with an ADSL or leased line will find that spammers locate their machines and probe for security holes within minutes after they connect.

Scanning for open relays and sending spam requires significant computing power and a thick network pipe, but the spammers found a simple way around that:  they use other peoples computers and network connections.  This dirty trick involves creating computer viruses that, once it penetrated a PC, will help the spammer to find open relays, or even send spam on behalf of the spammer.  (It is estimated that there are more than 14 million computers on the internet that has been compromised in this way - and the number is climbing.) 

The spammer controls his network of  "Zombies" (PCs with the virus) as if it is one huge computer with virtually infinite computing power and oodles of bandwidth - all geared to pump junk into your inbox. 

The final insult: imagine a spammer using your own virus infected PC to send you spam! :(

A perfect storm

The way email is transported on the internet is more than 30 years old.  It dates back to the days when the internet was mainly used by universities to share information, there were a few thousand machines on the network at most, and it was unimaginable that a person could have his very own personal computer.

Since then the internet changed drastically:  There are hundreds of millions of computers on the internet.  Just about anybody can afford to buy a personal computer and connect it to the internet. 

The internet is growing so fast that the network numbering scheme will have to be changed within a few years because we are on the point of running out of the 4 BILLION possible addresses.

Yet, amazingly, we are still using good old SMTP (Simple Mail Transfer Protocol) from the old, innocent days!

SMTP assumes that everyone on the network is trust worthy.  It mandates that any server on the network must be able to send email to any other server (or person) on the network - just like the physical postal system works.

The number one strength of SMTP (universal connectivity) is also it's greatest weakness, because it allows spammers to send email anonymously and virtually untraceably.  The very system that carries email on the internet, is the same system that is indirectly responsible for the huge amount of spam that threatens to destroy it.

At this point you may be wondering why the protocol is not simply replaced by a newer, more secure protocol?  The answer to this question is depressingly simple:  Installed base.  There is an old information technology joke that goes:

Why could God create the earth and everything in 6 days?

Answer: Because he did not have an installed base.

No disrespect is intended to any religion - the analogy tries to explain that it is much easier to do a huge amount of work when you have the advantage of starting with a clean slate.  Besides, many IT people believe they are directly related to the Almighty anyway ;-)

Millions upon tens of millions of mail servers on the Internet use SMTP to send email.  To change the protocol without disrupting the majority of email flowing around the internet is a virtually impossible task.

Having said this, we are fast reaching a point where spam and viruses are so destructive to the very fabric of the Internet that changing SMTP (or replacing it with a new one) may be less painful than allowing this mess to continue - even if it means changing the software running on several million email servers...

We (all internet users) are between the devil and the deep blue sea - and the tide is coming in.

How ISPs try to block spam

All internet users suffer under the burden of spam, but ISPs have a direct practical and financial incentive to deal with the problem: a) they have to protect their very expensive bandwidth and b) their clients insist that their ISP do something to protect them from spam.

ISPs are therefore highly motivated to get rid of the problem, but they face these problems:

  • Spam cannot be identified by sender email address because it is forged
  • Spam cannot be identified by email subject because it changes constantly
  • The body of spam emails have random words and misspelled names to prevent easy detection.  (For instance, Viagra is spelled V.iagra, vi.agra or viagr@, etc)
  • Spam is sent from millions of virus infected PCs so it is difficult to find and track spam servers
  • Even if one in one thousand emails are blocked incorrectly, it is an unacceptable error rate - so the ISP must be 100% sure that the email is in fact spam before intercepting it.

Quite a problem I'm sure you will agree.

To make matters worse (for the ISP):  If the ISP blocks some emails and not others, how can his clients be sure they received all the (legitimate) email that was sent to them?  Have you ever had to resolve a dispute where one party insist they sent the email and the other party insist they never received it?  How can you trust an email system if you know it is blocking email based on tricks in some black box?

So what can the ISP do?

Keyword scanning

A popular (but not very sophisticated) way for ISPs to deal with spam is to look for keywords in a message. 

For instance, if the email contains the word "viagra", then it is probably spam...or is it? 

It is conceivable that one of their clients may want (or need to) discuss the merits of the medication with a friend - after all viagra is a legitimate and respected drug!  Assuming that email is spam simply because it contains a reference to a trademark (owned by a company that has nothing to do with spam at all) is not acceptable.

ISPs can no longer use this method to reliably block spam.

Sender domain or address blacklisting

Many ISPs will intercept email when it comes from an email address that belongs to a known spammer.

This technique is known as email blacklisting, and it will block all messages that originate from a specific email address (or domain).

Often these blocks are effective, but only with nuisance spammers.  The professional spammers change (forge) sender email address with every outgoing message, or at least with every spam run.

It is address blacklisting is effective in catching less than 0.5% of spam.

Bayesian filters

A more advanced way of content filtering is to look at all the words in a message - instead of just looking for a few specific words.

For instance, if the words "viagra", "order" and "free" appears in the same message then it is more likely that the message is spam.  On the other hand, it might still be two friends discussing the drug, so a Bayesian filter looks at all the words of in the email and gives each of them a positive (spam) and negative (not spam) rating.

When the total rating exceeds a certain level, then the email is classified as spam.

For instance, when the message contains the first name of the person it is sent to and contains "neutral" words like "father" or "sick", then the spam rating decreases and the message may not be classified as spam.  The idea is that one can calculate the probability that a message is spam by assigning a score to each of the words in the email, and then calculate a total probability for the whole message.

The filter can "learn" what is spam and what is not by example.  Every time you designate a message as spam, the filter will take all the words in the message and assign a higher spam probability to them.  Normal emails (not designated by you as spam) are also recorded and will reduce all words in the message's spam probability.  Over time, the filter builds up a dictionary of spam and non-spam words; based on the normal email traffic an individual receives.  (The filter is slightly more clever than this simple example as it uses sophisticated math and statistical theories to analyze probabilities)

Using this technology, the filter will not consider the word "viagra" as probable spam at all if you are a doctor that prescribes the medicine.

These filters are used very effectively to combat spam and is used in a number of software packages.

The first problem is that the filter takes a while to "train" and it is only effective when used on a personal basis...in other words every mail user needs to have his own filter customized to the email he receives and what he deems to be spam.

The second problem is that spammers also know how Bayesian filters work so they will fill the message with random words from a dictionary to confuse the filter and reduce the spam rating the message receives.  In one case I even received spam with two jokes tacked onto the end in an effort by the spammer to avoid the filters.  Cute. :(

The biggest drawback of Bayesian filters are that there is a very small chance that they will miss-classify a message as spam. 

As we said before even one mistake in a thousand is too high, because it might just be your aunt asking you about Viagra for the Uncle.  If you don't reply, she will assume that you are ignoring her and you are out of her will.

ISPs run a big risk if they use untrained Bayesian filters ONLY - in other words not in combination with other tests.

Reverse DNS

A reverse DNS check is will determine whether the sender server's IP number has a friendly name attached to it.

It is an internet standard that email servers should have a reverse lookup - in other words: their IP number must translate to a friendly name.  Since this is easy to set up, a sender server without a reverse lookup comes (at best) from an incompetent ISP, but, more likely from a internet user who's computer is compromised by malware (virus / worm) or, of course, a spammer.

Real time DNS black lists (DNS RBL)

A real time blacklist (RBL) is a centrally maintained database of server addresses that have been positively identified as the source of spam.

It works like this:  The blacklist maintainer investigate spam complaints and once a server has been positively implicated, it's address is added to the black list and the administrator of the server is notified that his server is now blacklisted.  (Mail servers cannot hide their internet addresses, so spammers cannot falsify the information)

In addition to this manual process, the RBL provider publishes hundreds of thousands of email addresses (honey pots) where spammers can easily find them.  Once a spammer sends and email to one of these email addresses, that server is immediately classified as a spam server.  Sort of a high-tech real time trap for spammers.  This technique works very well because the spammer has no way to know that joesoap@somedomain.com is actually not a real person but a spam trap.

ISPs now use this RBL to check each and every email coming in to their servers.  If the sending server is a known spammer, the email is flagged as spam and deleted.

The RBL is constantly updated as the spammers move their accounts to new ISPs or when they use a new Zombie PC to send spam, because they inevitably send spam to a honey pot address, causing that source of spam to be identified and blacklisted. 

Cozahost uses a RBL that blocks more than a billion spam messages per month for more than 200 million internet users.  Less than 5% of spam reach our users inboxes.

(You can read more about DNS RBL here...)

Grey listing

Spammers depend volume: in order for their "business model" to work, they must pump out millions of messages per day - and this is their biggest weakness: they must send large volumes of email in order to survive.

The standard SMTP protocol allows for delivery of messages to be retried.  In the "old days" it was often not possible to deliver email on the first try.  The receiving server might have been busy, offline for a while or the network might have been congested.  The standard approach is therefore for all mail servers to retry delivery if it does not succeed on the first try.

For instance: If the server cannot deliver a message, it will retry 10 minutes later.  If it still fails, it will try a hour later; then 3 hours and so on...until it gives up 12 or 24 hours later.  When delivery eventually fails, the email is returned to the sender.

Grey listing leverages this protocol definition to trap spam:  Our incoming mail servers will refuse an initial connection from an unknown server.  When that server retries delivery (as all standard, legitimate servers will), our servers accept the connection and take delivery of the email.  For the next 30 days, that server will not be subject to grey listing - in other words, we will accept a connection the first time.

The spam server on the other hand, cannot afford to retry millions of messages - because it means that their sending speed is at least halved.  Besides that, by the time they resend, their servers are probably listed in a RBL already.  At a minimum we significantly disrupt their operations, and at a maximum we effectively block their spam.

The only downside of grey listing is that legitimate servers are delayed too.  Fortunately the delay is only a few minutes for one message once a month...a small price to pay for virtually spam free email we think.

(You can read more about grey listing here...)

SPF (Sender Policy Framework)

As we discussed earlier, spammers routinely forge their sender addresses.  The Viagra offer is almost definitely not from BillGates@microsoft.com! :-) 

While there is no way (using SMTP - the standard mail sending protocol) to verify the authenticity of a sender address - there is another way:

Every domain on the internet can (it is optional) publish information in their DNS to specify the servers that are authorized to send email on their behalf.  For instance, Microsoft will publish a list of IP numbers of their (legitimate) mail servers:

When we receive email alleging to come from Microsoft, we ask the Microsoft domain servers if that server (the one currently talking to us) is allowed to send email from @microsoft.com.  If the Microsoft domain servers say no - then we know that it is a spammer (or crook) trying to forge his sender address...and we refuse to accept the email.

(You can read more about the spf project here...)

Cozahost spam blocking

In summary: At Cozahost we use a 5 step approach to eliminate 99.9% of spam - with zero false positives:

1. We use Real Time Blacklists to refuse delivery of email

2. We use grey listing to frustrate spam servers

3. We use SPF where possible to refuse delivery from forged senders.  (When a sender's SPF checks are correct, the message is considered to be legitimate)

All of the methods above refuse delivery of the email - in other words: email is never deleted - it is returned to the sender as undeliverable. 

This is a very important point because our clients are 100% sure that we will never delete email - in other words, no email, once accepted into our system, will go missing.  

After the first three screenings are done, we continue to root out spam:

4. We use Bayesian filters to flag suspect messages

5. We use reverse DNS

If a message fails both Bayesian filters and a reverse DNS check, it is flagged as probable spam and moved to the user's junk mail folder. (Accessible via the email web interface).  Because we already accepted the message onto our servers, we cannot delete it.

Each and every user on our network can change their individual spam settings - they can even switch it off altogether.

(Note: As from Outlook 2002, the software included local spam filtering technology.  This spam filter may flag items in your inbox as spam and move it to the junk mail folder.  Your ISP has no control over this - so regularly check your junk mail folder and make sure you set up Outlook correctly. More info from Microsoft on handling spam using Outlook here...)

Conclusion

To summarize: Spam is a huge and growing problem and ISPs have a real role to play to reduce the amount of junk that reach their client's inboxes; but the ISP must behave responsibly (and with respect) when they must interfere with their clients email.

ISP based spam filtering can not be 100% effective, because spam is a moving target.  The most efficient way to deal with the problem is for ISPs to use server based filtering or blacklists, and end users to use efficient and effective filter software to eliminate the 1% spam that survives server (ISP) based filtering.

Internet users have a responsibility too - to make sure that they do not present themselves or their contacts as targets for spammers, but most of all to take appropriate measures to ensure that their PCs are secure enough not to be turned into spamming Zombies.

After all: the internet is a global community in which we all must live and work - we have to rely on each other to be good net citizens and responsible neighbors.

Do you want more quality  information like this?

You will find more of the same in the Cozahost newsletter.

About the author

This article was compiled by Cozahost for our free newsletter.

Please note that all rights are reserved for this article but you may copy and publish this article on your web site provided that you make no changes to the page at all - and that includes all of the hyperlinks and this notice.  We ask that you contact us if you are re-publishing this article on your web site so that we can notify you when we update the article.


(c) Cozahost, 2006. All rights reserved.
Use our site map to find information or please contact us if you have any questions.