New Page 1
How ISPs (can) block spam
ISPs can (and do) play a large role in limiting
the amount of spam received by their clients - but keeping the junk out is a
more difficult task than most people know.
Cozahost employs sophisticated software to
protect our clients from spam.
This article explains why and how we do it.
(For practical tips and advice on how to avoid
spam, please see the article "Seems
like you volunteered to receive spam?")
What is spam?
If you don't know what spam is, then you are a
very lucky internet user!
On the other hand, you may already (or will
soon) be receiving tens or even hundreds of emails with offers to enlarge a
certain part of your anatomy, Viagra at discount, pornography and worse.
This is the work of spammers.
The "proper" name for these email
advertisements is UCE - an abbreviation for Unsolicited Commercial Email.
"Spam" is actually a trademark name for a
canned meat product. :-) But I digress: the point is that the word "Spam" is commonly used to
refer to "advertisement" emails from people or companies you never heard of: in
other words, it is unsolicited.
Virtually no internet email user will escape
this problem - unless you take the
necessary precautions.
Spam is a HUGE problem - and growing
To give you an idea of the magnitude of the
problem: In 2003 approximately 1.5 trillion spam messages was sent.
In the first quarter of 2004, that number jumped to 1.6 trillion...in 3
months!
The economic damage caused by spam (lost
productivity and network congestion) was estimated to be between US$58 billion
and US$78 billion in the first quarter of 2004 alone.
At this time (January 2008), Cozahost is
blocking on average of 45 000 spam emails per day! If it takes just
two seconds to download and delete an spam message: imagine the time that would
have been wasted dealing with this junk.
How do the spammers do it?
Spammers are in it for the money. They
know that less than 0.001% of the emails they send will result in a lead, and
perhaps less than 0.01% of the leads will eventually result in a sale.
Their answer: send 10 million email
messages to get 10 sales. Tomorrow, send another 10 million emails and get
another 10 sales...and so on, until some of the most notorious and largest spammers make
more than US$ 1 million per year. (Remember that sending a million emails
costs just about nothing.)
Their profit margins are typically infinite,
because a "sale" to them means getting money from the customer: in most cases
they never deliver the product itself.
Since there is a lot of money to be made by
preying on the naiveté of internet users, these fraudsters can afford to spend
time and money to hire programmers and technicians to make their operations
difficult to trace.
All reputable Internet Service Providers (ISP)
will terminate an user's account immediately if they are identified as a spammer,
so the bad guys will typically get an internet access account from an ISP (using
false information) and then push as much spam through that ISP as possible
before their operation is detected and the account is closed. By the time
their account is terminated, the spammer has already set up one or more new accounts (using false
information) with the same or several other ISPs, so the spammer just moves on
to the next account. In many cases this process of burn and run is
automated by special software.
Of course spammers falsify sender email address
and other details to make it more difficult for ISPs and law
enforcement to find them, but the method most relied on to avoid identification
is to abuse other (innocent) email servers in order to relay their junk mail. They
find unsecured mail servers (open relays) by constantly scanning large blocks of
internet network addresses, looking for mail servers that have not been properly
secured.
If you think that finding these open relays
must be a lengthy and complicated process for spammers, you would be wrong.
A typical user connecting to the internet with
an ADSL or leased line will find that spammers locate their machines and probe
for security holes within minutes after they connect.
Scanning for open relays and sending spam requires significant computing
power and a thick network pipe, but the spammers found a simple way around that:
they use other peoples computers and network connections. This dirty trick
involves creating computer viruses that, once it penetrated a PC, will help the
spammer to find open relays, or even send spam on behalf of the spammer.
(It is estimated that there are more than 14 million computers on the internet
that has been compromised in this way - and the number is climbing.)
The spammer controls his network of
"Zombies" (PCs with the virus) as if it is one huge computer with virtually
infinite computing power and oodles of bandwidth - all geared to pump junk into
your inbox.
The final insult: imagine a spammer
using your own virus infected PC to send you spam! :(
A perfect storm
The way email is transported on the internet is
more than 30 years old. It dates back to the days when the internet was
mainly used by universities to share information, there were a few thousand
machines on the network at most, and it was unimaginable that a person could
have his very own personal computer.
Since then the internet changed drastically:
There are hundreds of millions of computers on the internet. Just about
anybody can afford to buy a personal computer and connect it to the internet.
The internet is growing so fast that the
network numbering scheme will have to be changed within a few years because we
are on the point of running out of the 4 BILLION possible addresses.
Yet, amazingly, we are still using good old SMTP (Simple
Mail Transfer Protocol) from the old, innocent days!
SMTP assumes that everyone on the
network is trust worthy. It mandates that any server on the network
must be able to send email to any other server (or person) on the network - just like the
physical postal system works.
The number one strength of SMTP (universal
connectivity) is also it's
greatest weakness, because it allows spammers to send email anonymously and
virtually untraceably. The very system that carries email on the internet,
is the same system that is indirectly responsible for the huge amount of spam
that threatens to destroy it.
At this point you may be wondering why the
protocol is not simply replaced by a newer, more secure protocol? The
answer to this question is depressingly simple: Installed base.
There is an old information technology joke that goes:
Why could God create the earth and
everything in 6 days?
Answer: Because he did not have an installed
base.
No disrespect is intended to any religion - the
analogy tries to explain that it is much easier to do a huge amount of work when
you have the advantage of starting with a clean slate. Besides, many IT
people believe they are directly related to the Almighty anyway ;-)
Millions upon tens of millions of mail servers on the
Internet use SMTP to send email. To change the protocol without disrupting the majority
of email flowing around the internet is a virtually impossible task.
Having said this, we are fast reaching a point
where spam and viruses are so destructive to the very fabric of the Internet
that changing SMTP (or replacing it with a new one) may be less painful than
allowing this mess to continue - even if it means changing the software running
on several million email servers...
We (all internet users) are between the devil
and the deep blue sea - and the tide is coming in.
How ISPs try to block spam
All internet users suffer under the burden of
spam, but ISPs have a direct practical and financial incentive to deal with the
problem: a) they have to protect their very expensive bandwidth and b) their
clients insist that their ISP do something to protect them from spam.
ISPs are therefore highly motivated to get rid
of the problem, but they face these problems:
- Spam cannot be identified by sender email
address because it is forged
- Spam cannot be identified by email subject
because it changes constantly
- The body of spam emails have random words
and misspelled names to prevent easy detection. (For instance, Viagra is
spelled V.iagra, vi.agra or viagr@, etc)
- Spam is sent from millions of virus infected
PCs so it is difficult to find and track spam servers
- Even if one in one thousand emails are
blocked incorrectly, it is an unacceptable error rate - so the ISP must be
100% sure that the email is in fact spam before intercepting it.
Quite a problem I'm sure you will agree.
To make matters worse (for the ISP): If
the ISP blocks some emails and not others, how can his clients be sure they
received all the (legitimate) email that was sent to them? Have you ever
had to resolve a dispute where one party insist they sent the email and the other party insist they
never received it? How can you trust an email system if you know it is
blocking email based on tricks in some black box?
So what can the ISP do?
Keyword scanning
A popular (but not very sophisticated) way for ISPs to deal with spam is to
look for keywords in a message.
For instance, if the email contains the
word "viagra", then it is probably spam...or is it?
It is conceivable that one of their clients may
want (or need to) discuss the merits of the medication with a friend - after all
viagra is a legitimate and respected drug! Assuming that email is spam
simply because it contains a reference to a trademark (owned by a company that has nothing to
do with spam at all) is not acceptable.
ISPs can no longer use this method to reliably
block spam.
Sender domain or address blacklisting
Many ISPs will intercept email when it comes
from an email address that belongs to a known spammer.
This technique is known as email blacklisting,
and it will block all messages that originate from a specific email address (or
domain).
Often these blocks are effective, but only with
nuisance spammers. The professional spammers change (forge) sender email
address with every outgoing message, or at least with every spam run.
It is address blacklisting is effective in catching less than 0.5% of
spam.
A more advanced way of content filtering is to
look at all the words in a message - instead of just looking for a few
specific words.
For instance, if the words "viagra", "order" and
"free" appears in the same message then it is more likely that the message is
spam. On the other hand, it might still be two friends discussing the
drug, so a Bayesian filter looks at all the words of in the email and gives each
of them a positive (spam) and negative (not spam) rating.
When the total rating exceeds a certain level, then
the email is classified as spam.
For instance, when the message contains the
first name of the person it is sent to and contains "neutral" words like "father" or "sick",
then the spam rating decreases and the message may not be classified as spam.
The idea is that one can calculate the probability that a message is spam by
assigning a score to each of the words in the email, and then calculate a total
probability for the whole message.
The filter can "learn" what is spam and what is
not by example. Every time you designate a message as spam, the filter
will take all the words in the message and assign a higher spam probability to
them. Normal emails (not designated by you as spam) are also recorded and
will reduce all words in the message's spam probability. Over time, the
filter builds up a dictionary of spam and non-spam words; based on the normal
email traffic an individual receives. (The filter is
slightly more clever than this simple example as it uses sophisticated math and
statistical theories to analyze probabilities)
Using this technology, the filter will not
consider the word "viagra" as probable spam at all if you are a doctor that
prescribes the medicine.
These filters are used very effectively to
combat spam and is used in a number of software packages.
The first problem is that the filter takes a
while to "train" and it is only effective when used on a personal basis...in
other words every mail user needs to have his own filter customized to the email
he receives and what he deems to be spam.
The second problem is that spammers also know
how Bayesian filters work so they will fill the message with random words from a
dictionary to confuse the filter and reduce the spam rating the message
receives. In one case I even received spam with two jokes tacked onto the
end in an effort by the spammer to avoid the filters. Cute. :(
The biggest drawback of Bayesian filters are
that there is a very small chance that they will miss-classify a message as
spam.
As we said before even one mistake in a
thousand is too high, because it might just be your aunt asking you about Viagra
for the Uncle. If you don't reply, she will assume that you are ignoring
her and you are out of her will.
ISPs run a big risk if they use untrained Bayesian
filters ONLY - in other words not in combination with other tests.
A reverse DNS check is will determine whether
the sender server's IP number has a friendly name attached to it.
It is an internet standard that email servers
should have a reverse lookup - in other words: their IP number must translate to
a friendly name. Since this is easy to set up, a sender server without a
reverse lookup comes (at best) from an incompetent ISP, but, more likely from a
internet user who's computer is compromised by
malware
(virus / worm) or, of course, a spammer.
A real time blacklist (RBL) is a centrally
maintained database of server addresses that have been positively identified as
the source of spam.
It works like this: The blacklist
maintainer investigate spam complaints and once a server has been positively
implicated, it's address is added to the black list and the administrator of the
server is notified that his server is now blacklisted. (Mail servers
cannot hide their internet addresses, so spammers cannot falsify the
information)
In addition to this manual process, the RBL
provider publishes hundreds of thousands of email addresses (honey pots) where
spammers can easily find them. Once a spammer sends and email to one of
these email addresses, that server is immediately classified as a spam server.
Sort of a high-tech real time trap for spammers. This technique works very
well because the spammer has no way to know that
joesoap@somedomain.com is actually
not a real person but a spam trap.
ISPs now use this RBL to check each and every
email coming in to their servers. If the sending server is a known
spammer, the email is flagged as spam and deleted.
The RBL is constantly updated as the spammers
move their accounts to new ISPs or when they use a new Zombie PC to send spam,
because they inevitably send spam to a honey pot address, causing that source of
spam to be identified and blacklisted.
Cozahost uses a RBL that blocks more than a billion
spam messages per month for more than 200 million internet users. Less
than 5% of spam reach our users inboxes.
(You can read more about DNS RBL
here...)
Grey listing
Spammers depend volume: in order for their
"business model" to work, they must pump out millions of messages per day - and
this is their biggest weakness: they must send large volumes of email in order
to survive.
The standard SMTP protocol allows for delivery
of messages to be retried. In the "old days" it was often not possible to
deliver email on the first try. The receiving server might have been busy,
offline for a while or the network might have been congested. The standard
approach is therefore for all mail servers to retry delivery if it does not
succeed on the first try.
For instance: If the server cannot deliver a
message, it will retry 10 minutes later. If it still fails, it will try a
hour later; then 3 hours and so on...until it gives up 12 or 24 hours later.
When delivery eventually fails, the email is returned to the sender.
Grey listing leverages this protocol definition
to trap spam: Our incoming mail servers will refuse an initial connection
from an unknown server. When that server retries delivery (as all
standard, legitimate servers will), our servers accept the connection and take
delivery of the email. For the next 30 days, that server will not be
subject to grey listing - in other words, we will accept a connection the first
time.
The spam server on the other hand, cannot
afford to retry millions of messages - because it means that their sending speed
is at least halved. Besides that, by the time they resend, their servers
are probably listed in a RBL already. At a minimum we
significantly disrupt their operations, and at a maximum we effectively block
their spam.
The only downside of grey listing is that
legitimate servers are delayed too. Fortunately the delay is only a few
minutes for one message once a month...a small price to pay for virtually spam
free email we think.
(You can read more about grey listing
here...)
SPF (Sender Policy Framework)
As we discussed earlier, spammers routinely
forge their sender addresses. The Viagra offer is almost definitely not
from BillGates@microsoft.com! :-)
While there is no way (using SMTP - the
standard mail sending protocol) to verify the authenticity of a sender address -
there is another way:
Every domain on the internet can (it is
optional) publish information in their DNS to specify the servers that are
authorized to send email on their behalf. For instance, Microsoft will
publish a list of IP numbers of their (legitimate) mail servers:
When we receive email alleging to come from
Microsoft, we ask the Microsoft domain servers if that server (the one currently
talking to us) is allowed to send email from @microsoft.com. If the
Microsoft domain servers say no - then we know that it is a spammer (or crook)
trying to forge his sender address...and we refuse to accept the email.
(You can read more about the spf project
here...)
In summary: At Cozahost we use a 5 step
approach to eliminate 99.9% of spam - with zero false positives:
1. We use Real Time Blacklists
to refuse delivery of email
2. We use grey listing
to frustrate spam servers
3. We use SPF where possible
to refuse delivery from forged senders. (When a sender's SPF checks are
correct, the message is considered to be legitimate)
All of the methods above refuse delivery of the
email - in other words: email is never deleted - it is returned to the sender as
undeliverable.
This is a very
important point because our clients are 100% sure that
we will never delete
email - in other words, no email, once accepted into our system, will go
missing.
After the first three screenings are done, we
continue to root out spam:
4. We use Bayesian filters
to flag suspect messages
5. We use reverse DNS
If a message fails both Bayesian filters
and a reverse DNS check, it is flagged as probable spam and moved to the
user's junk mail folder. (Accessible via the email web interface). Because
we already accepted the message onto our servers, we cannot delete it.
Each and every user on our network can change
their individual spam settings - they can even switch it off altogether.
(Note: As from Outlook 2002, the software
included local spam filtering technology. This spam filter may flag items
in your inbox as spam and move it to the junk mail folder. Your ISP has no
control over this - so regularly check your junk mail folder and make sure you
set up Outlook correctly. More info from Microsoft on handling spam using
Outlook
here...)
Conclusion
To summarize: Spam is a huge and growing
problem and ISPs have a real role to play to reduce the amount of junk that
reach their client's inboxes; but the ISP must behave responsibly (and with
respect) when they must interfere with their clients email.
ISP based spam filtering can not be 100%
effective, because spam is a moving target. The most efficient way to deal
with the problem is for ISPs to use server based filtering or blacklists, and
end users to use efficient and effective filter software to eliminate the 1% spam that survives server (ISP) based filtering.
Internet users have a responsibility too - to
make sure that they do not present
themselves or their contacts as targets for spammers, but most of all to
take appropriate measures to ensure that their PCs are secure enough not to be
turned into spamming Zombies.
After all: the internet is a global community
in which we all must live and work - we have to rely on each other to be good
net citizens and responsible neighbors.
Do you want more quality information like
this?
You will find more of the same in the
Cozahost newsletter.
About the author
This article was compiled by
Cozahost for our free
newsletter.
Please note that all rights are
reserved for this article but you may copy and publish this article on your web
site provided that you make no changes to the page at all - and that includes
all of the hyperlinks and this notice. We ask that you
contact us if you are
re-publishing this article on your web site so that we can notify you when we
update the article.
 |