Tuesday, November 13, 2007

The War Against Spam

In my earlier post about IT security, I described the Cold War between hackers/crackers/spammers and IT departments. Spam control is one of my most challenging battlefields.

Whenever I speak about security, I describe it as a Cold War between hackers/crackers/spammers and Information Technology departments. Spam control is one of our most challenging battlefields.

At BIDMC, we receive an average of 886,674 emails every day from the internet. We deliver 57,103 of these, meaning that 829,751 of these are Spam. This translates into 302,859,115 Spam per year or over a third of a BILLION Spam.

There are many commercial products on the market that can help with this problem. At BIDMC and Harvard Medical School we use Symantec Brightmail Anti-Spam Version 6.0. Here's the challenge - it's not easy to distinguish legitimate clinical email from advertising. In a medical environment our clinicians describe anatomy, medications, and diagnoses that might be the same key words used in emails which advertise herbals to enlarge your body parts. Suppose that our filters are tuned so tightly that all Spam is eliminated but also 1% of legitimate email is also blocked. The cost of this solution would be that 208,425 legitimate emails per year would be undelivered. Conversely, suppose our Spam filters are relaxed so that no legitimate email is blocked but also 1% of Spam gets through. The cost of this solution is that 3 million Spam make it to our inboxes every year.

The balance between false positives (blocking legitimate email) and false negatives (letting Spam through) is quite challenging and requires continuous updating of our Spam filtering techniques. We blacklist known spamming sites. We whitelist sites which send emails about anatomical parts, but are known clinical partners. We have a Spam Feedback mailbox which provides continuous feedback to Brightmail. We use Exchange and Outlook rules to automatically move Spam into folders. We block all ZIP files from the internet but notify recipients that an email containing a ZIP was received and blocked.

Two types of Spam still get through

1. Spammers embed graphics of advertisements instead of text. Since computers cannot read graphics, we cannot filter them

2. Spammers use words that are not unique i.e. "enhance your being a male" that cannot be filtered without removing legitimate email

At present, using Brightmail and the other techniques described above, we block 99% of all Spam (one third of a BILLION) and deliver nearly 100% of legitimate email, allowing 3 million Spam per year to land in our mailboxes but ensuring our doctors and staff get the mission critical email they need to deliver good care. We'll continue to enhance our Spam filtering systems, but you can still expect some Spam to get through. As fast as we innovate, spammers innovate, creating a continuous battle against Spam.

The ultimate answer may be that the internet email infrastructure itself needs to be revised to deny all email traffic except that which is specifically whitelisted by email servers and users. Earthlink and other ISPs have used this approach. It's a bit irritating for the sender who is told that email will not be received until the recipient approves the sender. It's a hassle for the recipient who has to approve every incoming email sender. The result however is that offending senders are blocked forever and no spam passes through the human medicated approval process.

Other alternatives are to charge bulk email senders postage for sending their contents over the internet, but that's tomorrow's blog entry!

11 comments:

Unknown said...

I've asked this question of your CEO, but I think he's discontinued his Wednesday Student question day. You responded quickly and quite well to my gadget question, so I figured I might hit you with a tougher one.

Let me get you up to speed with me and why I'm asking these questions:

I'm a 25 year old MBA/MHA student, working full time as a Decision Analyst in a Medium to Large Community hospital in Louisiana, 30 minutes north of New Orleans. I spent 2 years doing all revenue cycle at a small inpatient psych hospital, while I got my bachelors in management. Before that I did an internship at IBM for a summer at the American Express Data Center. Before that I spent a year as a student worker in an HR department of a Charity Hospital.

I'll finish my MBA/MHA in May, and I hope to make that jump into management within a year or so after that. One day I hope to work my way up to an administrative level like yourself.


At what point did you make the jump to a management position? Were you immediately comfortable in the position? What were the large challenges you faced as a new manager? Is there anything imparticular you wish you would have been better prepared for? What tools and classes do you feel best helped you?

You're an MD, which I would bet leverages a little more buy in from nurses and doctors in your decisions and reccomendations. I don't have a medical background. Do you think that will be a significant challenge to overcome when climbing the hospital career ladder?

That's just a few of the points I'm really interested in.

My hospital is expanding, and I see many nice management jobs opening within our organization. Some have told me to go ahead and apply, but I'm afraid my age and relative inexperience in actually managing will hinder me. Although, I am familiar with the budgeting process and staffing concerns given my current position helps to support managers in those decisions.

By the way. I've enjoyed your postings, and it's nice to be able to have the opportunity to communicate with professionals such as yourself.

Unknown said...

I heard about your blog from my employer a few days ago and I've been back every day since. Great job!

I've only got one comment and that's for today's post. Hackers and spam...hackers have nothing to do with spam, I think you're talking about crackers. There is a world of difference between the two. Please reference the following link for details as the explanation just a fraction too long to post.

http://catb.org/jargon/html/appendixc.html

Please continue your excellent writing, will check back daily.

John Halamka said...

I'll write a blog entry about my management history, since it's a long story but I started managing a company at age 18 and had 35 employees by 21. I sold the company since I could not run a company and do residency simultaneously.

My most valuable lessons were from my mistakes early on in my career. At each stage of management in my career - from managing 30 people to 300 people to 1000 people, I've been taken to the edge of my comfort zone. My technique has been simple - listen to your employees since they have the answers you need. Every time I have made a transition, I have asked my new staff to identify the strategic and process issues that we need to solve together. Managing by using formal authority or autocratic decisionmaking almost never works. In fact, the more responsibility I'm given, the less authority I feel I really have.

I highly recommend beginning a career in management with a small staff and learn together with them. Coursework helps - your MBA classes in project management and leadership should qualify you to begin managing people. 25 is a great age and your energy/enthusiasm will make you a great manager.

Dave said...

John,

Have you considered any reputation-based spam filtering technology? The two leaders are IronPort, now part of Cisco, and IronMail from Secure Computing. I'm evaluating these currently and they seem to be highly effective in discarding messages from invalid senders or known spammers without wasting time and processing power.

BTW, I'm the IS Security Officer at Bloomington where Todd Rowland works. If you have any questions about what I mentioned, feel free to contact me through Todd.

Paul from Arlington said...

Hi John,

Having every organization maintain their own whitelists and blacklists is neither not scalable nor accurate, as you point out.

Imagine a system which allowed organizations to cooperatively manage "trusted sender lists"-- these would be lists of domains which are trusted to not send spam.

See my notes for more info.

(And my general spam page. And of course this.)

Mark said...
This comment has been removed by the author.
Mark said...

From Google:

"we use optical character recognition (OCR) developed by the Google Book Search team to protect Gmail users from image spam."

Link

Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Unknown said...

302 million messages is a lot of spam, but it is not over a third of a billion!