Thursday, August 13, 2009

Behind the scenes of our SMTP Service

The last couple of days have been a challenging and learning experience for the team in charge of our SMTP Relay service. As the volume of email that passes through the relay grows, so do the problems associated with performance and scaleability.

The first version of the relay service operated in a single-threaded linear fashion. Meaning, all emails sent to relay.jangosmtp.net, are processed and sent one at a time. Whenever an email message arrives at relay.jangosmtp.net, the following steps are executed:

1. The originating IP address and From Address are examined to determine what JangoMail user account the email message belongs to.

2. The tracking options for that particular user are loaded.

3. The email message is disassembled, the tracking mechanisms are added, and then the email message is re-assembled.

4. The email message is transmitted to the appropriate JangoMail sending server, based on whether the user account is enrolled in the Sender Score Certified program or based on the domain of the recipient email address.

5. The sending email server receives the message, signs the message with DomainKeys and DKIM, and then transmits the email message to the final destination email server based on the recipient domain.

A traditional SMTP service is much simpler and only need incorporate step 1 and a part of step 5. It can take our processes anywhere from 1/10th of a second to 3 seconds to process and deliver a single email message. The amount of time depends on the size of the email, the encoding of the email, and the tracking options selected.

Tuesday morning, a larger client submitted 50,000 transactional emails to relay.jangosmtp.net. The emails arrived at the relay.jangosmtp.net server over the course of 5 hours. Our linear process went to work, processing and transmitting each email message individually. As a result, email messages from other, smaller customers were delayed, some for serveral hours. We were alerted to the problem very soon after the backlog of emails began to build. Since all JangoMail staff members use the JangoMail SMTP relay with their desktop email clients, it was easy for them to tell something was wrong -- emails my staff was sending to customers, prospects, and themselves weren't being received. We realized that having a single process, processing each email one by one, just isn't going to cut it for scenarios where a customer transmits a large amount of email messages at once.

Within 8 hours, we re-architected the entire process so that multiple threads could operate on groups of email messages simultaneously. We now have the ability to assign a dedicated processing thread to any large customer. Since all threads operate simultaneously, a large amount of email from one customer will now not hold up a small amount of email from another customer.

After we deployed the new code last night, we thought our problems were solved. We awoke Wednesday morning to discover that the multi-threaded process was crashing, due to I/O concurrency issues. The constant crashing of the new code resulted in another backlog of undelivered email messages. Within 10 minutes, we discovered the issue was that all threads were attempting to log their activities to the same log file, and this was resulting in I/O disk errors. Within 60 minutes, we corrected the issue and deployed the new code, and since then (1:15 PM EST Wednesday), there have been no errors and no backlogs.

We appreciate our SMTP service customers' support over the last couple days, especially those that were adversely affected by the backlog of emails. One of the factors that I think makes JangoMail a different type of company is the awesome skill and dedication of our developers. When an architectural or functional issue is discovered, our programmers work until the problem is solved. We react swiftly, coming up with creative solutions to difficult problems within minutes -- because if we don't, the problems get worse: backlogs build, emails stop sending, the system could grind to a halt. The types of development problems we face are generally not the type that can be solved with a Google search. This is because much of JangoMail's technology is unique. We've done things that very few companies in our industry have done, including writing our own Email Rendering Tool, writing our own SMTP sending engine from scratch, and now, introducing the world's first SMTP service with tracking.

I hope this post sheds some light on the inner workings of the JangoMail operation, my staff's committment to delivering an amazing service, and a little bit of our secret sauce on how the magic happens. If anyone has any questions about the SMTP service, email me at ajay AT us dot jangomail dot com.