Friday, November 06, 2009

How we scaled up the JangoSMTP service to accommodate the ING New York City Marathon

Overview of the New York City Marathon and its Email Alerts

It was a sunny mid-September day when my head of sales informed me that the systems integrator for the 2009 New York City Marathon was looking at our SMTP service for the delivery of alert emails on runners during the marathon. Given that our SMTP relay service was relatively new at the time, we saw this as an opportunity to demonstrate the powerful tracking features and performance capabilities of our service.

Having tried other email vendors and their own internal systems in the past, the marathon was looking to avoid past issues, such as large delays in the delivery of the email alerts and email blocking issues with consumer ISPs like GMail and Hotmail.

The marathon had about 42,000 total runners. Friends and family members of each runner could "subscribe" to a runner, such that whenever the runner reached a checkpoint during the marathon, an email alert would be sent to the subscribers of that runner. This would be done electronically via a Chronotrack D-Tag. For more information on the electronic tracking of runners, see http://www.nycmarathon.org/race_scoring.htm.

My team was told to expect anywhere from 400,000 to 2,000,000 email alerts to be sent out during the race. That represents an average of 10 to 50 email alerts per runner, depending on how many individuals subscribe to a particular runner.

The Speed Problem

The JangoSMTP service has been a single-node service since its launch earlier in 2009. It was serviced by one single fault-tolerant, RAID-based, multiple-CPU, high memory Windows server located at relay.jangosmtp.net. The SMTP service works by receiving email at relay.jangosmtp.net, and then passing emails along to one of JangoMail's 40 outbound SMTP senders for delivery to the final recipient. The relay.jangosmtp.net server could receive an unlimited amount of emails and saturate our upstream Internet provider's bandwidth, but it could only process, add tracking, and transmit to the outbound SMTP senders over the Internet at a rate of 250 emails/minute. I therefore calculated:

250 emails/minute x 60 = 15,000 emails/hour

Based on past marathon results, I estimated that the fastest runners would complete the race in about 2 hours, and the slowest runners would complete the race in about 6 hours. However, the marathon would be initiated in 3 waves of 14,000 runners each, distributed over an hour the morning of the marathon. So 6 hours of running time, plus an added hour for the start of the last wave, meant 7 hours of sending email.

7 hours x 15,000 emails/hour = 105,000 emails over 7 hours

Uh oh. We weren't nearly fast enough. Even at the minimum expected volume, we weren't fast enough by a factor of 4. And at the maximum expected volume, we only had 5% of the needed capacity.

The two bottlenecks were 1) the processing of an email message, meaning the dis-assembly and re-assembly to determine what user it belonged to and add tracking mechanisms, and 2) transmitting the email to a SMTP sender. Point #2 warrants more explanation. While there is no bottleneck for relay.jangosmtp.net to receive emails from the outside world, there can be a bottleneck for relay.jangosmtp.net to transmit emails, since relay.jangosmtp.net must transmit the message to separate SMTP sender located on separate networks in separate data centers. This transmission happens over the Internet, and based on where the SMTP sender is located and the routing to it, speed can fluctuate.

Time to Scale Up

Load Balancing

We needed to scale up, and scale up fast. The initial plan was to order four more servers, and have them each serve as an additional SMTP receiver and processor for relay.jangosmtp.net. The JangoMail architecture does not employ the use of appliance-based load balancers, so I decided to handle the load balancing via our Domain Name System (DNS).

We created multiple DNS "A" records for relay.jangosmtp.net, each with a Time to Live (TTL) of 60 seconds. Five "A" records were created in total, each with a different IP address. The first of the five was the original IP for relay.jangosmtp.net, and the other four were for the four additional servers we commissioned.

By keeping the TTL at a short 60 seconds, I could make certain that each of the five servers would receive an equal load of email every minute. And since not all DNS servers on the net respect TTLs and sometimes do their own caching, I confirmed with our tech contact at the marathon that their systems would NOT cache the IPs for relay.jangosmtp.net beyond the designated 60 second TTLs.

SQL Query Caching

Every email that arrives at relay.jangosmtp.net is disassembled, tracked, re-assembled, and then passed to an outbound SMTP server for actual sending. In order to determine what email belongs to what user, and what tracking options each user has selected, the originating IP address of each email message is looked up against an IP Address table, and then once the user is determined from the IP address, the UserAccounts table in the core database is queried to determine what tracking/DomainKeys options the user has selected. These two queries combined took anywhere from 0.05 seconds to 0.2 seconds, depending on the load on the database at the time.

We shaved this time down to 0.001 seconds by caching the results of these queries and refreshing every five minutes.

Multi-threading

Our custom SMTP architecture is a multi-threaded model, allowing for the simultaneous processing and delivery of emails across user accounts. We had been informed ahead of time that the marathon would trigger email alerts from two originating IP addresses. We therefore configured two dedicated processing threads on each of the 8 servers. This gave us 16 total processing threads, and also isolated the processing/delivery of the marathon's emails from our other clients' emails during the race.

Distributed Transactional Senders

The SMTP service is part of JangoMail's transactional email platform, which also includes the SendTransactionalEmail API method. All transactional emails are sent through the email sender that is assigned for a particular user. For fault-tolerance purposes, every user has a list of transactional email senders assigned to it, such that if the first email sender is unavailable or offline, the email is passed to the second, and to the third, and so on, until the email is successfully transmitted. This approach was great for fault-tolerance and redundancy, but not for scaleability. If the first server in the list was online and available, then it would receive all the transactional emails for that account.

I therefore decided to add an internally controlled user-level setting option to randomize the list of senders. Now, if a user's list of transactional senders included:

Sender1, Sender2, Sender3, Sender4

Now the relay.jangosmtp.net servers would farm out the emails for delivery to any sender in the user's list at random, ensuring that as many transactional senders as were assigned would receive an equal load of emails to deliver, rather than having the first available sender do all the delivery. This also aided in resolving the bandwidth bottleneck with the SMTP senders mentioned earlier.

Still not fast enough - need 3 more servers

Given our SQL query optimizations, multi-threading, additional servers, the timing now looked like:

400 emails/minute/server x 5 servers = 2,000 emails/minute x 60 minutes = 120,000 emails/hour.

120,000 emails/hour x 7 hours = 840,000 emails over 7 hours

However, if the load was over 1 million emails, there would still be a delay. I decided to annex 3 additional servers that are already a part of the JangoMail network but weren't active on port 25, and turn them into 3 additional SMTP receivers. There were now 8 total DNS "A" records for relay.jangosmtp.net.

With just 48 hours before the marathon, we were informed that the expected outbound volume, based on the number of subscribers so far, would be about 750,000 emails.

And just in case all else failed...

While we've always believed in the performance and reliability of our code and architecture, we decided that for a project of this caliber, a backup plan was necessary in case our custom SMTP architecture was unable to perform. The JangoSMTP custom architecture is what allows emails that pass through the SMTP relay to be open and click-tracked, and stored in a database, so that SMTP logs can be viewed and reports can be generated based on open and click timing, domains, and geo-tracking reports based on IP addresses. However, the primary issue for the marathon, was to ensure the emails were delivered, and delivered on time. Therefore, a backup plan was put into place that could guarantee the emails would be delivered, even if we had to eliminate the tracking and logging based on our own custom architecture. If we found that JangoSMTP could not handle the load, we would replace the instances of the JangoSMTP receiver with Microsoft's built-in SMTP service (part of Internet Information Services), such that emails would be received and delivered to the final recipient, without any processing needed in between.

In order to isolate the marathon from our other clients in case this backup plan had to be put in place, we setup the domain marathon.jangosmtp.net, which mimiced the 8 A records for relay.jangosmtp.net, and we asked the system integrators for the marathon to connect to marathon.jangosmtp.net instead of relay.jangosmtp.net. This would allow us to re-direct just their email on the day of the race if needed.

The Day of the Race - November 1, 2009

I awoke at 7:30 AM EST on that Sunday, after having been out the night prior for Halloween. The emails would begin trickling in at 8:30 AM EST for runner check-ins, and the first wave of the race was set to begin at 9:40 AM EST.

I was watching the race live on TV, while at the same time monitoring the traffic flows across our 8 instances of relay.jangosmtp.net. Emails were being received, processed, and sent quickly, and there was no backlog...until about 12:25 PM EST.

All three waves had been released, and runners from all three waves were triggering a massive volume of email alerts. From 12:25 PM EST to about 12:35 PM EST, there was a 4-5 minute delay with final delivery. Thankfully, the backlog period only lasted 10 minutes. It makes sense that approximately 3 hours after the release of the first wave, that the highest volume of email was passing through, since the greatest number of runners would still be running around this time.

Additionally, at about 1:00 PM EST, we discovered that comcast.net and att.net/bellsouth.net domains were blocking one of the IPs from which marathon email was sending. While we did have a mechanism by which domain-specific routes could be used, such that we could enable all comcast.net/att.net/bellsouth.net email to go through one specific non-blocked IP address, the complexity of the randomization system we had added to accomodate the bandwidth bottleneck rendered this mechanism non-functional. I called our lead developer, who was on call, asked him to make a change to the sender-determination algorithm, and re-deployed our code across all 8 SMTP server instances. We were now able to route all comcast.net/att.net/bellsouth.net email through a separate non-blocked SMTP sender. In the end, the marathon alert emails had less than a 0.3% blocking rate.

After 1:00 PM EST, no further email deliverability or backlog issues ensued.

Conclusion

We were thrilled that we were able to pull this off for the 2009 ING New York City Marathon. Even post-marathon, we continue to make performance and feature enhancements to our transactional email platform. If you have an important project for which sending email is critical, please get in touch with us, and we'll work as hard for you as we did for the marathon.

New API Method for Template-Driven Transactional Emails

Overview 

Marketers can now create and edit Transactional Email Templates on the Send Email Page and without the help of their IT team.

To enable this, we launched a new API method: SendTransactionalEmailFromTemplate. Instead of passing the full details of the email message using the regular SendTransactionalEmail method, you can now simply pass the email Campaign ID and let your marketers use the JangoMail interface to dictate the email's creative content.

SendTransactionalEmailFromTemplate works like the SendTransactionalEmail method, but does not require you to enter FromEmail, FromName, Subject, MessagePlain, and MessageHTML. Instead you must enter CampaignID, PersonalizationValues, PersonalizationColDelimiter and PersonalizationFields.


Input Parameters

Username
Your JangoMail account username

Password
Your JangoMail account password

CampaignID
Marketers should create an Email Campaign as they normally would, starting on the Send Email Page (or just use the SendMassEmail API method). Once they create a campaign and send a test, you can find the Campaign ID in the Reporting and Analytics section of the JangoMail interface. Look in the ID column to find the Campaign ID.

ToEmailAddress
The single email address to whom you want to send this email message.

PersonalizationFields
Example:
FirstName,LastName,Company,FromAddress

JangoMail uses the following syntax for personalization and Marketers should continue to use this when creating transactional email templates. Mail-merge personalization tags can be used in the Subject, HTML Message, Plain Text Message, From Display Name, and From Email Address:

%%FieldName%% 
or
%%FieldName**Default Value%

The second version of the personalization syntax can be used in cases where the value of FieldName is blank, such that a default value is substituted instead of a blank. 

PersonalizationValues
Example:
Ajay,Goel,Silicomm,ag@silicomm.com 

PersonalizationColDelimiter
Specify a column delimiter as follows:
c for a comma
s for a space
n for a newline, or a hard break
t for a tab

Options
A comma separated list of name/value pairs.  The names of the various options and their associated values are explained below.
 
You may specify:
TransactionalGroupID (a numeric ID of a previously created Transactional Group)
SkipUnsubCheck ("True" or "False")

You can also specify the following, but if not specified, values will pull from your original email campaign. User-specified values here will override values in your email campaign:

ReplyTo (any valid email address)
CC (any valid email address)
BCC (any valid email address)
CharacterSet (a character set like "ISO-8859-1" or "US-ASCII")
Encoding (either "7 bit" or "Quoted Printable" or "Base 64")
Priority (either "Low", "Medium", or "High") 
UseSystemMAILFROM ("True" or "False") 
Receipt ("True" or "False")
Wrapping (a numeric value from 0 to 100, with 0 meaning no wrapping takes place)
ClickTrack ("True" or "False") 
OpenTrack ("True" or "False")
NoClickTrackText ("True" or "False") 
Attachment1 (the name of a previously FTPd file, like "Invoice.pdf")
Attachment2
(the name of a previously FTPd file, like "Invoice.pdf")
Attachment3 (the name of a previously FTPd file, like "Invoice.pdf")
Attachment4 (the name of a previously FTPd file, like "Invoice.pdf")
Attachment5 (the name of a previously FTPd file, like "Invoice.pdf")

Example

In the below example, the input parameters are:

Username:jangomail
Password:******
CampaignID:265987456
ToEmailAddress:jangomail@gmail.com
PersonalizationFields:FirstName,OrderNumber
PersonalizationValues:Bob,123456
PersonalizationColDelimiter:c
Options:ReplyTo=jm@gmail.com,OpenTrack=True

This example assumes that the Subject and/or Body contains personalization placeholders for FirstName and OrderNumber.  In fact, for Campaign ID 265987456, the actual message is:

------BEGIN MESSAGE-------
Dear %%FirstName**Valued Customer%%,

Your order has shipped.  Your order number is %%OrderNumber%%.
------END MESSAGE-------

This transactional email will be open tracked.  To ensure that it will also be click tracked, we would add "ClickTrack=True" to the Options parameter.  If we wanted it to be a high priority message instead of the default medium priority, we could add "Priority=High", such that the full Options string would look like:

Options:ReplyTo=jm@gmail.com,OpenTrack=True,ClickTrack=True,Priority=High





To test the operation using the HTTP POST protocol, visit the JangoMail API page on this method and click the Invoke button.

Frequently Asked Questions

Q. What's the difference between SendTransactionalEmailFromTemplate and SendTransactionalEmail?
A. When calling SendTransactionalEmail, you are passing in the full details of that email message, including the Subject, the HTML Message, the Plain Text Message, tracking options and much more.  However, if you've already created a template message in the Send Email section of the JangoMail web interface, you can call SendTransactionalEmailFromTemplate to send transactional emails and avoid having to pass in all the message details on every method call.  You simply reference the Campaign ID of the email message that you created in the Send Email section.

Q. I see there's a PersonalizationColDelimiter input paramter.  Where is the PersonalizationRowDelimiter input parameter?
A. Since this is a method to send a single transactional email, only one set of values can be provided for the single recipient of the email message.  There is no need for a row delimiter since the data should only be for one record.

Q. What is the TransactionalGroupID?
A. This is an optional setting that can be specified as part of the Options input parameter.  Transactional Groups are a way to categorize different types of transactional emails, like Order Confirmations, Shipping Notifications, and Thank You messages.  You can create different Transactional Groups using the web interface or by calling AddTransactionalGroup.  Every Group has a numeric ID assigned to it, and so to assign a particular transactional email to a Group, you can set the TransactionalGroupID.

Q. How do I send attachments with my transactional email?
A. You can specify the names of attachments in the Options input parameter.  However, the file names you specify must already be present in your account.  You should FTP your files to client.jangomail.com/Attachments prior to calling this method with attachments.

Tuesday, October 27, 2009

Official Launch of JangoSMTP.com, Our Stand-Alone SMTP Relay Service

We have just launched JangoSMTP.com, a new stand-alone SMTP relay service and the first designed for email marketers.

Why use it?
JangoSMTP is the first SMTP relay service to offer open and click tracking. It also comes with a variety of features that contribute to its extreme deliverability
  • DomainKeys/DKIM Signing
  • SPF/SenderID Authentication
  • Sender Score Certification
  • Feedback Loops with ISPs
Visit our site for JangoSMTP's Full Features List.

Who is it for?
JangoSMTP can be utilized by users of desktop email clients (Outlook, Thunderbird, etc.),  Gmail users, and web programmers who send transactional emails through their own corporate servers.

Where can I learn more?
Go to http://www.jangosmtp.com/ for more information or Contact Us directly.

Monday, October 26, 2009

New API Method: AddTransactionalGroup

We just released a new API method for the JangoMail API which allows you to add a Transactional Group to your account:

*AddTransactionalGroup
  Adds a new Transactional Group to your account. Returns a string.

We also recently released 3 other transactional email API methods. These methods allow you to retrieve opens, clicks, unsubscribes, bounces, and complaint statistics for a single transactional email in your account.

Visit api.jangomail.com for a full list of our API methods.

Thursday, October 22, 2009

New Design: Features Page

JangoMail has a very comprehensive feature list and we've been working on a way to organize it so that it's easy to navigate. We want to make sure that our current and prospective customers are able to identify all of our capabilities and can easily find all of the information and tutorials we have on each feature.

We are excited to announce that we just relaunched our Email Features page, which now has an improved navigation system. You can expand each feature set and feature for more information. 



*Use the Table of Contents to find the feature category you are interested in.

*To navigate to a certain feature's description, video, tutorial or blog entry, simply click the arrow next to that feature.


  Then click on the link for the information you would like to see.


*To see a visual description of a certain feature, click on the name of that feature.


  A window will pop up with a visual description of the feature.




Browse through our new page yourself at: http://www.jangomail.com/features_overview.asp. Let us know what you think!

Wednesday, October 21, 2009

Two SMTP Service Bug Fixes

Tonight we've deployed two bug fixes to the SMTP relay service:
  1. Previously, emails sent with very long Subject lines, such that the Subject line folded onto the next line, were improperly processed, and this could result in headers being visible in the email body. This is now fixed.
  2. Previously, emails sent with very long From lines, such that the From line folded onto the next line, were improperly processed, and this could result in the email not being transmitted to the final recipient at all. This is now fixed.
  3. Previously, encoded subject lines, such as:

    =?utf-8?Q?You've_received_=E2=82=A8229.64_($2.00_USD)_in_your_Surveyhead_?= =?utf-8?Q?account?=

    would not have their encoding preserved. This is also now fixed.
Technical Resources:

Those wishing to read about the allowance of folded lines as stated in the SMTP specification can do so here: http://www.faqs.org/rfcs/rfc822.html. Specifically, read section:

3.1.1. LONG HEADER FIELDS

Monday, October 19, 2009

New Transactional Email API Methods

We have just released 3 new Transactional Email API Methods for the JangoMail Email Marketing API.

These new methods allow you to retrieve opens, clicks, unsubscribes, bounces, and complaint statistics for a single transactional email in your account:

*Reports_Transactional_GetSingleEmailStats_Dataset
  Retrieves statistics for a transactional email. Returns a .NET DataSet.

*Reports_Transactional_GetSingleEmailStats_String
  Retrieves statistics for a transactional email. Returns a string.

*Reports_Transactional_GetSingleEmailStats_XML
  Retrieves statistics for a transactional email. Returns an XML document.

Saturday, September 19, 2009

Technical Discussion of How Social Networking Sites Send Transactional Email

Shortly after JangoMail launched its own transactional email platform, we took on a flood of new clients with a need to send "social networking" type transactional emails, including "You have a new message" notifications and "You have a new friend request" notifications. Then we launched our SMTP relay for email marketers, and the floodgates ripped off.

We're often asked what the best practices are from a technical standpoint, with respect to issues like Content, From Addressing, DomainKeys and DKIM, and SPF/SenderID. Should the content of a message sent from one member to another be included in the email notification, or should the recipient have to log in to the site to read the message? Should a "friend request" email come "from" the friend or "from" the social networking site? Should the email contain Plain Text, or HTML, or both? To best answer these questions, I've analyzed transactional email from the most notable social networking sites -- Facebook, LinkedIn, Friendster, and MySpace. Note that this is an article about the technical aspects of sending transactional email. If you're looking for an article on transactional email strategy (what to send, how often, to whom...), then you won't find that here, but you can in many other places.

The following are the properties of each social networking site's email that I will analyze:
  1. Presence of a DomainKeys header.
  2. Presence of a DKIM header.
  3. Sender Policy Framework (SPF) compliant email.
  4. The From Address
  5. The From Display Name
  6. The SMTP MAIL-FROM Address used in the SMTP transaction
  7. The Sender header, if present
  8. The Reply-To header, if present
  9. Use of a unique identifier in the SMTP MAIL-FROM address
  10. The content, particularly whether a friend's message is present in the Email Body, or whether the user must log in to read the message.
  11. The use of HTML versus Plain Text MIME parts.
Let's get started with the biggest social network of all, Facebook:

Facebook:

When somebody sends me a message on Facebook, I get an email notification with the Subject line:

"Mary Parker sent you a message on Facebook..."

The Body of the email contains the actual message, which is convenient, because it prevents me from having to log in to Facebook to read the message.

From Address: notification+mummiv~d@facebookmail.com
Reply-To:
noreply@facebookmail.com
From Name:
Facebook
DKIM-Signature:
v=1; a=rsa-sha1; d=facebookmail.com; s=q1-2009b; c=relaxed/relaxed;q=dns/txt; i=@facebookmail.com; t=1253415012;h=From:Subject:Date:To:MIME-Version:Content-Type;bh=fLwGefDfPrLKUK2vwWzaC5k5iNo=;b=DfJugajI0Fsc5FIppBQSfSAAiV+WuK1ugzc00USpIkJku3sUgcupO+TPua0Tcxw7qJDUE0yRohPMWz3jPwPfRA==;


A DKIM-Signature header for facebookmail.com is present, but it is noteworthy that a DomainKey-Signature header is not present.

The SMTP MAIL-FROM address is: notification+mummiv~d@facebookmail.com

Facebook Transactional Email SMTP Log:

69.63.178.170 [03A4] 22:50:13 Connected
69.63.178.170 [03A4] 22:50:13 >>> 220 mail.silicomm.com ESMTP IceWarp 9.1.0; Sat, 19 Sep 2009 22:50:13 -0400
69.63.178.170 [03A4] 22:50:13 <<< EHLO mx-out.facebook.com
69.63.178.170 [03A4] 22:50:13 >>> 250-mail.silicomm.com Hello mx-out.facebook.com [69.63.178.170], pleased to meet you.
69.63.178.170 [03A4] 22:50:13 <<< MAIL FROM:<notification+mummiv~d@facebookmail.com>
69.63.178.170 [03A4] 22:50:13 >>> 250 2.1.0 <notification+mummiv~d@facebookmail.com>... Sender ok
69.63.178.170 [03A4] 22:50:13 <<< RCPT TO:<AjayGoel@silicomm.com>
69.63.178.170 [03A4] 22:50:13 >>> 250 2.1.5 <
AjayGoel@silicomm.com>... Recipient ok
69.63.178.170 [03A4] 22:50:13 <<< DATA
69.63.178.170 [03A4] 22:50:13 >>> 354 Enter mail, end with "." on a line by itself
69.63.178.170 [03A4] 22:50:13 *** <notification+mummiv~d@facebookmail.com> <ajay@us.jangomail.com> 1 2021 00:00:00 OK BLS56313
69.63.178.170 [03A4] 22:50:13 >>> 250 2.6.0 2021 bytes received in 00:00:00; Message id BLS56313 accepted for delivery


The SPF record for facebookmail.com is:

"v=spf1 mx ip4:204.15.20.0/22 ip4:69.63.176.0/20 -all"

The connecting IP is 69.63.178.170, which is a part of the "ip4:69.63.176.0/20" directive and therefore this IP is SPF-valid for sending email "FROM" facebookmail.com.

Noteworthy Points:
  1. It's excellent that Facebook is employing both SPF and DKIM.
  2. The email lacks a DomainKey-Signature header for backwards compatibility for receiving systems that might not be DKIM-compliant yet.
  3. The email is from "Facebook" and not my friend "Mary Parker". This is likely to prevent people from attempting to reply to the email as a way of writing back to "Mary Parker". The only way to reply to the message is to login to Facebook and use their internal messaging system.
  4. Facebook's transactional emails come from the domain facebookmail.com rather than the more recognizable facebook.com. This is perhaps to separate email reputation from their actual high-value business domain facebook.com. That way, should their emails be treated as spam by a network, the domain name facebookmail.com will be punished, not facebook.com.
  5. The X-Mailer header of the email is:

    X-Mailer: ZuckMail [version 1.00]


    Presumably, ZuckMail is an email system written by, or an homage to, Facebook's founder Mark Zuckerberg.

LinkedIn:

LinkedIn sends transactional email differently than Facebook. While Facebook "friend request" email comes from "Facebook" and a no-reply facebookmail.com email address, LinkedIn's "Join my network" emails usually come "From" the person trying to add you.

The relevant headers from a LinkedIn transactional email are below:

DomainKey-Signature: s=prod; d=linkedin.com; c=nofws; q=dns; h=Sender:Date:From:To:Message-ID:Subject:MIME-Version: Content-Type:X-LinkedIn-fbl;b=KQVq9LfSNIZ+6tOuz7xUaUW1wx86f62EndMqyY+bbkmI8jHQDNe8HdtW ZfJsxH50d8nvAnQHryCdQKGfBEY0u2t5sXkrRmf7yCAhhg4Q9aqBo20KZ LSUvTK726opFnO0;
Sender: messages-noreply@bounce.linkedin.com
From: "Mary Parker" <maryparker@hotmail.com>
To: Ajay Goel <AjayGoel@silicomm.com>


Note that the the From Name is my friend "Mary Parker" and the From Address is Mary's email address, maryparker@hotmail.com. Because no Reply-To header is present, clicking "Reply" on my email client will auto-address the message to the From Address, maryparker@hotmail.com.

From an authentication perspective, it's interesting that the LinkedIn transactional email contains the DomainKey-Signature header, but not the more modern DKIM-Signature header. JangoMail transactional emails contain both the DomainKey-Signature and the DKIM-Signature headers, for the highest level of compatibility with all receiving email systems.

LinkedIn Transactional Email SMTP Log:

64.74.98.137 [057C] 12:16:54 Connected
64.74.98.137 [057C] 12:16:54 >>> 220 mail.silicomm.com ESMTP IceWarp 9.1.0; Thu, 03 Sep 2009 12:16:54 -0400
64.74.98.137 [057C] 12:16:54 <<< EHLO mail15-a-aa.linkedin.com
64.74.98.137 [057C] 12:16:54 >>> 250-mail.silicomm.com Hello mail15-a-aa.linkedin.com [64.74.98.137], pleased to meet you.
64.74.98.137 [057C] 12:16:54 <<< MAIL FROM:<s-vEJ2DMWLs51nh5KhMMSL430pk-MMh3YJi-KV_N5M-Qy28fihJyfjtc@bounce.linkedin.com> SIZE=3706
64.74.98.137 [057C] 12:16:54 >>> 250 2.1.0 <s-vEJ2DMWLs51nh5KhMMSL430pk-MMh3YJi-KV_N5M-Qy28fihJyfjtc@bounce.linkedin.com>... Sender ok
64.74.98.137 [057C] 12:16:54 <<< RCPT TO:<AjayGoel@silicomm.com>
64.74.98.137 [057C] 12:16:54 >>> 250 2.1.5 <AjayGoel@silicomm.com>... Recipient ok
64.74.98.137 [057C] 12:16:54 <<< DATA
64.74.98.137 [057C] 12:16:54 >>> 354 Enter mail, end with "." on a line by itself
64.74.98.137 [057C] 12:16:54 *** <s-vEJ2DMWLs51nh5KhMMSL430pk-MMh3YJi-KV_N5M-Qy28fihJyfjtc@bounce.linkedin.com> <AjayGoel@silicomm.com> 1 4048 00:00:00 OK KZT46754
64.74.98.137 [057C] 12:16:54 >>> 250 2.6.0 4048 bytes received in 00:00:00; Message id KZT46754 accepted for delivery


Noteworthy Points:

  1. 1. The SMTP MAIL-FROM address is a string of characters followed by @bounce.linkedin.com. LinkedIn is using a sub-domain of its high-value business domain, linkedin.com. JangoMail also offers the option to setup a sub-domain based off your corporate domain name. The string of randomly generated characters is presumably to track any bounce that occurs as a result of this particular transactional email. If the silicomm.com MTA sends a notification back to s-vEJ2DMWLs51nh5KhMMSL430pk-MMh3YJi-KV_N5M-Qy28fihJyfjtc@bounce.linkedin.com, the LinkedIn system will be able to immediately tie the identifier present in that email address and use that information to immediately notify Mary Parker that the email to me bounced. JangoMail also handles bounces to transactional emails, but with a unique identifier in the X-VConfig header of the message as opposed to within the MAIL-FROM address itself. Both options are legitimate for tracking bounces.
  2. The SPF record for bounce.linkedin.com is:

    "v=spf1 redirect=linkedin.com"


    The redirect directive then tells us to look up the SPF record for linkedin.com, which is:

    "v=spf1 ip4:70.42.142.0/24 ip4:208.111.172.0/24 ip4:64.74.220.0/24 ip4:64.74.221.0/26 ip4:64.71.153.211 ip4:64.74.221.30 ip4:69.28.149.0/24 ip4:208.111.169.128/26 ip4:64.74.98.128/26 ip4:64.74.98.16/29 mx ~all"

    The sending IP 64.74.98.137 is covered by the "ip4:64.74.98.128/26" designation, therefore this email passes the SPF test.
  3. Emails with subject line "Invitation to connect on LinkedIn" always has the user's information as the From Name / From Address.
  4. Emails with subject line "Join my network on LinkedIn" sometimes have the From Name / From Address of the user, but other times From Name is "Mary Parker (LinkedIn Invitations)" and From Address is invitations@linkedin.com.

    I have been unable to determine the reason for the ambiguity with the "Join my network on LinkedIn" emails.

Friendster:

The following are the relevant headers from a Friendster "new message" notification:

From: Friendster <message-2219486617@mail.friendster.com>
Subject: New Friendster Message from Vikki - 09/07/09 10:17 PM
To: AjayGoel@silicomm.com


Friendster Transactional Email SMTP Log:

209.11.169.65 [03A4] 01:17:58 Connected
209.11.169.65 [03A4] 01:17:58 >>> 220 mail.silicomm.com ESMTP IceWarp 9.1.0; Tue, 08 Sep 2009 01:17:58 -0400
209.11.169.65 [03A4] 01:17:58 <<< EHLO c350a-5.friendster.com
209.11.169.65 [03A4] 01:17:58 >>> 250-mail.silicomm.com Hello c350a-5.friendster.com [209.11.169.65], pleased to meet you.
209.11.169.65 [03A4] 01:17:58 <<< MAIL FROM:<message-2219486617@mail.friendster.com> SIZE=9837
209.11.169.65 [03A4] 01:17:58 >>> 250 2.1.0 <message-2219486617@mail.friendster.com>... Sender ok
209.11.169.65 [03A4] 01:17:58 <<< RCPT TO:<AjayGoel@silicomm.com>
209.11.169.65 [03A4] 01:17:58 >>> 250 2.1.5 <
AjayGoel@silicomm.com>... Recipient ok
209.11.169.65 [03A4] 01:17:58 <<< DATA
209.11.169.65 [03A4] 01:17:58 >>> 354 Enter mail, end with "." on a line by itself
209.11.169.65 [03A4] 01:17:58 *** <message-2219486617@mail.friendster.com> <
AjayGoel@silicomm.com> 1 10017 00:00:00 OK POZ63258
209.11.169.65 [03A4] 01:17:58 >>> 250 2.6.0 10017 bytes received in 00:00:00; Message id POZ63258 accepted for delivery


Noteworthy Points:
  1. The complete lack of both DomainKeys and DKIM signatures.
  2. The email is SPF compliant. The SPF record for mail.friendster.com is "v=spf1 mx ip4:209.11.168.0/23 mx:c300a-1.gbxsc.friendster.com -all" and the sending IP is 209.11.169.65, which is covered by the "ip4:209.11.168.0/23" directive.
  3. The email is not "from" my friend but instead is from "Friendster". The From Email is also not of my friend, but a Friendster unique email address, in this case message-2219486617@mail.friendster.com. The From Address is the same as the SMTP MAIL-FROM address shown in the SMTP transaction. Therefore bounces at the MTA level will be sent to message-2219486617@mail.friendster.com, and the unique number that's a part of that From Address presumably allows the Friendster mail system to know that it was this particular transactional email that bounced.
  4. There is no Reply-To address, so if the user does hit Reply in his email client, the reply will go to the From Address, message-2219486617@mail.friendster.com, and sending to this will presumably cause the email to be lost in the ether.

MySpace:

The following are the relevant headers from a MySpace "new message" notification:

DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; s=pmta; d=message.myspace.com;h=MIME-Version:From:To:Reply-To:Date:Subject:Content-Type:Content-Transfer-Encoding; i=noreply@message.myspace.com;bh=T25HyvwBLohBcz5zhA0xB0tJqdc=;b=SeqKqNrcg4jDIyy0lmlo2BJje2frO0hgd16E9M/Y7U42LNLPh+cLlr7q5muDN7ASuPzwaNIvjj9VKq9VzNSvnxNukRsF2ICj+jT7GFRHdO39feZXnyJgcGbu0eHGLF7v83PL8CApEF80w/f3HuZ8xZgzYQcagUvNeD2jUmTqj/4=
DomainKey-Signature: a=rsa-sha1; c=nofws; q=dns; s=pmta; d=message.myspace.com;b=SlJszI5cM6w78pK5nSJLzVTrrFw5YgU2vX8GraxP8yPjhbb6/hZ+oMjr7HUF21r8wyK774IEA/o/u3Y6z+kcpRHTf7D3TWD94TCZW+mQDGxl3t+FSdqLLt4ISVUvGu+LTAIMfpSOzVKNnz3C/4QHpAg1c2a2jRu1IFmrPbqOaK0=;
From: MySpace <noreply@message.myspace.com>
To:
AjayGoel@silicomm.com
Reply-To: MySpace <noreply@message.myspace.com>
Subject: MaryParker has sent you a new message!


MySpace Transactional Email SMTP Log:

204.16.33.83 [0664] 18:04:01 Connected
204.16.33.83 [0664] 18:04:01 >>> 220 mx.silicomm.com ESMTP IceWarp 9.1.0; Fri, 10 Jul 2009 18:04:01 -0400
204.16.33.83 [0664] 18:04:01 <<< EHLO vmta20.myspace.com
204.16.33.83 [0664] 18:04:01 >>> 250-mx.silicomm.com Hello vmta20.myspace.com [204.16.33.83], pleased to meet you.
204.16.33.83 [0664] 18:04:01 <<< MAIL FROM:<noreply@message.myspace.com> BODY=8BITMIME
204.16.33.83 [0664] 18:04:01 >>> 250 2.1.0 <noreply@message.myspace.com>... Sender ok
204.16.33.83 [0664] 18:04:01 <<< RCPT TO:<
AjayGoel@silicomm.com>
204.16.33.83 [0664] 18:04:01 >>> 250 2.1.5 <
AjayGoel@silicomm.com>... Recipient ok
204.16.33.83 [0664] 18:04:02 <<< DATA
204.16.33.83 [0664] 18:04:02 >>> 354 Enter mail, end with "." on a line by itself
204.16.33.83 [0664] 18:04:02 *** <noreply@message.myspace.com> <
AjayGoel@silicomm.com> 1 5669 00:00:00 OK RDO05602
204.16.33.83 [0664] 18:04:02 >>> 250 2.6.0 5669 bytes received in 00:00:00; Message id RDO05602 accepted for delivery


MySpace is the only one of the four social networking sites to use both the DomainKey-Signature and the DKIM-Signature headers, so kudos to MySpace for considering the entire email ecosystem.

Noteworthy Points:

  1. Both DKIM and DomainKeys headers present.
  2. SPF passes. The SPF record for message.myspace.com is "v=spf1 ip4:63.208.226.0/24 ip4:204.16.32.0/22 ip4:216.178.32.0/20 ip4:216.178.38.0/20 a ~all" and the sending IP is 204.16.33.83, which is included in the "ip4:204.16.32.0/22" directive.
  3. The email is From "MySpace" rather than "Mary Parker", and the From Address is noreply@message.myspace.com instead of maryparker@hotmail.com. The From Address clearly implies that you should not reply to it, unlike Friendster's address. Clearly MySpace wants you to log in to the site to reply rather than by email.
  4. The SMTP MAIL FROM is noreply@message.myspace.com, and does not contain a unique identifier.
  5. The Reply-To header is present, but its value is the same as the From Address. So regardless, if you hit Reply on your email client, the email is going to noreply@message.myspace.com.
Conclusion:

If I had to rank the social networking sites in order of best to worse in sending authenticated, compliant, and useful transactional email, this would be the ranking:

1. Facebook (Best)

Positive: Emails are DKIM and SPF compliant. Email notifications of "You have a new message" include the message in the body of the email.
Negative: No DomainKeys signature, and the Reply-To address is one that cannot receive replies.

2. MySpace

Positive: Emails are SPF, DKIM, and DomainKeys compliant - the only social network to use all three.
Negative: Notifications of "You have a new message" force you to login to read your message. The Reply-To address is one that cannot receive replies.

3. LinkedIn

Positive: Emails are SPF and DomainKeys compliant. Sometimes emails are "From" the actual friend rather than LinkedIn. Proper use of the Sender header.
Negative: No DKIM signature, which means LinkedIn is complying with the older DomainKeys standard but not the newer DKIM standard. Notifications of "You have a new message" force you to login to read your message.

4. Friendster (Worst)

Positive: Emails are SPF compliant.
Negative: No DomainKeys signature. No DKIM signature. Notifications of "You have a new message" force you to login to read your message. The Reply-To address is one that cannot receive replies.

JangoMail Best Practice Recommendations:

When using JangoMail to send transactional emails for a social network, you can optimize the deliverability and usability of your emails by following these guidelines:

1. Ensure your emails are SPF compliant. In most cases, your emails will already be SPF compliant. Whether you're using your-username@jangomail.com as your From Address or noreply@MySocialNetwork.com as your From Address, SPF validates the SMTP MAIL-FROM Address, not the Display From Address that the recipient sees. The SMTP MAIL-FROM Address, meaning the From Address that shows in the SMTP logs, will always be your-username@jangomail.com, unless one of the following is true:

a. You've setup a custom sub-domain, in which case your sub-domain will be reflected in the SMTP MAIL-FROM.
b. You've explicitly set the SMTP MAIL-FROM by unchecking the "Use JangoMail MAIL-FROM" box on the Send Email page.
c. You've explicitly set the UseSystemMailFrom=False option when calling the SendTransactionalEmail API method.
d. You've unchecked the Use JangoMail MAIL-FROM box under My Options --> SMTP Relay, and you are using the smtp relay to send your emails.

2. Ensure your emails are signed with DKIM and DomainKeys. Once you've setup your account to use DKIM, your emails will automatically include both a DKIM and a DomainKeys signature. Read this guide to learn how to configure DomainKeys/DKIM for your emails.

3. Determine whether the content of a "You have a new message" email should appear in the Body of the email or whether you want to force the recipient to login to read the new message. I personally favor Facebook's method of including the original message in the email notification but forcing the recipient to log in to reply. After all, free social networks need to generate page-views, and they can't do that if they allow back-and-forth messaging functionality all within email.

4. Determine whether the From Address and/or Reply-To Address should be that of your social network's or that of your user's. We've seen that in the big four, LinkedIn is the only site that shows the user's name and email address in the From Line. This allows the recipient to conveniently reply to the sender of the message via email. The disadvantage to the owner of the social network is that the recipient need not visit the web site to reply to the message.

Therefore, If you wish to optimize the user experience, set the user's name and email address in the From line of the email message. If you employ this strategy, consider these points:

a. Your emails will still be SPF compliant so long as the SMTP MAIL-FROM contains a single consistent domain and SPF has been setup properly for that single domain.
b. Return Path, a large email deliverability company, recommends that if you send emails in this manner, be sure to include the "Sender" header as well. In JangoMail, you need only contact Support to have the Sender header turned on for your account.
c. There is a small chance that your emails will not be SenderID compliant, since SenderID may validate the Display From Address in addition to the SMTP MAIL-FROM address. However most SPF records for major domains are just that - SPF records and not SenderID records. In fact, all four major social networking sites use ONLY SPF, and not SenderID, as shown by the v=spf1 designation in all four records.

The best option may be to set the From Address to the social network site's address (noreply@MySocialNetwork.com), and set a Reply-To Address equal to the user's email address. The advantage of this setup is that you need not worry about any SPF/SenderID issues, and if the recipient hits Reply, the email will be directed to the user, not the social network web site. It is still important to have the Sender header inserted, however.

Wednesday, September 02, 2009

Data Center Move Part 3: What went right and what went wrong

This is the third and final installment of my account of our data center move. This is also my favorite part to write, because I get to recount everything that went right and wrong during after the move. And I love learning from my mistakes.

I had six total people, including myself, coordinating this effort - five at the data centers, and one remote man who could test connectivity from the outside and perform tasks related to the move that didn't require being on-site. Our downtime was scheduled from 12:00 AM to 4:00 AM EST on a Saturday night / Sunday morning.

11:00 PM

The five met at the New Data Center at 11:00 PM. I wanted to make sure the New Data Center had the WAN cable drop ready, that our access card worked, that the combinations to the cabinet doors worked, and plan out our access into and out of the building and determine where to park our cars while unloading equipment. The New Data Center came equipped with remote-access Power Distribution Units (PDUs) in both cabinets, one cabinet with 30 Amps and one with 20 Amps.

11:30 PM

Our inspection of the New Data Center was complete, and we drove the eight blocks to the Old Data Center to prepare for the take-down of the core servers: the web server, the database server, and the API server.

11:40 PM

We arrived at the Old Data Center. The primary check we needed to do prior to shutting down the servers ensuring that the standby database server had up-to-date copies of files from the primary database server. The primary log-ships to the standby server, so we needed to ensure the last of the log files had been restored to the standby server.

12:00 AM

The equipment was transported in three waves:

1. At 12:00 AM, I took the firewall and the main switch to the New Data Center. I wanted to get the firewall appliance connected to the WAN connection from the New Data Center, plug my laptop in to verify that connectivity to the Internet was working. The team transporting the three core servers would wait for my call from the New Data Center signaling that connectivity was working.

2. After my call to verify connectivity, a team of two (the primary two) would begin dismantling the three core servers (web server, database server, API server), and transport them to the New Data Center.

3. Then the remaining two (the back two) would begin dismantling and transporting the last set of servers, including corporate mail, incoming mail, standby DB server, corporate web server, the rendering servers, and a few other standby servers and appliances.

12:30 AM

Testing connectivity was a critical first step, because the type of connectivity was different from that of the Old Data Center. The New Data Center required a layer 3 routing device, while the Old Data Center did not. The firewall had to be reconfigured with a static IP, a range of DMZ IPs, and a gateway. All DMZ devices would then use the firewall as the gateway. At the Old Data Center, all devices, including the firewall, were assigned IPs from the same class C, and the gateway IP for all devices, was the same IP, and an IP provided to us by the Old Data Center. While the new connectivity setup seemed complicated at first, it worked just as expected.

After the firewall and switch were plugged in and connected, I had the sixth man who was remote, connect to the firewall via web browser, and begin changing the Network Address Objects' IP addresses. Simultaneously, he logged into our third party DNS system and begin making DNS changes for the main domain names, like jangomail.com and www.jangomail.com and mail.jangomail.com and relay.jangosmtp.net.

12:45 AM

After having verified connectivity, I called the core two, and had them initiate take-down and transport of the three core servers.

1:15 AM

Within thirty minutes, the team arrived, and we began rack-mounting the three core servers. We plugged them in, connected them to the switch, but they did not come back online.

This is where we ran into problem #1: While the IPs for the New Data Center had been assigned to the second NIC on each of the core 3 servers, the second NIC on each of the servers was disabled. Each needed to be re-enabled for connectivity to the servers to be established. I needed to login to each server, and activate the NIC, but I forgot the Keyboard-Video-Mouse (KVM) unit at the Old Data Center, and had no way of configuring anything on these three servers.

1:45 AM

I took a vehicle back to the Old Data Center, retrieved the KVM switch, checked in on the back two that were moving the ancillary servers, and drove back to the New Data Center.

2:15 AM

I mounted the KVM switch, and one by one, plugged it into each of the 3 core now-mounted servers, and activated the NIC with the new IP. One problem - I could only activate the NIC on 2 of the 3 servers. On the third, the API server, the keyboard/mouse ports were USB, and our KVM unit doesn't have USB connectors for keyboard/mouse. I had no way of actually configuring the API server. The New Data Center had mentioned the availability of a KVM crash cart on the floor, so I searched for it.Ten minutes later, I found it, locked behind another customer's cage.

2:45 AM

The web server and database server were operational, 75 minutes prior to the end of our downtime window. However, the API is mission critical to 30% of our customers, so I had to find a way to get it up and running.

Unable to fabricate an immediate solution to this problem, I drove back to the Old Data Center to check on the two men dismantling the ancillary servers.

3:00 AM


I pulled up, and they were just loading up the car. All equipment had now been removed from Old Data Center's racks. I was going to lead the car back to the New Data Center.The car wouldn't start. The car's battery had died while the flashers were on in, parked in front of the building.I pulled up alongside the car, we moved the equipment from his vehicle to mine, had a quick chat with the building doorman about not towing the now defunct car, piled back into my vehicle, and the three of us drove to the New Data Center.

3:20 AM

We loaded up the ancillary equipment on carts, and rolled the carts into the New Data Center. Once the equipment was in the data center, I had my core two men rack-mount the equipment. I let the back two use my laptop, still connected directly to the firewall, to Google for a store that was still open at the hour that would carry jumper cables. They didn't find one, but left anyway in my vehicle in search of jumper cables.

The three of us continued to rack-mount the remaining ancillary equipment. One by one, we turned each device on, and re-configured the NIC for its new IP address

After all equipment was powered on and connected, I still needed to get the API up. I decided to use the standby database server as my new primary API server, until such time as I could configure the API server with a USB keyboard/mouse.

Post Move - 5:30 AM

The three of us drove back home to Dayton, Ohio.

Post Move - 6:30 AM

I arrived at my home in Dayton and fired up my laptop to do some final testing from the outside. I noticed a few problems:

My BlackBerry wasn't receiving any email.

My BlackBerry receives all my work email by having our corporate mail server, mail.silicomm.com, forward my email to my TMobile BlackBerry email account. The email wasn't being forwarded.

I first remoted into mail.silicomm.com, did an MX lookup on tmo.blackberry.net, which is the T-Mobile BlackBerry domain, and attempted to connect to port 25 on that IP. The connection was made and immediately dropped. I assumed it was due to the new IP for mail.silicomm.com never having sent email before. As a temporary work-around, I had mail.silicomm.com forward my email to my GMail account, and then I had my GMail account forward email to my TMobile BlackBerry account.

Email I was sending wasn't being received.

I sent a test email from mail.silicomm.com to my GMail account, and it wasn't received.

Our corporate email system, IceWarp, has its own DNS settings, separate from the DNS settings in Windows TCP/IP. I hadn't changed the DNS servers of IceWarp to the New Data Cetner's DNS servers, and therefore, outbound email wasn't being transmitted since the MX lookups couldn't be performed.

JangoMail notification emails weren't being received.

All JangoMail notifications, such as "Sending Complete" and "Import Complete" notifications are sent via the JangoMail SMTP relay service. You can authenticate into the SMTP relay either by IP address or by From Address. For our own system notifications, we authenticate by IP address. We had forgotten to change the authenticated IP address in our own JangoMail account - the one that handles all these customer notifications. Once that was changed, customers received the email notifications.

One-Three Days After Data Center Move

Clients were reporting that the API was still inaccessible.

The API was inaccessible to at least three clients, because they were connecting to the API by IP address, rather than the domain name api.jangomail.com. This was discovered in the 48 hours following the move.

Clients were reporting that the JangoMail web-database connectivity feature wasn't working.

Clients were unable to use the web-database connectivity feature of JangoMail because they had restricted access to their web servers/database servers by IP address, and they had our Old Data Center IP range in their firewalls. They did not have the New Data Center IPs in their firewalls.

Five Days After Data Center Move

While examining the headers of a forwarded email in my GMail account, I noticed that an email forwarded by mail.silicomm.com to my GMail account had the originating IP listed as
209.173.128.118, which is not the IP for mail.silicomm.com. The IP for mail.silicomm.com is 209.173.141.195, and the reverse lookup for this IP is appropriately, mail.silicomm.com.

Upon further examination, I discovered that all servers in the DMZ were connecting to the outside as 209.173.128.118, which is the IP of the firewall. If I pointed a web browser on any device to www.whatismyip.com, the result was the same - 209.173.128.118. I was baffled, since all devices had been assigned public, routable IP addresses. I called SonicWall support, but I was declined support because I didn't have a support contract, and they refused to do a per-incident support ticket. I posted on the SonicWall forums, and was told I had to put in a Network Address Translation (NAT) setting in order for the firewall to prevent the firewall IP from routing the outbound connections. Once, I did that, the original BlackBerry forwarding issue was resolved. The BlackBerry email server was dropping the connection immediately because the source IP of the connection, 209.173.128.118, did not have a reverse DNS entry for it. It was the firewall's IP, so it doesn't need a rDNS entry.

Lessons Learned

There were several areas where planning was weak, including planning what equipment would be moved in what phase, and under assessing the impact of switching to new IP addresses. Given the experience I've outlined, I would have done the following differently:
  1. Activated the NIC with the New Data Center IP while still at the Old Data Center.
  2. Made sure to have a USB keyboard/mouse with me.
  3. A week prior to the move, informed all customers that our IP addresses were changing, including the new IP range.
  4. Informed all API customers that anyone connecting directly to our IP address needed to connect to the domain api.jangomail.com instead.
  5. Been more conscious of the JangoMail accounts that we ourselves use and dependencies on IP addresses.
  6. Been more aware that some applications don't inherit the TCP/IP settings of Windows, and have them set manually.
  7. And lastly, I would have made sure at least one vehicle carried jumper cables.

Friday, August 28, 2009

New Video Tutorial: Configuring Gmail with the JangoMail SMTP Relay

We have a new Video Tutorial on how to configure Gmail with the JangoMail SMTP Relay Service:

Video: How to Configure Gmail with the JangoMail SMTP Relay

The JangoMail SMTP Relay improves your sending capabilities by providing the following features for your person-to-person emails:
  • Open Tracking
  • Click Tracking
  • DomainKeys/DKIM Signing
  • Easy web-based access to SMTP Logs
  • API methods to retrieve Reporting data
For more information, screenshots, and written directions, read our post on configuring your Gmail account with the JangoMail SMTP Relay Service.

If you are not a Gmail user, watch our other Video Tutorials on Configuring the JangoMail SMTP Relay with a Desktop Client or a Web Server.

Monday, August 24, 2009

Data Center Move Part 2: The Planning

There were several components to my planning of the data center move:
  1. Connectivity / IP Addresses - We'd be getting a whole new set of IP addresses, so I had to map the old IPs to the new IPs. And, in order for the domains to map to the new IPs as soon as we came live at the new data center, I set all TTLs for important domains, like jangomail.com, to 60 seconds.
  2. Reverse DNS - Because we operate several email servers, I needed to ensure that rDNS would work properly on all of them post-move. So prior to the move, I contacted the support team at the New Data Center to put in rDNS entries for 3 of our IPs, that would map to mail.silicomm.com, mx.jngo.net, and mail.jangomail.com.
  3. Human labor – I’d need to find enough people to move our equipment over in the desired amount of time.
  4. Rails – we were moving from non-Dell cabinets to Dell cabinets. We needed to make sure the rails purchased for our non-Dell cabinets would work in the New Data Center’s Dell cabinets.
  5. Firewall – All of our server access rules are controlled by our firewall, and we’d need to make sure the firewall would be ready, with the rules around the new IP addresses.
  6. Software/OS configuration changes – We would need to ensure that all web servers, email servers, database servers that had services bound to a designated IP would have those IPs changed.
IP Addresses

Prior to the day of the move, I took inventory of all our equipment and the IPs to which they were assigned. I then made an Excel spreadsheet detailing the device, the old IPs assigned to it, and the new IPs that would be assigned to it.

Reverse DNS

This was relatively simple. Since we don't own our IP addresses, and since we only control forward DNS in our own DNS system, we would rely upon the New Data Center's tech team to setup rDNS for us on 3 of our IPs, each of which would be assigned to a different email server.

Human Labor

I enlisted a team of 6, including myself: 5 doing the actual move, and one man on the outside to test things and remotely make configuration changes to our firewall and DNS system. The goal was to be offline for no more than 4 hours. The plan was simply this:
  1. Within the first hour, I’d take the firewall from the Old Data Center to the New Data Center to test connectivity to the Internet. A team of 2 would begin dismantling the most important servers, the primary web server, the primary database server, and the API server, but not load them into a vehicle until I had called from the New Data Center to verify connectivity was working. If for some reason, connectivity wasn’t working, then we could easily fall back to the Old Data Center.
  2. Within the next two hours, the primary 2, would meet me in the New Data Center with the 3 most critical servers.
  3. During the last hour, the secondary 2 would arrive at the New Data Center with all other remaining equipment.
Rails

Several weeks prior to the move, we brought rails from the Old Data Centerto the New Data Center to ensure they were compatible. They were not, but with a lug, washer, and screw, we made them work. We tested mounting a 1U server two weeks prior to the actual move.

Firewall

We only had a single firewall appliance running in the Old Data Center, and I contemplated purchasing a second appliance of the same model prior to the move. That would enable me to have the second firewall configured with the New Data Center’s IPs, and would save time in having to manually re-configure the Old Data Center’s firewall mid-move. But the price tag of the appliance convinced me otherwise, and I determined it would only take 10 minutes of time to swap out the old IPs on the firewall with the new IPs once I was able to connect my laptop to its LAN port in the New Data Center.

A few days prior to the move, we became aware of a major difference between our connectivity setup at the Old Data Center and the setup at the New Data Center. At the Old Data Center, we were simply given a range of IPs, the gateway to use, and DNS servers. All servers and devices, including the firewall, used this configuration.

The connectivity setup at the New Data Center was slightly more complicated, requiring a Layer 3 routing device. An IP would have to be assigned to the WAN port on the firewall, and that WAN port would use the facility’s IP as its gateway. Then, all the devices connected to our DMZ would use the IP that we assigned to our firewall as their own gateways. I was wary of this new setup, and therefore wanted to test connectivity during the move (see step 1 above), prior to reaching a point of no-return with the move. And because the firewall would now serve as the gateway for all of our servers on the DMZ, the firewall now became an essential piece of equipment to our uptime. At the Old Data Center, the firewall could've been removed from the network flow, and the servers would remain online. But that wasn't the case anymore.

Software Configuration Changes

The software services that runs JangoMail consist of web servers, FTP servers, SMTP servers, SQL Servers, and some monitoring systems. We have many instances of IIS running, and prior to the move, I documented which applications would need configuration changes because of the new IPs. Many of our IIS web/FTP sites are bound not to a specific IP but to “All Unassigned”, which benefitted us because it prevented us from having to manually change the IP that a particular site was bound to. But some services, like our corporate email server (IceWarp), has to have DNS servers specifically set by IP address. It doesn’t inherit the DNS settings from the NIC’s configuration. It was given that I’d be changing the IPs and Gateway and DNS servers associated with all NICs on all servers, but I wanted to minimize the amount of additional IP changes that would have to be done on top of NIC changes.

With my planning finished a mere 3 hours before the move, I was ready to go. Stay tuned for Part III, where I’ll detail what went right and what went wrong during the move, and whether we were able to complete the move within our announced downtime window!

Wednesday, August 19, 2009

Data Center Move Part 1: Why I did it

This past Saturday night, we did what I hope we only have to do every 5 or more years - we switched data centers. We had to physically move all of JangoMail's servers out of one data center and into another, while minimizing the impact to our customers. JangoMail was offline during the entire move, which makes it a stressful experience. The worst case scenario would be for JangoMail to go offline and never come back online. A lot would have to go wrong to realize the worst case scenario, but still, taking your life’s work offline for multiple consecutive hours while trying to move and reconfigure equipment as fast as possible in the wee hours of a Sunday morning is a large undertaking. I decided to write this article 1. As an account for myself of what we did right and wrong and 2. To help other sysadmins navigate the messy waters of switching data centers.

Why I moved data centers

Because this article isn’t meant to be an endorsement or complaint against any organization, I won’t mention the names of the old data center and the new data center. I’ll simply refer to them as Old Data Center and New Data Center.

I moved because I was unhappy with Old Data Center. Old Data Center has colocation facilities all over the country and if the cabinet space you need is under a certain size, you can’t buy from them directly – you must go through one of their reseller partners. We rent two full cabinets to house our equipment.

My issues with Old Data Center were:

  1. Every 6 months for the last 2 years, they increased their prices.
  2. I was unimpressed with the onsite staff at Old Data Center. Whenever I needed to see them in person for a simple issue of getting a new access card made for a new employee, I’d show up at the assigned time and ALWAYS have to wait for the right person to be available. They weren’t dressed well, nor were they articulate.
  3. 18 months ago, when we expanded from 1 to 2 cabinets, as part of the setup for the second cabinet, a data center employee physically cut the Ethernet cable connecting our main cabinet #1 to the WAN. We were down for 7 hours, and no explanation was provided by Old Data Center as to why that employee thought cutting that cable was a good idea. I requested a credit, and 12 days later Old Data Center responded that we would get a credit of $46.38. I protested, citing the damage that had been done to our business, and the credit was increased to $324.66. Our monthly fee for the one cabinet was approximately $1,200 and it was my opinion that for such a boneheaded mistake, one full month’s credit should’ve been issued.
  4. Setup fees were, in my opinion, too high. When we needed to increase the Amps for our power circuit, along with paying for the increased Amps, a $500 setup fee was incurred. When we wanted to move from regular cabinet doors to mesh cabinet doors for better heat dispersion, we were quoted $750 per cabinet.
  5. On the few times that there were connectivity issues with Old Data Center, we would call the support number of the reseller reporting the outage, and then they would contact Old Data Center on our behalf to get an explanation. I didn’t like the multiple layers involved.
  6. After four years of having been a customer of reseller of Old Data Center, and never being late on an invoice payment, they suddenly informed us that we owe them a security deposit of $2,315. The reason? Apparently Old Data Center was suddenly charging reseller a security deposit. Again, I wasn’t pleased with this multi-vendor relationship.
  7. I didn’t feel like a valued customer.

The reasons I picked New Data Center were:

  1. It was a single, independent company that owned, managed, and supported the datacenter.
  2. The monthly fee was less than that of Old Data Center, and already included mesh doors.
  3. New Data Center also provided Internet-connected PDUs in each of our two cabinets.
  4. When I toured New Data Center, all employees were dressed professionally, courteous, gave me their business cards, explained their positions, and offered help above and beyond what I expected.

Stay tuned for Part II, in which I’ll detail the planning of the data center move, and Part III, where I’ll detail the actual move operation.

Thursday, August 13, 2009

Behind the scenes of our SMTP Service

The last couple of days have been a challenging and learning experience for the team in charge of our SMTP Relay service. As the volume of email that passes through the relay grows, so do the problems associated with performance and scaleability.

The first version of the relay service operated in a single-threaded linear fashion. Meaning, all emails sent to relay.jangosmtp.net, are processed and sent one at a time. Whenever an email message arrives at relay.jangosmtp.net, the following steps are executed:

1. The originating IP address and From Address are examined to determine what JangoMail user account the email message belongs to.

2. The tracking options for that particular user are loaded.

3. The email message is disassembled, the tracking mechanisms are added, and then the email message is re-assembled.

4. The email message is transmitted to the appropriate JangoMail sending server, based on whether the user account is enrolled in the Sender Score Certified program or based on the domain of the recipient email address.

5. The sending email server receives the message, signs the message with DomainKeys and DKIM, and then transmits the email message to the final destination email server based on the recipient domain.

A traditional SMTP service is much simpler and only need incorporate step 1 and a part of step 5. It can take our processes anywhere from 1/10th of a second to 3 seconds to process and deliver a single email message. The amount of time depends on the size of the email, the encoding of the email, and the tracking options selected.

Tuesday morning, a larger client submitted 50,000 transactional emails to relay.jangosmtp.net. The emails arrived at the relay.jangosmtp.net server over the course of 5 hours. Our linear process went to work, processing and transmitting each email message individually. As a result, email messages from other, smaller customers were delayed, some for serveral hours. We were alerted to the problem very soon after the backlog of emails began to build. Since all JangoMail staff members use the JangoMail SMTP relay with their desktop email clients, it was easy for them to tell something was wrong -- emails my staff was sending to customers, prospects, and themselves weren't being received. We realized that having a single process, processing each email one by one, just isn't going to cut it for scenarios where a customer transmits a large amount of email messages at once.

Within 8 hours, we re-architected the entire process so that multiple threads could operate on groups of email messages simultaneously. We now have the ability to assign a dedicated processing thread to any large customer. Since all threads operate simultaneously, a large amount of email from one customer will now not hold up a small amount of email from another customer.

After we deployed the new code last night, we thought our problems were solved. We awoke Wednesday morning to discover that the multi-threaded process was crashing, due to I/O concurrency issues. The constant crashing of the new code resulted in another backlog of undelivered email messages. Within 10 minutes, we discovered the issue was that all threads were attempting to log their activities to the same log file, and this was resulting in I/O disk errors. Within 60 minutes, we corrected the issue and deployed the new code, and since then (1:15 PM EST Wednesday), there have been no errors and no backlogs.

We appreciate our SMTP service customers' support over the last couple days, especially those that were adversely affected by the backlog of emails. One of the factors that I think makes JangoMail a different type of company is the awesome skill and dedication of our developers. When an architectural or functional issue is discovered, our programmers work until the problem is solved. We react swiftly, coming up with creative solutions to difficult problems within minutes -- because if we don't, the problems get worse: backlogs build, emails stop sending, the system could grind to a halt. The types of development problems we face are generally not the type that can be solved with a Google search. This is because much of JangoMail's technology is unique. We've done things that very few companies in our industry have done, including writing our own Email Rendering Tool, writing our own SMTP sending engine from scratch, and now, introducing the world's first SMTP service with tracking.

I hope this post sheds some light on the inner workings of the JangoMail operation, my staff's committment to delivering an amazing service, and a little bit of our secret sauce on how the magic happens. If anyone has any questions about the SMTP service, email me at ajay AT us dot jangomail dot com.

Sunday, August 02, 2009

You Can Now Use JangoMail's SMTP Relay with Gmail

Gmail allows users to use a custom From Address to send emails, but some recipients would see both the From Address and the Gmail Sender Address. Messages would appear to recipients as "From username@gmail.com On Behalf Of username@customdomain.com". Gmail announced last week that users can now send with their custom domain through their company's email servers, and that will eliminate the "On Behalf of" message.

This means that you can now use the JangoMail SMTP Service with Gmail to eliminate the "On Behalf of" message and give you the added benefits of Open Tracking, Click Tracking, SMTP logging, and participation in the Sender Score Certified Program.

Configuring Your JangoMail Account

Before configuring Gmail, set up an Authentication method in your JangoMail account.

1. Go to My Options and Settings > SMTP Relay.
2. Choose Authenticate by From Address, click From Addresses and enter the From Address from which you’ll be sending email. You may enter multiple From Addresses.


Configuring Your Gmail Account

Set up Gmail to use the JangoMail SMTP Relay.

1. Sign into your Gmail account.
2. Go to Settings in the upper right corner.
3. Click on the Accounts tab.
4. In the Send mail as: section, click edit info next to the account that you want to change. You should already have your business email address set up to send through Gmail. If you don't, click the Add Another Email Address option and follow the steps provided.
5. Hit the Next Step button.
6. Choose the Send through [your domain] SMTP servers option.
7. For SMTP Server, erase the default address and type in relay.jangosmtp.net
8. For Port, choose 25.
9. Type in your JangoMail Username and Password.
8. Save Changes
Gmail will now send your emails via the JangoMail SMTP Relay instead of its own servers. Enjoy the Open Tracking, Click Tracking, and all the other benefits of our SMTP service.


To view Gmail's announcement of this new feature, visit:
http://gmailblog.blogspot.com/2009/07/send-mail-from-another-address-without.html