Saturday, September 19, 2009

Technical Discussion of How Social Networking Sites Send Transactional Email

Shortly after JangoMail launched its own transactional email platform, we took on a flood of new clients with a need to send "social networking" type transactional emails, including "You have a new message" notifications and "You have a new friend request" notifications. Then we launched our SMTP relay for email marketers, and the floodgates ripped off.

We're often asked what the best practices are from a technical standpoint, with respect to issues like Content, From Addressing, DomainKeys and DKIM, and SPF/SenderID. Should the content of a message sent from one member to another be included in the email notification, or should the recipient have to log in to the site to read the message? Should a "friend request" email come "from" the friend or "from" the social networking site? Should the email contain Plain Text, or HTML, or both? To best answer these questions, I've analyzed transactional email from the most notable social networking sites -- Facebook, LinkedIn, Friendster, and MySpace. Note that this is an article about the technical aspects of sending transactional email. If you're looking for an article on transactional email strategy (what to send, how often, to whom...), then you won't find that here, but you can in many other places.

The following are the properties of each social networking site's email that I will analyze:
  1. Presence of a DomainKeys header.
  2. Presence of a DKIM header.
  3. Sender Policy Framework (SPF) compliant email.
  4. The From Address
  5. The From Display Name
  6. The SMTP MAIL-FROM Address used in the SMTP transaction
  7. The Sender header, if present
  8. The Reply-To header, if present
  9. Use of a unique identifier in the SMTP MAIL-FROM address
  10. The content, particularly whether a friend's message is present in the Email Body, or whether the user must log in to read the message.
  11. The use of HTML versus Plain Text MIME parts.
Let's get started with the biggest social network of all, Facebook:

Facebook:

When somebody sends me a message on Facebook, I get an email notification with the Subject line:

"Mary Parker sent you a message on Facebook..."

The Body of the email contains the actual message, which is convenient, because it prevents me from having to log in to Facebook to read the message.

From Address: notification+mummiv~d@facebookmail.com
Reply-To:
noreply@facebookmail.com
From Name:
Facebook
DKIM-Signature:
v=1; a=rsa-sha1; d=facebookmail.com; s=q1-2009b; c=relaxed/relaxed;q=dns/txt; i=@facebookmail.com; t=1253415012;h=From:Subject:Date:To:MIME-Version:Content-Type;bh=fLwGefDfPrLKUK2vwWzaC5k5iNo=;b=DfJugajI0Fsc5FIppBQSfSAAiV+WuK1ugzc00USpIkJku3sUgcupO+TPua0Tcxw7qJDUE0yRohPMWz3jPwPfRA==;


A DKIM-Signature header for facebookmail.com is present, but it is noteworthy that a DomainKey-Signature header is not present.

The SMTP MAIL-FROM address is: notification+mummiv~d@facebookmail.com

Facebook Transactional Email SMTP Log:

69.63.178.170 [03A4] 22:50:13 Connected
69.63.178.170 [03A4] 22:50:13 >>> 220 mail.silicomm.com ESMTP IceWarp 9.1.0; Sat, 19 Sep 2009 22:50:13 -0400
69.63.178.170 [03A4] 22:50:13 <<< EHLO mx-out.facebook.com
69.63.178.170 [03A4] 22:50:13 >>> 250-mail.silicomm.com Hello mx-out.facebook.com [69.63.178.170], pleased to meet you.
69.63.178.170 [03A4] 22:50:13 <<< MAIL FROM:<notification+mummiv~d@facebookmail.com>
69.63.178.170 [03A4] 22:50:13 >>> 250 2.1.0 <notification+mummiv~d@facebookmail.com>... Sender ok
69.63.178.170 [03A4] 22:50:13 <<< RCPT TO:<AjayGoel@silicomm.com>
69.63.178.170 [03A4] 22:50:13 >>> 250 2.1.5 <
AjayGoel@silicomm.com>... Recipient ok
69.63.178.170 [03A4] 22:50:13 <<< DATA
69.63.178.170 [03A4] 22:50:13 >>> 354 Enter mail, end with "." on a line by itself
69.63.178.170 [03A4] 22:50:13 *** <notification+mummiv~d@facebookmail.com> <ajay@us.jangomail.com> 1 2021 00:00:00 OK BLS56313
69.63.178.170 [03A4] 22:50:13 >>> 250 2.6.0 2021 bytes received in 00:00:00; Message id BLS56313 accepted for delivery


The SPF record for facebookmail.com is:

"v=spf1 mx ip4:204.15.20.0/22 ip4:69.63.176.0/20 -all"

The connecting IP is 69.63.178.170, which is a part of the "ip4:69.63.176.0/20" directive and therefore this IP is SPF-valid for sending email "FROM" facebookmail.com.

Noteworthy Points:
  1. It's excellent that Facebook is employing both SPF and DKIM.
  2. The email lacks a DomainKey-Signature header for backwards compatibility for receiving systems that might not be DKIM-compliant yet.
  3. The email is from "Facebook" and not my friend "Mary Parker". This is likely to prevent people from attempting to reply to the email as a way of writing back to "Mary Parker". The only way to reply to the message is to login to Facebook and use their internal messaging system.
  4. Facebook's transactional emails come from the domain facebookmail.com rather than the more recognizable facebook.com. This is perhaps to separate email reputation from their actual high-value business domain facebook.com. That way, should their emails be treated as spam by a network, the domain name facebookmail.com will be punished, not facebook.com.
  5. The X-Mailer header of the email is:

    X-Mailer: ZuckMail [version 1.00]


    Presumably, ZuckMail is an email system written by, or an homage to, Facebook's founder Mark Zuckerberg.

LinkedIn:

LinkedIn sends transactional email differently than Facebook. While Facebook "friend request" email comes from "Facebook" and a no-reply facebookmail.com email address, LinkedIn's "Join my network" emails usually come "From" the person trying to add you.

The relevant headers from a LinkedIn transactional email are below:

DomainKey-Signature: s=prod; d=linkedin.com; c=nofws; q=dns; h=Sender:Date:From:To:Message-ID:Subject:MIME-Version: Content-Type:X-LinkedIn-fbl;b=KQVq9LfSNIZ+6tOuz7xUaUW1wx86f62EndMqyY+bbkmI8jHQDNe8HdtW ZfJsxH50d8nvAnQHryCdQKGfBEY0u2t5sXkrRmf7yCAhhg4Q9aqBo20KZ LSUvTK726opFnO0;
Sender: messages-noreply@bounce.linkedin.com
From: "Mary Parker" <maryparker@hotmail.com>
To: Ajay Goel <AjayGoel@silicomm.com>


Note that the the From Name is my friend "Mary Parker" and the From Address is Mary's email address, maryparker@hotmail.com. Because no Reply-To header is present, clicking "Reply" on my email client will auto-address the message to the From Address, maryparker@hotmail.com.

From an authentication perspective, it's interesting that the LinkedIn transactional email contains the DomainKey-Signature header, but not the more modern DKIM-Signature header. JangoMail transactional emails contain both the DomainKey-Signature and the DKIM-Signature headers, for the highest level of compatibility with all receiving email systems.

LinkedIn Transactional Email SMTP Log:

64.74.98.137 [057C] 12:16:54 Connected
64.74.98.137 [057C] 12:16:54 >>> 220 mail.silicomm.com ESMTP IceWarp 9.1.0; Thu, 03 Sep 2009 12:16:54 -0400
64.74.98.137 [057C] 12:16:54 <<< EHLO mail15-a-aa.linkedin.com
64.74.98.137 [057C] 12:16:54 >>> 250-mail.silicomm.com Hello mail15-a-aa.linkedin.com [64.74.98.137], pleased to meet you.
64.74.98.137 [057C] 12:16:54 <<< MAIL FROM:<s-vEJ2DMWLs51nh5KhMMSL430pk-MMh3YJi-KV_N5M-Qy28fihJyfjtc@bounce.linkedin.com> SIZE=3706
64.74.98.137 [057C] 12:16:54 >>> 250 2.1.0 <s-vEJ2DMWLs51nh5KhMMSL430pk-MMh3YJi-KV_N5M-Qy28fihJyfjtc@bounce.linkedin.com>... Sender ok
64.74.98.137 [057C] 12:16:54 <<< RCPT TO:<AjayGoel@silicomm.com>
64.74.98.137 [057C] 12:16:54 >>> 250 2.1.5 <AjayGoel@silicomm.com>... Recipient ok
64.74.98.137 [057C] 12:16:54 <<< DATA
64.74.98.137 [057C] 12:16:54 >>> 354 Enter mail, end with "." on a line by itself
64.74.98.137 [057C] 12:16:54 *** <s-vEJ2DMWLs51nh5KhMMSL430pk-MMh3YJi-KV_N5M-Qy28fihJyfjtc@bounce.linkedin.com> <AjayGoel@silicomm.com> 1 4048 00:00:00 OK KZT46754
64.74.98.137 [057C] 12:16:54 >>> 250 2.6.0 4048 bytes received in 00:00:00; Message id KZT46754 accepted for delivery


Noteworthy Points:

  1. 1. The SMTP MAIL-FROM address is a string of characters followed by @bounce.linkedin.com. LinkedIn is using a sub-domain of its high-value business domain, linkedin.com. JangoMail also offers the option to setup a sub-domain based off your corporate domain name. The string of randomly generated characters is presumably to track any bounce that occurs as a result of this particular transactional email. If the silicomm.com MTA sends a notification back to s-vEJ2DMWLs51nh5KhMMSL430pk-MMh3YJi-KV_N5M-Qy28fihJyfjtc@bounce.linkedin.com, the LinkedIn system will be able to immediately tie the identifier present in that email address and use that information to immediately notify Mary Parker that the email to me bounced. JangoMail also handles bounces to transactional emails, but with a unique identifier in the X-VConfig header of the message as opposed to within the MAIL-FROM address itself. Both options are legitimate for tracking bounces.
  2. The SPF record for bounce.linkedin.com is:

    "v=spf1 redirect=linkedin.com"


    The redirect directive then tells us to look up the SPF record for linkedin.com, which is:

    "v=spf1 ip4:70.42.142.0/24 ip4:208.111.172.0/24 ip4:64.74.220.0/24 ip4:64.74.221.0/26 ip4:64.71.153.211 ip4:64.74.221.30 ip4:69.28.149.0/24 ip4:208.111.169.128/26 ip4:64.74.98.128/26 ip4:64.74.98.16/29 mx ~all"

    The sending IP 64.74.98.137 is covered by the "ip4:64.74.98.128/26" designation, therefore this email passes the SPF test.
  3. Emails with subject line "Invitation to connect on LinkedIn" always has the user's information as the From Name / From Address.
  4. Emails with subject line "Join my network on LinkedIn" sometimes have the From Name / From Address of the user, but other times From Name is "Mary Parker (LinkedIn Invitations)" and From Address is invitations@linkedin.com.

    I have been unable to determine the reason for the ambiguity with the "Join my network on LinkedIn" emails.

Friendster:

The following are the relevant headers from a Friendster "new message" notification:

From: Friendster <message-2219486617@mail.friendster.com>
Subject: New Friendster Message from Vikki - 09/07/09 10:17 PM
To: AjayGoel@silicomm.com


Friendster Transactional Email SMTP Log:

209.11.169.65 [03A4] 01:17:58 Connected
209.11.169.65 [03A4] 01:17:58 >>> 220 mail.silicomm.com ESMTP IceWarp 9.1.0; Tue, 08 Sep 2009 01:17:58 -0400
209.11.169.65 [03A4] 01:17:58 <<< EHLO c350a-5.friendster.com
209.11.169.65 [03A4] 01:17:58 >>> 250-mail.silicomm.com Hello c350a-5.friendster.com [209.11.169.65], pleased to meet you.
209.11.169.65 [03A4] 01:17:58 <<< MAIL FROM:<message-2219486617@mail.friendster.com> SIZE=9837
209.11.169.65 [03A4] 01:17:58 >>> 250 2.1.0 <message-2219486617@mail.friendster.com>... Sender ok
209.11.169.65 [03A4] 01:17:58 <<< RCPT TO:<AjayGoel@silicomm.com>
209.11.169.65 [03A4] 01:17:58 >>> 250 2.1.5 <
AjayGoel@silicomm.com>... Recipient ok
209.11.169.65 [03A4] 01:17:58 <<< DATA
209.11.169.65 [03A4] 01:17:58 >>> 354 Enter mail, end with "." on a line by itself
209.11.169.65 [03A4] 01:17:58 *** <message-2219486617@mail.friendster.com> <
AjayGoel@silicomm.com> 1 10017 00:00:00 OK POZ63258
209.11.169.65 [03A4] 01:17:58 >>> 250 2.6.0 10017 bytes received in 00:00:00; Message id POZ63258 accepted for delivery


Noteworthy Points:
  1. The complete lack of both DomainKeys and DKIM signatures.
  2. The email is SPF compliant. The SPF record for mail.friendster.com is "v=spf1 mx ip4:209.11.168.0/23 mx:c300a-1.gbxsc.friendster.com -all" and the sending IP is 209.11.169.65, which is covered by the "ip4:209.11.168.0/23" directive.
  3. The email is not "from" my friend but instead is from "Friendster". The From Email is also not of my friend, but a Friendster unique email address, in this case message-2219486617@mail.friendster.com. The From Address is the same as the SMTP MAIL-FROM address shown in the SMTP transaction. Therefore bounces at the MTA level will be sent to message-2219486617@mail.friendster.com, and the unique number that's a part of that From Address presumably allows the Friendster mail system to know that it was this particular transactional email that bounced.
  4. There is no Reply-To address, so if the user does hit Reply in his email client, the reply will go to the From Address, message-2219486617@mail.friendster.com, and sending to this will presumably cause the email to be lost in the ether.

MySpace:

The following are the relevant headers from a MySpace "new message" notification:

DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; s=pmta; d=message.myspace.com;h=MIME-Version:From:To:Reply-To:Date:Subject:Content-Type:Content-Transfer-Encoding; i=noreply@message.myspace.com;bh=T25HyvwBLohBcz5zhA0xB0tJqdc=;b=SeqKqNrcg4jDIyy0lmlo2BJje2frO0hgd16E9M/Y7U42LNLPh+cLlr7q5muDN7ASuPzwaNIvjj9VKq9VzNSvnxNukRsF2ICj+jT7GFRHdO39feZXnyJgcGbu0eHGLF7v83PL8CApEF80w/f3HuZ8xZgzYQcagUvNeD2jUmTqj/4=
DomainKey-Signature: a=rsa-sha1; c=nofws; q=dns; s=pmta; d=message.myspace.com;b=SlJszI5cM6w78pK5nSJLzVTrrFw5YgU2vX8GraxP8yPjhbb6/hZ+oMjr7HUF21r8wyK774IEA/o/u3Y6z+kcpRHTf7D3TWD94TCZW+mQDGxl3t+FSdqLLt4ISVUvGu+LTAIMfpSOzVKNnz3C/4QHpAg1c2a2jRu1IFmrPbqOaK0=;
From: MySpace <noreply@message.myspace.com>
To:
AjayGoel@silicomm.com
Reply-To: MySpace <noreply@message.myspace.com>
Subject: MaryParker has sent you a new message!


MySpace Transactional Email SMTP Log:

204.16.33.83 [0664] 18:04:01 Connected
204.16.33.83 [0664] 18:04:01 >>> 220 mx.silicomm.com ESMTP IceWarp 9.1.0; Fri, 10 Jul 2009 18:04:01 -0400
204.16.33.83 [0664] 18:04:01 <<< EHLO vmta20.myspace.com
204.16.33.83 [0664] 18:04:01 >>> 250-mx.silicomm.com Hello vmta20.myspace.com [204.16.33.83], pleased to meet you.
204.16.33.83 [0664] 18:04:01 <<< MAIL FROM:<noreply@message.myspace.com> BODY=8BITMIME
204.16.33.83 [0664] 18:04:01 >>> 250 2.1.0 <noreply@message.myspace.com>... Sender ok
204.16.33.83 [0664] 18:04:01 <<< RCPT TO:<
AjayGoel@silicomm.com>
204.16.33.83 [0664] 18:04:01 >>> 250 2.1.5 <
AjayGoel@silicomm.com>... Recipient ok
204.16.33.83 [0664] 18:04:02 <<< DATA
204.16.33.83 [0664] 18:04:02 >>> 354 Enter mail, end with "." on a line by itself
204.16.33.83 [0664] 18:04:02 *** <noreply@message.myspace.com> <
AjayGoel@silicomm.com> 1 5669 00:00:00 OK RDO05602
204.16.33.83 [0664] 18:04:02 >>> 250 2.6.0 5669 bytes received in 00:00:00; Message id RDO05602 accepted for delivery


MySpace is the only one of the four social networking sites to use both the DomainKey-Signature and the DKIM-Signature headers, so kudos to MySpace for considering the entire email ecosystem.

Noteworthy Points:

  1. Both DKIM and DomainKeys headers present.
  2. SPF passes. The SPF record for message.myspace.com is "v=spf1 ip4:63.208.226.0/24 ip4:204.16.32.0/22 ip4:216.178.32.0/20 ip4:216.178.38.0/20 a ~all" and the sending IP is 204.16.33.83, which is included in the "ip4:204.16.32.0/22" directive.
  3. The email is From "MySpace" rather than "Mary Parker", and the From Address is noreply@message.myspace.com instead of maryparker@hotmail.com. The From Address clearly implies that you should not reply to it, unlike Friendster's address. Clearly MySpace wants you to log in to the site to reply rather than by email.
  4. The SMTP MAIL FROM is noreply@message.myspace.com, and does not contain a unique identifier.
  5. The Reply-To header is present, but its value is the same as the From Address. So regardless, if you hit Reply on your email client, the email is going to noreply@message.myspace.com.
Conclusion:

If I had to rank the social networking sites in order of best to worse in sending authenticated, compliant, and useful transactional email, this would be the ranking:

1. Facebook (Best)

Positive: Emails are DKIM and SPF compliant. Email notifications of "You have a new message" include the message in the body of the email.
Negative: No DomainKeys signature, and the Reply-To address is one that cannot receive replies.

2. MySpace

Positive: Emails are SPF, DKIM, and DomainKeys compliant - the only social network to use all three.
Negative: Notifications of "You have a new message" force you to login to read your message. The Reply-To address is one that cannot receive replies.

3. LinkedIn

Positive: Emails are SPF and DomainKeys compliant. Sometimes emails are "From" the actual friend rather than LinkedIn. Proper use of the Sender header.
Negative: No DKIM signature, which means LinkedIn is complying with the older DomainKeys standard but not the newer DKIM standard. Notifications of "You have a new message" force you to login to read your message.

4. Friendster (Worst)

Positive: Emails are SPF compliant.
Negative: No DomainKeys signature. No DKIM signature. Notifications of "You have a new message" force you to login to read your message. The Reply-To address is one that cannot receive replies.

JangoMail Best Practice Recommendations:

When using JangoMail to send transactional emails for a social network, you can optimize the deliverability and usability of your emails by following these guidelines:

1. Ensure your emails are SPF compliant. In most cases, your emails will already be SPF compliant. Whether you're using your-username@jangomail.com as your From Address or noreply@MySocialNetwork.com as your From Address, SPF validates the SMTP MAIL-FROM Address, not the Display From Address that the recipient sees. The SMTP MAIL-FROM Address, meaning the From Address that shows in the SMTP logs, will always be your-username@jangomail.com, unless one of the following is true:

a. You've setup a custom sub-domain, in which case your sub-domain will be reflected in the SMTP MAIL-FROM.
b. You've explicitly set the SMTP MAIL-FROM by unchecking the "Use JangoMail MAIL-FROM" box on the Send Email page.
c. You've explicitly set the UseSystemMailFrom=False option when calling the SendTransactionalEmail API method.
d. You've unchecked the Use JangoMail MAIL-FROM box under My Options --> SMTP Relay, and you are using the smtp relay to send your emails.

2. Ensure your emails are signed with DKIM and DomainKeys. Once you've setup your account to use DKIM, your emails will automatically include both a DKIM and a DomainKeys signature. Read this guide to learn how to configure DomainKeys/DKIM for your emails.

3. Determine whether the content of a "You have a new message" email should appear in the Body of the email or whether you want to force the recipient to login to read the new message. I personally favor Facebook's method of including the original message in the email notification but forcing the recipient to log in to reply. After all, free social networks need to generate page-views, and they can't do that if they allow back-and-forth messaging functionality all within email.

4. Determine whether the From Address and/or Reply-To Address should be that of your social network's or that of your user's. We've seen that in the big four, LinkedIn is the only site that shows the user's name and email address in the From Line. This allows the recipient to conveniently reply to the sender of the message via email. The disadvantage to the owner of the social network is that the recipient need not visit the web site to reply to the message.

Therefore, If you wish to optimize the user experience, set the user's name and email address in the From line of the email message. If you employ this strategy, consider these points:

a. Your emails will still be SPF compliant so long as the SMTP MAIL-FROM contains a single consistent domain and SPF has been setup properly for that single domain.
b. Return Path, a large email deliverability company, recommends that if you send emails in this manner, be sure to include the "Sender" header as well. In JangoMail, you need only contact Support to have the Sender header turned on for your account.
c. There is a small chance that your emails will not be SenderID compliant, since SenderID may validate the Display From Address in addition to the SMTP MAIL-FROM address. However most SPF records for major domains are just that - SPF records and not SenderID records. In fact, all four major social networking sites use ONLY SPF, and not SenderID, as shown by the v=spf1 designation in all four records.

The best option may be to set the From Address to the social network site's address (noreply@MySocialNetwork.com), and set a Reply-To Address equal to the user's email address. The advantage of this setup is that you need not worry about any SPF/SenderID issues, and if the recipient hits Reply, the email will be directed to the user, not the social network web site. It is still important to have the Sender header inserted, however.

Wednesday, September 02, 2009

Data Center Move Part 3: What went right and what went wrong

This is the third and final installment of my account of our data center move. This is also my favorite part to write, because I get to recount everything that went right and wrong during after the move. And I love learning from my mistakes.

I had six total people, including myself, coordinating this effort - five at the data centers, and one remote man who could test connectivity from the outside and perform tasks related to the move that didn't require being on-site. Our downtime was scheduled from 12:00 AM to 4:00 AM EST on a Saturday night / Sunday morning.

11:00 PM

The five met at the New Data Center at 11:00 PM. I wanted to make sure the New Data Center had the WAN cable drop ready, that our access card worked, that the combinations to the cabinet doors worked, and plan out our access into and out of the building and determine where to park our cars while unloading equipment. The New Data Center came equipped with remote-access Power Distribution Units (PDUs) in both cabinets, one cabinet with 30 Amps and one with 20 Amps.

11:30 PM

Our inspection of the New Data Center was complete, and we drove the eight blocks to the Old Data Center to prepare for the take-down of the core servers: the web server, the database server, and the API server.

11:40 PM

We arrived at the Old Data Center. The primary check we needed to do prior to shutting down the servers ensuring that the standby database server had up-to-date copies of files from the primary database server. The primary log-ships to the standby server, so we needed to ensure the last of the log files had been restored to the standby server.

12:00 AM

The equipment was transported in three waves:

1. At 12:00 AM, I took the firewall and the main switch to the New Data Center. I wanted to get the firewall appliance connected to the WAN connection from the New Data Center, plug my laptop in to verify that connectivity to the Internet was working. The team transporting the three core servers would wait for my call from the New Data Center signaling that connectivity was working.

2. After my call to verify connectivity, a team of two (the primary two) would begin dismantling the three core servers (web server, database server, API server), and transport them to the New Data Center.

3. Then the remaining two (the back two) would begin dismantling and transporting the last set of servers, including corporate mail, incoming mail, standby DB server, corporate web server, the rendering servers, and a few other standby servers and appliances.

12:30 AM

Testing connectivity was a critical first step, because the type of connectivity was different from that of the Old Data Center. The New Data Center required a layer 3 routing device, while the Old Data Center did not. The firewall had to be reconfigured with a static IP, a range of DMZ IPs, and a gateway. All DMZ devices would then use the firewall as the gateway. At the Old Data Center, all devices, including the firewall, were assigned IPs from the same class C, and the gateway IP for all devices, was the same IP, and an IP provided to us by the Old Data Center. While the new connectivity setup seemed complicated at first, it worked just as expected.

After the firewall and switch were plugged in and connected, I had the sixth man who was remote, connect to the firewall via web browser, and begin changing the Network Address Objects' IP addresses. Simultaneously, he logged into our third party DNS system and begin making DNS changes for the main domain names, like jangomail.com and www.jangomail.com and mail.jangomail.com and relay.jangosmtp.net.

12:45 AM

After having verified connectivity, I called the core two, and had them initiate take-down and transport of the three core servers.

1:15 AM

Within thirty minutes, the team arrived, and we began rack-mounting the three core servers. We plugged them in, connected them to the switch, but they did not come back online.

This is where we ran into problem #1: While the IPs for the New Data Center had been assigned to the second NIC on each of the core 3 servers, the second NIC on each of the servers was disabled. Each needed to be re-enabled for connectivity to the servers to be established. I needed to login to each server, and activate the NIC, but I forgot the Keyboard-Video-Mouse (KVM) unit at the Old Data Center, and had no way of configuring anything on these three servers.

1:45 AM

I took a vehicle back to the Old Data Center, retrieved the KVM switch, checked in on the back two that were moving the ancillary servers, and drove back to the New Data Center.

2:15 AM

I mounted the KVM switch, and one by one, plugged it into each of the 3 core now-mounted servers, and activated the NIC with the new IP. One problem - I could only activate the NIC on 2 of the 3 servers. On the third, the API server, the keyboard/mouse ports were USB, and our KVM unit doesn't have USB connectors for keyboard/mouse. I had no way of actually configuring the API server. The New Data Center had mentioned the availability of a KVM crash cart on the floor, so I searched for it.Ten minutes later, I found it, locked behind another customer's cage.

2:45 AM

The web server and database server were operational, 75 minutes prior to the end of our downtime window. However, the API is mission critical to 30% of our customers, so I had to find a way to get it up and running.

Unable to fabricate an immediate solution to this problem, I drove back to the Old Data Center to check on the two men dismantling the ancillary servers.

3:00 AM


I pulled up, and they were just loading up the car. All equipment had now been removed from Old Data Center's racks. I was going to lead the car back to the New Data Center.The car wouldn't start. The car's battery had died while the flashers were on in, parked in front of the building.I pulled up alongside the car, we moved the equipment from his vehicle to mine, had a quick chat with the building doorman about not towing the now defunct car, piled back into my vehicle, and the three of us drove to the New Data Center.

3:20 AM

We loaded up the ancillary equipment on carts, and rolled the carts into the New Data Center. Once the equipment was in the data center, I had my core two men rack-mount the equipment. I let the back two use my laptop, still connected directly to the firewall, to Google for a store that was still open at the hour that would carry jumper cables. They didn't find one, but left anyway in my vehicle in search of jumper cables.

The three of us continued to rack-mount the remaining ancillary equipment. One by one, we turned each device on, and re-configured the NIC for its new IP address

After all equipment was powered on and connected, I still needed to get the API up. I decided to use the standby database server as my new primary API server, until such time as I could configure the API server with a USB keyboard/mouse.

Post Move - 5:30 AM

The three of us drove back home to Dayton, Ohio.

Post Move - 6:30 AM

I arrived at my home in Dayton and fired up my laptop to do some final testing from the outside. I noticed a few problems:

My BlackBerry wasn't receiving any email.

My BlackBerry receives all my work email by having our corporate mail server, mail.silicomm.com, forward my email to my TMobile BlackBerry email account. The email wasn't being forwarded.

I first remoted into mail.silicomm.com, did an MX lookup on tmo.blackberry.net, which is the T-Mobile BlackBerry domain, and attempted to connect to port 25 on that IP. The connection was made and immediately dropped. I assumed it was due to the new IP for mail.silicomm.com never having sent email before. As a temporary work-around, I had mail.silicomm.com forward my email to my GMail account, and then I had my GMail account forward email to my TMobile BlackBerry account.

Email I was sending wasn't being received.

I sent a test email from mail.silicomm.com to my GMail account, and it wasn't received.

Our corporate email system, IceWarp, has its own DNS settings, separate from the DNS settings in Windows TCP/IP. I hadn't changed the DNS servers of IceWarp to the New Data Cetner's DNS servers, and therefore, outbound email wasn't being transmitted since the MX lookups couldn't be performed.

JangoMail notification emails weren't being received.

All JangoMail notifications, such as "Sending Complete" and "Import Complete" notifications are sent via the JangoMail SMTP relay service. You can authenticate into the SMTP relay either by IP address or by From Address. For our own system notifications, we authenticate by IP address. We had forgotten to change the authenticated IP address in our own JangoMail account - the one that handles all these customer notifications. Once that was changed, customers received the email notifications.

One-Three Days After Data Center Move

Clients were reporting that the API was still inaccessible.

The API was inaccessible to at least three clients, because they were connecting to the API by IP address, rather than the domain name api.jangomail.com. This was discovered in the 48 hours following the move.

Clients were reporting that the JangoMail web-database connectivity feature wasn't working.

Clients were unable to use the web-database connectivity feature of JangoMail because they had restricted access to their web servers/database servers by IP address, and they had our Old Data Center IP range in their firewalls. They did not have the New Data Center IPs in their firewalls.

Five Days After Data Center Move

While examining the headers of a forwarded email in my GMail account, I noticed that an email forwarded by mail.silicomm.com to my GMail account had the originating IP listed as
209.173.128.118, which is not the IP for mail.silicomm.com. The IP for mail.silicomm.com is 209.173.141.195, and the reverse lookup for this IP is appropriately, mail.silicomm.com.

Upon further examination, I discovered that all servers in the DMZ were connecting to the outside as 209.173.128.118, which is the IP of the firewall. If I pointed a web browser on any device to www.whatismyip.com, the result was the same - 209.173.128.118. I was baffled, since all devices had been assigned public, routable IP addresses. I called SonicWall support, but I was declined support because I didn't have a support contract, and they refused to do a per-incident support ticket. I posted on the SonicWall forums, and was told I had to put in a Network Address Translation (NAT) setting in order for the firewall to prevent the firewall IP from routing the outbound connections. Once, I did that, the original BlackBerry forwarding issue was resolved. The BlackBerry email server was dropping the connection immediately because the source IP of the connection, 209.173.128.118, did not have a reverse DNS entry for it. It was the firewall's IP, so it doesn't need a rDNS entry.

Lessons Learned

There were several areas where planning was weak, including planning what equipment would be moved in what phase, and under assessing the impact of switching to new IP addresses. Given the experience I've outlined, I would have done the following differently:
  1. Activated the NIC with the New Data Center IP while still at the Old Data Center.
  2. Made sure to have a USB keyboard/mouse with me.
  3. A week prior to the move, informed all customers that our IP addresses were changing, including the new IP range.
  4. Informed all API customers that anyone connecting directly to our IP address needed to connect to the domain api.jangomail.com instead.
  5. Been more conscious of the JangoMail accounts that we ourselves use and dependencies on IP addresses.
  6. Been more aware that some applications don't inherit the TCP/IP settings of Windows, and have them set manually.
  7. And lastly, I would have made sure at least one vehicle carried jumper cables.