Has Research In Motion joined the ever growing ranks of service providers that put their operations at peril by releasing inadequately tested software? For the second time in a week, BlackBerry users in the western hemisphere were without service for a period of several hours on Tuesday. For a service whose stock in trade is rock-solid reliability and security, this is becoming a real business problem.
The official explanation for what went wrong, issued this morning, is neither enlightening nor encouraging:
“A service interruption occurred Tuesday that affected BlackBerry customers in the Americas. Message delivery was delayed or intermittent during the service interruption. Phone service and SMS services on BlackBerry smartphones were unaffected. Root cause is currently under review, but based on preliminary analysis, it currently appears that the issue stemmed from a flaw in two recently released versions of BlackBerry Messenger (versions 22.214.171.124 and 126.96.36.199) that caused an unanticipated database issue within the BlackBerry infrastructure. RIM has taken corrective action to restore service.
“RIM has also provided a new version of BlackBerry Messenger (version 188.8.131.52) and is encouraging anyone who downloaded or upgraded BlackBerry Messenger since December 14th to upgrade to this latest version which resolves the issue. RIM continues to monitor its systems to maintain normal service levels and apologizes for any inconvenience to customers.”
BlackBerrys’ greatest strength is also its greatest weakness. RIM can guarantee security, and, most of the time, reliability, by channeling communications through its network operation centers. Other smartphones fetch and send mail by talking directly to mail servers, whether it’s a corporate Exchange server, Hotmail, or a personal ISP account somewhere. BlackBerrys talk only to RIM’s servers and those servers handle communications with the mail servers. Other data services, including Web browsing, also go through the RIM NOC. The NOC is the bulwark of the BlackBerry system, but it is also a single point of failure.
What is particularly disturbing about the RIM explanation is that the release of a new version of client software was able to bring down the network. RIM doesn’t tell us just how this happened, but it suggests poor programming practice, inadequate testing, or both.
RIM is the only smartphone maker other than Apple to gain market share since the release of the iPhone. RIM’s value proposition, never explicitly stated but always lurking at the back of its marketing, is that while the iPhone may be a lot of fun, the BlackBerry is the serious tool for folks who need to get things done. Two major outages in a week (the system was down for many users for several hours on Dec. 17) badly undercut that message. RIM is lucky that these failures occurred at a time of year when people’s attention to business is at a low ebb and the bad publicity has been relatively minor, but it cannot afford for this sort of thing to go on.