An internal system that generates configurations — essentially, information that tells other systems how to behave — encountered a software bug and generated an incorrect configuration. The incorrect configuration was sent to live services over the next 15 minutes, caused users’ requests for their data to be ignored, and those services, in turn, generated errors.
Posted by Matt Seber on Saturday, January 25, 2014
Even the Internet giant has no excuse from undergoing technical difficulties, as manifested by a massive Google service outage on Friday that disrupted numbers of Google service such as Documents, Calendar, Google+ and Gmail.
While it was just a brief outage, the disruption had undeniably initiated serious troubles to millions of Google users worldwide.
The Internet giant has already extended its apology for the inconvenience caused by the most recent service interruption and guaranteed that Google’s technical support group has already identified the culprit and fixed the problem.
In a statement released Friday evening, Google has explicated that the outage was caused by a software bug. It further said that the errors users had encountered within Gmail and other Google services during that time were due to an internal software bug that eventually triggered users' data requests to be ignored.
On the company's blog post, Google's VP of Engineering, Ben Treynor said:
The outage reportedly started Friday at 10:55 AM (PST) and lasted 25 to 55 minutes, affecting as much as 10 percent of Google users worldwide.
During these times, Gmail has stopped working for web and mobile applications, as well as third-party clients like Apple Mail. There were also reports that some Google Drive users were having connection issues during the outage. YouTube video service also appeared sluggish.
Google's main search products as well as the Google homepage were not affected though.
The crash of multiple Google services was sudden, brief and unexpected. Nevertheless, Google has assured the public that they are now in the process of implementing new systems that would thwart any similar problems from happening again in the future.
"Whether the effect was brief or lasted the better part of an hour, please accept our apologies - we strive to make all of Google's services available and fast for you, all the time, and we missed the mark today," Treynor said.
"The issue has been resolved, and we're now focused on correcting the bug that caused the outage, as well as putting more checks and monitors in place to ensure that this kind of problem doesn't happen again.”
Even so, many people are still questioning the aptitude of Google's Site Reliability Engineering team because of what happened.