Amazon S3 Outage: Our Response

What happened?
On February 28, 2017 at 12:37 PM EST, Veoci users started experiencing problems accessing files stored in Veoci as well as uploading new files into Veoci.  The issues were triaged and eventually resolved at approximately 2:30 PM EST.

How did this happen?
One of Amazon’s system administrators accidentally entered an incorrect command which resulted in the complete shutdown of many Amazon servers, thus disabling Amazon S3 services in the US East Region (Virginia) and affecting thousands of customers.  Amazon’s summary can be found here: https://aws.amazon.com/message/41926/

Why this impacted Veoci?
Amazon Simple Storage Service, Amazon S3, is an internet storage system to store and retrieve any amount of data from anywhere on the web.  It is split up among 8 storage locations around the world. Veoci uses Amazon S3 storage in the US East Region in Northern Virginia and the US West Region in Oregon. All Veoci files are stored across several servers in Virginia and are automatically replicated to Oregon. 

What Veoci functionality was impacted?
Most Veoci use was unaffected; however, uploading files, downloading files, importing via Excel, exporting to Excel, emailing attachments to Rooms, and submitting entries into public Forms with attachment fields were inhibited.

How did the Veoci team respond?
Veoci’s technical development team agreed to wait a short time after the outage to see if Amazon would recover quickly. After about 90 minutes without a resolution from Amazon, the Veoci team decided at 2:00 PM EST to modify our configuration so Veoci’s files would be directly accessed and stored using our Oregon location. This failover process was fully in place at 2:30 PM EST. At that time, the only functionality that was still impacted was accessing some previous file versions.

Amazon S3 recovered from the outage at 4:54 PM EST. To ensure that Amazon had fully recovered, Veoci’s failover to Oregon continued for approximately two days, continuing to support almost all normal operations. On March 1, 2017 at 11:50 PM EST, our team reconfigured file access and storage back to the Virginia site, restoring all normal operations. 

How did the Veoci team communicate this issue?
Veoci’s primary tool for delivering outage notifications, Mailchimp, was also impacted by the Amazon S3 outage. After exploring other communication options, we emailed all Organization Administrators at 2:03PM EST.  An update was also sent to the same audience at 2:52 PM EST.

Next Steps
To further improve our recovery method, we are currently working to configure all Veoci files to be automatically stored and accessed via both the East and West Amazon S3 Buckets. This way, should the Virginia location encounter another issue, files will automatically be sent to Oregon with no failover process needed.

Veoci is also implementing a capability for our team to broadcast an Incident Alert to all users directly within Veoci, removing our reliance on Mailchimp for distributing timely communications.

Article written by

Julie Reynolds is the Marketing and Communications Specialist at Veoci. She is a recent University of Connecticut graduate and specializes in writing, communication, public relations, marketing and celebrity gossip.

Born and raised in New Haven, she especially loves the fast-paced lifestyle and cultural diversity of the city. She hopes to continue to develop a career where she is constantly meeting new people and bettering her writing. Go Huskies!

Leave a Reply