Lessons learned from the AWS disruption

Sharing thoughts and lessons learned from the AWS (us-east-1) disruption, including impact on ScreenshotOne.

Blog post 3 min read

Written by

Dmytro Krasun

Published on

On October 19 and 20, 2025 the AWS service disruption occurred in the N. Virginia (us-east-1) Region. It felt like almost half of the Internet was affected.

What happened?

The AWS team already has shared in many details their official version on what happened:

One of the AWS key services, Amazon DynamoDB, had a DNS failure due to a software “race condition”—a bug in their DNS automation system caused the main endpoint’s DNS record to be blank, so clients couldn’t connect.

And because many other AWS services (like EC2, Lambda, etc.) depend on DynamoDB and DNS, the failure cascaded: new EC2 instance launches failed, network load balancers misbehaved, and multiple services suffered elevated errors until manual fixes and throttling were applied.

In response AWS has disabled the faulty automation and is adding extra safeguards to prevent similar issues in future.

No impact on the ScreenshotOne API

On that day, we were carefully observing all our services and error logs. And luckily, but the ScreenshotOne API was not affected, at all:

ScreenshotOne API was not affected

Impact on the ScreenshotOne dashboard authentication

The most major impact was on the ScreenshotOne dashboard authentication process. Users who were using magic links with emails, could not sign in/up because our downstream email provider Resend relied on AWS:

Resend status page

Deploying to production failed

Since deploying to production relied on pulling images from Docker Registry, it was failing.

Docker Registry status page

To our luck, there were no critical issues with the production environment that we needed to fix. But deploying a few features our for customers was postponed.

Lessons learned

It happens

Since ScreenshotOne is hosted on DigitalOcean, Google Cloud and partially on Hetzner, we were not affected by the AWS disruption.

But it doesn’t mean that it won’t happen with our providers. Fails happen and it is a huge part of the software engineering.

We were lucky and that’s it. The question is how to avoid that in the future in case we are not lucky.

Self-host your Docker registries and CI/CD infrastructure

ScreenshotOne relies on GitHub Actions for CI/CD. And it worked great, however, they might fail, too.

But what really failed was the Docker Registry since it relied on AWS.

Self-hosting our Docker registries and CI/CD infrastructure would prevent that.

Backup email providers

ScreenshotOne uses Resend for email sending. And it didn’t work as expected. It can be prevented by setting up a few regions in Resend. And by adding a backup email provider.

There is one more argument in favor of the backup email provider. Sometimes they block you because if you are hacked, hackers might send a lot of spam through your account, in that case, you can switch quickly to the backup provider without any impact on your customers and focus your energy and time on fixing the security issue if happened.

Summary

The AWS us-east-1 disruption showed that even large providers can go down, and critical dependencies can break unexpectedly. ScreenshotOne’s core API stayed unaffected thanks to multi-cloud infrastructure, but third-party services relying on AWS, like email authentication and Docker Registry, became unavailable.

A few key lessons is to host essential build and deploy infrastructure yourself, to set up backup providers for critical features like email, and always expect that outages will happen—resilience requires preparation.

Read more posts

Interviews, tips, guides, industry best practices, and news.

View all posts
Cloudflare Browser Rendering

Cloudflare Browser Rendering

Cloudflare recently launched a new Browser Rendering platform. I decided to dive into it and quickly check if I could use it in ScreenshotOne to provide a faster and better customer experience.

Read more

Automate website screenshots

Exhaustive documentation, ready SDKs, no-code tools, and other automation to help you render website screenshots and outsource all the boring work related to that to us.