Auth0 Architecture – Running in Multiple Cloud Providers and Regions

Auth0 provides authentication, authorization and single sign on services for apps of any type: mobile, web, native; on any stack.

Authentication is critical for the vast majority of apps. We designed Auth0 from the beginning with multipe levels of redundancy. One of this levels is hosting. Auth0 can run anywhere: our cloud, your cloud, or even your own servers. And when we run Auth0 we run it on multiple-cloud providers and in multiple regions simultaneously.

This article is a brief introduction of the infrastructure behind and the strategies we use to keep it up and running with high availability.

Core Service Architecture

The core service is relatively simple:

  • Front-end servers: these consist of several x-large VMs, running Ubuntu on Microsoft Azure.

  • Store: mongodb, running on dedicated memory optimized X-large VMs.

  • Intra-node service routing: nginx

All components of Auth0 (e.g. Dashboard, transaction server, docs) run on all nodes. All identical.

Multi-cloud / High AvailabilityMulti cloud architecture

Last week, Azure suffered a global outage that lasted for hours. During that time our HA plan activated and we switched over to AWS

  • The services runs primarily on Microsoft Azure (IaaS). Secondary nodes on stand-by always ready on AWS.

  • We use Route53 with a failover routing policy. TTL at 60 secs. The Route53 health check detects using a probe against primary DC, if it fails (3 times, 10 seconds interval) it changes the DNS entry to point to secondary DC. So max downtime in case of primary failure is ~2 minutes.

  • We use puppet to deploy on every “push to master”. Using puppet allows us to be cloud independent on the configuration/deployment process. Puppet Master runs on our build server (TeamCity currently).

  • MongoDB is replicated often to secondary DC and secondary DC is configured as read-only.

  • While running on the secondary DC, only runtime logins are allowed and the dashboard is set to “read-only mode”.

  • We replicate all the configuration needed for a login to succeed (application info, secrets, connections, users, etc). We don’t replicate transactional data (tokens, logs).

  • In case of failover, there might might some logging records that are lost. We are planning to improve that by having a real-time replica across Azure and AWS.

  • We use our own version of chaos monkey to test the resiliency of our infrastructure


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s