Transport Layer Security (TLS), and its preceding protocol, Secure Sockets Layer (SSL), are essential to the modern Internet. Encrypting network communications using TLS protects users and organizations from publicly exposing in-transit data to third parties. This is especially important for the web, where TLS secures HTTP traffic (HTTPS) between backend servers and customers’ browsers. TLS is so important that browsers will display warnings for insecure pages, search engines reduce SEO rankings for insecure pages, and the average percentage of web pages using HTTPS increased from 45% in 2015 to 99% in 2022.
In this blog, we explain how TLS certificates are critical to your infrastructure and offer tips on keeping your certificates up-to-date no matter how large or complex your environment.
Why is TLS adoption so difficult, and what are the risks?
While TLS adoption has gotten easier through initiatives like Let’s Encrypt, it still has its challenges. For one, a TLS certificate is only valid for a set time (called the validity period). Security teams must request new certificates and roll them out over existing ones before the old ones expire. If a certificate’s expiration date lapses, customers will see an alarming message when trying to access your website or service:
Additionally, organizations often have multiple certificates in rotation for different services. Security teams need to track which certificates are in use, where they’re deployed, and when they’re due for renewal, creating logistical overhead. As your environment's size and complexity grows, so too does the risk of a certificate expiring and bringing down a critical service. For example, in 2021, Shopify narrowly missed a global outage due to an outdated root certificate in a third-party library.
To make matters worse, certificate renewal is an infrequent and often overlooked maintenance task. Renewals only happen once every few months to every few years, depending on the validity period, meaning engineers are likely to forget about them. Teams often automate renewals, but this set-and-forget approach makes it even more likely that they'll miss upcoming renewal problems. This creates a lot of risks, such as:
- Automated renewal notifications falling through the cracks or getting ignored.
- Security team personnel changing and losing track of certificate rotations and ownership.
- Changes in the validity period that aren't accounted for in automation.
Since certificates are time-sensitive, and different certificates can expire at different times, there needs to be a way to continuously check for expiring certificates across multiple services. But how do we test whether a certificate is expiring soon, and how do we make sure all of our certificates are covered? Fortunately, Gremlin has a solution.
Detecting and testing for expiring certificates
The first question is often the one teams have the most trouble with: how do you know what certificates are in your environment? Tracking certificates can be extremely difficult in large environments, as different services (both internal and external) likely have different certificate renewal processes, dates, owners, and even certificate authorities.
One way to approach this is by making a catalog of all services with associated TLS certificates, then testing each certificate in order. This is the most direct, but it also has the most drawbacks:
- Large environments may have hundreds, thousands, or even tens of thousands of services. Cataloging all of them could take significant time and effort.
- Services change constantly. What happens when someone forgets to update the service catalog after adding, retiring, or renaming a service?
- Different certificates will have different renewal dates. How do you keep them all up-to-date? How do you pinpoint certificates that are expiring soon?
Gremlin Reliability Management (RM) helps solve all of these issues. With Gremlin RM, you define your services in the Gremlin web app, and Gremlin automatically detects any network dependencies that the service communicates with. For each dependency, you can run a Certificate Expiry test, which retrieves the dependency's certificate chain and tests each entry to see when it will expire. The test reports a failure if a certificate expires within the next 30 days.
Gremlin lets you auto-schedule tests to run weekly. This means that every week, you'll have an up-to-date status on soon-to-expire certificates each week and can act accordingly with ample time.
Conclusion
Staying ahead of expiring certificates is vital for keeping your websites and services accessible and secure. Running a Certificate Expiry test lets you quickly and safely test your certificates on any environment, whether your websites are hosted on AWS, GCP, Azure, or on-premises.
If you'd like to start running Certificate Expiry tests on your services, sign up for a free Gremlin RM trial. If you're interested in learning how to test for expiring TLS certificates using Gremlin Fault Injection (FI), see our tutorial here.