Transport Layer Security (TLS) is an essential component of the modern Internet. It encrypts the connections between users, organizations, and systems, preventing data in-transit from being exposed to third parties. Because of this, it's critical for teams to not only implement TLS for their services, but also make sure that it's working properly.
In this tutorial, we'll show you how to use Gremlin Fault Injection (FI) to test your services to detect expiring TLS certificates. We'll use the Time Travel attack to adjust the clock of our system, then retrieve our TLS certificate. Our system will compare the TLS expiration date to its clock, and if the clock is within the certificate's validity period, it will be considered valid.
This tutorial will show you how to:
Before starting this tutorial, you'll need:
To install Gremlin, follow the installation guide in our documentation. Once your host(s) appear in the Gremlin web app, continue on to the first step.
Earlier, we mentioned how devices check a certificate's expiration date against the current date and time to verify the certificate's validity. We'll use Time Travel to change the system clock on our host, letting us shift seconds, minutes, hours, days, or even years into the future. We'll move our system forward, send a request to a website, and if we receive an expiration error, we know that our certificate is expiring soon.
To demonstrate what a healthy certificate looks like, let’s run a curl request against our web application:
1curl -I https://gremlin-demo-lab-host/
1HTTP/2 2002cache-control: max-age=6048003content-length: 5439584content-type: text/plain5date: Mon, 18 Jan 2021 23:06:18 GMT6...
Now let’s try sending a request to a website with an expired certificate:
1curl -I https://expired.badssl.com/
We'll see that curl reports back an SSL certificate error. This is what we should look for when considering whether the test has passed or failed:
1curl: (60) SSL certificate problem: certificate has expired More details here: https://curl.se/docs/sslcerts.html
Now that we know what to look for, let’s perform our test. Our hypothesis is that if we set our system clock forward (e.g. by one month) and send a curl request, we’ll see a successful response. But if curl returns an error, then we know the certificate will expire within the next day.
We’ll log into the Gremlin web app, create a new attack, and select our test host, which is shown here as "gremlin-demo-lab-host":
Next, we’ll expand the State category and select Time Travel. We’ll keep the length of the experiment set to 60 seconds, block NTP (Network Time Protocol) communication so that our host doesn’t automatically update to the correct time, and set the offset set to 2,678,400 seconds, or exactly one month from now.
Tip: To calculate the offset, use a date/time conversion tool such as the ones provided by timeanddate.com.
Now let’s run the test. While it's running, let’s re-run curl:
1curl -I https://gremlin-demo-lab-host/
1curl: (60) SSL certificate problem: certificate has expired2More details here: https://curl.se/docs/sslcerts.html
Curl returned an error, meaning that our certificate is going to expire within the next month. We’ll click the Halt button in the top-right corner of the Gremlin web app to halt the experiment, which automatically reverts the system clock to the correct time. We’ll record our observations in Gremlin, then work on replacing our certificate. Using Time Travel allowed us to catch this before it became a problem for our customers, while doing so in a safe and controlled way.
With a Scenario, we can run multiple Time Travel attacks back-to-back and increase the interval each time. This lets us test over multiple time periods during a single experiment.
To create a Scenario, we’ll click on our previously run Time Travel attack to open the Attack Details page. From here, we’ll click Create Scenario. We’ll call our Scenario "SSL/TLS certificate expiration" and enter a description.
Next, we’ll click “Add a recent attack”, re-select our previous Time Travel attack, and choose our test host. We’ll change the offset for the second attack to 604,800 (one week). We’ll repeat this step to create a third attack, then change its offset to 2,678,400 (one month).
While the Scenario is running, we’ll run curl in a continuous loop. In the following script, curl makes a request, and if the request is successful, it waits 10 seconds before repeating it. If curl fails, it exits the loop and prints the failure to the console. This script also prints the current system time before each check, so we can see which stage of the Scenario was active when curl failed.
1#!/bin/bash2while :; do3 echo $(date)4 curl -s https://gremlin-demo-lab-host/ > /dev/null5 if [[ "$?" -ne 0 ]]; then6 break7 fi8 sleep 109done10echo "Failed to connect."
Now let’s run the Scenario and start our script. Once the Scenario hits stage 2, curl returns an error and the loop exits. This tells us our TLS certificate will expire between one day and one week from now.
If you have a Gremlin account, you can use this card to use a pre-configured Scenario. Click "Run Scenario" to open the Recommended Scenario in the Gremlin web app, click "Add targets and run" to select the hosts you want to run the attack on, then run the Scenario.
The Gremlin REST API provides a RESTful interface for performing actions in Gremlin, such as starting attacks and Scenarios. By using the REST API in our test script, we can automatically initiate our Scenario
First, let’s reopen our executed Time Travel Scenario in the web app. Click Rerun, then in the bottom-right corner of the page, click Gremlin API Examples. This generates a full curl request that we can use to initiate the attack:
1curl -i -X POST 'https://api.gremlin.com/v1/scenarios/<your Scenario ID>/runs?teamId=<your team ID>' -H 'Content-Type: application/json;charset=utf-8' -H 'Authorization: Bearer <your bearer token>' -d '{}'
Next, we’ll copy this command to our curl script and add it just before the loop. We can add a second API command after the end of the loop to halt the experiment if curl fails. This lets us safely rollback after detecting an expired certificate without having to open the Gremlin web app and halt the experiment ourselves. Make sure to replace <your Scenario ID>
, <your team ID>
, and <your bearer token>
with your own values:
1#!/bin/bash23# Start the Scenario4RUN=$(curl -X POST 'https://api.gremlin.com/v1/scenarios/<your Scenario ID>/runs?teamId=<your team ID>' -H 'Content-Type: application/json;charset=utf-8' -H 'Authorization: Bearer <your bearer token>' -d '{}')56# Test your website(s)7while :; do8 echo $(date)9 curl -s https://gremlin-demo-lab-host/ > /dev/null10 if [[ "$?" -ne 0 ]]; then11 break12 fi13 sleep 1014done1516# Halt the Scenario17curl -X POST 'https://api.gremlin.com/v1/scenarios/halt/<your Scenario ID>/runs/'$RUN'?teamId=<your team ID>' -H 'Content-Type: application/json;charset=utf-8' -H 'Authorization: Bearer <your bearer token>' -d '{}'
Now we have a fully scripted chaos experiment that we can schedule using a service like cron, add to our CI/CD pipeline, or run as part of our client-side testing suite.
Staying ahead of expiring certificates is vital for keeping your websites and services accessible and secure. Using Gremlin FI to run Time Travel tests lets you quickly and safely test your certificates on any environment, whether your websites are hosted on AWS, GCP, Azure, or on-premises.
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started