Adding Gremlin to your CI/CD pipeline is a key step in automating your reliability efforts. We previously wrote a tutorial on how to run a Chaos Engineering experiment as part of a Jenkins pipeline. The result ran a chaos experiment every time you deployed your code to a test environment. But this approach has a limitation: you have to either wait for the test to finish and check the results programmatically, or allow the build process to continue regardless of the results.
This tutorial expands on the previous one by using the Gremlin reliability score, which is a more proactive indicator of reliability. A reliability score is calculated by running a series of experiments (called a Test Suite). The main benefits are:
In this tutorial, we'll create a complete Jenkins pipeline that checks a service's reliability score using the Gremlin API. We'll compare the score against a required minimum score, and if it passes, we'll promote it to production. You'll learn how to create API keys in Gremlin and use the Gremlin API. And while this tutorial uses code specific for Jenkins, you can use the same concepts with any CI/CD tool.
This tutorial will show you how to:
Before starting this tutorial, you’ll need the following:
The first step is to define the Jenkins pipeline. We already wrote a simple Groovy file that you can download from GitHub. Copy and paste the contents of the file to your computer, or use the "Download raw file" button. Alternatively, you can copy the file contents from the code block below:
1/*2This Pipeline example demonstrates how to use the Gremlin API to check the Gremlin Score of a service3before promoting it to production. The Gremlin Score is a measure of the reliability of a service.4If the Gremlin Score is less than the value set, the pipeline will fail and the service will not be promoted to production,5 */67pipeline {8 agent any910 stages {11 stage('Check Gremlin Score') {12 steps {13 script {14 def serviceId = 'Replace with your service ID'15 def teamId = 'Replace with your team ID'16 def apiUrl = "https://api.gremlin.com/v1/services/${serviceId}/score?teamId=${teamId}"17 def apiToken = 'Bearer Replace with your Bearer token or API token'18 def minScore = 80.0 // Replace with your minimum Gremlin Score1920 def response = sh(script: "curl -s -X GET '${apiUrl}' -H 'Authorization: ${apiToken}' -H 'accept: application/json'", returnStatus: true)2122 if (response != 0) {23 error("API call to Gremlin failed with status code: ${response}")24 } else {25 def apiResponse = sh(script: "curl -s -X GET '${apiUrl}' -H 'Authorization: ${apiToken}' -H 'accept: application/json'", returnStdout: true).trim()2627 echo "API Response: ${apiResponse}" // Debug logging2829 // Attempt to capture numbers using a permissive regex30 def scoreMatches = (apiResponse =~ /(\d+(\.\d+)?)/)3132 if (scoreMatches) {33 def score = null3435 for (match in scoreMatches) {36 def potentialScore = match[0]37 try {38 score = Float.parseFloat(potentialScore)39 break40 } catch (NumberFormatException e) {41 // Continue searching for a valid score42 }43 }4445 if (score != null) {46 echo "Gremlin Score: ${score}" // Debug logging4748 if (score < minScore)49 error("Gremlin Score ${score} is less than defined ${minScore}. Cannot promote to production.")50 }51 } else {52 echo "No valid score found in API response." // Debug logging53 error("Unable to extract Gremlin Score from the API response.")54 }55 }56 }57 }58 }5960 stage('Promote to Production') {61 steps {62 // Add the steps to promote to production here63 // This could involve deployment and other production-related tasks64 // You can replace this comment with the actual steps for your deployment process65 echo "Promoting to production..."66 }67 }68 }6970 post {71 failure {72 echo "The pipeline has failed. Not promoting to production."73 }74 success {75 echo "The pipeline has succeeded. Promoting to production."76 }77 }78}
In order to use Gremlin's REST API, we need to add our authentication details to the script. You'll need two things:
Once you have the API key, paste it into the following line in the releasePipeline.groovy
file:
1def apiToken = 'Bearer Replace with your Bearer token or API token'
Save the file.
You'll need two additional pieces of information from Gremlin: your team ID and the service ID. The team ID is the unique ID for your Gremlin team, and the service ID is the unique ID of the service you want to check the score for.
We'll start with the team ID. To get the team ID, look in the bottom-left corner of the Gremlin web app. You'll see your name, and underneath that, your team name. Click the icon next to the team name to copy your team ID to your clipboard. From there, open your releasePipeline.groovy
file and paste it in the following line:
1def teamId = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
For the service ID:
1def serviceId = 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
Save the file.
In this step, we'll create a pipeline using our Groovy file. But before we do, there's one last tweak we need to make: we need to set the score threshold.
The score threshold is the minimum reliability score the service must have before it can deploy to production. This is defined in the minScore
variable. In the sample file, we set minScore = 80.0
, which means the service must have a score of at least 80% to deploy. Anything below this score will stop the pipeline and raise an error. You can change this threshold to any value between 0 and 100 by editing this line:
1def minScore = 80.0 // Replace with your minimum Gremlin Score
Now we're ready to add this file to our Jenkins Pipeline. To do this:
After you click Save in the previous step, click Build Now to run the pipeline. Gremlin will retrieve the service's score, check if its value is greater than or equal to minScore
, and if so, will mark the build as successful. Otherwise, it will mark it as failed.
From here, you can make changes to better integrate the pipeline into your build process. Instead of hard-coding values like your service ID, use environment variables instead so you can pass different IDs for each service, and use credentials for storing your Gremlin API key.
We've also included a section in the Groovy script where you can enter commands for deploying your service to production. This runs immediately after Jenkins compares the service's reliability score against minScore
:
1stage('Promote to Production') {2 steps {3 // Add the steps to promote to production here4 // This could involve deployment and other production-related tasks5 // You can replace this comment with the actual steps for your deployment process6 echo "Promoting to production..."7 }8}
Lastly, you can change the "failure" condition to perform other steps, such as notifying the service's owner by sending an email or calling a service like PagerDuty. You can also track the status of your builds by integrating with a monitoring tool like Datadog and alert on failed builds that way.
Congratulations on setting up a reliability gate in Jenkins! This will ensure that your service only gets pushed to production if it meets your minimum reliability scores.
To ensure your scores stay up to date, make sure to autoschedule reliability tests on your service to run at least once a week. Going longer than one week without re-running a test will cause that test to expire, reducing your score. Remember that you can also use the Run All button to re-run all of the service's tests and regenerate its score.
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started