Many operations teams today leverage Continuous Deployment (CD) pipelines to provide a repeatable automated sequence of steps in building and testing new software. This enables a consistent ability to stand up an environment, perform validations, and optionally tear down the environment to revert to a clean slate in a repeatable way. Teams will often add automated testing tools to perform functional tests, load tests, integration tests, and other types of tests to validate the quality of the product before and after pushing to production.
With Chaos Engineering, we can add reliability testing to our suite of automated tests. Running chaos experiments in our CI/CD pipeline ensures all code changes are reliable before they reach customers. By using "automated chaos" to test for reliability during the deployment process, we can detect operational issues early and avoid outages in production.
In this tutorial, we'll create stages in a Jenkins pipeline to inject a controlled amount of failure into a test system using Gremlin. You'll learn how to deploy a Jenkins instance using Docker, create API keys in Gremlin, and use the Gremlin API to start an attack.
Before you begin this tutorial, you’ll need the following:
In this step, you’ll stand up an instance of Jenkins using the official Docker image. If you already have a Jenkins environment, skip to Step 3 - Create your Chaos Deployment Pipeline.
At the command line, enter the following to initialize a Jenkins instance using Docker.
1docker run --publish 8080:8080 --publish 50000:50000 --name jenkins jenkins/jenkins:lts-alpine
Navigate to http://localhost:8080
on your browser to confirm Jenkins is working. If this is your first time setting up Jenkins, you will need to enter your admin password and your choice of packages. For this tutorial, the defaults will work fine. Then, add an admin user and log into the account.
In this step, you’ll enter your Gremlin API key and team ID into the Jenkins instance. Your Gremlin API key is tied to your Gremlin user account, and allows Jenkins to authenticate with Gremlin without requiring your username or password. Your team ID is associated with your Gremlin team and allows Jenkins to run attacks, target hosts, and perform other actions within your Gremlin team.
To get your team ID, log into the Gremlin web app. Click on the user icon in the top-right, then click Team Settings. Click the Configuration tab to see your Team ID:
Copy your team ID or keep this window open, as you'll need it in the next step.
Next, we'll create an API key. Click on the user icon in the top-right, then click Account Settings. Click on the API Keys tab, then click New API Key. Enter a name for the key (e.g. "Jenkins") and optionally a description, then click Save. Copy the key from the modal window that appears (you can still access the key after closing the modal window).
Now that we have our team ID and API key, let's enter them into Jenkins. We'll add these to Jenkins as credentials. Open the following in your browser:
http://localhost:8080/credentials/store/system/domain/_/newCredentials
Or open the Jenkins dashboard and navigate to Manage Jenkins > Manage Credentials > (global). Click Add Credentials. Set the Kind to Secret text
and the Scope to Global
as shown below. Paste your Gremlin API key in the Secret field, and enter gremlin-api-key
as the ID. Click OK to save.
Repeat this step for your team ID. Select Secret text
, paste your ID into the Secret
field, then enter gremlin-team-id
into the ID field. Click OK to save. Your global credentials list should look like this:
In this step, we'll create a Jenkins pipeline. This pipeline will run a CPU attack, which consumes CPU capacity on our target host for a set amount of time. The target of the attack is the host where we installed Gremlin before starting the tutorial.
In a typical CI/CD pipeline, our pipeline code might contain steps for provisioning a test environment, deploying an application, deploying the Gremlin agent to that environment, then running the attack. For the purposes of this tutorial, we'll skip the first three steps and just show how to run the attack using the Gremlin API.
On the Jenkins home screen, click New Item. Enter a name such as "Chaos Pipeline", select Pipeline, then click OK. Scroll down to the Pipeline section, then enter the following code:
1pipeline {2 agent none3 environment {4 ATTACK_ID = ''5 GREMLIN_API_KEY = credentials('gremlin-api-key')6 GREMLIN_TEAM_ID = credentials('gremlin-team-id')7 }8 parameters {9 string(name: 'TARGET_IDENTIFIER', defaultValue: 'gremlin-demo-lab-host', description: 'Host to target')10 string(name: 'CPU_LENGTH', defaultValue: '30', description: 'Duration of CPU attack')11 string(name: 'CPU_CORE', defaultValue: '1', description: 'Number of cores to impact')12 string(name: 'CPU_CAPACITY', defaultValue: '100', description: 'The percentage of total CPU capacity to consume')13 }14 stages {15 stage('Initialize test environment') {16 steps{17 echo "[Add commands to create a test environment.]"18 }19 }20 stage('Install application to test environment') {21 steps{22 echo "[Add commands to deploy your application to your test environment.]"23 }24 }25 stage('Run chaos experiment') {26 agent any27 steps {28 script {29 ATTACK_ID = sh (30 script: "curl -s -H 'Content-Type: application/json;charset=utf-8' -H 'Authorization: Key ${GREMLIN_API_KEY}' https://api.gremlin.com/v1/attacks/new?teamId=${GREMLIN_TEAM_ID} --data '{ \"command\": { \"type\": \"cpu\", \"args\": [\"-c\", \"$CPU_CORE\", \"-l\", \"$CPU_LENGTH\", \"-p\", \"$CPU_CAPACITY\"] },\"target\": { \"type\": \"Exact\", \"hosts\" : { \"ids\": [\"$TARGET_IDENTIFIER\"] } } }' --compressed",31 returnStdout: true32 ).trim()33 echo "View your experiment at https://app.gremlin.com/attacks/${ATTACK_ID}"34 }35 }36 }37 }38}
Let's take a closer look at this script.
First, in the environment
section, we retrieve our credentials (our Gremlin API key and team ID). Under parameters
, we define the parameters of the attack. TARGET_IDENTIFIER
is the name of the host we want to target as it appears in Gremlin (for example, here we use gremlin-demo-lab-host
). You can find your list of hosts in the Gremlin web app by clicking on Clients > Hosts:
Next is the stages
section. The first two stages are where we would add steps to provision and set up our test environment. The third stage, "Run chaos experiment," is where we call the Gremlin API to start the attack. Note the script
field, which contains the complete call to the Gremlin API. You can replace this field with any Gremlin API call of your choice, whether it's calling a different type of attack, running a Scenario, attacking a Kubernetes resource, or attacking a Service. You can learn more about creating API calls in our getting started tutorial.
For now, replace the default value of TARGET_IDENTIFIER
with the name of the host you want to run the attack on. Optionally, change the parameters of the CPU attack by changing the CPU_LENGTH
, CPU_CORE
, and CPU_CAPACITY
parameters. CPU_LENGTH
is how long the attack will run (in seconds), CPU_CORE
is the number of CPU cores impacted, and CPU_CAPACITY
is the percentage of total CPU capacity to consume.
Next, run the demo script by selecting “Build with Parameters”, then “Build”. Jenkins will quickly run through the first two stages, then call the Gremlin API and start the attack. The Stage View will look similar to this:
If we open the console output by clicking on the build number and selecting Console Output, we'll see the following:
1Started by user Admin2Running in Durability level: MAX_SURVIVABILITY3[Pipeline] Start of Pipeline4[Pipeline] withCredentials5Masking supported pattern matches of $GREMLIN_API_KEY6[Pipeline] {7[Pipeline] withEnv8[Pipeline] {9[Pipeline] stage10[Pipeline] { (Initialize test environment)11[Pipeline] echo12[Add commands to create a test environment.]13[Pipeline] }14[Pipeline] // stage15[Pipeline] stage16[Pipeline] { (Install application to test environment)17[Pipeline] echo18[Add commands to deploy your application to your test environment.]19[Pipeline] }20[Pipeline] // stage21[Pipeline] stage22[Pipeline] { (Run chaos experiment)23[Pipeline] node24Running on Jenkins in /var/jenkins_home/workspace/Chaos Pipeline25[Pipeline] {26[Pipeline] script27[Pipeline] {28[Pipeline] sh29Warning: A secret was passed to "sh" using Groovy String interpolation, which is insecure.30 Affected argument(s) used the following variable(s): [GREMLIN_API_KEY]31 See https://jenkins.io/redirect/groovy-string-interpolation for details.32+ curl -s -H 'Content-Type: application/json' -H 'Authorization: Key ****' https://api.gremlin.com/v1/attacks/new --data '{ "command": { "type": "cpu", "args": ["-c", "1", "-l", "30", "-p", "100"] },"target": { "type": "Exact", "hosts" : { "ids": ["gremlin-demo-lab-host"] } } }' --compressed33[Pipeline] echo34View your experiment at https://app.gremlin.com/attacks/User requires privilege for target team: TEAM_DEFAULT35[Pipeline] }36[Pipeline] // script37[Pipeline] }38[Pipeline] // node39[Pipeline] }40[Pipeline] // stage41[Pipeline] }42[Pipeline] // withEnv43[Pipeline] }44[Pipeline] // withCredentials45[Pipeline] End of Pipeline46Finished: SUCCESS
Congratulations! You've now integrated chaos experiments into your CI/CD pipeline!
This tutorial is just the first step to effectively using Chaos Engineering in your CI/CD pipeline. Expand your practice further by running a Scenario instead of an attack, run a check to verify the completion of the experiment, use Status Checks to automatically halt an experiment if your systems become unstable, or run your experiment alongside an integration or load test. If you have automated load or functional tests, run them alongside your chaos experiment to make sure your systems can operate reliably under stress. You can apply these same principles to other automated build and deployment tools such as Spinnaker, GitLab, or CircleCI.
For more on Gremlin and CI/CD, check out our webinar: Automating Chaos Engineering in your CI/CD Environments.
Gremlin empowers you to proactively root out failure before it causes downtime. See how you can harness chaos to build resilient systems by requesting a demo of Gremlin.
Get started