Fault Injection

Latency Experiment

The Latency experiment injects latency into IP packets at the transport layer, targeted by supplied port and host arguments. Gremlin injects latency on a per-packet basis on layer 3 of the OCI model.

Linux

The Latency experiment uses existing Quality of Service (QOS) and Differentiated Services (diffserv) facilities in the Linux kernel to emulate packet latency.

This experiment does not interact with iptables, and so it does not interfere with any existing iptables rulesets.

This experiment requires the NET_ADMIN capability, which is enabled by default at installation time. See capabilities(7)

Windows

The Latency experiment uses a custom driver to emulate packet latency.

This experiment was introduced with Windows agent 1.0.11.

Options

Parameter	Flag	Required	Default	Version	Description
IP Addresses	-i IP address	False		0.0.1	Only impact traffic to these IP addresses. Also accepts CIDR values (i.e. `10.0.0.0/24`).
Device	-d interfaces	False	Device discovery	0.0.1	Impact traffic over these network interfaces. Comma separated lists and multiple arguments supported. Linux-only: You can define multiple interfaces starting with agent version `2.30.0.`
Hostnames	-h hostnames	False	`^api.gremlin.com`	0.0.1	Only impact traffic to these hostnames.
Remote Ports	-p port numbers	False	`^53`	0.0.1	Only impact outgoing traffic to these destination ports. Also accepts port ranges (e.g. `8080-8085`).
Local Ports	-s port numbers	False		0.0.1	Only impact outgoing traffic from these source ports. Also accepts port ranges (e.g. `8080-8085`).
MS	-m int	False	`100`	0.0.1	How long to delay egress packets (millis).
Protocol	-P {TCP, UDP, ICMP}	False	all	1.5.3	Only impact a specific protocol.
Providers	WebUI and API Only	False		0.0.1	External service providers to affect.
Tags	WebUI and API Only	False		0.0.1	Only impact traffic to hosts running Gremlin clients associated with these tags.
Length	-l int	False	`60`	0.0.1	The length of the experiment (seconds).

FAQ

Why is the observed latency much higher than what's configured in the latency experiment?

You may sometimes observe a higher latency impact on your application than you configured in your latency experiment. Gremlin injects latency on a per-packet basis on layer 3 of the OSI model. This is representative of common networking failure modes, such as queuing at network switches or latency due to increased route distance. This contributes to higher observed latency primarily in two ways.

First, for TCP connections, if your request or response doesn't fit into a single congestion window, then latency will be added for each round trip. Applications that reuse large connection pools on reliable networks typically have very large congestion windows, so the latency is applied only once. However, if the latency increase causes requests to queue and overwhelm the existing pool, new connections may be created with new congestion windows. Each of these new connections will have to reopen the connection window over multiple round trips, applying the latency each time.

Second, applications may make multiple requests to the same dependency in serial. For example, if you send two queries to the same database, adding 100ms of latency will increase the application request time by 200ms (100ms for each request).

In both cases, the experiment is working as intended and revealed a reliability risk in the design of the application being tested.