A Complex Example

The simple example above should be good enough for most people, but if you're dealing with a large subset of probes (we support up to 1024), or if you're interested in comparing the current RTT value to past values, then this section is for you.

You can control how the alerts are triggered based on a few arguments in the URL:

Argument Default Description
max_packet_loss 75 The acceptable percentage packet loss per probe
show_all false Show all RTT responses. The default is to only show all responses for alerting probes
permitted_total_alerts 0 The total number of probes you would permit to respond with an alert before a global alert is issued
lookback 1 The total number of measurement results to compare to generate a median RTT value.
median_rtt_threshold N/A The threshold at which an alert should be issued when you compare the latest RTT value to the median values (based on the lookback)

These arguments can be combined to give interesting results, so we'll break them down one-by-one and then give you some examples of combinations and the resulting output.

max_packet_loss

By default, we don't set alert: true unless the packet loss percentage exceeds 75%. If you'd like to adjust this threshold, you can pass max_packet_loss to the URL. Expanding on our simple example above, this request would require that all packets be lost before an alert will be set on a probe:

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?max_packet_loss=95

Note however that if you set max_packet_loss to 100, no alert will ever be set for lost packets.

Similarly, you can make the check more sensitive by tweaking the max_packet_loss value downward:

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?max_packet_loss=0

This would set an alert if even one packet was lost.

show_all

In the simple example, the sample output listed only basic probe information:

# Request
GET https://atlas.ripe.net/api/v2/measurements/123456789/status-check/

# Response
...
"234": {
  "alert": false,
  "last": 14.152,
  "last_packet_loss": 0.0,
  "source": "Country: GR"
},
...

If ever there is an alert triggered though, the all attribute is included so that you can see further details:

# Request
GET https://atlas.ripe.net/api/v2/measurements/123456789/status-checks/

# Response
...
"234": {
  "alert": true,
  "alert_reasons": [
    "loss"
  ],
  "all": [
    null,
    null,
    null
  ]
  "last": null,
  "last_packet_loss": 100.0,
  "source": "Country: GR"
},
...

By setting show_all, you're asking the server to always include the all attribute in the output, regardless of whether or not there's an alert issued, so you'd change the output of an error-free result to:

# Request
GET https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?show_all=1

# Response
...
"234": {
  "alert": false,
  "all": [
    12.123,
    14.152,
    17.321
  ]
  "last": 14.152,
  "last_packet_loss": 0.0,
  "source": "Country: GR"
},
...

permitted_total_alerts

By default, we assume that one probe failing to meet expected thresholds is cause for alarm. If you feel this is too sensitive, you can increase this value. This won't change the alert value for each probe, but it will determine whether or not global_alert will be set to true, and if change_http_status is set to 1, the HTTP status will be changed to 418.

The following will allow for a maximum of 3 probes to alert before the global alert is set:

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?permitted_total_alerts=3

lookback and median_rtt_threshold

Sometimes the current median RTT isn't enough information with which to make an alert decision. Sometimes, you need a little history to determine whether an alert is warranted. This is where lookback and median_rtt_threshold come in.

Let's use our example again. Say that you've been running this measurement for a few hours now and each of our 5 probes has collected at least 10 results each:

Probe Results
Rotterdam 5 5 6 6 5 4 4 100 5 7
Athens 12 14 13 11 12 15 17 12 13 15
Vancouver 13 13 13 13 14 13 15 12 17 8
São Paulo 32 33 34 35 36 37 38 39 40 41
Brisbane 312 333 380 400 331 301 310 312 313 311

Based on these results, we can calculate a median value:

Probe Median
Rotterdam 5
Athens 15
Vancouver 14
São Paulo 37
Brisbane 310

The lookback value mentioned above determines the total number of past measurement results we take into account to generate these median values. Values can range from 1 to 10 and the default is 1.

Once we have a median value, the next part of the equation, your specified median_rtt_threshold comes into play. We compare our calculated median value to the current value, and if the difference exceeds your threshold value, we post an alert.

To continue with our example, say that you've decided that you want to be alerted if any probe exceeds its median RTT by 10. Your query would look like this:

# Request
GET https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=10&median_rtt_threshold=10

# Response
...
"234": {
  "alert": true,
  "alert_reasons": [
    "latency"
  ],
  "all": [
    43.103,
    43.363,
    43.517,
    45.254,
    45.303,
    45.714,
    45.72,
    46.045,
    46.907,
    46.92,
    47.338,
    48.843,
    49.831,
    50.598,
    50.834,
    55.644,
    65.612,
    73.656,
    78.739,
    81.618,
    101.793,
    105.107,
    111.606,
    138.973,
    144.736,
    154.633,
    159.825,
    199.248,
    206.075,
    314.524
  ],
  "last": 111.606,
  "last_packet_loss": 0.0,
  "median": 55.644,
  "source": "Country: GR"
},
...

You'll note that not only has an alert been triggered due to the disparity between median and last, but also that alert_reasons now contains latency instead of what you may have seen until now: loss. It's possible that in some cases, you could have a sufficient number of dropped packets to trigger an alert and a sufficient amount of latency, so this property will help you figure out which is which.

You can vary the lookback value if you like, and this will adjust the number of samples used to establish a median.

A note about the lookback value

Median calculations are based only on the non-null values available. This means that if lookback=10 and of those 10 results only 2 of them are non-null, only those two results will be used to calculate the median.

Supported median_rtt_thresholds include both percentages and integers, positive and negative. Some examples:

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=10&median_rtt_threshold=10
https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=10&median_rtt_threshold=10%
https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=10&median_rtt_threshold=-10
https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=10&median_rtt_threshold=-10%

Note however that you should be careful with using integers, as there's always likely to be a strong variance for probes located a long distance for their target.

Sanity Filter

In the case of very low median values, a sanity check is applied to prevent alerts from being issued for no reason. An example of this might be a probe with a median RTT of 2.3 and a latest RTT of 4.6. That's a 200% increase, but not one worthy of note, so our sanity filter will not consider this worthy of an alert.

At present, the sanity filter ignores any delta of ±5ms.

Combinations

So now that we've covered all of the different options, you can try combining them to see what kind of results you might get.

This will will only alert on probes that exceed a packet loss of 50%, and will only post a global alert if more than 3 probes are alerting:

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?permitted_total_alerts=3&max_packet_loss=50

Same thing, but this will always show the RTT values:

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?show_all=1&permitted_total_alerts=3&max_packet_loss=50

Looking back over the last 7 results, show alerts for probes exceeding the median RTT by 30%

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=7&median_rtt_threshold=30&show_all=1&permitted_total_alerts=3&max_packet_loss=50

The same thing, and again we include all RTT values:

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=7&median_rtt_threshold=30%&show_all=1

The same thing again, but this time only sound a global alert if more than 5 probes are alerting:

https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=7&median_rtt_threshold=30%&show_all=1&permitted_total_alerts=5

And finally, a great big one that will:

  • Establish a median for each probe based on the past 10 results
  • Alert any probe whose latest RTT exceeds that of the median by 20%
  • Show all RTTs, regardless of alert status
  • Will only show a global alert if more than 7 probes are alerting
  • Will mark a probe as alerting if the packet loss on that probe exceeds 50%

      https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?lookback=10&median_rtt_threshold=20%&show_all=1&permitted_total_alerts=7&max_packet_loss=50

results matching ""

    No results matching ""