A Simple Example
Let's say that your website, www.example.com, is hosted on servers in Europe and North America. You're interested in availability and response time, so you create a RIPE Atlas ping measurement from five locations around the globe and begin seeing results coming back that look something like this:
- Rotterdam, Netherlands:
id: 123, rtt: 9ms - Athens, Greece:
id: 234, rtt: 12ms - Vancouver, Canada:
id: 345, rtt: 13ms - São Paulo, Brazil:
id: 456, rtt: 55ms - Brisbane, Australia:
id: 567, rtt: 312ms
The ID for your new measurement is 123456789, so you can get basic information about your measurement by querying this URL:
https://atlas.ripe.net/api/v2/measurements/123456789/The new status checks system is can be found at a similar URL:
https://atlas.ripe.net/api/v2/measurements/123456789/status-checkQuerying this URL alone should give you basic dashboard values for your server, which is enough for you to plug into a monitoring engine like Nagios. The output should look something like this:
# Request
GET https://atlas.ripe.net/api/v2/measurements/123456789/status-check
# Response
HTTP/1.1 200 OK
Date: Tue, 29 Oct 2013 14:37:37 GMT
X-RIPE-Atlas-Global-Alert: 0
Content-Type: text/plain
Cache-Control: no-cache
{
"global_alert": false,
"probes": {
"123": {
"alert": false,
"last": 107.296,
"last_packet_loss": 0.0,
"source": "Country: NL"
},
"234": {
"alert": false,
"last": 14.152,
"last_packet_loss": 0.0,
"source": "Country: GR"
},
"345": {
"alert": false,
"last": 9.328,
"last_packet_loss": 0.0,
"source": "Country: CA"
},
"456": {
"alert": false,
"last": 21.761,
"last_packet_loss": 0.0,
"source": "Country: BR"
},
"567": {
"alert": false,
"last": 28.281,
"last_packet_loss": 0.0,
"source": "Country: AU"
}
}Note that in the case of every probe above, alert is set to false. This is because your network is presently healthy. If, however, connectivity between your server and Brisbane, Australia were to degrade suddenly, for example, the output might look something like this:
# Request
GET https://atlas.ripe.net/api/v2/measurements/123456789/status-check/
# Response
HTTP/1.1 200 OK
Date: Tue, 29 Oct 2013 14:37:37 GMT
X-RIPE-Atlas-Global-Alert: 1
Content-Type: text/plain
Cache-Control: no-cache
{
"global_alert": true,
"probes": {
"123": {
"alert": false,
"last": 107.296,
"last_packet_loss": 0.0,
"source": "Country: NL"
},
"234": {
"alert": false,
"last": 14.152,
"last_packet_loss": 0.0,
"source": "Country: GR"
},
"345": {
"alert": false,
"last": 9.328,
"last_packet_loss": 0.0,
"source": "Country: CA"
},
"456": {
"alert": false,
"last": 21.761,
"last_packet_loss": 0.0,
"source": "Country: BR"
},
"567": {
"alert": true,
"alert_reasons": [
"loss"
],
"all": [
null,
null,
null
]
"last": null,
"last_packet_loss": 100.0,
"source": "Country: AU"
}
}
}Note that probe 567 (the ID for the probe that you're using in Brisbane) has somehow lost the ability to ping your server. This has resulted in the following changes to the output of your status check:
- The
lastproperty (the last attempt to ping your server) has anullvalue - The
last_packet_lossvalue is set to100% - As the last attempt could not get even one packet through, the
alertproperty was set totrue - As one of the probes has now triggered an alert, the
global_alertproperty is set totrue - The
X-RIPE-Atlas-Global-Alertheader is set to1 - Two additional values were added to the probe definition in question:
allandalert_reasons:allis a list of all packet results used to calculatelast. There will be more explanation about this later.alert_reasonsis a list of reasons why this alert was triggered. Typically this will only have one value:loss, but as we'll see later on, it may also containlatency.
The idea is to have your monitoring software parse this output and act accordingly. How you parse it is up to you. A simple use case would be to simply grep the output for global_alert":true and trigger your alerts based on that, while a more nuanced example might parse the JSON and look for values relevant to different users to page the appropriate contact.
If you're not keen on parsing the output, or want to save bandwidth by using a simpler test, we also allow you to abuse the HTTP response code system by setting the flag change_http_status=1. In this case, the above response would change to the following:
# Request
HEAD https://atlas.ripe.net/api/v2/measurements/123456789/status-check/?change_http_status=1
# Response
HTTP/1.1 418 UNKNOWN STATUS CODE
Date: Tue, 29 Oct 2013 14:37:37 GMT
X-RIPE-Atlas-Global-Alert: 1
Content-Type: text/plain
Cache-Control: no-cacheNote that the only HTTP codes currently in use are 200 and 418. There are no plans to expand the abuse of the HTTP status code system at present, as this would make it difficult to indicate whether there is a problem with the measurement in question, or the status check system itself.
With these sorts of changes, you can write server-side scripts to capture and parse the JSON output, or just note the HTTP response code and take whatever action you see fit. To use Nagios as an example, you could use the check_http script to alert if the HTTP response is anything other than 200. There's no need to write any custom code if you don't want to. Please make sure that your system uses properly set HTTP Host-headers, i.e. it sends a Host: atlas.ripe.net line with the HTTP request. In Nagios this is acieved by using the option -H atlas.ripe.net.