Published on

Datadog's Missing API - Metrics Explorer

Authors
  • avatar
    Name
    Trevor Rundell
    Twitter

We recently launched an observability portal at Mutiny to assist with service discovery, day-to-day operations, and incident response. I hope to write more on what exactly an observability portal is in another post, but the elevator pitch is that it’s a centralized place for the team to organize and traverse our observability stack. One of the primary functions of this tool is to help you link to interesting metrics on the fly - something that Datadog (where most of our observability tools live) isn’t great at doing natively.

A table of components and metrics

I looooove tables for displaying metrics, but it’s also really handy to link each cell to a timeseries visualization of that metric.

Datadog recommends a few tools for sharing metrics, notably Dashboards and Notebooks, which have the downside of being persistent, mutable resources. If you create a Dashboard or Notebook to share a single metric graph it will clutter up your search with a bunch of useless junk. And if someone edits that Notebook to tweak the metric for their own investigation it will change for everyone, making it confusing to refer back to links you may have saved during an incident.

Fortunately, Datadog does actually have a tool that’s almost exactly what I wanted. Metrics Explorer let’s you create and share links to arbitrary metrics views, albeit with some limitations on the visualization. When you use Metrics Explorer it automatically serializes its state into the URL in the form of an encoded hash fragment. This makes it super easy to share the exact view you’re looking at, and without any risk of your view being mutated accidentally by someone else. And if you want to save these otherwise ephemeral views, you can easily add them to a Dashboard or Notebook with one click from a context menu.

The bad news - the interface to create Metrics Explorer links is completely undocumented and obfuscated. I reached out to Datadog support about this and they (unsurprisingly) referred me back to Notebooks and Dashboards. However, since it all seems to be encoded in that location hash all it takes is a little elbow grease to figure out the how to encode and decode these URLs for yourself. Don’t worry, I’ll save you the trouble! If you find yourself wanting to create links to Metrics Explorer for your own tooling, read on…

Option 1: Encoded Location Hash

The location hash that encodes the view’s state is simply a compressed JSON blob that can be manipulated with the https://github.com/pieroxy/lz-string library. The compressToEncodedURIComponent() and decompressFromEncodedURIComponent() functions of the library allow you to convert from or to JSON respectively.

There’s a bit more to describe the structure of the payload itself, but if you decode one of these location hashes you’ll end up with a JSON blob that look something like this…

{
  "widget": {
    "layout": {
      "x": 1,
      "y": 1,
      "width": 1,
      "height": 1
    },
    "definition": {
      "type": "timeseries",
      "requests": [
        {
          "response_format": "timeseries",
          "queries": [
            {
              "name": "query1",
              "data_source": "metrics",
              "query": "avg:system.cpu.user{*}"
            }
          ],
          "formulas": [
            {
              "formula": "query1"
            }
          ],
          "style": {
            "palette": "dog_classic",
            "line_type": "solid",
            "line_width": "normal"
          },
          "display_type": "line"
        }
      ]
    }
  },
  "splitConfig": null
}

The queries and formulas blobs most closely resemble the syntax used for the query timeseries data across multiple products endpoint described in the Datadog Metrics API documentation so I won’t attempt to describe this structure in more detail here. I did also publish a small library you might find useful to integrate this structure into your own application.

And that’s it - you have everything you need to generate Metrics Explorer links! All together you’ll end up with a link that looks something like this: https://app.datadoghq.com/metric/explorer?fromUser=false&start=1721748052998&end=1721751652998&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMgaWBoIcKoAdJnyeE14fsBwIhUQ2lLqIgjI8gNQeJkSUjwgPAC6VPVYaDmg8hirCA3KUDgwTpmYGhoQmYloonBOcorK6ZdQF1dO9EzKIm4eaHP8EPaYLA3BQ5ECXERKBY8PggAHiADCUmEMBQIjwaDQPCAA

Option 2: Query Parameters

There is actually a second, slightly easier, but significantly less powerful way to create links to Metrics Explorer you might notice is used in the Metrics Summary tool in Datadog. The biggest downsides of this approach are that you can only specify a single metric at a time, and it doesn’t support formulas. But since everything is just query parameters this can be a reasonable alternative if you can’t use the lz-string library for whatever reason. Here are the query parameters you can use…

  • exp_metric - The metric you want to visualize e.g. system.cpu.user
  • exp_agg - The aggregation function e.g. max, avg, sum
  • exp_scope [repeatable] - The scopes you want to filter by in the form key:value. You can repeat this query parameter to filter by multiple scopes.
  • exp_group [repeatable] - The dimension to group by into separate series. You can repeat this parameter to group by multiple dimensions.
  • exp_calc_as_rate - true if you want to view the metric as a rate, otherwise defaults to count. Only applies to rate metrics.
  • start / end - The start and end of the timeframe you wish to visualize, in milliseconds from the Unix epoch.
  • paused - true if you want to freeze the time period, or false if you want the graph to live update.
  • graph_layout - split to create one graph per group, multi to overlay each group series onto the same graph but create separate metrics on separate graphs, or stacked to overlay all metrics / series onto the same graph.

All together you’ll end up with a link that looks something like this: https://app.datadoghq.com/metric/explorer?exp_metric=system.cpu.user&exp_agg=avg&exp_scope=environment:production

Not using Datadog?

I’m curious - how well do other observability products solve this problem? I did some quick skimming through some of their documentation to find out.

Grafana

Grafana has a similar “Explore” tool that also encodes queries in a JSON blob in the URL. This interface is thankfully well documented so you shouldn’t have any problem adapting this for your use case.

Honeycomb

Honeycomb also seems to have the ability to create easily sharable links to metrics via an encoded query string parameter.

New Relic

New Relic has a similar metrics explorer that can generate sharable links, but I couldn’t find any documentation to suggest how to generate these links on the fly. If you use New Relic, I’d be curious to know if these links have a well defined structure or if they’re obfuscated like Datadog’s were.

Conclusion

Lightweight, ephemeral links are a great way to facilitate exploration and collaboration with your team. If your team uses Datadog hopefully this post helps you fill a gap in your tooling and leverage this underutilized capability more fully!