Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for time-since-last-action heartbeat metrics #26

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jessekempf
Copy link

@jessekempf jessekempf commented May 5, 2018

I've found there's a common pattern in services I write, where I want a heartbeat or watchdog timer for periodic jobs being run by an in-memory scheduler. I thought it'd be good to contribute it back as a change upstream.

@23Skidoo
Copy link
Collaborator

23Skidoo commented May 7, 2018

/cc @tibbe

@tibbe
Copy link
Collaborator

tibbe commented May 7, 2018

Could this be implemented outside the library using registerGroup?

@jessekempf
Copy link
Author

It could be, but in that case why wouldn't Counter, Gauge, Label, and Distribution be implemented outside the library?

@jessekempf
Copy link
Author

Also, when I took a look at all Hackage packages with "ekg" in the name, all of them reuse the primitives defined in ekg-core. And registerGroup seems to be for composites of primitive metrics, but a Heartbeat is atomic.

@tibbe
Copy link
Collaborator

tibbe commented May 8, 2018

The Value type captures semantic information about the metric being monitoried:

 data Value = Counter {-# UNPACK #-} !Int64
            | Gauge {-# UNPACK #-} !Int64
            | Label {-# UNPACK #-} !T.Text
            | Distribution !Distribution.Stats

Counters are monotonically increasing, gauges can go both up and down, and labels/distributions are different types. A heartbeat is simply a type of counter, not a semantically different kind of thing. Same story for MetricSampler. The different constructors there are so we can construct the semantically right Value.

(Now, could all the registerFoo functions be written in terms of registerGroup? I haven't thought about it or tried it, maybe it can be done.)

@jessekempf
Copy link
Author

Right, if anything a heartbeat is a type of gauge, but where the value is a direct function of time rather than an indirect one. As an entity it has a different set of primitive operations on it because it is measuring time rather than quantity.

If the temporal semantics of a heartbeat don't matter, and instead it's a type of gauge, by the same course of reasoning the monotonically-increasing semantics of a counter shouldn't matter, because it's implementable in terms of a gauge. Of course, the reason any quantity can be implemented in terms of a gauge is that a gauge is any integer-valued function f(t). Ignoring the fact gauge in this implementation is a signed 64-bit integer, one could argue that a label is a kind of gauge because strings are countably infinite and so there's a bijection of them onto the integers.

But it makes sense to use Haskell's type system to encode the different usage rules for the different kinds of things we want to monitor when building software in the real world. Gauges are quantity-valued, Counters are quantity-valued but can only be incremented (though the types admit adding a negative increase), and Heartbeats operate only on times.

I will totally cop to ignoring typesafety in System.Metrics.Heartbeat.read, and following that through to completion in making the constructor be Heartbeat :: Int64 -> Value instead of Heartbeat :: UTCTime -> Value or Heartbeat :: NominalDiffTime -> Value and handling the rendering to integer value in the sampling function instead of at each of the individual reporters. One of my questions for you was going to be "am I doing the conversion to an integer too early?".

@tibbe
Copy link
Collaborator

tibbe commented May 17, 2018

the monotonically-increasing semantics of a counter shouldn't matter, because it's implementable in terms of a gauge.

The distinction makes a difference to the consumer and is why statsd also has this distinction: if you know you're monitoring a monotonically-increasing value you know that if the value went down it must be because the thing you monitored was restarted (or similar). This in turns means that you can accurately graph the value over time (e.g. requests/s) in face on e.g. failing machines.

I still don't quite see why heartbeat can't be just a gauge, could you give a client code example showing how it will be used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants