I talk on stage about o11y and otel stuff... basically.
Start with OpenTelemetry, and the collectors. Theres tonnes of articles out there.
Start with OpenTelemetry, and the collectors. Theres tonnes of articles out there.
Tests that already exist describe what the system is required to do, the telemetry tells you what its actually doing. Couple those 2 things and you have a winning combo.
Tests that already exist describe what the system is required to do, the telemetry tells you what its actually doing. Couple those 2 things and you have a winning combo.
That talk was before coding agents were mainstream, nowadays, that context frok traces can help agents locally too!
That talk was before coding agents were mainstream, nowadays, that context frok traces can help agents locally too!
Profiling linked to span ids will allow more.
Profiling linked to span ids will allow more.
If you're adding the instrumentation, just add that property to the span.
`for_loop.<name>.iteration_count`
Then you can aggregate the span data and forget about the metric cardinality
If you're adding the instrumentation, just add that property to the span.
`for_loop.<name>.iteration_count`
Then you can aggregate the span data and forget about the metric cardinality
I think you should have another look at profiling though, it's come along way.
I think you should have another look at profiling though, it's come along way.
The issue is, you don't need this all the time, so a metric isn't really the right answer. When you need it, you need it.
The issue is, you don't need this all the time, so a metric isn't really the right answer. When you need it, you need it.
For the profiling approach, you would be able to get a lot closer than you think, it's not "just" syscalls.
For the profiling approach, you would be able to get a lot closer than you think, it's not "just" syscalls.
Adding the count to a trace is better, but hard to pinpoint.
Utilising continuous profiling, along with tracing, will show you the outcome you're looking for.
Adding the count to a trace is better, but hard to pinpoint.
Utilising continuous profiling, along with tracing, will show you the outcome you're looking for.
Metrics, logs, tracing, are meant to be intentional, you instrument the things that matter, not instrument everything.
Metrics, logs, tracing, are meant to be intentional, you instrument the things that matter, not instrument everything.
If people are asking to use another tool, perhaps your tool is missing something? How do you prioritise building that feature? Do you have the expertise in the current team to do that?
If people are asking to use another tool, perhaps your tool is missing something? How do you prioritise building that feature? Do you have the expertise in the current team to do that?
This measures, at a simple level, how many people would recommend your service to a friend or colleague.
Ultimately, given a choice, would people use your platform?
This measures, at a simple level, how many people would recommend your service to a friend or colleague.
Ultimately, given a choice, would people use your platform?
Thats not counting how you then develop new features as the need arises.
This is why Observability teams MUST become product teams and not Infra/SRE teams.
Thats not counting how you then develop new features as the need arises.
This is why Observability teams MUST become product teams and not Infra/SRE teams.
Thats just not sustainable for the majority of organisations.
Thats just not sustainable for the majority of organisations.
Once you hit scale from a data and also users of those tools perspective, it becomes a fulltime job for multiple people.
Once you hit scale from a data and also users of those tools perspective, it becomes a fulltime job for multiple people.
The problem is normally procurement, or people being scared of them.
The problem is normally procurement, or people being scared of them.