Lumigo Release Notes logo

Release Notes

Back to Homepage Subscribe to Updates

Labels

  • All Posts
  • Announcement
  • feature
  • Improvement

Jump to Month

  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • April 2022
  • February 2022
  • November 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
Improvement
2 months ago

Automatically filtering out healthchecks on ECS and Kubernetes

Health-checks are peculiar things

Healthchecks is a monitoring technique with a special place flavor: healthchecks are fired off at regular, frequent intervals (sometimes every 10 seconds, sometimes 1 minute) by orchestration platforms and monitoring tools. Most healthchecks are HTTP-based, and the returned HTTP response is checks based on the status code and (sometimes) content. But really, the only healthchecks a person needs to know about, are those that fail, which usually lead to containers being torn down and other disruptive infrastructure changes.

Issues with health-checks in Lumigo

Given that Lumigo's pricing model is based on the amount of requests we process, the large amount of successful healthchecks that every container workload undergoes leads to undesirable consumption of quota, for data that is effectively not useful. Moreover, successful healthchecks lead to noise in the Explore and Transactions view, degrading the overall experience.

Cutting the Gordian knot

Luckily, one can often spot recognize HTTP requests that are healthchecks pretty easily! Both AWS ELB health-checks, as well as Kubernetes ones (including EKS), come in with specific User-Agent headers. Lumigo now automatically drops in the data processing pipeline in the Lumigo platform all the spans that:

  1. Carry the User-Agent HTTP header with values that are known to be health-checks, specifically ELB-HealthChecker/* (AWS ELB, often used with Amazon ECS) and kube-probe/<kubelet_version> (Kubernetes, including Amazon EKS)
  2. Return an HTTP status code that denotes a successful response (a.k.a.: `2xx` like `200 OK`, `201 Accepted`, etc.). This is because if a Health-check fails (e.g. returning HTTP status code 500), usually something bad is about to happen to your containers :-)

What do you need to do on your end?

Nothing. It just works with every version of tracers we released so far for containers and all HTTP OpenTelemetry instrumentations we have ever seen. Enjoy :-)

P.S.: Matching health-checks by path (e.g., /health) sounds like a good solution on paper, but in practice it leads to very annoying false-positives (i.e., HTTP calls that are NOT related with health-checks). Moreover, healthcheck paths are configurable, and practitioners do make use for that configurability, which would lead to false negatives (health checks we let through). User-Agent headers, on the other hand, are far less often changed by healthcheck systems. User-agent matching, on the other hand, is usually rather reliable for this use-case.

Improvement
4 months ago

Improved batch containerized workload support

Since we launched Amazon ECS support earlier this summer, we have come across many user workloads that behave like batch jobs (which, incidentally, we see often scheduled via AWS Batch and, occasionally, via AWS Step Functions). Rather than relying on long-running processes that receive request over HTTP, these workloads execute jobs pulled from the Amazon SQS or sometimes the process environment, perform computation involving databases, other services and messaging queues, and then terminate.

The most intuitive representation for such transactions consist of a "root" span representing the "main" method, with the outgoing requests to databases, messaging queues and other service nested directly under the "main" span. And this is how Lumigo will now represent these workloads, provided that you use the OpenTelemetry API to create the "root" span.

Lumigo now also supports the case where the distributed trace starts with an outgoing request, but given there is no common parent span, multiple such ongoing requests will each result in a separate transaction.

Enjoy this improved support for your containerized workloads and let us know what you think about it!

P.S. If you want a hand using the OpenTelemetry API to create root spans, we are happy to help! Let us know through the support channel, and we'll gladly arrange a call to help you out. It's usually just 5 minutes coding, and then pushing it to your environment to validate :-)

P.P.S.S. Lumigo now also shows Elastic Load Balancers that serve HTTP requests issued by containerized workflows.


Improvement
6 months ago

Amazon ECS OpenTelemetry Resource Attributes Now Supported


We extended Lumigo's support of OpenTelemetry resource attribute semantic conventions to cover the `cloud.platform` key and the Amazon ECS semantic conventions.

Our goal with implementing the OpenTelemetry semantic conventions is to enable you to get the best integration with Lumigo for your workloads even when not using the Lumigo OpenTelemetry distributions, but rather using upstream OpenTelemetry SDKs.

The Amazon ECS semantic conventions are not yet widely implemented in OpenTelemetry SDKs, but we are very much looking forward to contributing them to the community!

For more information about the OpenTelemetry semantic conventions supported by Lumigo, refer to the OpenTelemetry Supported Semantic Conventions documentation.

Improvement
9 months ago

New and improved System-Map

A new UI brings a best-in-class visual experience to your System Map, all with the same features you love, including filtering, to give you a detailed bird's eye view of your whole stack.

Improvement
11 months ago

Global Search Shortcut Has Been Improved

Search for pages, functions, and product documentation — all with keyboard shortcut to maximize your efficiency.

Hit Ctrl + K for Windows and Cmd + K for macOS to search for pages, functions, and saved documents.


Improvement
a year ago

Debug transactions with stack trace

See the line of code where the exception was thrown on the STACK TRACE section. This important information can help you figure out where things went wrong or how various subroutines work together during execution.

When will I see the stack trace? At the moment the stack trace is available for traced Python functions.


Improvement
a year ago

Introducing the new and improved UI

We’ve just released a new user interface designed to improve the overall user experience including new time picker, and better support for Safari and Firefox. 


Improvement
a year ago

New and improved Slack notifications

We’ve revamped Slack notifications to ensure teams quickly get the information they need when issues occur. You can now see the details of the issue at a glance, and investigate the issue in lumigo with one click. 

For more information read the documentation here.

Improvement
a year ago

Function Duration widget is available on the dashboard

We've added a Function Duration widget to the dashboard to help you easily identify slow functions. 

Check it out:


Improvement
a year ago

New Issues and Alerts Page

To help you resolve issues faster we've updated the Issues & Alert page.

  • Perform actions on a bulk of issues.
  • Customize the issues table to match your needs.
  • View execution tags distribution.

For more information read the documentation here.