Serverless Observability in N|Solid for AWS Lambda

We are excited to release Serverless Observability for N|Solid with support for AWS Lambda. With the growth of organizations leveraging serverless increasing as they realize the performance and cost benefits, we’re excited to provide customers with this new visibility into the health and performance of their Node.js apps utilizing Serverless Functions utilizing serverless architectures.

Img 1. Serverless Cloud Providers

Observability has become a critical part of software development; understanding performance issues and why they occur is incredibly valuable. Leveraging Serverless introduces challenges for observability based on how the technology works due to added complexity and challenges in data collection and analysis. Some key observability issues include:

Cold starts: When a serverless function is first invoked, it takes time for the underlying infrastructure to spin up. This can lead to latency issues, especially for applications that require a high degree of responsiveness. In other words, Lambda must initialize a new container to execute the code, and that initialization takes time.

Debugging: Debugging serverless applications can be challenging due to the event-driven nature of the platform. This is because it can be difficult to track the flow of execution through an application comprising multiple functions invoked in response to events.

Visibility: Serverless providers typically provide limited visibility into the underlying infrastructure. This can make it difficult to troubleshoot performance issues and identify security vulnerabilities.

N|Solid now makes it easy to implement distributed tracing based on __OpenTelemetry__, providing better insights into these microservices-based systems.

What is Serverless?

Serverless computing is a cloud computing model that allows developers to run code without provisioning or managing servers. Instead, developers pay for the resources they use, such as the time their code runs and the amount of data it stores.

Some key benefits of Serverless Computing include:

Accelerated app development,
Reduced costs and improved scalability,
Increased agility

Serverless computing has become increasingly popular in recent years, offering several advantages over traditional server-based architectures. However, serverless computing can also be complex, and observability, monitoring, and debugging can be particularly challenging.

Key concepts to understand the importance of extending observability to serverless architectures:

Observability: The ability to understand the state of a system by collecting and analyzing telemetry data. This data can include logs, metrics, and traces. Traces are a particularly important type of telemetry data, as they can be used to track the flow of requests through a system.

OpenTelemetry: An open-source observability framework that can collect and analyze telemetry data from various sources. It includes support for __Distributed Tracing__, which can be used to track the flow of requests through a serverless environment.

At NodeSource, we have been at the forefront of supporting developers to solve these challenges in their Node.js applications, so we have extended this series of features and content since last year:

Distributed Tracing Support in N|Solid – https://nsrc.io/DistributedTracingNS
Enhance Observability with Opentelemetry Tracing – Part 1 – https://nsrc.io/NodeJS_OTel1
Instrument your Node.js Applications with Open Source Tools – Part 2 – https://nsrc.io/NodeJS_OTel2
AIOps Observability: Going Beyond Traditional APM – https://nsrc.io/AIOpsNSolid

In a serverless environment, applications are typically composed of several different services, each responsible for a specific task. This can make it difficult to track the flow of requests through the system and identify performance problems, but not with N|Solid:

With this release, you can now have traces of when a microservices system calls a serverless function.

Now, with N|Solid, you can:

Track the flow of requests across your microservices architecture, from the Frontend to the Backend.
Identify performance bottlenecks and errors.
Correlate traces across multiple services.
Visualize your traces in a timeline graph.

N|Solid is a lightweight and performance-oriented tracing solution that can be used with any Node.js application. It is easy to install and configure, and it does not require any changes to your code.

Introducing Serverless Support for AWS Lambda

NodeSource has introduced a new observability feature to gather information throughout a request in a serverless environment. The collected information can be used for debugging latency issues, service monitoring, and more. This is valuable for users interested in debugging request latency in a serverless environment.

Img 2. N|Solid Serverless Installation Process

With this announcement, we are enabling the opportunity to connect a process following the indications of our Wizard. You will be able to connect the traces of your Node application through requests or connected functions, by connecting the data, you will be able to find the cause of latency issues, errors, and other problems in serverless environments.

How to apply Serverless Observability through N|Solid:

Key practices include leveraging distributed tracing to understand end-to-end request flows, instrumenting functions with custom metrics, analyzing logs for debugging and troubleshooting, and employing centralized observability platforms for comprehensive visibility.

Img 3. N|Solid Serverless Configuration

To use loading Opentelemetry Instrumentation Modules, first, use the NSOLID_INSTRUMENTATION environment variable to specify and load the Opentelemetry instrumentation modules you want to utilize within your application. To enable instrumentation for specific modules, follow these steps:

For HTTP requests using the http module, set the NSOLID_INSTRUMENTATION environment variable to http.

If you’re also performing PostgreSQL queries using the pg module, include it in the NSOLID_INSTRUMENTATION environment variable like this: http,pg.

List all the relevant instrumentation modules required for your application. This will enable the tracing and monitoring of the specified modules, providing valuable insights into their performance and behavior.

Connect through integration the serverless functions; select and configure the environment variable you require. For more advanced settings, visit our NodeSource documentation.

Img 4. N|Solid Wizard Installation Summary

In our NodeSource documentation, you will find CLI – Implementation and Instructions or directly follow the instructions of our Wizard:

Install
Usage
Interactive Shell
Inline Inputs
Interactive Inputs
Help
Uninstall

Benefits of N|Solid Implementing Serverless

Serverless monitoring collects and analyzes serverless application data to identify and fix performance problems. This can be done using a variety of tools and techniques, including:

__Metrics__: Metrics are data points that measure the performance of an application, such as the number of requests per second, the average response time, and the amount of memory used.
__Logs__: Logs are records of events in an application, such as errors, warnings, and informational messages.
__Traces__: Traces are records of a request’s path through an application, including the time it spends in each function.

By collecting and analyzing this data, serverless monitoring tools can identify performance problems, such as:

__Cold starts__: Cold starts is when a serverless function is invoked for the first time. This can cause a delay in the response time.
__Overcrowding__: Overcrowding occurs when too many requests are sent to a serverless function simultaneously. This can cause the function to slow down or crash.
__Bottlenecks__: Bottlenecks occur when a particular part of an application is slow. Some factors, such as inefficient code, a lack of resources, or a bug, can cause this.

Once a performance problem has been identified, serverless monitoring tools can help developers fix the problem. This can be done by:

Tuning the application: Developers can tune the application by changing the code, configuration, or resources used.
Evolving the application: Developers can evolve it by adding new features or changing how it works.
Monitoring the application: Developers can continue monitoring the application to ensure that the problem has been fixed and the application performs as expected.

Benefits of N|Solid Implementing Serverless

Img 5. N|Solid Functions Dashboard

Now, directly in your N|Solid Console, you’ll have the Function Dashboard, which will allow you to review in-depth and detail all the functions you have connected.

By selecting the functions, you can access the __Function Detail View__.

Img 6. N|Solid Function Detail View

In the first content block, you will have the following:

The function’s name and the provider’s icon__.
__Note:
If you click on the function itself, it will open the function itself in a new tab directly in AWS.

On the other hand, in the central box, you will have:
RUNTIME VERSION
FUNCTION VERSION
ACCOUNT ID
N|Solid LAYER: (It is versioned, which is what allows us to see the AWS info)
REGIONS: The zone of the function. Where it is running that specific version.
ARCHITECTURE: Lambda can have several Linux architectures. This info comes directly from AWS.

And finally, on the right corner of our console, we will have
Vulnerability Status__, and below this, __Estimated Cost: How much are you spending on this function? For more information, please review our documentation.

About N|Solid Layer

In an __AWS Lambda__, you can have information about hundreds of instances, AWS does it for you, but you don’t have information on each invocation. They show you by default the last 5 minutes through the median, the 99th percentile, etc., but they can’t give us invocation by invocation individually; they simply group them.

There are two types of data we have from AWS Lambda:

Real-Time Metrics or Telemetry API Metrics
Cloudwatch Metrics

The metrics that CloudWatch reports are not real-time. They are extracted from the CloudWatch API by the Lambda Metrics Forwarder and displayed as AWS Lambda delivers them directly. This means there may be a delay of up to a few minutes between recording and displaying a metric.

The __Telemetry API Metrics__, which we extract, through Lambda layers, through modules that we install to your function to be executed.

We distributed this information as a layer,__N|Solid Layer__, injecting code inside. An AWS extension for N|Solid provides information about the life cycle of each Lambda function providing data to the console:

Generate the Timeline Graph
Span: Span Details, Path of the Span, and filter the results by attributes of the Span.
Changing Time Range
Generate SBOM reports

Our method tracks where the function is frozen to optimize Lambda resources within certain limits, allowing access to the information even after the function has been frozen.🤯In real-time, the registration is detected, providing the key information on each invocation in each instance.

In Closing,

We’ve been working on serverless observability for our customers for a while because we believe in the value of leveraging the technology approach. Serverless provides many benefits for developers and teams, but it can be difficult to provide quality observability without adding a lot of expensive overhead.

N|Solid can now help teams develop and maintain better software, resolve issues faster, and provide security for both traditional and serverless Node.js Applications. Try it for free, or connect with us to learn more.

It’s exciting to share this new release, our first in a series of serverless cloud platforms. Look for Azure and GCP to be released soon. Thanks for reading all about it.promising future ahead. Our heartfelt gratitude goes to our exceptional team for their unwavering dedication and hard work in bringing this vision to life.

Related Links

AWS Lambda Documentation – https://docs.aws.amazon.com/lambda/latest/dg/welcome.html
Layer Concept – https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-concepts.html#gettingstarted-concepts-layer
Cloudwatch Metrics – https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html#monitoring-metrics-types
Lambda Extensions – https://docs.aws.amazon.com/lambda/latest/dg/lambda-extensions.html
CloudFlare Serverless Computing – https://www.cloudflare.com/learning/serverless/what-is-serverless/
IBM – What is serverless? – https://www.ibm.com/topics/serverless
GCP Serverless – https://cloud.google.com/serverless?hl=es-419

Enhance Observability with Opentelemetry tracing – Part 1

Recently, conversations have been increasing around OpenTelemetry; it is gaining more and more momentum in Node.js development circles, but what is it? How can we take advantage of the key concepts and implement them in our projects?

Of note, NodeSource is a supporter of OpenTelemetry, and we have recently implemented full support of the open-source standard in our product N|Solid. It allows us to make our powerful Node.js insights accessible via the protocol.

Opentelemetry is a relatively recent vendor-agnostic emerging standard that began in 2019 when OpenCensus and OpenTracing combined to form OpenTelemetry – seeking to provide a single, well-supported integration surface for end-to-end distributed tracing telemetry. In 2021, they released V1. 0.0, offering stability guarantees for the approach.

And most important, OpenTelemetry is an open-source observability project/framework with a collection of software development kits (SDKs), APIs, and tools for instrumentation from the Cloud Native Computing Foundation (CNCF).

W3C Trace Context is the standard format for OpenTelemetry. Cloud providers are expected to adopt this standard, providing a vendor-neutral way to propagate trace IDs through their services. Organizations use OpenTelemetry to send collected telemetry data to a third-party system for analysis.

But to break down its history a bit, we think it’s important to understand the concept of __Observability__.

At Nodesource, as you likely know, we work daily in Observability, focusing exclusively on the Node.js runtime and N|Solid from N|Solid 4.8.0 supports some OpenTelemetry features. But before getting deeper in OTEL, it is important to understand Observability and try to resolve this important question: What is Observability?

Setting the foundations to talk about OpenTelemetry

It’s important to understand that when we talk about Observability, we need first to know what questions we seek to answer or clarify when detailing a system.

The first question often asked is __why__ my application has specific behavior. And to solve this and other questions, we first must instrument our system so that our application can emit signals, that is, traces, metrics, and logs. When we correctly do this, we have the necessary information needed.

Observability is the ability to measure the internal states of a system by examining its outputs. – Splunk

Detailing a system through Data Collection: Telemetry Data

Your systems and apps need proper tooling to collect the appropriate telemetry data to achieve Observability.
But what is the telemetry data that we need?

The three key concepts are :

Metrics

Logs

Traces

Ok, let’s define each of these concepts:

Metrics

__Metrics__: are aggregations over a period of time of numeric data about your infrastructure or application. Examples include system error rate, CPU utilization, and request rate for a given service.

As quoted by isitobservable.io, OpenTelemetry has three metric instruments :

__Counter__: a value that is summed over time (similar to the Prometheus counter)
__Measure__: a value that is aggregated over time (a value over some defined range)
__Observer__: captures a current set of values ​​at a given time (like a gauge in Prometheus)

The context is still very important, along with metric information like name, description, unit, kind (counter, observer, measure), label, aggregation, and time.

Logs

__Logs__: A Log is a timestamped message emitted by services or other components. They are not necessarily associated with any particular user request or transaction, but they become more valuable when they are.

The logical line would tell us that here we must jump to traces because it is part of the three key concepts. But before defining what a trace is, we must zoom in on the concept of __Span__.

Span

__Span__: A Span represents a unit of work or operation. It tracks specific operations that a request makes, painting a picture of what happened during the time in which that operation was executed.

A span is the building block of a trace and is a named, timed operation representing a piece of the workflow in the distributed system. All traces are composed of Spans.

Traces

Traces__: A Trace records the paths taken by requests (made by an application or end-user) as they __propagate through multi-service architectures, like microservice and serverless applications. It is also known as Distributed Trace. A trace is almost always an assessment of end-to-end performance.

Without tracing, it is challenging to pinpoint the cause of performance problems in a distributed system.

Suppose you realize we broke into the three pillars of Observability when introducing the concept of Span. In that case, however, the three pillars and Span conform to what is known as __Telemetry Data__, which are simple __signals emitted from applications and resources about their internal state.__

The core concept of Context Propagation

When we want to correlate events across our services’ boundaries, we look for a context that helps us identify the current trace and Span. But context is not the only thing we need; we also need __propagation__.

If you are with us following the article carefully, you will realize that in the definition of trace, we talk about the word __‘propagation’__. You might wonder what this means.

Propagation is how context is bundled and transferred in and across services, often via HTTP headers. Now, With these clear concepts, we can begin to understand the concept of __Context Propagation__.

A critical functionality required to implement Distributed Tracing is the concept of Context Propagation. We can define it as a mechanism for storing state and accessing data across the lifespan of a distributed transaction, either across execution contexts inside a process or across the boundaries of the services that conform to our system.

For In Process propagation__, we typically use something like the __AsyncLocalStorage class from the async_hooks module.

Whereas Across Processes__, it will depend on the IPC protocol used. For example, for HTTP, there’s the Trace Context specification from the _W3C__, which defines the _traceparent and tracestate headers to propagate tracing info.

Getting into a Distributed Application

Let’s say we have a distributed application like the one in the picture. It has 4 Nodejs services: API, auth, Service1 and Service2, and 1 database.

Imagine we’re having intermittent performance issues. They could come from several points:
– Database access
– Network link status,
– DNS request latency, etc.
Finding where exactly may become a very hard and time-consuming task; the harder, the more complex the system is.

Distributed tracing will help us A LOT with that, as we’ll generate tracing information on every point of the distributed system (A, B, C, D, and E). Not only that, but while the request goes through all the services, thanks to Context Propagation, some ‘tracing state’ will be passed along so all the tracing info can be linked to the very same request.

Instrument your system

To get visibility into the performance and behaviors of the different microservices, we need to instrument the code with OpenTelemetry to generate traces. But first, let’s define what Instrumentation is…

Automatic Instrumentation

With __Automatic Instrumentation__, our instrumentation libraries will automatically take the configuration provided (through code or environment variables) and do most of the work.

In the following example, using the OpenTelemetry SDK, we show how we can automatically generate spans for every HTTP transaction handled by the Nodejs HTTP core module.

Manual Instrumentation

__Manual instrumentation__, on the other hand, while requiring more work on the user/developer side, enables far more options for customization, from naming various components within OpenTelemetry (for example, spans and the traces ) to add your own attributes, specific exception handling, and more. See the following example shows how to manually generate a Span using the OpenTelemetry SDK.

How to Implement Opentelemetry in my project?

The way we historically would implement a typical observability pipeline is shown in the following picture.

In this case, having all that data at your disposal is great and can give us a valuable overview of our system, but unless we are able to correlate the observability signals somehow: metrics, logs, and traces, we won’t be able to have the best of it.
OpenTelemetry comes to solve this problem. The solution is going to come from correlating these signals. This can be done by applying the same concept of Context Propagation that was used for Traces to Metrics and Logs, so in this case, identifiers such as the trace_id and the span_id are associated with those signals.

OpenTelemetry spanId and traceId can correlate Logs and Metrics with a specific Span in a Trace.

Opentelemetry Components

OpenTelemetry is much more… Before finishing this article, it is important to describe the different components of OpenTelemetry.

Note: For more detail, read the specification overview of the Opentelemetry project.

OpenTelemetry API: It provides an API, which defines data types and operations for generating and correlating tracing, metrics, and logging data.

💚 From N|Solid 4.8.0 we provide an implementation of the OpenTelemetry TraceAPI, allowing users to instrument their own code using the de-facto standard API.

OpenTelemetry SDK: It provides Language-specific implementations of the API.

OpenTelemetry OTLP: A protocol to transport the Telemetry Data.

💚 With N|Solid 4.8.0 we support many instrumentation modules available in the OpenTelemetry ecosystem. Supporting exporting traces using the OpenTelemetry Protocol(OTLP) over HTTP.

OpenTelemetry Collector: To receive, process, and export Telemetry data.

💚 In N|Solid 4.8.0 is now possible to send N|Solid Runtime monitoring information (metrics and traces) to backends supporting the OpenTelemetry standard like multiple APMS (Dynatrace, Datadog, Newrelic).

OpenTelemetry Semantic Conventions: To have well-defined naming for the attributes associated with the signals: (service.name, http. port, etc.)

We know that there may be other key concepts to develop around Opentelemetry, and for this reason, we invite you to visit the direct website of the project or the Github Repo directly.

This introductory article gives us the basis for sharing a demo we prepared for NodeConf.EU, where we apply open-source tools to implement Opentelemetry in your project. We invite you to stay tuned for our next blog post. 😉 Wait for the second part!

Conclusions

Traces are really useful for understanding modern distributed systems.
We build better software when we get the best of our traces.
With OTel (Opentelemetry), we’re able to have maximized insights and answer future questions without having to make any code changes.
#OTel provides interoperability with observability tools.
Collect and correlate telemetry data is easy if you follow the OpenTelemetry framework.
As far as we know, the OpenTelemetry community is working hard to develop support for metrics and logs. Waiting for news soon! 🤞
Note: If you want to learn more about OpenTelemetry in Javascript, click HERE

To start getting more value out of your traces and metrics, you can use Opentelemetry with N|Solid back-end.

Achieve Your Performance Goals With N|Solid

We know that you want to get the best out of your application and to do it professionally, you will surely need a great ally to help you with various tools without affecting your performance. We do not want to stay in a ‘marketing speech’ where we tell you that we are the best… you can 👀check it directly here with this Open Source tool that also includes OTEL Results.

We’d love to hear more from you! 💚
– Feel free to TRY N|Solid and get in touch with us on Twitter at @nodesource.