Instrument your Nodejs Applications with Open Source Tools – Part 2

As we mentioned in the previous article, at NodeSource, we are dedicated to observability in our day-to-day, and we know that a great way to extend our reach and interoperability is to include the Opentelemetry framework as a standard in our development flows; because in the end our vision is to achieve high-performance software, and it is what we want to accompany the journey of developers in their Node.js base applications.

With this, we know that understanding the bases was very important to know the standard and its scope, but that it is necessary to put it into practice. How to integrate Opentelemetry in our application?; and although NodeSource has direct integration into its product in addition to more than 10 key functionalities in N|Solid, that extend the offer of a traditional APM, as you know, we are great contributors to the Open Source project, we also support the binary distributions of the Node.js project, our DNA is always helping the community and showing you how through Open Source tools you can still increase the visibility. So through this article, we want to share how to set up OpenTelemetry with Open Source tools.

In this article, you will find __How to Apply the OpenTelemetry OS framework in your Node.js Application__, which includes:

Step 1: Export data to the backend

Step 2: Set up the Open Telemetry SDK
__Step 3__: Inspect Prometheus to review we’re receiving data

Step 4: Inspect Jaeger to review we’re receiving data

Step 5: Getting deeper at Jaeger 👀

Note: This article is an extension of our talk at NodeConf.EU, where we had the opportunity to share the talk:

__Dot, line, Plane Trace!__
__Instrument your Node.js applications with Open Source Software__
Get insights into the current state of your running applications/services through OpenTelemetry. It has never been as easy as now to collect data with Open Source SDKs and tools that will help you extract metrics, generate logs and traces and export this data in a standardized format to be analyzed using the best practices. In this talk, We’ll show how easy it is to integrate OpenTelemetry in your Node.js applications and how to get the most out of it using Open Source tools.

To see the talks from this incredible conference, you can watch all sessions through live-stream links below 👇
– Day 1️⃣ –
– Day 2️⃣ –
– Day 3️⃣ –

Now we are ready to start 💪 📖 👇

Apply the OpenTelemetry OS framework in your Node.js Application

So, going back to the distributed example we described in our previous article, here we can see what the architecture looks like this after adding observability.

Every service will collect signals by using the OpenTelemetry Node.js SDK and export the data to specific backends so we can analyze it.

We are going to use the following:

JAEGER for Traces and Logs.

Prometheus to visualize the metrics.

_Note: _Jaeger and Prometheus are probably the most popular open-source tools in space.

Step 1: Export data to the backend

How the data is exported to the backends differs:
To send data to _JAEGER__, we will use OTLP over HTTP, whereas for _Prometheus__, the data will be pulled from the services using HTTP.

First, we will show you how easy it is to set up the OpenTelemetry SDK to add observability to our applications.

### Step 2: Set up the OpenTelemetry SDK

First, we have the providers in charge of collecting the signals, in our case __NodeTracerProvider__ for traces and __MeterProvider__ for metrics.
Then the exporters send the collected data to the specific backends.
The Resource contains attributes describing the current process, in our case, __ServiceName__ and __Container. Id’s__. The name of these attributes is well defined by the spec (it’s in the __semantic_conventions module__) and will allow us to differentiate where a specific signal comes from.

So to set up traces and metrics, the process is basically the same: we create the provider passing the Resource, then register the specific exporter.

We also register instrumentations of specific modules (either core modules or popular userspace modules), which provide automatic Span creation of those modules.

Finally, the only important thing to remember is that we need to initialize OpenTelemetry before our actual code; the reason is these instrumentation modules (in our case for __http__ and fastify) __monkeypatch__ the module they’re instrumenting.

Also, we create the __meter instruments__ because we will use them on every service: an __HTTP request counter__ and a couple of observable gauges for __CPU usage__ and __ELU usage__.

So let’s spin the application now and send a request to the API. It returns a 401 Not Authorized. Before trying to figure out what’s going on, let’s see if Prometheus and jaeger are actually receiving data.

Step 3: Inspect Prometheus to review we’re receiving data

Let’s look at Prometheus first:
Looking at the HTTP requests counter, we can see there are 2 data points: one for the __API service__ and another one for the __AUTH service__. Notice that the data we had in the Resource is __service_name__ and __container_id__. We also can see the process_cpu is collecting data for the 4 services. The same is true for __thread_elu__.

Step 4: Inspect Jaeger to review we’re receiving data

Let’s look at Jaeger now:
We can see that one trace corresponding to the __HTTP request__ has been generated.

Also, look at this chart where the points represent traces, the X-axis is the timestamp, and the Y-axis is the duration. If we inspect the trace, we can see it consists of 3 spans, where every span represents an __HTTP transaction__, and it has been automatically generated by the instrumentation-HTTP modules:

The 1st span is an HTTP server transaction in the API service (the incoming HTTP request).
The 2nd span represents a POST request to AUTH from API.
The 3rd one represents the incoming HTTP POST in AUTH. If we inspect a bit this last span, apart from the typical attributes associated with the request (HTTP method, request_url, status_code…).

We can see there’s a Log associated with the Span this makes it very useful as we can know exactly which request caused the error. By inspecting it, we found out that the reason for the failure was missing the auth token.

This piece of information wasn’t generated automatically, though, but it’s very easy to do. So in the verify route from the service, in case there’s an error verifying the token, we retrieve the active span from the current context and just call __recordException()__ with the error. As simple as that.

Well, so far, so good. Knowing what the problem is, let’s add the auth token and check if everything works:

curl http://localhost:9000/ -H “Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiIiLCJpYXQiOjE2NjIxMTQyMjAsImV4cCI6MTY5MzY1MDIyMCwiYXVkIjoid3d3LmV4YW1wbGUuY29tIiwic3ViIjoiIiwibGljZW5zZUtleSI6ImZmZmZmLWZmZmZmLWZmZmZmLWZmZmZmLWZmZmZmIiwiZW1haWwiOiJqcm9ja2V0QGV4YW1wbGUuY29tIn0.PYQoR-62ba9R6HCxxumajVWZYyvUWNnFSUEoJBj5t9I”

Ok, now it succeeded. Let’s look at Jaeger now. We can see the new trace here, and we can see that it contains 7 spans, and no error was generated.

Now, it’s time to show one very nice feature of Jaeger. We can compare both traces, and we can see in grey the Spans that are equal, whereas we can see in Green the Spans that are new. So just by looking at this overview, we can see that if we’re correctly Authorized, the API sends a GET request to SERVICE1, which then performs a couple of operations against POSTGRES. If we inspect one of the POSTGRES spans (the query), we can see useful information there, such as the actual QUERY. This is possible because we have registered the instrumentation-pg module in SERVICE1.

And finally, let’s do a more interesting experiment. We will inject load to the application for 20 seconds with autocannon…

If we look at the latency chart, we see some interesting data: up until at least the 90th percentile, the latency is basically below 300ms, whereas starting at least from 97.5%, the latency goes up a lot. More than 3secs. This is Unacceptable 🧐. Let’s see if we can figure out what’s going on 💪.

Step 5: Getting deeper at Jaeger 👀

Looking at Jaeger and limiting this to like 500 spans, we can see that the graph here depicts what the latency char showed. Most of the requests are fast, whereas there are some significant outliers.

Let’s compare one of the fast vs. slow traces. In addition to querying the database, we can see the slow trace in that SERVICE1 sends a request to SERVICE2. That’s useful info for sure. Let’s take a look more closely at the slow trace.

In the __Trace Graph view__, every node represents a Span, and on the left-hand side, we can see the percentage of time with respect to the total trace duration that the subgraph that has this node as root takes. So by inspecting this, we can see that the branch representing the HTTP GET from SERVICE1 to SERVICE2 takes most of the time of the span. So it seems the main suspect is SERVICE2. Let’s take a look at the Metrics now. They might give us more information. If we look at the thread.elu, we can see that for SERVICE2, it went 100% for some seconds. This would explain the observed behavior.

So now, going to the SERVICE2 code route, we can easily spot the issue. We were performing a __Fibonacci operation__. Of course, this was easy to spot as this is a demo, but in real scenarios, this would not be so simple, and we would need some other methods, such as CPU Profiling, but regardless, the info we collected would help us narrow down the issue quite significantly.

So, that’s it for the demo. We’ve created a repo where you can access the full code, so go play with it! 😎

Main Takeaways

Finally, we just want to share the main takeaways about implementing observability with Open Software Tools:

Setting up observability in our Node.js apps is actually not that hard.
It allows us to observe requests as they propagate through a distributed system, giving us a clear picture of what might be happening.
It helps identify points of failure and causes of poor performance. (for some cases, some other tools might also be needed: CPU profiling, heap snapshots).
Adding observability to our code, especially tracing, comes with a cost. So Be cautious! ☠️But we are not going to go deeper into this, as it could be a topic for another article.

Before you go

If you’re looking to implement observability in your project professionally, you might want to check out N|Solid, and our ’10 key functionalities’. We invited you to follow us on Twitter and keep the conversation!

Top 10 N|Solid —APM for Node— features you needed to use

Nearly a year ago, we launched N|Solid SaaS, and although there are still a few months to go before our anniversary, we wanted to share the top 10 features ofN|Solid that make us proud every day of what we have built.

N|Solid is the best way to monitor and secure your Node.js applications (including in production) that are trusted by developers, software teams, and global enterprise companies. It has an array of features like other APMs, but we go deeper with our insights and are more performant than all others. We created N|Solid to help companies build the best software with Node and save time resolving issues. Because there is significant risk in deploying open-source applications without knowing the security gaps, we provide features to prevent security issues and insights for resolving them.

We are confident we’re the best APM solution for Node. js-based applications; if you are using Node, you should be using our runtime. We’re a complete product/solution, not just an APM focused on Node.

About N|Solid

N|Solid is a toolset built on Node.js that provides a number of enhancements to improve troubleshooting, debugging, managing, monitoring, and securing your Node.js applications. It is 100% compatible with the open-source project and requires no instrumentation of your code.

N|Solid Console

N|Solid provides a web-based console, ‘N|Solid Console’ to monitor your applications but also allows you to introspect your Node.js applications, in the same way, directly in the CLI if you run the __N|Solid Runtime__.

N|Solid Runtime

If you want to introspect your Node.js applications and have the most control from your command line, you’ll run them with the N|Solid Runtime, which is shaped similarly to a typical Node.js runtime but provides some additional executables.

To install N|Solid Runtime, download and unpack an N|Solid Runtime from the N|Solid download site.

Why N|Solid is an APM

Traditionally, the acronym APM has been used to refer to application performance management. However, in recent years it also refers, perhaps more correctly, to Application Performance Monitoring, and that is exactly what N|Solid does, which is why its categorization in this spectrum of applications is correct. Something important to highlight is that it is not a polyglot APM; it is clearly an APM specialized in Node.js, which has always been our focus.

While other APMs support Node.js, none provide the level of insights N|Solid can. In many cases, the APMs can become a part of the problem by consuming significant resources due to how they are designed. But don’t take our word for it. You can check it with real data through this OS Project — The APM’s Benchmark tool —.

APM’s Benchmark Tool – Overview Screen

N|Solid APM (Self-hosted or SaaS) is the best observability and insights tool to manage Node performance and security, and the full platform access enables you to really #KnowyourNode

In this blog post, we want to wrap it up our product series, briefly telling you about the 10 main features of N|Solid. We hope you like it and it helps you get the most out of our product.

[1] Project & Applications Monitoring in N|Solid

Visually view application behavior and identify performance and security issues.

With Project & Application Monitoring, you can track a website or any application based on Node.js. This feature allows you to collect your log data to help developers detect bugs and process use, track downtime, and improve performance to be consistent and focused on the end-user experience.

N|Solid APM – Projects & Applications View

This area is mainly made up of 3 main views that use the Projects and Applications and Process Monitoring:
– Applications view
– Application summary view
– Processes view

Read more about this feature here:

[2] Process Monitoring in N|Solid

Access deep performance insights.

The applications and associated processes are displayed in this feature of our N|Solid Console. You can visualize Event Loop Estimated Lag, Heap Used, or CPU Used, for example, and you can correlate these metrics in a planimetry. You can also select a specific process to know its general status and vulnerabilities and choose a specific graphic to visually represent the selected information.

N|Solid APM – Process Monitoring

Read more here:

[3] CPU Profiling in N|Solid

Shows what functions consume CPU% and how resources are allocated.

CPU Profiling allows you to understand where opportunities exist to improve your Node processes’ speed and load capacity. This feature shows what functions consume CPU% and how resources are allocated.

N|Solid APM – Flamegraph-CPU Profile

Read more here:

[4] Worker Threads in N|Solid

View In-depth metrics of each worker thread.

Worker threads are treated first class and have the same access to CPU profiles, snapshots, etc. as the main process. We are the only solution that has full support worker threads.

View In-depth metrics of each worker thread. With this feature, identify opportunities to improve the performance of CPU-intensive work.

Read more here:

[5] Capture Heap Snapshots in N|Solid

Understand where and how memory is being used

Taking heap snapshots is a great way to help identify the underlying problem when faced with a memory leak or performance issue. In this way, you will be able to understand where and how memory is being used, and you will be able to quickly resolve memory leaks and performance issues.

N|Solid APM – Capture Heap Snapshots

Read more here:

[6] Memory Anomaly Detection in N|Solid

View In-depth metrics of each worker thread.

Identify Memory anomalies taken with a more accurate detection method.
– Insights and metrics are historical, before and after the incident happened.
– Get anomalies at different heap usage levels.
– Detect correlation between sets of memory-specific metrics.
– Filter results by specific processes inside your application.

N|Solid APM – Memory Anomaly Detection

Read more here:

[7] Vulnerability Scanning – NCM – in N|Solid

Know all of the potential vulnerabilities within your application.

NCM is security, compliance, and curation tool around the 3rd-Party Node.js & JavaScript package ecosystem. It provides protection against security vulnerabilities and licensing compliance issues and provides risk assessment when working with a 3rd-party ecosystem.

The N|Solid Console can be configured to perform periodic verification of all packages loaded by all N|Solid processes.

N|Solid APM – Vulnerability Scanning from N|Solid Runtime

NCM provides

Actionable insights.
Offline vulnerability scanning.
Prevent processes in an application from launching if they have vulnerabilities with “strict mode.”
NCM-CI (Service Tokens and CI Processes) customization.

__Note__: NCM can be viewed from 3 locations: full overview, per application, and per process.

Read more here:

[8] HTTP Tracing support in N|Solid

Enables the ability to debug application latency and other issues.

HTTP tracing gathers throughput and the lifecycle of any HTTP, DNS, or other types of request.
– Debug latency issues, monitor your services, and more with the collected information.
– See in a timeline graph the density of the number of tracked spans.
– Inspect each span for more detail on the collected trace.
– Filter the results by the attributes of a span and delimit them to the time range.

N|Solid APM – HTTP Tracing Support

Read more here:

[9] Global Alerts & Integrations in N|Solid

Be aware of issues and vulnerabilities. Pre-configured API integrations with key 3rd party services.

You can use automation to trigger alerts over integrations, CPU profiles, or heap snapshots. Be aware of issues and vulnerabilities, Pre-configured API integrations with key 3rd party services.

So when creating the heap snapshot, for example, I will have the notification directly in Slack of my N|Solid Console’s behavior; from there, I can check it by opening the Console.

N|Solid APM – Global Alerts & Integrations – Slack Example

Read more here:

[10] Distributed Tracing in N|Solid

Better understand the factors that affect an application’s latency.

Distributed tracing is a core component of Observability mainly used by site reliability engineers (SREs) but also by developers and is recommended in that way to obtain the greatest benefits as a team in charge of modern distributed software.

As your system scales, you’ll need to add a tracing and refine sampling capabilities, which means getting the context to understand the complexity of distributed architectures.

N|Solid APM – Distributed Tracing

Distributed Tracing provides several solutions, which include:

Monitoring system health
Latency trend and outliers
Control flow graph
Asynchronous process visualization
Debugging microservices

Read more here:

Still, on our roadmap, we are planning and executing features that will shake up the ecosystem in the coming months. Stay tuned! 😎

Top Ten Features In N|Solid

🧭Projects & Applications Monitoring in N|Solid –

🌌 Process Monitoring in N|Solid –

🔍 CPU Profiling in N|Solid –

🕵️‍♂️ Worker Threads Monitoring in N|Solid –

📸 Capture Heap Snapshots in N|Solid –

🚨 Memory Anomaly Detection in N|Solid –

🚩 Vulnerability Scanning & 3rd party Modules Certification in N|Solid –

👣 HTTP Tracing Support in N|Solid –

⏰ Global Alerts & Integrations in N|Solid –

🪄 Distributed Tracing in N|Solid –
…and more

Want to try N|Solid?

To check out the top 10 features and more in N|Solid, create your account in sign up or sign in, in the top right corner of our main page. More information is available here.

As always, we’re happy to hear your thoughts – feel free to get in touch with our team or reach out to us on Twitter at @nodesource.