Welcome to The Future of Software Development: Powered by Telemetry, Security, and AI

We made some big announcements during our keynote at Collision in Toronto; our AI Assistant, Adrian, and the open sourcing of our Node.js Runtime, N|Solid Runtime. They are a big part of our vision for the future of software development, one that is powered by telemetry, security, and AI – which was the topic of our talk. In this post we will share more about our vision and specifically how NodeSource is enabling that future.

NodeSource began as most great companies do; with smart, passionate people that saw a problem they had to fix: there was simply no good tooling for Node.js. We were Node believers and open-source project contributors on a mission to make Node more accessible for developers & safe for enterprises to adopt. Since our beginning we have provided the ecosystem with our insights, training, and binary distributions of the open-source packages – over 110 million downloads in the last year alone – powering Node applications in production all over the globe.

As a result of countless hours of ideating, coding and customer validation, N|Solid was born – an enterprise grade tool providing the deepest insights with the lowest overhead, all while continuing to keep ode apps secure. Today N|Solid is used by some of the largest organizations and developers globally. The mission that was set all those years ago is now more relevant than ever, over 30 million websites rely on Node.js and it’s one of the most used and loved technologies by developers worldwide. It’s been an amazing journey.

Revolutionizing Software Development: Advancing Telemetry, Security, and Efficiency with AI Innovation

We have continued to innovate, with our Node experts pushing to create the most advanced telemetry and security platform possible while still providing customers with world class support for Node. We have always believed that giving the very foremost data and insights was the best way to produce better software. Making software is continually challenging; the software development life cycle (SDLC) is highly inefficient.

You begin with an idea that you turn into code, then it gets built, tested and released for users to experience. Then you monitor for issues that are identified to triage and solve. Those fixes are added to other features to build, test, and release…and the cycle continues. While significant effort has been applied to make this process more efficient, these have invariably been small improvements. Tweaks really, to the overall process. Until now, fueled by the advancements in AI.

We believe that the future of software is intelligent software engineering, powered by telemetry, security & AI. The SDLC is augmented by applying AI that is trained with the right data to accelerate the production and maintenance of secure, highly performant code. It’s about building a new model – a generative loop based upon the intent of the code and its actual operation in production – bringing data and AI into the process in powerful new ways.

On the front end, AI is redefining the way code is written, from ChatGPT to GitHub’s CoPilot and beyond, Generative AI is creating code, documenting it, and writing test plans. These advancements are set to revolutionize the software development process on their own, replacing the often used copy/paste of code found from Google, StackOverflow, or existing codebases. Developers that leverage these new tools will have dynamically increased velocity of their code while still owning the solution. While significant, this is only a part of the solution of the future.

Unveiling the True Measure of Software: Quality and Performance in Production

The reality is that the quality and performance of software is only realized once it is in production, in use by real users. The telemetry data in this application is a key component for transforming the SDLC. How software performs in production, not how well software did in a test environment, is the true test for quality code. And the depth of that telemetry data is how you identify issues. This has long been our focus, not just to report on general metrics, but to go well beyond. This is why we established the measure of event loop utilization, worker thread monitoring, and more, to enable deep insights into application health and performance.

Application health is directly tied to security, more than ever today quality code is secure code. But, security is not static, there are new vulnerabilities that develop all the time. The visibility to these is critical, especially for production code. It’s why we offer our security tooling, NCM (Node Certified Modules) as a part of our platform. Enabling customers to have visibility to security issues from development and production live code.

It’s the depth of data and security health that unlock the opportunities with AI. It’s the other half of the equation of the future of software, powered by telemetry, security and AI. This is the future NodeSource is enabling.

N|Solid – the future of Node bringing the power of data and AI

With the announcement of our AI Assistant, “Adrian”, we are leveraging our unique and unparalleled data to help developers identify and resolve issues with tremendous speed and efficiency. Adrian will help every Node developer and devops engineer to not just view the telemetry, security, and alerts that matter – but to understand them, know their context and how to solve for them. It’s a game changer. It takes the power of the most advanced observability tool and the specific context of each application combined with our AI to resolve code issues fast.

Furthermore, our AI tools will assess code quality, identify cost optimizations, generate code and more. It’s like ‘god mode’ for Node.

This is the next step in our journey toward the future state of the SDLC. If you want to experience what Adrian can do, sign up HERE for our early access beta list and we will notify you when you can join the software development revolution.

About NodeSource, Inc.

NodeSource, Inc. is a technology company completely focused on Node.js and is dedicated to helping organizations and developers leverage the power of this technology. We offer the leading APM for monitoring and securing Node.js and provide world-class support and consulting services to help organizations navigate their Node.js journey. #KnowYourNode. For more information, visit NodeSource.com and follow @NodeSource on Twitter.

AIOps Observability: Going Beyond Traditional APM

AIOps is an emerging technology that applies machine learning and analytics techniques to IT operations. AIOps enables IT teams to leverage advanced algorithms to identify performance issues, predict outages, and optimize system performance. Nodesource sees significant advantages for developers and teams to increase software quality by leveraging AIOPS. We have extended our platform’s (N|Solid) observability capabilities to include AIOps, enabling developers to leverage advanced machine learning and analytics techniques to optimize their Node.js applications.

Our N|Solid platform provides the most advanced visibility into Node.js applications, enabling developers to quickly identify performance issues, detect security vulnerabilities, and troubleshoot errors. N|Solid achieves this level of observability through real-time performance monitoring, comprehensive metrics, and detailed instrumentation of Node.js applications.

Last year, we integrated OpenTelemetry into our runtime and were nearing the release of an extension of this layer into our console. This advancement will further extend our platform to support AIOps. Santiago Gimeno, a Senior Architect, sums up our vision of the integration with OTel (Open Telemetry) and N|Solid:

“In today’s world, where applications are becoming more complex and distributed, having a good observability solution is more important than ever. The emergence of OpenTelemetry as the de-facto standard for observability is key. It allows application developers to select solutions that adapt better to their needs. Even more, it allows for healthy competition between observability solution vendors. We support this approach and continue to take steps to ensure N|Solid stays compliant with the OpenTelemetry specification, so everyone can use what we believe is the best observability solution for Node.js.”

Key differences between an APM and Observability

APM (Application Performance Management) and observability are both methods of monitoring and managing the performance and health of software applications, but there are key differences between the two:

Scope: APM is focused on monitoring the performance of applications, while observability is a more comprehensive approach that includes monitoring the infrastructure and application stack, as well as the performance of individual services.

Metrics: APM typically relies on predefined metrics and thresholds to identify performance issues, while observability takes a more flexible approach to collect a wide range of data, including logs, metrics, and traces.

Root cause analysis: APM is designed to quickly identify the root cause of performance issues, often through alerting and automated remediation, while observability takes a more holistic approach that emphasizes the need to understand the relationships between different parts of the system to identify and fix issues.

Proactivity: APM is often reactive, focusing on identifying and fixing issues as they arise, while observability is more proactive, focusing on continuous monitoring and analysis to identify potential issues before they become critical.

Tooling: APM is often built around specific tools and technologies designed to monitor and analyze application performance, while observability is more flexible and adaptable, focusing on integrating a wide range of tools and technologies to provide a comprehensive system view.

You need to have both; they are important to monitoring and managing the performance and health of software applications.

This is why N|Solid is not only an APM but also has observability within its functionalities. And now, with the implementation of ML and SBOM, it goes beyond APM and supports the growing discipline of AIOps.

AIOps: Fundamental concept in Modern IT Operations

AIOps (Artificial Intelligence for IT operations) is an approach to IT operations that leverages machine learning and artificial intelligence to automate and optimize IT tasks. AIOps aims to enhance the efficiency and effectiveness of IT operations by leveraging the vast amount of data generated by various IT systems and applications.

Observability refers to IT teams’ ability to observe and understand the behavior of complex systems in real-time, using a combination of monitoring, logging, and analytics tools.

AIOps and observability enable IT teams to proactively monitor and manage IT systems, applications, and infrastructure, allowing them to identify and resolve issues quickly. AIops uses machine learning and AI algorithms to identify patterns in large amounts of data, while observability provides the visibility and context needed to understand the behavior of complex systems.

Modern Observability in Place

Support for open-source tracing tools and standards like OpenTelemetry facilitates team collaboration in resolving issues. Open Telemetry is the second most active CNCF project, behind only Kubernetes, showcasing its importance to the industry.

_Image Twitter Michael Haberman @hab_mic
_

Following this standard, N|Solid (since N|Solid 4.8.0) supports OTEL:
– Implements the OpenTelemetry TraceAPI, allowing users to use the de-facto standard API to instrument their code.
– It supports using many instrumentation modules available in the Open Telemetry ecosystem. It supports exporting traces using the Open Telemetry Protocol(OTLP) over HTTP.
– With this feature is now possible to send N|Solid runtime monitoring information (metrics and traces) to backends supporting the Open Telemetry standard like multiple APMS (Dynatrace, Datadog, Newrelic, etc.).

Additionally, we included OTEL in the ‘APM performance dashboard,’ an open-source tool we released to the community, enabling developers and organizations to understand the impact of APM tools’ performance.

_Img APM’s Performance Dashboard View
_

Recent enhancements to the tool include the following:

Updated the data with N|Solid 4.8.0 -> 16.16.0 and 14.20.0
Added a few new tests: especially with different solutions for graphql.
Added more APMs: opentelemetry, AppDynamics

Added testing of N|Solid against Datadog, Dynatrace, and NewRelic.

Do you want to implement OTel in your Node.js application? ????

Enriching telemetry data with metadata is an important aspect of observability, and OpenTelemetry provides a flexible and extensible framework for doing so. However, there can be challenges in implementing this in practice, especially when dealing with multiple tools and technologies.

One approach to addressing this challenge is to use a centralized configuration management tool to ensure consistency in metadata enrichment across your observability stack. Please review the following articles to give you an accurate guide to implementing Opentelemetry in your project.

Enhance Observability with Opentelemetry tracing – Part 1

Instrument your Nodejs Applications with Open Source Tools – Part 2

However, if you want this implementation out of the box and have other useful features, we invite you to try N|Solid.

Conclusion

N|Solid Supports OpenTelemetry Features, Integrates SBOM and ML at its Core.

By supporting OpenTelemetry features, N|Solid provides seamless integration with this framework, enabling customers to understand their applications, infrastructure behavior, and performance. This integration enhances the ability of developers and operators to troubleshoot issues, identify bottlenecks, and optimize application performance.

N|Solid’s integration of Software Bill of Materials (SBOM) provides a comprehensive list of all software components used in an application, including open-source libraries and dependencies, which helps organizations to mitigate security risks and ensure compliance with regulations. By integrating SBOM at its core, N|Solid provides organizations with an efficient and reliable way to manage the security and compliance of their software applications.

Finally, N|Solid’s integration of machine learning (ML) at its core is another critical feature that helps to identify patterns and anomalies in data, allowing developers and operators to gain insights that are not easily detectable using traditional monitoring tools. This integration of ML at the core of N|Solid enables organizations to improve the overall reliability, performance, and security of their applications and services.

N|Solid’s support of OpenTelemetry features, integration of SBOM, and integration of ML at its core provides developers and operators with a comprehensive set of tools to manage and optimize their applications and infrastructure, making N|Solid a valuable platform in the modern software development and operations landscape.

Ready to connect?

If you want to know more about our APM’S Benchmark project and get the most out of your Node.js application, read this incredible article by our VP of engineering, Adrián Estrada, ‘In-depth analysis of the APMs performance cost in Node.js.

We also invite you to ????️ Use the ✨APM’s Performance Dashboard✨here:
???? Read the full blog post here: https://nsrc.io/4xFaster
???? Contribute here: https://github.com/nodesource/node-APMs-benchmark
If you have any questions, please contact us at [email protected] or through this form.

Experience the Benefits of N|Solid’s Integrated Features
Sign up for a Free Trial Today

To get the best out of Node.js and experience the benefits of its integrated features, including OpenTelemetry support, SBOM integration, and machine learning capabilities. Sign up for a free trial and see how N|Solid can help you achieve your development and operations goals. #KnowyourNode

Nodesource introduces Machine learning on its N|Solid platform to help make better Node Apps

N|Solid is an incredibly versatile platform for helping developers and devops engineers build and manage highly performant and secure Node.js web applications. With the advancement of machine learning you can unlock even more potential. Our M/L solution is a powerful tool that can increase the quality of user experience and boost efficiency for organizations with their Node.js applications. In this article, we’ll explore what machine learning is and how you can use it within N|Solid, pluswe’ll provide tips and best practices for leveraging this new capability to get the most out of your Node.js project.

AI – growing in value in the software development lifecycle

Img #1 AI vs ML concepts

Put in context, artificial intelligence refers to the general ability of computers to emulate human thought and perform tasks in real-world environments, while machine learning refers to the technologies and algorithms that enable systems to identify patterns, make decisions, and improve themselves through experience. — https://ai.engineering.columbia.edu

The technology world has been abuzz with the growing hype of artificial intelligence (AI). This is understandable as AI promises to revolutionize business and everyday life; from self-driving cars to automated customer service, AI will shape the future of our civilization. As technology continues to advance, the potential applications for AI are seemingly endless.

AI and ML (Machine Learning) are closely related, but not identical. AI is the broader concept of machines being able to perform tasks that would normally require human intelligence, such as visual perception, speech recognition, decision-making, and language understanding. ML is a specific subset of AI that is focused on the development of algorithms and statistical models that allow computers to “learn” from data, without being explicitly programmed. In other words, ML is a method for achieving AI.

ML and AI can help developers build better software in several ways. Some examples include:

Automating repetitive tasks: ML algorithms can be used to automate repetitive tasks that would otherwise require human intervention. For example, a ML model could be trained to automatically classify and categorize emails, reducing the need for manual sorting.

Improving software performance: ML algorithms can be used to optimize the performance of software systems. For example, a ML model could be trained to predict the load on a server, allowing the software to dynamically adjust its resource usage in response.

Enhancing the user experience: AI-powered software can provide a more personalized and intuitive experience for users. For example, a chatbot powered by natural language processing (NLP) could be used to provide customer service, or a recommendation system powered by ML could be used to suggest products to customers.

Predictive Maintenance: AI and ML algorithms can be used to predict when a machine or equipment is likely to fail, allowing maintenance to be performed before the failure occurs.

Identify and Fix Bugs: AI and ML can be used to automatically identify and fix software bugs, reducing the need for human intervention.

Improve Cybersecurity: AI and ML can be used to identify and mitigate cyber threats and detect suspicious activity on a network, which help to improve cybersecurity.

We believe there is great promise for developers to leverage new tooling that helps them focus on the solution and resolve issues as fast as possible, reducing security risks and deliver amazing user experiences. We see AI and ML as a major step forward to build better software.

Node.js expose the potential of AI.

Img 2 – AI Frameworks

We believe Node.js is a powerful technology for leveraging the potential of AI. It allows developers to easily create and manage AI applications, as it features extensive APIs for interacting with AI-related services. With Node.js, developers can create AI-backed applications that can be deployed across various platforms, making it an invaluable asset for businesses looking to leverage the power of AI.

The combination of Node.js and AI will also make it possible to create sophisticated applications that can interpret data in real-time, allowing businesses to improve their customer experience dramatically. As AI advances, Node.js will be a key tool in helping developers make the most out of the technology.

Recently there are several AI projects that are ushering a massive wave of exploration. OpenAI and its ChatGPT has become one of the fastest tools ever adopted. We are impressed with the incredible progress of the OpenAI project and many others,we continue to study, experiment, and review implementations of these technologies and their potential for the ecosystem.

Links to other cool resources

GitHub OpenAI: https://github.com/openai/openai-quickstart-node

OpenAI Docs: https://beta.openai.com/docs/quickstart

Already, Node.js is being used by many companies to power their AI-driven applications, and this trend will only continue as more companies seek to take advantage of the power of AI. Node.js also allows developers to quickly set up and deploy AI-driven applications, further accelerating the development process. With Node.js and AI, businesses can create smarter, faster, and more efficient applications.

Nodesource Introduces Machine learning in N|Solid platform

N|Solid is a Node.js platform with an integrated AI development environment.

This feature allows for training models that will later detect similar patterns in your application data and fire custom events.

It also offers advanced analytics capabilities and support for various AI technologies, making it a powerful tool for businesses looking to capitalize on the potential of AI.

Img 3 – ML Feature Cover

N|Solid is part of a larger trend toward making AI and ML more accessible to developers, helping to utilize these advancements to deliver software solutions.. By providing an integrated platform for Node.js in production, N|Solid is making it easier for businesses to create sophisticated AI-driven models and reap the benefits that come with them.

Developers can start using this new feature in N|Solid immediately to:

Identify performance issues and present insights to resolve quickly
Apply insights across multiple applications
Smart analysis and detection of common Node.js performance issues with the bundled models we provide
Training of custom models to detect specific problems
Global notifications and events tracking for processes and applications

Below you will see ML in action inside N|Solid.

Machine Learning UI

In the N|Solid Console, the Machine Learning feature can be accessed from the app summary or process detail views.

Each handles different data sets and will have a different effect on the model you train.

Training ML Models

The Machine Learning models can be trained using two kinds of data sets. The models trained in the app summary view will use the aggregated data of all the processes running inside the app.

On the other hand, the models trained in the process detail view will use process-specific data.

Train a model in the app summary view.

When a process/app is first connected, it will take a certain amount of data to be successfully trained; you will find a progress loader under process configuration:

To train a model in an app summary page, click on Train ML Model button.

Train a model in a process detail view.

To train a model in a process detail page, click on Train ML Model button.

Modal creation and training

After clicking on the Train ML Model button, a modal will open; here, you can create, filter, and train models; this modal is the same for both pages.

To create a model, click on CREATE NEW MODEL.

Name and briefly describe the model, then save.

Select the created modal and click on ‘TRAIN.’

When the trained model finds a data pattern similar to the one it was trained with, it will fire an event and show a banner on top of the navbar.

Click on View Event to be redirected to the events tab; here, you will find the most recent machine learning event.

The events will also appear in the application status section; clicking on VIEW ANOMALIES will redirect to the events tab.

Manage the default and custom models.

Machine Learning models can be administered in the settings tab, where you will find a set of default models and the user-trained models; here, the frequency of events being fired can be modified, and the custom user models can be deactivated, deleted, or edited.

For a full reset of the created models, click on RESET MODELS.

Custom user models have edit and delete icons; these models are found beneath the default models.

PLEASE NOTE Only the name and description of the user-created model can be edited; if you want to change the model data, please retrain the model in-app summary or in the process detail pages. Default models are activated by default; these can only be activated or deactivated.

Our Machine learning feature has been live since November 2022; if you want to review the official documentation, you can do it here.

One Last Thing…

To get the best out of Enterprise Node.js, start a free trial of N|Solid SaaS, an augmented version of the Node.js runtime, enhanced to deliver low-impact performance insights and greater security for mission-critical Node.js applications.

Instrument your Nodejs Applications with Open Source Tools – Part 2

As we mentioned in the previous article, at NodeSource, we are dedicated to observability in our day-to-day, and we know that a great way to extend our reach and interoperability is to include the Opentelemetry framework as a standard in our development flows; because in the end our vision is to achieve high-performance software, and it is what we want to accompany the journey of developers in their Node.js base applications.

With this, we know that understanding the bases was very important to know the standard and its scope, but that it is necessary to put it into practice. How to integrate Opentelemetry in our application?; and although NodeSource has direct integration into its product in addition to more than 10 key functionalities in N|Solid, that extend the offer of a traditional APM, as you know, we are great contributors to the Open Source project, we also support the binary distributions of the Node.js project, our DNA is always helping the community and showing you how through Open Source tools you can still increase the visibility. So through this article, we want to share how to set up OpenTelemetry with Open Source tools.

In this article, you will find __How to Apply the OpenTelemetry OS framework in your Node.js Application__, which includes:

Step 1: Export data to the backend

Step 2: Set up the Open Telemetry SDK
__Step 3__: Inspect Prometheus to review we’re receiving data

Step 4: Inspect Jaeger to review we’re receiving data

Step 5: Getting deeper at Jaeger 👀

Note: This article is an extension of our talk at NodeConf.EU, where we had the opportunity to share the talk:

__Dot, line, Plane Trace!__
__Instrument your Node.js applications with Open Source Software__
Get insights into the current state of your running applications/services through OpenTelemetry. It has never been as easy as now to collect data with Open Source SDKs and tools that will help you extract metrics, generate logs and traces and export this data in a standardized format to be analyzed using the best practices. In this talk, We’ll show how easy it is to integrate OpenTelemetry in your Node.js applications and how to get the most out of it using Open Source tools.

To see the talks from this incredible conference, you can watch all sessions through live-stream links below 👇
– Day 1️⃣ – https://youtu.be/1WvHT7FgrAo
– Day 2️⃣ – https://youtu.be/R2RMGQhWyCk
– Day 3️⃣ – https://youtu.be/enklsLqkVdk

Now we are ready to start 💪 📖 👇

Apply the OpenTelemetry OS framework in your Node.js Application

So, going back to the distributed example we described in our previous article, here we can see what the architecture looks like this after adding observability.

Every service will collect signals by using the OpenTelemetry Node.js SDK and export the data to specific backends so we can analyze it.

We are going to use the following:

JAEGER for Traces and Logs.

Prometheus to visualize the metrics.

_Note: _Jaeger and Prometheus are probably the most popular open-source tools in space.

Step 1: Export data to the backend

How the data is exported to the backends differs:
To send data to _JAEGER__, we will use OTLP over HTTP, whereas for _Prometheus__, the data will be pulled from the services using HTTP.

First, we will show you how easy it is to set up the OpenTelemetry SDK to add observability to our applications.

### Step 2: Set up the OpenTelemetry SDK

First, we have the providers in charge of collecting the signals, in our case __NodeTracerProvider__ for traces and __MeterProvider__ for metrics.
Then the exporters send the collected data to the specific backends.
The Resource contains attributes describing the current process, in our case, __ServiceName__ and __Container. Id’s__. The name of these attributes is well defined by the spec (it’s in the __semantic_conventions module__) and will allow us to differentiate where a specific signal comes from.

So to set up traces and metrics, the process is basically the same: we create the provider passing the Resource, then register the specific exporter.

We also register instrumentations of specific modules (either core modules or popular userspace modules), which provide automatic Span creation of those modules.

Finally, the only important thing to remember is that we need to initialize OpenTelemetry before our actual code; the reason is these instrumentation modules (in our case for __http__ and fastify) __monkeypatch__ the module they’re instrumenting.

Also, we create the __meter instruments__ because we will use them on every service: an __HTTP request counter__ and a couple of observable gauges for __CPU usage__ and __ELU usage__.

So let’s spin the application now and send a request to the API. It returns a 401 Not Authorized. Before trying to figure out what’s going on, let’s see if Prometheus and jaeger are actually receiving data.

Step 3: Inspect Prometheus to review we’re receiving data

Let’s look at Prometheus first:
Looking at the HTTP requests counter, we can see there are 2 data points: one for the __API service__ and another one for the __AUTH service__. Notice that the data we had in the Resource is __service_name__ and __container_id__. We also can see the process_cpu is collecting data for the 4 services. The same is true for __thread_elu__.

Step 4: Inspect Jaeger to review we’re receiving data

Let’s look at Jaeger now:
We can see that one trace corresponding to the __HTTP request__ has been generated.

Also, look at this chart where the points represent traces, the X-axis is the timestamp, and the Y-axis is the duration. If we inspect the trace, we can see it consists of 3 spans, where every span represents an __HTTP transaction__, and it has been automatically generated by the instrumentation-HTTP modules:

The 1st span is an HTTP server transaction in the API service (the incoming HTTP request).
The 2nd span represents a POST request to AUTH from API.
The 3rd one represents the incoming HTTP POST in AUTH. If we inspect a bit this last span, apart from the typical attributes associated with the request (HTTP method, request_url, status_code…).

We can see there’s a Log associated with the Span this makes it very useful as we can know exactly which request caused the error. By inspecting it, we found out that the reason for the failure was missing the auth token.

This piece of information wasn’t generated automatically, though, but it’s very easy to do. So in the verify route from the service, in case there’s an error verifying the token, we retrieve the active span from the current context and just call __recordException()__ with the error. As simple as that.

Well, so far, so good. Knowing what the problem is, let’s add the auth token and check if everything works:

curl http://localhost:9000/ -H “Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiIiLCJpYXQiOjE2NjIxMTQyMjAsImV4cCI6MTY5MzY1MDIyMCwiYXVkIjoid3d3LmV4YW1wbGUuY29tIiwic3ViIjoiIiwibGljZW5zZUtleSI6ImZmZmZmLWZmZmZmLWZmZmZmLWZmZmZmLWZmZmZmIiwiZW1haWwiOiJqcm9ja2V0QGV4YW1wbGUuY29tIn0.PYQoR-62ba9R6HCxxumajVWZYyvUWNnFSUEoJBj5t9I”

Ok, now it succeeded. Let’s look at Jaeger now. We can see the new trace here, and we can see that it contains 7 spans, and no error was generated.

Now, it’s time to show one very nice feature of Jaeger. We can compare both traces, and we can see in grey the Spans that are equal, whereas we can see in Green the Spans that are new. So just by looking at this overview, we can see that if we’re correctly Authorized, the API sends a GET request to SERVICE1, which then performs a couple of operations against POSTGRES. If we inspect one of the POSTGRES spans (the query), we can see useful information there, such as the actual QUERY. This is possible because we have registered the instrumentation-pg module in SERVICE1.

And finally, let’s do a more interesting experiment. We will inject load to the application for 20 seconds with autocannon…

If we look at the latency chart, we see some interesting data: up until at least the 90th percentile, the latency is basically below 300ms, whereas starting at least from 97.5%, the latency goes up a lot. More than 3secs. This is Unacceptable 🧐. Let’s see if we can figure out what’s going on 💪.

Step 5: Getting deeper at Jaeger 👀

Looking at Jaeger and limiting this to like 500 spans, we can see that the graph here depicts what the latency char showed. Most of the requests are fast, whereas there are some significant outliers.

Let’s compare one of the fast vs. slow traces. In addition to querying the database, we can see the slow trace in that SERVICE1 sends a request to SERVICE2. That’s useful info for sure. Let’s take a look more closely at the slow trace.

In the __Trace Graph view__, every node represents a Span, and on the left-hand side, we can see the percentage of time with respect to the total trace duration that the subgraph that has this node as root takes. So by inspecting this, we can see that the branch representing the HTTP GET from SERVICE1 to SERVICE2 takes most of the time of the span. So it seems the main suspect is SERVICE2. Let’s take a look at the Metrics now. They might give us more information. If we look at the thread.elu, we can see that for SERVICE2, it went 100% for some seconds. This would explain the observed behavior.

So now, going to the SERVICE2 code route, we can easily spot the issue. We were performing a __Fibonacci operation__. Of course, this was easy to spot as this is a demo, but in real scenarios, this would not be so simple, and we would need some other methods, such as CPU Profiling, but regardless, the info we collected would help us narrow down the issue quite significantly.

So, that’s it for the demo. We’ve created a repo where you can access the full code, so go play with it! 😎

Main Takeaways

Finally, we just want to share the main takeaways about implementing observability with Open Software Tools:

Setting up observability in our Node.js apps is actually not that hard.
It allows us to observe requests as they propagate through a distributed system, giving us a clear picture of what might be happening.
It helps identify points of failure and causes of poor performance. (for some cases, some other tools might also be needed: CPU profiling, heap snapshots).
Adding observability to our code, especially tracing, comes with a cost. So Be cautious! ☠️But we are not going to go deeper into this, as it could be a topic for another article.

Before you go

If you’re looking to implement observability in your project professionally, you might want to check out N|Solid, and our ’10 key functionalities’. We invited you to follow us on Twitter and keep the conversation!