Optimization

Is Optimization worth in Node and MongoDB

I always think that optimization is as important as the feature itself. I saw developers ignoring this stage of development and ending up rewriting the whole code. We should give equal importance to optimization as we give it to the feature. Node.JS is best suited for data-intensive applications due to its lightweight IO operations. 

What is Optimization?

Optimization is something that makes our app/feature/API efficient. It improves the performance by choosing the best possible way to execute. 

Why Optimization?

Everything works fine until the data is within a limit. Once the data starts growing our systems/APIs/Apps start showing their true colors, I mean they start lagging and slowing down like a Sloths. Sometimes developers misunderstand this situation and they go for more computation power and more server resources in the name of scalability. This strategy can work temporarily but not for a long time. One day or another your data will again grow and existing resources will fail to handle it. So the long-term solution is optimization. 

How to Optimize?

There are many ways to optimize the Node.JS app at a high level such as Load Balancing, Caching, Clustering, etc. We will focus on low level i.e code level tricks. At this level, there are two important aspects to see, Code Optimization & Query Optimization.

Code Level Optimization

Code is a core part of any application. Optimizing the code boosts the application performance and its efficiency with less amount of resources. Imagine an Application 1 with ordinary written code having a higher amount of server resources like CPU power, Memory, Network connectivity, etc. And Application 2 with optimized code having ordinary server resources. Now, the performance of both applications is the same but if you compare the operating & maintenance costs of these two applications, there is a big difference. The cost of Application 1 is higher than Application 2 and the reason is optimized code. Let’s see a few ways to achieve code-level optimization.

1. Keep it clean

We all like our homes, living rooms, and bedrooms clean and neat. Indirectly clean homes reflect our habits and behavior and it improves our mood and reduces our stress & fatigue. A similar strategy applies to code as well. A well-written, neat & clean code reduces confusion and improves readability & understanding. In short “Pretend your code like your living room, the cleaner the better.” 

Ugly Code:

Clean code:

2. Split it

Even though we have learned about functions and their benefits, developers end up adding too much code into a single function. This makes a function unreadable and heavy for execution thus reducing the efficiency. 

A thumb rule for Functions is “A function should always perform a single task only”. So split the code into suitable functions as per designated tasks.

Too much code in a single function:

Code is split into multiple functions as per designated tasks:

3. Use Guard Clauses with if-else statements

A Guard Clause is a chunk of code at the top of a function or block that serves a similar purpose to a precondition. Basically, when you have something to check/assert at the beginning of a method, do it using a fast return.

4. Dump switch case & else if

The switch case statement looks so clean and neat to our eyes but it’s not so good for execution and it lacks efficiency. The reasons behind the poor performance of the switch case include procedural control flow, Misaligned fallthrough cases, Manually adding break statements, and multiple cases can run which makes it hard to trace logic.  

Similarly multiple else if statements make code looks so ugly and increase the execution time because the compiler has to go through each and every else if to determine the true expression.

So what’s the solution for all of this? Object Literals, yeah objects are the heart of javascript. Almost everything in javascript is associated with objects one way or another. We can transform the switch case or else if statements into Object Literal Lookups.

Traditional Switch Case / Else if:

Object Literal Lookup:

5. Say no to for .. in, forEach and use standard loops with a predefined length

Loops are an essential part of logic. It gives us the ability to repeat the code. In JavaScript, there are tons of ways to loop over arrays and objects and one of them is for .. in. It iterates over object properties and it can be used on arrays as well. JavaScript arrays are specific kinds of objects with some useful properties that help us to treat them as an array but they still have some internal meta properties. So when you use for .. in on arrays it iterates over regular and internal meta properties also, which makes it inefficient.

The second one is forEach which runs a function on every element within an array. This is a great way to loop over arrays but you can’t use a break, return or continue statements inside the function which forces it to loop till the end even if it’s not necessary. 

The simple solution for both of the above statements is to loop till captured/predefined length using standard loop statements like for (), and while () which iterates till the length and accepts the break & continue statements.

Code with for .. in and forEach:

Code with for and while:

6. Bye-bye synchronous functions, Hi Asynchronous functions

What will happen if someone’s car blocks your way while going to the office? Most probably you will be late and feel disturbed. Similarly, synchronous functions are like those cars that block the execution of the engine until its executed. So when execution is blocked, other functions/code won’t execute and response time will get delayed which is directly reducing the efficiency. 

Asynchronous functions on the other hand do not block the execution and each command gets executed one after another even if the previous command has not computed the result.

The previous command runs in the background and loads the result once it has finished the processing. So the Asynchronous functions are champions.

Synchronous code:

Asynchronous Code:

7. Avoid Memory leak

In Node.js a Memory Leak is a silent killer. Basically, it’s an orphan block of memory on the Heap that is no longer used by our application since it has not been released by Garbage Collector. These useless blocks keep growing over time and make your application crash because it runs out of memory. 

There are some best practices to avoid Memory Leaks:

Reduce use of global variables

Since global variables are never garbage collected, it’s best to ensure you don’t overuse them.

When you assign a value to an undeclared variable, JavaScript automatically hoists it as a global variable in default mode. This could be the result of a typo and could lead to a memory leak.

To avoid such surprises, always write JavaScript in strict mode using the ‘use strict’; annotation at the top of your JS file. In strict mode, the above will result in an error. When you use ES modules or transpilers like TypeScript or Babel, you don’t need it as it’s automatically enabled. In recent versions of NodeJS, you can enable strict mode globally by passing the –use_strict flag when running the node command.

Use Stack memory efficiently

Stack variables are the ones that hold dynamic values and a developer has no better options other than to use those. So to use Stack memory efficiently you can: 

1. Remove all unused and unnecessary variables

2. Destructure the object/array and use only required fields in your code instead of using the whole object to pass it as arguments to functions, closures, timers, and event handlers.

8. Outsource the CPU Intensive task to the worker thread

As we all know Node.js is a single process, a single-threaded in nature. It can execute only one set of instructions at a time so there is no issue of concurrency. Even being single-threaded, Node.js is very efficient in handling I/O operations because of Event Loop. So if you try to put some CPU-intensive code in Node.js, it will block the Event Loop and other tasks won’t get executed until it’s done. But here is the solution for it, the Worker Threads. 

Worker Threads are suitable for executing those CPU-intensive tasks in parallel with the Main thread without affecting it. Worker threads are introduced in Node.js V10 and are still in the experimental phase.

Here, the workerData and parentPort are part of the Worker Thread. The workerData is used for fetching the data from the thread and parentPort is used for manipulating the thread. The postMessage() method is used for posting the given message in the console by taking the filename as fetched by workerData.

So if you run index.js, the output will be:

This article is published on Ever Blogs

{ fileName: ‘Ever Blogs’, status: ‘Done’ }

Note: You can read more about worker threads here: https://nodejs.org/api/worker_threads.html

Query Level Optimization

Node.js is best suited for data-intensive I/O applications. It can handle a large number of requests asking for data from I/O resources such as Databases. Most of the modern world applications use cloud Databases which gives us free hand to shift the load to DB, i.e less computation for Node.js and High computation for cloud DB, but is it a good idea to shift the load to cloud DB? The answer is yes. If you are using Cloud DB you should definitely keep minimum computation on your Node.js server and shift all possible complex computation on cloud DB using aggregation. But what if your DB is hosted on your own server? In this case, you should balance the complexity in both Node.js & DB. For balancing the complexity you need to keep your Node.js code & DB queries optimized. 

Query level Optimization is a process of writing a query in such a way that it can handle thousands of records, eat less memory and return output real quick. Let’s see a few ways to achieve Query level optimization:

1. Return only what you really need

Returning unnecessary data/fields is the same as buying unnecessary things. Whereas buying unnecessary things puts an extra financial burden on us and returning unnecessary data puts a burden on the network causing latency. We as developers are always futuristic and think that we will need this field in the future, but believe me, it does not happen frequently and that extra field becomes a burden on the network and on a developer. Why its a burden on the Developer? The time when you try to cut the unnecessary fields from the old query, there is a mental burden on you that “What if this field is needed/used somewhere”, “Everything will break if I cut this field”, “Everyone will blame me if something breaks”, and so on. So the better option here is to return only & only needed fields while you write the query initially.

 You can either use $project or Schema.select (If you are using mongoose):

2. Limit the number of records

Querying the records is simple until it’s in less amount. Once the records grow to thousands or lakhs it’s a huge burden on the network again. Querying the data without limit is like pumping the water from the ocean. So the thumb rule is if you know your data will grow over time, you must limit the number of records being queried.

In practical terms, if you are adding a limit then you must implement pagination. So for that, you can use skip & limit together. If you want to know how it’s done, you can simply visit my blog here: https://everblogs.com/angular/server-side-pagination-in-angular-11/ 

You can either use $limit or Schema.limit (If you are using mongoose):

3. If you use mongoose, reduce the use of .popuate()

Mongoose is a really handy and popular database driver for MongoDB. It has populate() method that let us reference documents from other collections similar to $lookup but NOT the same. $lookup and populate() are much different, populate() is very easy to use but it fetches the populated documents by making extra DB queries. Which makes it a little less efficient than standard $lookup. 

Consider you have 10 reference fields in one collection and if you use populate() method, it will make 10 extra DB fetch queries to populate those fields.

4. Create Indexes

What do we do when we have to find something from a book? We check the index and directly traverse to that page as mentioned in the index instead of going through the whole book. This process saves our time and effort. Similarly, MongoDB Indexes are special data structures [1] that store a small portion of the collection’s data set in an easy-to-traverse form.  

Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.

5. Use $lookup with pipelines

Basically, $lookup is an aggregation stage that joins a document from one collection to a document from another collection. Many developers tend to use $lookup in a traditional way (without pipelines) that populates the whole document with all fields. Sometimes using $lookup in the traditional way (without pipelines) can cause memory overflow and you might need to use the allowDiskUse flag. So to avoid this you can use $lookup with pipelines same as aggregate. One of the most important pipeline stages in $lookup is $project which limits the number of fields to be populated and thus saves the memory.

6. Spend more time on designing the DB schemas, avoid designing it in the SQL way

Planning and designing the Database is a crucial part of software engineering. MongoDB is meant for storing data in raw JSON format, so we should explore this ability. Developers who migrate from SQL to MongoDB sometimes end up the designing schema in SQL way. What exactly is SQL way? SQL way is design tactics that force developers to split tables into smaller dependant/linked tables. MongoDB allows developers to put much of the data into a single document/collection. Unnecessary splitting in MongoDB can cause writing aggregates and populate, which will put a further load on MongoDB itself.

Conclusion

Optimization is as important as development. Development without optimization is not sustainable and it will increase the risk and cost of application & its maintenance. However prioritization of optimization changes from developer to developer, so it’s always better to start now.  

You Might Also Like
%d bloggers like this: