In this blog post, we'll delve into a survey invitation and management service built on Node.js. We'll discuss how we identified and resolved a critical issue related to memory consumption that was causing the service to run out of memory and how we successfully optimized the system for better performance.
During the week of deploying our new survey invitation and management service, we encountered a perplexing issue: the Node.js application consistently ran out of memory during end-to-end testing. This service was designed to determine survey eligibility based on various factors, including survey progress and demographic quotas. Being event-driven and built on Node.js, the service seemed to exhibit a memory leak or intense memory usage.
We initially built a testing tool that simulated survey responses. This tool was essential, as manually answering surveys would have been time-consuming and impractical for surveys requiring hundreds of responses. With this simulation, we we able to identify performance bottlenecks and other anomalies.
As we simulated responses for approximately 600 participants, we carefully monitored logs for any unusual behavior or errors. We encountered a fatal error message: "JavaScript heap out of memory." This indicated a severe memory issue.
We confirmed this by tracing Node's heap memory usage by running our node process with the --trace-gc flag.
We embarked on a codebase review to identify any sections of the code that might contribute to memory leaks or intense memory usage. Our investigation led us to two main areas of concern:
The first issue we encountered was related to the database queries responsible for fetching information about survey participants. These queries were returning anywhere from 10,000 to 100,000 documents, which were then being converted into arrays. This process was memory-intensive, as both the document and the array had to be allocated to memory.
export const getRefcodesBySurveyID = async (surveyID) => {
const responses = await responseGraph
.find({
surveyID,
status: {
$in: ["open", "qualified"],
},
deleted: {
$exists: false,
},
})
.toArray();
const allRefcodes = responses.map((el) => el.refcode);
const answeredStatuses = ["qualified"];
const answeredResponses = responses.filter((el) =>
answeredStatuses.includes(el.status)
);
return {
invitedRefcodes: allRefcodes,
answeredRefcodes: answeredResponses.map((el) => el.refcode),
};
};
The second issue involved callback functions used for event processing. Each event was creating a callback function. The use of callbacks introduced overhead that was impacting memory usage. The callback functions were passed to event handlers, leading to unnecessary memory consumption.
const eventProcessedCb = async () => {
await events.setEventProcessed(SURVEY_CHANNEL, eventId);
};
if (eventName === "SURVEY_LAUNCHED") {
await surveyLaunched(ctx, data, eventProcessedCb);
} else if (eventName === "ADD_CUSTOM_REFCODES") {
await addCustomRefcodes(ctx, data, eventProcessedCb);
} ...
To tackle the memory-intensive queries, we made a change to our approach. Instead of letting the Node.js process handle the conversion of document to arrays, we modified the queries to use MongoDB's aggregation feature. By grouping documents using an empty or null key and pushing each document into an array, we achieved the same result while significantly reducing memory consumption as MongoDB does the conversion for us.
export const getRefcodesBySurveyID = async (surveyID) => {
const responsesAggregate = await responseGraph
.aggregate([
{
$match: {
surveyID: surveyID,
status: {
$in: ["open", "qualified"],
},
deleted: {
$exists: false,
},
},
},
{
$group: {
_id: "",
responses: {
$push: {
refcode: "$refcode",
status: "$status"
}
}
}
}
]).toArray();
const responses = responsesAggregate[0]? responsesAggregate[0].responses: [];
const allRefcodes = responses.map((el) => el.refcode);
const answeredStatuses = ["qualified"];
const answeredResponses = responses.filter((el) =>
answeredStatuses.includes(el.status)
);
return {
invitedRefcodes: allRefcodes,
answeredRefcodes: answeredResponses.map((el) => el.refcode),
};
};
For the callback-related issue, we restructured our code to eliminate the need for excessive callback functions. We integrated the event processing directly into a try-catch block after a series of switch statements.
switch (eventName) {
case "SURVEY_LAUNCHED": {
await surveyLaunched(ctx, data);
break;
}
case "ADD_CUSTOM_REFCODES": {
await addCustomRefcodes(ctx, data);
break;
}
...
...
...
case "TRIGGER_REFCODE_RETRIEVAL": {
await fireSixtyNine(ctx, data);
break;
}
}
await events.setEventProcessed(SURVEY_CHANNEL, eventId);
After implementing these changes, we re-ran our end-to-end tests and closely monitored the system's behavior. The outcomes were positive. The memory usage patterns shifted from gradual increases leading to crashes to a consistent memory footprint of around 100MB (confirmed using nodejs's --trace-gc flag mentioned above) during event processing. This demonstrated that the JavaScript garbage collection mechanism was functioning as intended.
Our experience highlights the importance of investigation, optimization, and problem-solving in the field of software development. The ability to identify memory-related issues, understand their causes, and implement effective solutions is crucial for ensuring the reliability and performance of complex applications.
Through our efforts, we not only resolved a critical memory-related issue but also gained insights into enhancing the overall scalability our application.