Skip to content
Lawsuit Help Desk

Lawsuit News Center

Unmasking the Hidden Performance Menace in Magento 2.3: An In-Depth Analysis and Solution for the Distinct Count Search Query Dilemma

Unmasking the Hidden Performance Menace in Magento 2.3: An In-Depth Analysis and Solution for the Distinct Count Search Query Dilemma

"Unmasking the Hidden Performance Menace in Magento 2.3: An In-Depth Analysis and Solution for the Distinct Count Search Query Dilemma"

In the evolving landscape of e-commerce, performance issues can become hidden menaces disrupting user experiences and overall business operations. Delving into the heart of Magento 2.3, a seemingly innocuous query – SELECT DISTINCT COUNT(*) FROM search_query – has emerged as a significant culprit, causing unexpected CPU surges and consequent slowdowns. This article embarks on an in-depth exploration of this issue, revealing its profound effects, while offering practical and innovative solutions to alleviate this performance quandary.

Unveiling the Performance Disruptor: Delving into the Query Issue

To contextualize the dilemma, let's first delve into the root cause. The query raising havoc, SELECT DISTINCT COUNT(*) FROM search_query, is linked to cache management of popular search terms. With the upgrade to Magento 2.3, this seemingly harmless query suddenly became a significant CPU load catalyst, disrupting not only the e-commerce platform's performance but also the user experience. A distinct spike in database CPU load was observed, with the query duration extending with the increasing size of the search_query table.

The core issue lies in the query structure, particularly the DISTINCT operator. In a database context, DISTINCT is used to remove duplicate rows from the result set of a query. However, the search_query table already has a unique constraint on query_text and store_id, making the DISTINCT operator superfluous and adding unnecessary strain on CPU resources.

Exploring the Real-time Impact: Magento Commerce 2.3.4 meets ElasticSearch 6.7.0

The disruptive impact of this query issue was reported on Magento Commerce 2.3.4, integrated with ElasticSearch 6.7.0. For instance, a live site with 2.7 million search terms found the query significantly slowing down its operations. The expected outcome of any query is to finish within a reasonable time frame, but this particular query took an intolerably long time, especially with a large search_query table.

The real-time implications extended to high CPU usage, slow keyword search performance, and a consequent impact on the admin panel's efficiency. These problems were not isolated to the 2.3.4 version but remained relevant in subsequent versions, including Magento 2.4.1-p1, 2.4.4-p1, and 2.4.6, illustrating the ongoing and pressing nature of the issue.

The Culprit Exposed: The Role of num_results >0 Condition and Distinct Operator

On closer examination, the main part of the query that takes time is the num_results > 0 condition. The purpose of this condition is to ensure that the search query returns results and that those results are then cached for future reference. However, this condition puts a lot of strain on the database, particularly with a large search_query table.

The DISTINCT operator in the query is another significant culprit. Removing the DISTINCT operator from the query reduces the query duration substantially. As the search_query table already has a unique constraint on query_text and store_id, the DISTINCT operator is unnecessary, causing additional load without adding any value. In fact, changing the DISTINCT clause to false in the Query/Collection class within Magento improved the query performance dramatically, reducing the duration from 700ms to a mere 2ms. However, one must consider the impact on MySQL inserts, especially for projects with high search term cardinality.

In conclusion, both the num_results > 0 condition and the DISTINCT operator play critical roles in this performance issue. Identifying these culprits is the first step towards devising necessary and efficient solutions.

Drawing from Experience: Real-world Scenarios Showcase the Issue's Pertinence

In the rapidly advancing e-commerce arena, real-world experiences and case studies often provide the most compelling evidence of a problem's true impact, and the Magento 2.3 performance issue is no exception. For instance, live sites with an extensive 2.7 million search terms witnessed the query exerting excessive strain on the database, leading to significant slowdowns. Likewise, keyword searches, a critical feature for user experience, took a painfully long ten seconds to complete on a Magento 2.4.6 Commerce version with 1.4 million rows in the search_query table.

Moreover, the issue was not confined to impacting user experience alone; it also detrimentally affected the admin panel's performance, hampering backend operations. This issue was not only present in the Magento 2.3.4 community version but persisted through Magento's later updates, including the 2.4.4-p1, 2.4.5-p4, and 2.4.6 versions. The real-world impact of this issue clearly underscores its relevance and the urgency to find effective solutions.

Towards a Performance-Efficient Future: Keeping Pace with Magento's Latest Updates

Despite Magento's attempts to resolve this performance issue, illustrated by the c90edaa commit, the problem persists. Therefore, the quest for constructive solutions remains crucial to ensure Magento's performance keeps pace with its latest updates and meets its users' expectations.

The removal of the DISTINCT operator from the queries within \Magento\Search\Model\ResourceModel\Query\Collection emerged as an effective solution. This approach was based on the understanding that the DISTINCT operator was unnecessary due to the unique constraint on query_text and store_id. When deployed, this solution reduced the query duration from a staggering 700ms to a mere 2ms, a significant performance boost.

However, this approach should be supplemented with strategies to manage high search term cardinality, which could also be causing the performance issue. Asynchronous insertion or reducing the number of search terms are viable alternatives worth exploring. Initial testing suggests that adding an index on the search_query table could also improve performance, although further testing is needed to evaluate the impact on write loads.

In conclusion, while the exact solution might vary- depending on individual site conditions, these strategies offer a promising starting point towards a performance-efficient future.

Devising Innovative Solutions: Strategies to Alleviate the Performance Dilemma

To address the performance issue effectively, it is important to be armed with innovative solutions that are practical and adaptable. For instance, overriding the execute function in Magento_CatalogSearch/Controller/Result/Index to only use the getNotCacheableResult part could be a worthwhile approach. Implementing this could revert the performance to its state before the Magento 2.3 update, reducing the strain on the database.

Alternatively, removing the num_results > 0 condition or finding an alternative solution could significantly reduce the query duration. The num_results > 0 condition has been identified as a major contributor to the query's extended execution time, therefore, efforts to modify or replace it could result in substantial performance improvements.

In more extreme cases, where the search_query table has a large number of rows, drastic measures such as stopping the tracking of search terms entirely or only inserting a fraction of the search terms could be considered. While this might seem radical, such bold steps could be the key to alleviating the performance dilemma presented by the SELECT DISTINCT COUNT(*) FROM search_query issue.

In a nutshell, a combination of these innovative strategies could significantly alleviate the performance issue, contributing to a smoother and more efficient Magento experience for users, developers, and administrators alike.

Thus, our in-depth analysis unravels two pivotal factors contributing to the performance dilemma in Magento 2.3: the num_results > 0 condition and the DISTINCT operator. While the removal of the DISTINCT operator from \Magento\Search\Model\ResourceModel\Query\Collection has shown a significant reduction in the query duration, it is worth considering strategies to manage high search term cardinality, such as asynchronous insertion or limiting the number of search terms. Moreover, innovative solutions like overriding the execute function in Magento_CatalogSearch/Controller/Result/Index or altering the num_results > 0 condition could potential unlock further performance improvements. In the final analysis, it is a combination of these measures that promises to alleviate the performance issue, paving the way for a more efficient, user-friendly Magento experience.