Business Intelligence Tool For Summarising Reviews Of Samsung's Electronic Products Retailed On Amazon.com
Problem statement: Provide a brief report to the CEO of Samsung that includes only critical information regarding reviews of Samsung’s products retailed on Amazon.com.
The objective of this work is to develop a tool that could provide useful information to the CEO using smarter and faster analysis of large amounts of data.
Figure 3 illustrates the overall project plan. More than a quarter million reviews have been written regarding the 1000 products that Samsung sells on Amazon.com. The chief executive would like to know whether any interesting trend was observed in the reviews related to any product and the reasons for such trends.
Firstly, I define a sharpness index that records the number of spikes in the variation of average monthly rating. I also keep track of the total number of monthly reviews with respect to time to ensure that the observed variation in rating is an actual trend and not dominated by outliers. Based on the aforementioned criteria, I short list 20 products that have had an unusual degree of variation (either positive or negative) in their ratings. Subsequently, I plot the variation in average rating per month and total number of reviews per month to obtain further clues regarding the trends in the reviews related to the short listed products. Once, I confirm that there is an interesting trend in the reviews, I carry out topic modeling using Latent Dirichlet Allocation to identify hidden topics and summarise them by identifying the document that is closest to the centroid of the documents dominated by a particular topic in the LDA space.
The plots of average rating and total number of monthly reviews for a particular product of interest are shown in Fig. 4 and Fig. 5. As it can be observed, the average rating undergoes plenty of ups and downs. But towards during 2013-2014 of the curve, i.e., more recent months, the values are generally lower than 3.5 on a more consistent basis.
We have very few reviews per month at beginning between 2011-2012, where as there are many more reviews between 2013 and 2014. The average monthly rating associated with months that have a low number of reviewers is not trustworthy as these values are dominated by outliers such as people who tend to have extreme opinions get a greater weight during those months. However, there is a general increase in the number of reviewers between 2013-2014; and it coincides with the period where the average rating has been below 3.5 consistently. To understand the reasons, I carry out topic modeling of the reviews of this product. The results of topic modeling are available at the following web-link.
From the summary of reviews of the detachable Multi-Travel Charger it can be observed that that 20.7% documents have a highest weightage of topic 1 in them and the representative document corresponding to such reviews is present in the third column of the first row. This is the type of summary that provides useful insights to CEO in the time available to to him/her.