Consider application where users rates products (e.g. 1-5 stars).
Through passage of time, there might be millions of records. One can create desired indexes and/or keep sum and count of all ratings to easily get overall average score of particular product or seller.
The problem I am trying to tackle is how to efficiently provide an average score based on ratings from last year (365 days). In other words, any rating that is older than 365 days MUST NOT be considered for average score.
Three most naïve approaches are:
- Calculate it on DB level, using
GROUP BY
andAVG
(or similar DB functions).
- cons: it might analyze thousands of rows to give actual rating
- Have a background process that updates
average_score
once per day for all products and sellers.
- cons:
average_score
is updated only once per day
- Recalculate
average_score
every time new rating is added.
- cons:
average_score
is recalculated too often
My question is if there is better/smarter way to do it, taking into account there might be hundreds of thousands ratings per day?