1

Imagine an app like Instagram/Reddit with a feed of posts.

Problem: We want to show users posts they have not seen before.

When the user first opens the app, we retrieve 30 latest posts from the backend and show them to the user.

The user sees 10 posts and leaves the app.

At this point, the user has seen post with id 30 - 21 and not seen post with id 20 - 1

When the user comes back the next day, the database in total has 40 posts.

This time, when the API retrieves the latest posts, it also fetches 10 of the same old posts that the user has already seen.

We want to be able to skip the posts that the user has already seen.

Probable solution:

Divide the entire feed into SEEN and UNSEEN ranges. These ranges can be identified by their start and end markers, AKA post IDs.

For example, when the user leaves the app for the first time, we store it as a range. -> (30,20)

This range identifies posts already seen by the user.

Next time when the user opens the app, we send these ranges to the API, and then the API filters posts such as posts on either side of this range are returned. Posts between this range are not.

Sample SQL:

SELECT * from post where Id > 30 OR Id < 20

If we have multiple ranges, for example (50, 40) and (30, 20) the SQL becomes:

SELECT * from post where (Id > 50) OR (Id < 40 AND Id > 30) OR (Id < 20)

Essentially, we are dividing the feed into black/white or seen/unseen markers.

However, while this seems plausible, there are cases when the ranges become overlapping. For example, when the user opens the app with 50 posts, he/she might actually go down to post id = 7 and then leave the app. The correct range for this user should now only be (50, 7). How would these be efficiently merged then?

Are there any other/better solutions to this problem?

code
  • 121
  • 3

1 Answers1

3

I would recommend not to tamper with ID ranges, this becomes overly complicated and makes unjustified assumptions about the ID value ranges or the order in which a users looks at the blog posts.

The posts have unique IDs. Store the IDs of the seen posts per user somewhere, maybe in a table "seen_posts". Then use something along the lines of

 SELECT * from post WHERE Id NOT IN 
          (SELECT Id FROM seen_posts WHERE user_id=:uid);
Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • Would this be performant if the number of seen posts are 2000 ? What about when they are 10000? – code Jan 08 '21 at 14:31
  • 2
    @AbhishekSoni - this is going to perform fine. Joins in SQL are not as slow as people think. Modern relational databases were built for this kind of use case --- and highly optimized for it. – Greg Burghardt Jan 08 '21 at 15:24
  • And really this should be accomplished using JOINs: `SELECT ... FROM post JOIN user ON ... JOIN seen_posts ON ...` – Greg Burghardt Jan 08 '21 at 15:26
  • @GregBurghardt: you are aware that this is a "NOT IN" clause, not an "IN" clause? – Doc Brown Jan 08 '21 at 15:27
  • Ick. Sorry about that. Disregard my previous comment. – Greg Burghardt Jan 08 '21 at 15:28
  • 2
    @AbhishekSoni: how performant this is depends heavily on the database, but for most serious systems, I would expect exactly what Greg wrote. Moreover, there is no guarantee a range based approach would become quicker - the opposite may be true. Guessing around won't bring you further. I would recommend to try this out and measure the performance. – Doc Brown Jan 08 '21 at 15:35
  • Thanks @DocBrown. It works fine for smaller entries, and I figured out a balance so as to keep this list small: I only need to ever care about the latest X posts in the database. It's practically impossible for a "legit" user to scroll past that many posts, given new content is always getting created. Thanks a lot. – code Jan 08 '21 at 18:38
  • Thanks @GregBurghardt Gonna go with this solution – code Jan 08 '21 at 18:39