Introduction to recommender systems

[Early, early draft]

This chapter introduces recommender systems (commonly called RecSys), tools that recommmend items to users. Many of the most popular uses of recommender systems involve to suggesting products to customers. Amazon, for example, uses recommender systems to choose which retail products to display. Recommender systems aren’t limited to physical products. For example, the algorithms that Pandora and Spotify use to curate playlists are recommender systems. Personalized suggestions on news websites are recommender systems. And as of this writing, several carousels on the home page for Amazon’s Prime Videos’s contain personalized TV and Movie recommendations.

I (Zack) have honestly no idea why Amazon wants me to watch Bubble Guppies. It’s possible that Bubble Guppies is a masterpiece, and the recommender systems knows that my life will change upon watching it. It’s also possible that the recommender made a mistake. For example, it might have extrapolated incorrectly from my affinity for the anime Death Note, thinking that I would similarly love any animated series. And, since I’ve never rated a nickelodean series (either postiively or negatively), the system may have no knowledge to the contrary. It’s also possible that this series is a new addition to the catalogue, and thus they need to recommend the item to many users in ordder to develop a sense of who likes Bubble Guppies. This problem, of sorting out how to handle a new item, is called the cold-start problem.

A recommender system doesn’t have to use any sophisticated machine learning techniques. And it doesn’t even have to be personalized. One reasonable baseline for most applications is to suggest the most popular items to everyone. But we have to be careful. Depending on how we define popularity, we might create a feedback loop. The most popular items get recommended which makes them even more popular, which makes them even more frequently recommended, etc.

For services with diverse users, however, personalization can be essential. Diapers are among the most popular items on Amazon, but we probably shouldn’t recommend diapers to adolescents. We also probably should not recommend anything associated with Justin Bieber to a user who isn’t an adolescent. Moreover, we might want to personalize, not only to the user, but to the context. For example, just after I bought a Pixel phone, I was in the market for a phone case. But I have no interested in buying a phone case one year later.

Many ways to pose the problem

While it might seem obvious, that personalization is a good strategy, it’s not immediately obvious how best to articualate recommendation as a machine learning problem.

Discuss: * Rating prediction * Passive feedback (view/notview) * Content-based recommendation

Amazon review dataset

  • introduce dataset
In [5]:
import mxnet
import mxnet.ndarray as nd
import urllib
import gzip
In [10]:
with gzip.open(urllib.request.urlopen("http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Grocery_and_Gourmet_Food_5.json.gz")) as f:
    data = [eval(l) for l in f]

In [11]:
data[0]
Out[11]:
{'asin': '616719923X',
 'helpful': [0, 0],
 'overall': 4.0,
 'reviewText': 'Just another flavor of Kit Kat but the taste is unique and a bit different.  The only thing that is bothersome is the price.  I thought it was a bit expensive....',
 'reviewTime': '06 1, 2013',
 'reviewerID': 'A1VEELTKS8NLZB',
 'reviewerName': 'Amazon Customer',
 'summary': 'Good Taste',
 'unixReviewTime': 1370044800}

[Do some dataset exploration]

  • Look at the average rating
  • Look at the number of unique users and items
  • Plot a histogram of the number of ratings/reviews corresponding to each user
  • “” for items
In [17]:
users = [d['reviewerID'] for d in data]
In [18]:
items = [d['asin'] for d in data]
In [14]:
ratings = [d['overall'] for d in data]

Models

  • Just the average
  • Offset plus user and item biases
  • Latent factor model / matrix factorization
In [ ]:

In [ ]:

In [ ]:

In [ ]:

In [ ]: