I have been using a home-brew Feedreader for the last 6 years or so. It’s a river-of-news style aggregator, that ranks posts in order of “interestingness” rather than date, with the most interesting entries that I haven’t seen yet at the top.
Interestingness is derived via my click interactions: if my feedreader shows me an article and I click on it, I’m implicitly voting that the article is interesting. If it shows me something and I don’t click on it, I’m saying it’s not interesting. All of this click data is used to train a naive bayesian classifier, which classifies each new entry as it comes in.
There are some great advantages to sorting things by interest: there’s no sense of digital guilt when I don’t visit my reader for a few days (if something interesting happens while I’m on a week-long vacation, for instance, it’ll still stay at the top of my queue). Also, I can subscribe to a decently large number of feeds (300 or so, at last count) without feeling information overload.
This ranking system has been very good from a precision point of view (the items that are recommended to me are usually stuff I’m interested in). However, I’ve been feeling lately that recall or coverage is lacking (my top-ranked items are too similar, too echo-chambery, too drawn from the same sources).
(One difficulty I’m having is that precision is very easy to measure. Recall or coverage is much harder to quantify, with just the click/attention data I have available).
So… I’m now beginning to consider changing my ranking algorithm up. Maybe I can capture different views:
- show me stuff from my friends only (or some pre-specified list of must-reads)
- show me stuff from blogs I haven’t looked at in a while
- show me stuff from blogs that publish very infrequently
- show me stuff that each blog considers abnormally interesting (does it have an abnormally high amount of comments/likes/upboats/etc compared to the typical post)
- show me stuff restricted by content type (video, img, comic, news, blog)
- or to generalize, show me stuff that will take a certain estimated time tom consume (an image is quick, as is a tweet, a short blog entry is longer, a multi-page economist article is longer still)