Skip to content

Aggregators for the New Age

I’ve never been completely satisfied with the ongoing state of feed readers. Too many of them take the “mail client” paradigm, in which the user interface is modeled after email readers, and makes the implicit assumption that you want to read every single item. This becomes a cognitive dilemma once one’s list of subscriptions becomes too big (25, or 50, or 200, or more feeds). Vis: google for “information overload”, “digital guilt”.

Obviously, we need something structured more like a newspaper (I’ve heard this called a “river of news” before). I see two important lessons to learn from newspapers:

  • Reading patterns: Very rarely will someone read the newspaper from front page to back page, in all its entirity. People browse, thumb, skim.
  • Formatting to direct attention: Newspaper formatting is designed with this skimming interaction in mind. Important stuff is placed in more attention-grabbing type, using format (big, big font) and location (important stuff in the frontmost and backmost pages).

But Aggregators are not just newspapers. Because they’re computer-driven, they have the chance to be smarter, more personalized. It’s funny, people have been talking about “smart” aggregators for a while. A short while ago I did a google for “bayesian feedreader” and found plenty of insightful stuff written back as far back as 2003. However… while there’s been plenty of punditry, nothing “smart” has made it to the mainstream yet.

I don’t know why.

But I have my theories. Maybe it’s because simple aggregation is “good enough”. This is definitely part of it. But I suspect a lot of it is also because, while AI techniques like bayesian classifiers are a much better fit for filtering spam than they are for filtering “interestingness”. “Interestingness” is such a broad target to hit, I’m not sure if classifiers built on just naive keywords are going to cut it.

Well, I’m going to try building my own. I’ll probably model the UI to look something like kinja‘s, and make the backend very plugin-able so that I can hotswitch different methods of calculating “interestingness”. More details to follow, but I’m thinking of experimenting with some machine learning based on explicit feedback + implicit browsing patterns, plus popularity metrics based on general population and on a more specific network of trust. Between digg, technorati, delicious, other social bookmarking sites, and all of these aggregated feeds, there are certainly a lot of tools available for calculating an “interestingness metric”.

I might want to have a separate “boringness” metric too. We’ll let a little hacking show which yields the best results.

Some links:

4 Comments

  1. mote

    Also, I’m noticing that a lot of things (e.g. the rise of folksonomy, “web 2.0” (whatever that means) ) have changed since 3 years back when these ideas were first being thrown around. Untapped potential there, too.

    Posted on 06-Jun-06 at 13:42 | Permalink
  2. One thing, about Bayes and aggregators: I don’t think anyone’s found a way to apply Bayesian filtering to RSS aggregation that produces a satisfying result – at least not one worth cheering to the web about.

    I know I haven’t – and I’m up to my 6th private attempt or so at different arrangements with Bayes in particular. I think a different form of filtering is what’s needed. Maybe LSI, maybe some other form of valued scoring that doesn’t result in a flat spam/ham answer. (ie. “interestingness”, as you say.)

    But, in the time since 2003, what’s really shown promise are more and more varied ways of soliciting and exploiting human intelligence in the course of finding and filtering news and feed items. See: del.icio.us, digg, etal. The best news lately is pre-scanned and filtered by human domain experts.

    Posted on 07-Jun-06 at 07:22 | Permalink
  3. mote

    Les, thanks for stopping by. I\’ll be emailing you once I\\\’m ready to start writing the intelligent part of my aggregator–I\’m curious what sort of features you\\\’ve tried for machine learning.

    A few thoughts on what you wrote:
    1. I\’m quite sure a straight binary classifier is not the answer. Interestingness is different from the spam/ham problem because the answer is fuzzy rather than black & white. My naive guess is that the best user interface will be one that subtly marks to the user that an article is worth reading (say, a brighter red used for the entry header) or not worth reading (a duller grey for the background). Ordering (higher noise-to-signal stuff down near the bottom) is also a possibility. I find it helpful, when dealing with AI judgements that are not too accurate, to let the user interface be as vague as possible.

    2. I definitely agree with you about the advances since 2003–the mechanical turk has been great! I\’ve been thinking of different ways to use digg or del data, but I\’m not sure if Joshua or Kevin would be too happy with me pounding their server every time a new feed item comes in.

    Posted on 08-Jun-06 at 08:47 | Permalink
  4. mote

    (Hmmm… looks like there’s a bug in wordpress’s “\” and “‘” escaping)

    Posted on 08-Jun-06 at 08:50 | Permalink