Outward-facing Questions:
-
The great thing about delicious and folksonomy is that it creates an ontology as an emergent biproduct of individual self-serving efforts (that is, personal bookmarking). I’m wondering if we can take a similar tact to solve other AI problems.
Inward-facing Questions:
- What is the best way to represent the evolution of a tag’s meaning (evolution on both the individual and group scale). Folksonomy is a lot more dynamic than a fixed ontology, so we might not be able to use the same old tools.
- Folksonomy is the relationship between three types of information: tags, tagged objects, and the users who tag them. What information can we derive each that are not explicit in the structure. You can call this “tag grouping”, “neighbor search”, “related items”… but it’s really all just clustering. What are the differences when you cluster each?
- Continuing from the last quesiton: it’s most intuitive to hierarchically cluster tags—this maps well onto the formal “ontology” model that information architects and NLP researchers are comfortable in dealing with. But what happens when we hierarchically cluster users and tagged items? What does hierarchy infer about the relationships between parents, children, and siblings in the resulting structure?
- What are the differences in (tags, users, items) between digg, delicious, flickr, and citeulike?
Ah, to have time to pursue these….
Not too nervous about it, just want to get it over with. Putting on the finishing touches to my slides for tomorrow.
Ahhh, I’m done. Now, don’t that feel good. 71 pages on building a computational model of language learner errors. Phew, now to sleep.
Talking to a friend last week, an interesting idea came up: We don’t just consume information, information also consumes us.
My attention is a scarce resource, and different ideas, media, schools of thought, compete for it. (This is what makes multidisciplinarity hard).
It makes me think twice about metaphors for learning that compare research and knowledge acquisition to foraging for food. What if, instead of likening ourselves to the predators and farmers, we liken ourselves to the prey and the farmed.
There’s plenty of discussion of memes as pseudo-genetic entities (evolving, reproducing, self-transmitting)… but underlying this is the idea that we are the medium of transmission, we are the host to the virus.
It certainly puts a new spin on the way I look at sites like All Consuming.
I don’t like this metaphor of being consumed, it feels too passive and fatalistic to me. But maybe it’s true.
I’ve never been completely satisfied with the ongoing state of feed readers. Too many of them take the “mail client” paradigm, in which the user interface is modeled after email readers, and makes the implicit assumption that you want to read every single item. This becomes a cognitive dilemma once one’s list of subscriptions becomes too big (25, or 50, or 200, or more feeds). Vis: google for “information overload”, “digital guilt”.
Obviously, we need something structured more like a newspaper (I’ve heard this called a “river of news” before). I see two important lessons to learn from newspapers:
- Reading patterns: Very rarely will someone read the newspaper from front page to back page, in all its entirity. People browse, thumb, skim.
- Formatting to direct attention: Newspaper formatting is designed with this skimming interaction in mind. Important stuff is placed in more attention-grabbing type, using format (big, big font) and location (important stuff in the frontmost and backmost pages).
But Aggregators are not just newspapers. Because they’re computer-driven, they have the chance to be smarter, more personalized. It’s funny, people have been talking about “smart” aggregators for a while. A short while ago I did a google for “bayesian feedreader” and found plenty of insightful stuff written back as far back as 2003. However… while there’s been plenty of punditry, nothing “smart” has made it to the mainstream yet.
I don’t know why.
But I have my theories. Maybe it’s because simple aggregation is “good enough”. This is definitely part of it. But I suspect a lot of it is also because, while AI techniques like bayesian classifiers are a much better fit for filtering spam than they are for filtering “interestingness”. “Interestingness” is such a broad target to hit, I’m not sure if classifiers built on just naive keywords are going to cut it.
Well, I’m going to try building my own. I’ll probably model the UI to look something like kinja’s, and make the backend very plugin-able so that I can hotswitch different methods of calculating “interestingness”. More details to follow, but I’m thinking of experimenting with some machine learning based on explicit feedback + implicit browsing patterns, plus popularity metrics based on general population and on a more specific network of trust. Between digg, technorati, delicious, other social bookmarking sites, and all of these aggregated feeds, there are certainly a lot of tools available for calculating an “interestingness metric”.
I might want to have a separate “boringness” metric too. We’ll let a little hacking show which yields the best results.
Some links: