Skip to content

Fragmented thoughts on readworthiness

12-Aug-06
  • Ultragleeper, by the guy who made Beautiful Soup. A “Recommendation Engine”. Ultragleeper takes group data from technorati, del, google, and personal recommendations regarding readworthiness of web pages. This is a neat subset of what I’m looking at for a good feedreader. I need to browse his source (python, yay) to take a look at how he implements ranking algorithms. I like that he bootstraps recommendations from the user’s OMPL feed list and delicious links… what a neat idea!
  • Statistical analysis of “Front Page” Digg entries aggregated over time shows that while a small percent of users are responsible for a high percentage of the posts that make it to the front page, these prolific users are only responsible for posts that are in the mediocre-to-good range (they easily break the 50+ diggs requirement to hit the front page, but usually end up with a final rating in the < 500 range by the time they scroll off the page). The real high-scoring entries are typically submitted by relative unknowns. I need to think more about how this relates to personal recommendations.
  • (off that last note… I need to consider if group recommendation sites (digg, delicious’s front page) have different demands/qualities/characteristics compared to personal recommendation requirements. Wider vs denser range of interests (a homogenous group would have much denser interest fields than an individual–this is why the delicious popular page is filled with ajax and css links), different tolerance of signal vs noise (is this true?), …

Meta

19-Jul-06

It’s kind of meta, isn’t it?
These guys are not marketing Che, but rather marketing the marketing of Che. Oh, you hipsters and your rapier wit.

English Teaching plus Aerobics Mashup

10-Jul-06

This surreal Japanese video combines aerobics with practical English instruction.

Looks to be from about 15 years ago, but it’s a timeless masterpiece–from the rakish, bandana-wearing gaijin to the M.I.A.-esque techno soundtrack to the serene happiness on the aerobics dancers’ faces.

Or… if you’re a foreigner who would like to learn Japanese

Migrating From Quicken

08-Jul-06

A short while back, my wife appropriated my iBook. She couldn’t resist its OSX-y goodness. Until now, I’d been tracking my finances in Quicken on the thing, and, as I have very little love for Quicken on the Mac, this gave me a good excuse to finally migrate away.

My primary machine is a gentoo box, so that shaped my options somewhat.

First, tried a few versions of Quicken (2003, 2005) under Wine. Neither worked. Neither even installed. Googling showed mixed results of sucesses and failures. Perhaps if I invested in Crossover or Cedega I’d have better luck–but for now, I’m a poor PhD student, so I’d rather go the Free as in Beer route. Scratch the easy solution.

There seems to be two main free personal finance apps for Linux: KMyMoney (v 0.8.4) and GnuCash (v 1.8.11).

KMyMoney

  • First step was to import my years of Quicken data. Its QIF import was FINICKY—it took me quite a while manually regexp-tweaking the quicken data file before I could get the data to import correctly (this step alone told me it’s not ready for the everyday user).
  • The UI is pretty friendly (lots of icons, non-imposing). Will need to try it out more before I can make a judgement on big-picture usability
  • Handles OFX imports (through AQBanking) even better than Quicken does.
  • It’s missing graph-based reports. The text-based reports are nice, but they don’t compare
  • The real game-ender, though, was its epilepsy-inducing flashing red text (the first google hit for “kmymoney flashing red” was this same question, greeted by a RTFM, and the original poster deciding to try GnuCash. Sigh. Maybe that’s what I’ll try next

GnuCash

  • I import my QIF data without problem (and without data tweaking, gnucash + 1).
  • The UI looks to be made circa late 1980s. It is ugly and imposing. The feelings I get are more “Generic industrial-strength cleaning product” than “Friendly personal finance management”.

I’ll update this entry more as I hack around with both over the weekend. I’m not too impressed with Free Software solutions for this. Maybe I’ll try MoneyDance, I’ve heard good things about it.

update 20060709:

  • Entries in kmymoney flash red because they are missing categories. While I agree with the spirit of this “feature” (I do eventually want to categorize everything), it doesn’t make the implementation any less epilepsy-inducing. I long for the day when I can’t say “Good idea, ugly UI. What else can you expect from Free Software”.
  • I ended up using KMyMoney because it seems to be more actively developed than GnuCash. For Free Software, I consider forward momentum to be just as important as current feature set, and it appears KMyMoney has it won here (GnuCash is still in GTK1? Maybe that bit about “late 80’s UI” for GnuCash wasn’t as large of an exaggeration as I’d thought! Phew!).
  • a few more reviews

update 20060710:

  • How timely, GnuCash has just released 2.0 as of last night. And appears to be using a modern version of GTK also. So much for that last comment about “forward momentum”. I’ll give the new version a try.
  • Half an hour later: Errr, this isn’t that much better. Oh well…

update 20070131:

  • I’ve been using KMyMoney for half a year now, and for the most part it’s been good to me. OFX support is decent, and it hasn’t disappeared any of my data. Functionally, the only things it’s really lacking is decent graph visualization and budgeting. However, with this new year, I’ve found that the real deal-breaker for me is no cross-platform support. KMyMoney is all right if it’s only me hacking on the finances, but if I want to get my wife involved then we’ll need something that works in mac and/or windows
  • Enter Moneydance. It’s written in java with crossplatformness in mind, which means it works equally well in Mac/Linux/Windows/Solaris/whatever.
  • It has decent graphing visualization and budgets that KMyMoney lacks. Its UI is better than either KMyMoney or GnuCash… but the best part is it has API hooks into python (!!!) —this means (hopefully) I can automate lots of the drollery of, say, categorizing stuff.
  • The downside to Moneydance is that it’s neither free nor Free, but: the python API allows me to get my data out of I need it (mitigating the data lock-in and the fact that it’s not libre Free)… and as far as gratis free, $30 for a license that allows you free upgrades for multiple years… that’s not that bad.

What we can learn from Folksonomy

24-Jun-06

Outward-facing Questions:

  • The great thing about delicious and folksonomy is that it creates an ontology as an emergent biproduct of individual self-serving efforts (that is, personal bookmarking). I’m wondering if we can take a similar tact to solve other AI problems.

Inward-facing Questions:

  • What is the best way to represent the evolution of a tag’s meaning (evolution on both the individual and group scale). Folksonomy is a lot more dynamic than a fixed ontology, so we might not be able to use the same old tools.
  • Folksonomy is the relationship between three types of information: tags, tagged objects, and the users who tag them. What information can we derive each that are not explicit in the structure. You can call this “tag grouping”, “neighbor search”, “related items”… but it’s really all just clustering. What are the differences when you cluster each?
  • Continuing from the last quesiton: it’s most intuitive to hierarchically cluster tags—this maps well onto the formal “ontology” model that information architects and NLP researchers are comfortable in dealing with. But what happens when we hierarchically cluster users and tagged items? What does hierarchy infer about the relationships between parents, children, and siblings in the resulting structure?
  • What are the differences in (tags, users, items) between digg, delicious, flickr, and citeulike?

Ah, to have time to pursue these….

Quals Tomorrow

22-Jun-06

Not too nervous about it, just want to get it over with. Putting on the finishing touches to my slides for tomorrow.

Finished the Dissertation Proposal

19-Jun-06

Ahhh, I’m done. Now, don’t that feel good. 71 pages on building a computational model of language learner errors. Phew, now to sleep.

Consuming

12-Jun-06

Talking to a friend last week, an interesting idea came up: We don’t just consume information, information also consumes us.

My attention is a scarce resource, and different ideas, media, schools of thought, compete for it. (This is what makes multidisciplinarity hard).

It makes me think twice about metaphors for learning that compare research and knowledge acquisition to foraging for food. What if, instead of likening ourselves to the predators and farmers, we liken ourselves to the prey and the farmed.

There’s plenty of discussion of memes as pseudo-genetic entities (evolving, reproducing, self-transmitting)… but underlying this is the idea that we are the medium of transmission, we are the host to the virus.

It certainly puts a new spin on the way I look at sites like All Consuming.

I don’t like this metaphor of being consumed, it feels too passive and fatalistic to me. But maybe it’s true.

Aggregators for the New Age

05-Jun-06

I’ve never been completely satisfied with the ongoing state of feed readers. Too many of them take the “mail client” paradigm, in which the user interface is modeled after email readers, and makes the implicit assumption that you want to read every single item. This becomes a cognitive dilemma once one’s list of subscriptions becomes too big (25, or 50, or 200, or more feeds). Vis: google for “information overload”, “digital guilt”.

Obviously, we need something structured more like a newspaper (I’ve heard this called a “river of news” before). I see two important lessons to learn from newspapers:

  • Reading patterns: Very rarely will someone read the newspaper from front page to back page, in all its entirity. People browse, thumb, skim.
  • Formatting to direct attention: Newspaper formatting is designed with this skimming interaction in mind. Important stuff is placed in more attention-grabbing type, using format (big, big font) and location (important stuff in the frontmost and backmost pages).

But Aggregators are not just newspapers. Because they’re computer-driven, they have the chance to be smarter, more personalized. It’s funny, people have been talking about “smart” aggregators for a while. A short while ago I did a google for “bayesian feedreader” and found plenty of insightful stuff written back as far back as 2003. However… while there’s been plenty of punditry, nothing “smart” has made it to the mainstream yet.

I don’t know why.

But I have my theories. Maybe it’s because simple aggregation is “good enough”. This is definitely part of it. But I suspect a lot of it is also because, while AI techniques like bayesian classifiers are a much better fit for filtering spam than they are for filtering “interestingness”. “Interestingness” is such a broad target to hit, I’m not sure if classifiers built on just naive keywords are going to cut it.

Well, I’m going to try building my own. I’ll probably model the UI to look something like kinja‘s, and make the backend very plugin-able so that I can hotswitch different methods of calculating “interestingness”. More details to follow, but I’m thinking of experimenting with some machine learning based on explicit feedback + implicit browsing patterns, plus popularity metrics based on general population and on a more specific network of trust. Between digg, technorati, delicious, other social bookmarking sites, and all of these aggregated feeds, there are certainly a lot of tools available for calculating an “interestingness metric”.

I might want to have a separate “boringness” metric too. We’ll let a little hacking show which yields the best results.

Some links:

back.

20-May-06

I spent last week attending the CALICO symposium. The attendees were an interesting mix of language teachers, managers of language learning computer labs at Universities, linguists, and a few people interested in Computer Science/Linguist hybrid folk like me, interested in the intersection of Natural Language Processing and AI-driven computer pedagogy.

The signal-to-noise ratio would have been horrendously low, were it not for the handy “NLP in CALL” preconference workshop–it was easy to see early on who had research similar to mine.

The biggest surprise for me at the workshop was that I recognized nearly no one there. Now, I’m not an academic hermit by any means, but one of the biggest bits of take-home-knowledge for me from this conference was not technical in nature, but more meta: AI in CALL is a horribly fragmented field. Vis:

  • On one hand you have the ASR-heavy research groups that focus on modeling of learner pronunciation. Project LISTEN, SRI’s Eduspeak, John Morgan @ USMA. Conferences like EuroSpeech
  • On another hand, you have the AI-focused AI in Education groups that deal with CALL as a subset of general Computer Aided Pedagogy. AIEd is a good example of this
  • On the third hand, you have these folks from CALICO, a tight-knit group of (what seems to be mostly German) language teachers and linguists, interested in the practical use of building tools to automatically assess student input.

I am still, constantly surprised to see the lack of NLP in the CALL field. Even CALICO researchers seem to be stuck in the 90s, what with their rules-based rather than statistical approaches. Or, perhaps there are facets to the field of learner speech (high language interference effects, small corpora) that don’t lend themselves as well to pure statistical solutions.

As an aside, I should note the conference was held in Honolulu, Hawaii (Yes, yes, as one of my fellow conference-goers said, “it’s funny, people have such a different preconceptions when you tell them you’re going to a conference in Hawaii, as opposed to, say, Detroit”). I’ve put up a bunch of pictures from this year’s CALICO on my flickr account.