Skip to content

Consolidating Music Metadata

04-Dec-06

Finally finished a script a couple weekends ago to synchronize data between Amarok, Rhythmbox, and iTunes. I now use Amarok exclusively, and it’d been bugging me for a long time that my old metadata from multiple machines and multiple apps was locked away and unexploitable. So i fixed that, for myself at least. I harvest everything into a common format and populate a big ol database with everything. Then I merge all the metadata together (averaging and adding, whatever, where necessary).

The code is ugly for now, so no public release. I might clean it up some time if anybody else wants it. Just ask.

Finishing up the semester.

04-Dec-06
  • Finished up a semester of TAing. I really like teaching, so immediately rewarding, to see minds grow week by week. Research is so long-term by contrast (start working on a problem, get good results after 3 months, publish after 6 months).
  • Signed Google’s job offer last week (will be working in their Santa Monica location part time next semester while I finish my thesis). Am not yet sure how much I can talk publicly about what I’ll be working on, but it’s got a lot to do with ontologies and tags.
  • Attending what promises to be an excellent workshop on language learner modeling at OSU in mid December. It fits perfectly in with my research of modeling learner errors; I have high hopes for my time there.
  • Later in December, the wife and I are taking my parents to Taiwan this Christmas. Two weeks. I’m pretty sure it is their first international trip, outside of a few days we spent in Vancouver, British Columbia (which is as American as you can get and still be in a foreign country) on a family vacation we took when I was in Junior High. Not to say that they aren’t culturally open (I grew up eating foods from a variety of different cultures), they just haven’t traveled too much. It will be awesome (we hope). Lots of places to visit out there (San Yi, Lu Gang, Hua Lian, Ying Ge…)

Grassroots Journalism

14-Nov-06

A friend of mine went to Korea this last summertime, and took an amazing bunch of pictures. It struck me as I was browsing them: these are not just vacation photos to be filed away, they are of high enough quality to appear in most any mainstream publication.

I look forward to the day when photostreams like this get passed around
and we can get national-geographic caliber stories written grassroots by
the masses instead of by magazines. Do you think it’ll happen? I give
it 10-15 years max before this becomes mainstream =)…

(note: if you look carefully, bbc solicits reader information and photos at the bottom of most breaking articles, so I suppose we’re already getting there).

p.s. the author of these photos says:

… Yeah, I think grassroots story telling will will go mainstream but I think it might happen in 5-7 years. We’re in the 1.0 phase now with bubblings of what is to come — there a quite a few publically generated news sites and larger operations aggregating strories/photos from regular folks … It’s just matter of time before the next big idea hits that brings the various streams/ideas come together.

Spam as Turing Test

08-Nov-06

I received an impressive spam a while ago. It was a comment to my SQLObject post a while back, telling me “Have you tried Ruby language? It has quite good database object system.” Not a bad comment, taken by itself. But the poster’s submitted website was clearly some search engine optimization type spam site. I’m still not sure if the spam message was generated automatically or by human. But it does give me nightmarish vision of separating ham from spam in a post-turing-test world.

Thoughts on Publishing in Academia

08-Nov-06

To paraphrase/quote David Klein:

publications would be so much better if we were forward-thinking instead of rigorous in our testing. It seems like people judge a paper’s value by “in 10 years, will someone find a hole in the rigor of my testing procedure”. I would rather judge a paper by “does this make me have an interesting idea about the field that I’ve never thought of before”

Advice on Writing One’s Dissertation

02-Nov-06

All dissertations require four months of uninterrupted work.

  • The last month of work takes 0.5 calendar months.
  • The second to last month takes 1.5 calendar months.
  • The first two months can take years, and they usually do.

Prof. Daneil Bewrry, U. Waterloo

Sigh… if only this were less true.

Cousin Nancy

31-Oct-06

Cousin Nancy

Miss Nancy Ellicott
Strode across the hills and broke them,
Rode across the hills and broke them —
The barren New England hills —
Riding to hounds
Over the cow-pasture.

Miss Nancy Ellicott smoked
And danced all the modern dances;
And her aunts were not quite sure how they felt about it,
But they knew that it was modern.

Upon the glazen shelves kept watch
Matthew and Waldo, guardians of the faith,
The army of unalterable law.

by T.S. Eliot. In his earlier years IIRC. I love that stanza in the middle.

break from the hiatus

27-Oct-06

somehow haven’t posted here for a month and a half, phew.

  • thesis progress still continuing but slow (I think I’ve said “Yeah, I’ll be graduating in a year and a half” for the past two years now).
  • speaking of thaeces, learning the hard way the difference between science and systems. ugh, it’s so hard to get an academically rigorous results out of a research project that has such practical demands on resources and functionality. just another game to play, I guess.
  • More in the academic world, am trying to figure out how my work fits in with the rest of Computer Aided Language Learning field. It seems that statistical language modeling of learner errors is commonplace for pronunciation modeling, but an afterthought for any of the other fields. How is this possible?? Am I overlooking some major bodies of research (the field is horribly fragmented and it’s all too easy to do this)? Or do I really get the chance to carve a niche for myself with this thesis I’m writing?
  • funding shortage in the research group. will be with the Google part time from January until at least May. Promises to be a fun experience.
  • went to Shik Do Rak three nights ago. Easily the best restaurant in Korea-town. I think I’m still full =).
  • Short backpacking trip tomorrow and Sunday in the Angeles mountains, a little north of Arcadia, with Eldwin and Christina. Hopefully the San Gorgonio fires won’t ruin the air quality. This will be my wife’s first time hiking! (and my first time in the past decade, nearly! where does the time go?)
  • hacking on a feedreader in my spare time. it’s been slowly gathering statistics of my use over this past month or so, which I’ll use to train some different measures of interestingness and boringness. next up is to expose those measures to the UI, for some usefulness goodness

How pale the stars, that burn in pallid splendor.

Five Years Ago Today

11-Sep-06

Five years ago today I had literally just moved to Taipei. At the time, I was sharing an apartment with an Indonesian dude who spoke very little English (and I, at that point, spoke very little Chinese…we communicated through lots of gestures and a Chinese-English pidgin, needless to say).

So, it’s late afternoon (time difference, remember), we’re watching some Taiwanese variety show, when the normal programming gets interrupted with some fast-speaking, serious-looking news anchor. The language-newbie that I was, I couldn’t understand a thing he was saying. The one thing I could understand, though, was the infographic in the corner of the screen behind him: lifted straight from CNN (I assume) was a picture of a couple buildings on fire, subtitled “America Under Attack”.

Wow.

I had no idea what was going on, and I could only think of the worst. My flatmate understood, but there was no easy way for him to communicate “terrorists just hijacked some planes and ran them into a building” in pantomime. Phone calls to friends and family back in the States naturally wouldn’t connect through.

Our apartment had a dial-up internet connection… the mainstream media sites (CNN et al) were overwhelmed with traffic and not working, if I recall. (“Is America’s infrastructure wiped out?” I remember thinking!). Finally, I was able to connect to less mainstream sites like Slashdot, which were still up and had excellent running commentary of the situation. I remember being logged on for most of the afternoon, refreshing the few functioning pages every couple of minutes, hungry for any new information. There was a lot of confusion and a lot of unintentional misinformation. Was it the PLO? Was it missiles? Was it nuclear? I remember wondering about the global repercussions of this attack. At this point we didn’t know who was behind it, the extent of the damage, how long the attacks would continue, or how the United States was going to respond. Having just arrived in a foreign country, life was that much more uncertain–Would I need to return home? To be drafted?? How will the global community treat Americans? How will my neighbors treat me? Do I even know where the American Embassy (or, Taiwanese embassy-alternative) is located?

I remember, on the morning after the bombings, my (normally tactful!) language instructor told the class “and this situation is unique because it is the first time anyone has attacked America since the Japanese bombed Pearl Harbor”. Now, you must understand, I was the only American in the class, and the remainder of the class was an assortment of Japanese, Korean, and European. When the teacher said that, I was sitting right across from the Japanese guy. It was an awkward moment, to say the least. How to react? I think we both just shrugged at each other.

Just a few five-years-old-now scattered thoughts.

Where were you?

Semantic Web 2.0

22-Aug-06

Attended a talk today by Stefan Decker of DERI in Ireland this morning. “Semantic Web 2.0” was the title of the talk–and I think Stefan wins the Most Buzzwords in a Talk Title award.

I must admit when I came into the talk that I was a bit skeptical–Semantic Web 1.0 never got off the ground (not the way the WWW did, at least!), so are we really ready for a 2.0?

Stefan is of the view that Semantic Web (1.0) never took off at the time because we didn’t have the tools and connectedness for it to reach critical mass of adopters and ease of use. It was like “people dreaming of building a fighter jet when they only have parts to make a bicycle”, as he puts it.

So, I guess with all this Ajax and social networking and folksonomic whatnot, we finally have the tools to help the Semantic Web really succeed? He thinks so. Oh, that I could be so optimistic!

Snarkiness aside, here’s some of the goals he sees within SemWeb 2.0:

  • semantic interlikning of online community sites
  • semantic blogging
  • semantic wikis (structuring and browsing the web and desktop)
  • social semantic collaborative filtering (using explicit relationships for information delivering and assessment)

A few short notes on each of these:

“Semantically-Interliinked Online Communities (SIOC)”

  • motivation: there’s lots of latent information to be gleaned from all these socially enabled websites, but this information is hidden (there’s the underlying database, but all we see is the HTML. We can write wrappers, but when HTML changes we’re sore outta luck).
  • SIOC is trying to expose the underlying structure via RDF, via plugins
  • this stuff is getting integrated into lots of open source projects via plugins (phpBB, wordpress, drupal, more)

Semantic Blogging

  • instead of just blogging for human eyes, blog for machine eyes too. e.g. automatically tag ppl with foaf info, or events with event xml, so that ppl can automatically add events to calendar or ppl to address book
  • While this might be useful, I’m not sure if it will be the panacaea that its proponents claim. And, how are you going to get everyone to agree on standards? Are Microsoft, Google, and all the rest of the biggies going to agree to cooperate?

Semantic Wikis

  • Addresses traditional wiki problems of structured access and information re-use
  • of all the things he’s talking about, i think this one has the most potential for immediate adoption–in fact, it’s already being adopted in the small picture: categories and templates on wikipedia.
  • However, I’m not sold on it being implemented on the grander scale. It looks like a lot of work, both for the readers and writers of the information (tools will make this easier, granted), but I also don’t think it will have the reader-base that eyeball wikis have. And I suspect it will be harder to motivate people to contribute information (more detached/less “social”)

Perhaps I’m being overly harsh. I WANT this to work, though. I’d love to see all this happening. But, I’m afraid it’s all a new re-release of the old “Semantic Web 1.0” hype, with a shiny new rounded-corners, pastel-colored-gradient logo (and a “beta” on the side to boot). We’ll see…

If we are going to get a successful “Semantic Web 2.0”, I think we need to take a few lessons from the successes of the social software that has succeeded (these unfortunately were not described in the talk):

  • Be Bottom-Up: Like folksonomy, don’t rely on the Powers that Be (W3C, Microsoft, Google, whoever) to set standards. Instead, let consensus bubble-up, folksonomy style. Users are lazy, so make doing the Right Thing easier than the Divergent Thing (see del.icio.us’s auto-suggest of tag labels for a good example of hwo this works). The big thing I’ve been wondering lately is if we can have a folksonomy of standards as well as a folksonomy of data–Will this work??
  • Be Top-Down: Find a use that everyone really needs/wants, and they’ll jump through the hoops. Drawing from the Folksonomy example again, look at what Del and Flickr have done… Users are all functioning selfishly–they want to store their pictures and their bookmarks, and when they tag things, a lot of the time they’re only doing it for personal use. The folksonomic patterns bubble up on their own.