Dude, Portuguese Fado has got to be the most meta music on earth.
Off to Portugal
02-Sep-05I’ll most likely be incomunicado for the next week. Will be in Lisbon, Portugal, presenting at the Eurospeech conference. Look me up if you’re in the area ;)
God’s Grandeur
01-Sep-05The world is charged with the grandeur of God.
It will flame out, like shining from shook foil;
It gathers to a greatness, like the ooze of oil
Crushed. Why do men then now not reck his rod?
Generations have trod, have trod, have trod;
And all is seared with trade; bleared, smeared with toil;
And wears man’s smudge and shares man’s smell: the soil
Is bare now, nor can foot feel, being shod.And for all this, nature is never spent;
There lives the dearest freshness deep down things;
And though the last lights off the black West went
Oh, morning, at the brown brink eastward, springs —
Because the Holy Ghost over the bent
World broods with warm breast and with ah! bright wings.
Gerard Manley Hopkins, 1877
Accelerando
26-Aug-05The last great transglobal trade empire, run from the arcologies of Hong Kong, has collapsed along with capitalism, rendered obsolete by a bunch of superior deterministic resource allocation algorithms collectively known as Economics 2.0
—Accelerando, a post-cyberpunk novel by Charles Stross
A good read, and Creative-Commons-Released-For-Free-Download. Aside from a too-explicit-for-my-puritan-tastes S&M scene near the beginning, I thoroughly enjoyed the time I spent in this book. It’s only 160 short pages, but the prose is so thick and packed with ideas that I spent more time reading than with most books 3 times its length. Stross is refreshingly technologically literate (you’ll find no refrains of Gibson writing Neuromancer on an old typewriter here), and the book is packed with ideas of both near-future and post-singularity-future life. And it’s brimming over with good-natured satire directed towards this current era (awkward adolescence of human culture that it is). I love the quote at the top of this entry. “Economics 2.0”, I’m still chuckling…
Extracting Semantics from Folksonomy?
25-Aug-05So, we have some really nice folksonomies out there now. And they’re really good for humans. But what can machines do with them? Can we use del.icio.us to further the sisyphan goals of the semantic web?
More specifically, I was talking to HaoChuan tonight about how one might automatically use a folksonomy to populate an ontology.
The difficult part in this is to extract meaningful semantic relationships from a folksonomy. In your standard, nonhierarchical tag-based folksonomy, all you know is “a is-a-tag-of b” or “b is-tagged-by-a”. You don’t know what the tag infers about the page–content? origin? summary? intended use?
The problem, then, can be re-stated as this:
A tagsonomy is a large collection of (tag, object) pairs. How can I extrapolate the relationship between the tag and the tagee, and expand these pairs into (tag, relationship, object) triplets?
Looking at the popular tags in delicious, there are a number of relationships. “software” “web” “reference” “toread” “tutorial” “fun” “free” “cool”.
There are a few interesting things about these popular tags:
- there are several distinct relationship types one can classify the tag into. “summarizes-content”, “is-of-format”, “is-of-use”, “makes-reader-feel”.
- the relationship is easily predictable and relatively unambiguous, looking at the tag alone without the corresponding tagged web pages.
So how could we go about automating the process of seeing what tags mean in context?
- look at the tags themselves, as I have done. this will not work in all cases (in fact, I am curious to see how it will work on the less popular links)
- look at examples of data mining to extract relationships in unstructured plain text documents.
Data mining, in its barest essence, does extract relationships, but the relationships are fixed and predefined–and it’s all about many instances of data pairs that fit one strictly defined relationship. This problem is related, but different in some nonsubtle ways.
Thoughts?
Updates
Related reading
- Mimi Yin of Chandler has a good entry on the relative strengths of faceted vs hierarchical organization, and how facet use easily devolves into chaotic tagsonomy. It addresses the lack of semantics in tagsonomic relations as a problem, but doesn’t treat it as a solvable or even surmountable problem.
- Clay Shirky talks about how cool and useful folksonomy is. His focus there was not the specific usefulnes of ontology (machine-parsability) that I was looking for… But his examples (e.g. looking at sets of tags assigned to URLs, how they are different per-user and per-time) do provide interesting data sets for this thought problem of mine.
Spent a little time talking to Patrick Pantel, one of ISI’s resident automated ontology generation researchers. He brought up another merging of folksonomy and ontology, namely automatically creating an ontology of tags. This problem is straightforward, interesting, and a bit easier than what I’m trying to do here. Joshua has already accomplished this, in the form of his “related tags” function.
Addressing my problem, he suggests that, because the tags themselves seem to be good enough predictors of semantic relationship between the tags and tagees, I could come up with list of possible relationships, and use some pre-existing ontologies to map tags and their ontological-neighbors into these relationship sets. Might be a good first-stab at things, and I wouldn’t even need begin at the computationally-intensive task of examining the text of tagged objects.
Learner Language Modeling at NICT
22-Aug-05About a month ago, some researchers from NICT (Japan’s National Institute of Information and Communications Technology) came to visit ISI and give a series of short presentations on their work. Among those presenting was Emi Izumi, a woman who is involved in research very similar to mine. She, and a few others over there, have been working on modeling mistakes in learner language–specifically, typical Japanese school-taught learners of English. I expect they have encountered much less logistical details than we have with tactical language (namely, shortage of language-learner-speakers, native-speaker-annotators, and pre-existing speech data models)… lucky them.
Interestingly, their work is very complementary to my own–while I have concentrated on phonology-related errors, they have put more effort into syntax and morphosyntax. It looks like there’s a lot of future for further cooperation here =).
They have also created a healthy-sized annotated database of learner speech, the NICT JLE Corpus. In accordance with their research, the corpus is rich with syntactic errors (but, unfortunately, mispronunciations are replaced with learner-intended words where they are understandable).
I’m curious how I can use this corpus to benefit my own research. While I expect many errors to be language-dependent–unique to the interaction between the L1 and L2 involved–I am sure there are some language universals that come into play–and, as I’m dealing with a paucity of data, I can at least use a Japanese model as a bootstrap.
Of course, once I get enough data, it will be really cool to compare relative statistics–get a glimpse of what exactly is universal…
I have uploaded a few of Izumi’s papers here, to my citeulike page.
i heart my apartment
16-Aug-05Ahhhh. Moved in on Sunday (thanks Jeff, Bassam, Yuko, Ben!).
- 10 minute commute to work. (wow. just wow).
- I no longer need to fear walking around at night after reading about the weekly armed mugging published in the USC DPS Crime Alert
- restaurants abound
- 2 minutes walk to Cafe Brazil (pricey, but their weekend slow-roasted pork is evidently to die for)
- multiple thai restaurants
- 10 minutes walk from Versailles (well, “walk” is to be taken nonliterally–the portions are so huge that you could roll home on your gorged belly rather than walk home)
Ithaca HOURS
11-Aug-05This is really neat. Ithaca, New York, has created a micro-economy by printing their own fiat money called HOURS . Given that the county’s average hour of labor is worth $10.00, they tie their currency to this rate. Only local companies pay using HOURS, and only local stores/businesses accept HOURS as currency, so the money stays local. HOURS are slowly introduced to the local economy, and are touted as a way to boost both local economy and local identity/pride. And, yes, it’s legal.
An interesting idea.
More on their about page. Transaction.net has a good summary and set of related links, and Paul Grover has a good article on the subject, Grassroots Economics .
interesting thought, that true “culture” springs more from the common man than from the elite.
craziness of late
09-Aug-05- Visitted an immigration attorney yesterday morning, to see what legal bumps we might encounter for the upcoming marriage in december, and subsequent applications for permanent residency and/or citizenship. While our situation is on the more typical side, I felt the high price of an official legal opinion was a worthy exchange for the security-of-mind that we now have. Whenever I think about lawyers, I always wonder if it’s possible to have a legal system that is intuitive for the everyday citizen.
- Signed the lease on the new place yesterday afternoon. A 2 bedroom joint near Venice and Sepulveda. Has everything on our desired-list: lots of light but not noisy, good airflow, sizeable kitchen counter-space (why are Los Angelean kitchens so small? They usually demand such acrobatics and pre-planning to find enough space to prepare a good meal), newish appliances, and a good location (quiet, safe, near busses). Two-bedroom seemed like the way to go, once I’m married in order to maintain a healthy worklife/homelife dichotomy. Mindy is moving in in December, once we get married. For now, the price is a bit steep ($1,500, considering I’m living off the pittance they give graduate research assistants at school, but I have a healthy savings I’ve slowly amassed over my years of studenthood), so perhaps I should put feelers out for a subletter. Anyone interested renting a room in a nice place on the Westside for half a year? Drop me a line.