engaged.

Last night, 6:30pm, at Hong Shu Lin station (????) (on the Red Line of the subway, the last stop before Danshui (??), where the subway goes above ground, out of Taipei city, in the more scenic semi-countryside), we watched the sunset and I proposed to my girlfriend.

I guess this begins a new era of my life.

taiwan

been in taiwan for about a week now. forgot how oppressive the humidity is, and how good the fruits and vegetables are (amazing how things can taste when you breed fruits for flavor instead of resistance to pesticides/long-ripeness for shipping).

had dinner with a couple blogger/social-software guys from Academica Sinica a couple nights ago. it is wonderful, how nerdiness transcends cultural boundaries– Ilya (???), sitting next to me, out of the blue starts talking about where the different free-wireless-internet coffee shops are, around our area. Something I totally wanted to know, but would never have thought of asking. And I would do the same to someone visiting me in Los Angeles. Talk of wardriving in Taiwan followed.

The best part of the night was that no one, not even once, complemented me on my chinese. The significance of this is that, in a rare moment, those in attendence were relating to me, not as a ‘foreigner’, but as a fellow researcher/nerd/academic. This is a rarer circumstance than the typical American might guess. Taiwan is a very homogenic society, and the status quo assumption here is that anyone european-looking speaks only English (I can talk to waiters or bus drivers all I want in Chinese, and they will STILL always insist on answering back in broken, monosyllabic English. I don’t understand this.). On the rare occasion that I do carry on a conversation in Chinese with a local (aside from my friends from back when I lived here before), things rarely progress beyond the “your Chinese is quite good” stage (though, honestly, this feels is more flattery than actual commentary, as most Taiwanese will say this to a foreigner if he or she can speak even one or two simple words). OK, end-rant. My point is that it was a wonderful evening, that I could carry on a real conversation with locals, the first night that I met some of them, talking about the Taiwanese cell phone industry as it compares to U.S. and Europe, the state of social software (Taiwan has a huge BBS/message-board culture that the U.S. doesn’t have, and it changes the way students there approach the social-software table), the economic development of the country, etc. etc. I think it’s easy, as a white male in America, to forget about what it’s like to be objectified, to be treated as a “white person” instead of as an individual. I wonder how much minorities in the U.S. face objectification. It is easy to feel like every time someone complements me on my Chinese here, they are objectifying me, treating me as “foreigner” instead of “Nick”.

In other news, the Taiwanese TV news media sucks. It’s the same gossip-type stories, repeated 10 times a day. There is no real news here. i am curious how blogging-as-mass-media will affect this country. In Iran, you have blogging-as-media as a valid method for the public getting real information beyond the government-controlled newspapers. There is country-wide firewalling censorship, but from what I hear it’s pretty unsophisticated. In China, the Great Firewall is a bit more sophisticated, plus the government is making everyone register their personal web pages, thereby squelching free speech. need to think more about these different case studies: china, america, taiwan, iran. hmmm…

Anyways, I’m off to a wedding feast in a couple of hours. mmm, my first one.

latex

Urgh, LaTeX.
Tools with clunky UIs, nonintuitive markup, unhelpful error messages on failed compiles, and a glass-cannon bibliography manager.

Resourcelist:

Mad Paper-Reading

Expanding our Eurospeech paper (which we found last week was accepted!) on modeling language learner spoken disfluencies into a full journal paper. Been re-acquainting myself with the masses of related work this weekend.

A fun side effect is that I don’t feel so alone in my research any more. Here, where I sit, at the dovetail of Second Language Acquisition, Natural Language Processing, Artificial Intelligence and Automatic Speech Recognition, I don’t get to meet people who deal with the same questions I deal with day-to-day. It’s not exactly pure interdisciplinary work, but it’s definitely in the same order of merging of academic cultures and demands.

But, it’s always nice to realize there’s other people (even if only a handful) who’ve trodden this same road.

pydev 0.9.4

neat! a new version of pydev (a python IDE plugin for Eclipse) is out. Among other fixes are good $PYTHONPATH handling and a more robust debugger. My day just got so much brighter =).

Programming-Binge and Crash-Sustainable Development

Saturday morning I had a great idea. I realized my work in modeling second language learner phonology errors was really an approximation of a finite state transducer framework. And the bugs and speed issues I had been dealing with for much of the last 3 months were not on the linguistics side but more on the FSA framework architecture side.

“So if your system approximates a finite state machine, why not use a real one?”. This was my idea. I’ve used Carmel (ISI’s most excellent Finite State engine, that is free and open source and mighty powerful) a lot in the past, so I’m comfortable thinking in FSA terms… I sat down to code, and basically binged this weekend. I can’t remember the last time I was so dedicated or so motivated. Perhaps it was because it was a change of pace, a change of looking at the problem. Perhaps my brain had gotten a little stale this semseter, like cookies left sitting out on the kitchen counter for too long. It was fun, and it was productive, and I reproduced about 3 month’s work in a weekend, and my system’s speed is orders magnitude faster than before and a little more accurate. (Just goes to show how much the right tool for the job can win you–I should have realized this a year and a half ago =/ )

But damn am I tired. 3 days in a row, coding 12-14 hours a day (basically, all my time not spent in meetings, church, commute, or eating). And there went my weekend.

If programming was some sort of drug, then I came off my high yesterday to a huge crash. And I realized that life like that isn’t too healthy. “Moderation is a virtue”, they say.

CiteULike

Richard Cameron’s brainchild CiteULike is a social software driven, web-based content management system for academic papers. It’s a lot like del.icio.us (socially-browsable, public bookmarks, organized by tags and folksonomy rather than by strict hierarchy), but with more support for the metadata typical to academic papers.

It also imports and exports to bibtex, for low barrier-to-entry.

I’ve known about CiteULike for a while now, but somehow never got around to using it. Paper-reading is too-often a chore (I’d all-too-often rather be coding!), and entering in metadata and tracking is not the sexiest of tasks. Well, an impending self-imposed deadline for a journal submission made me realize “uh oh, need a literature survey!”, combined with “well, I’ve surveyed a lot, but it’s all buried in a (literally!) two-foot stack of printed-out journal and conference papers, hilited and with notes in the margins”. So, in a bout structured procrastination last Friday, I got around to picking out the most succulent of the papers and entered them into my citeulike page.

My first impressions:

  1. The social aspect of it has a lot of promise. It’ll be great to see other users who read the same papers I do, or use a categorization tagset that overlaps with mine. The only problem is that not too many NLP (much less pedagogy, second language acquisition, or even linguistics) people use it. This kind of social software has usefulness roughly proportional to the square of its users. I’m still a little skeptical that the userbase will ever grow to make the system as useful as del.icio.us–academics are a small subsection of society, the pieslice of academics with research areas overlapping with mine is miniscule indeed, and the forkful of those that discover citeulike? I suspect I might finish dinner still hungry.
  2. That said, the personal content management aspect of citeulike is a win. That was what was so good for del.icio.us: it works in the social-software arena, but it’s useful from an anti-social arena as well (I know people who use it just to store their bookmarks from a web-accessible location, tag for their own future lookup, and don’t really care that other people are bookmarking what they bookmark). I’ve always had trouble tracking and documenting my reading binges, and maintaining an up-to-date bibtex file of everything I consume/produce. This system looks like it can solve that.
  3. Use of paper metadata remains unexploited. Let me browse “Other papers that were published at this conference”, “Order-by-publish-date”, “Order-by-last-read-date”. And citations and references inside the papers themselves forms a rich web of data that I’m sure can be mined for reading-recommendation-goodness.
  4. The social aspect remains unexploited. Yes, Richard is probably lacking both data and CPU cycles, but I can’t wait to see “people who bookmarked the same things you did have rated these other papers very highly, that you haven’t read yet”
  5. I’ve always had a problem of following journals, and this (along with Google Scholar) has helped a lot. The RSS-viewable watchlists, and the searchability is really, really handy (speaking of which, what about some more integration with google scholar?).

Update: More thoughts:

  1. Let me rank papers that I have read, perhaps with criteria similar to those used by paper-reviewers for conferences
  2. Tag intersection, union, difference. These are the tools that make a tag-based organizational system as powerful as (or even more powerful than) your traditional strict hierarchy system. Why don’t more folksonomic systems implement these? They’re not that much more expensive, computationally.

Yes, I know the requested feature list grows and grows, and I realize that citeulike is just a side project for Richard… alas…

Google Does Machine Translation

Neat, google is finally getting around to using their in-home statistical machine translation stuff, instead of Systran’s old rule-based.

Been expecting this for a while, ever since they bought Franz Och from us here at ISI a year or two ago.

Hehe, it’s funny to read layperson commentary on slashdot:

There are five levels of machine translation:
1) word substitution.
2) phrase substitution.
3) cohesive paragraphs and idioms.
4) light literature, magazine articles, and business.
5) classical literature, law, and diplomacy.

Each level requires at least an order of magnitude more computing power than the previous one. Babel fish is on level two and systran is on three. Google is positioning themselves to be between levels four and five.

Oh man, this commenter is seriously confused, in so many ways =). I like how he (one can assume the commenter is male, as he is commenting on Slashdot) transitions from algorithmic criteria in 1-3 to genre criteria in 4-5. And then there’s the bit about “law/diplomacy” being harder than “magazine articles” (untrue from a statistical perspective, as there’s so much more bilingual corpora available created by the U.N. or Canadian Parliament, for example). And then there’s the hubris to assume 4 & 5 are even POSSIBLE given the current level of technology.

Sigh…

</rant>