I can has consciousness?

Conversations at work recently have turned again and again to consciousness and self-awareness (what, you thought “Android” was just a phone? ;) ). Now, I’m not going to belabor the point with discussions of artificial intelligence and yet another amateur’s resummarization of Searle’s Chinese Room[1]. Instead, I’ve been thinking about self-awareness in groups of humans.

A bullet-point braindump:

  • As background, remember that short story in Godel Escher Bach, where the ant-eater communicated with the colony of ants (not the ants themselves, but the colony), and ate certain individual ants as a way to shape the colony into something that’s more intelligently connected?
  • It’s a cliche’d remark that groups of humans begin to resemble organisms in their own right. Corporations seek after the good of the corporation rather than the good of any of its individuals. Cultures grow, intermingle, reproduce spawning new cultures. OK, so these macro-groups of humans are animals, that’s for sure. But are they self-aware Conscious? Would we recognize it if they were?
  • It’s interesting when a group of people who’ve been meeting for a while realize that they are in fact behaving as a group, and in turn have a group identity. Is this awareness of group identity the same as self-awareness in the group? (answer: I don’t think so, this is something different).
  • To extend the brain metaphor, imagine humans to be the neurons in a larger collective brain. Urgh, the speed of signal transition along axon-dendrite gap is horribly slow. What effect does this slowness have? Also, humans are damn intelligent signal processors compared to neurons. What effect would our individual intelligences have on the larger structure?
  • Would such a self-aware “organism” think thoughts that are entirely separate and entirely transcendent above the thoughts of its constituents?
  • Scale? Seems to be the general belief that intelligence is the emergent result of massive amounts of highly, highly interconnected neurons. How many people do you need in a group before it can be considered an organism? A self-aware organism? Is the interconnectedness of humans even on a large enough order of magnitude to support a functionally processing organism? What are such an organism’s inputs, outputs? Would human sub-organizations specialize into computational functional tools, similar to how neurons in the brain are specialized into groups like the PFC, the amygdala, etc?
  • I imagine an extraterrestrial coming to the earth, and conversing with society as opposed to individuals. That would be an interesting story. But not the kind of sci-fi that would entertain a puny human mind, though, that’s for sure.

Hmm, I’ll have to think more about this… so many premature thoughts… And most of them the result of only 4 hours of sleep for the last couple days. My apologies, dear anonymous reader, for the unpolished words, the undeveloped concepts, the flaws. “Time past and time future / Allow but a little consciousness.”

[1] (In any case, I love Ben Goertzel’s take on the situation, which, to paraphrase: “When the time comes, and you’re actually arguing with the computer whether it is self-aware or not, then the point is already moot, isn’t it?”)

What we can learn from Folksonomy

Outward-facing Questions:

  • The great thing about delicious and folksonomy is that it creates an ontology as an emergent biproduct of individual self-serving efforts (that is, personal bookmarking). I’m wondering if we can take a similar tact to solve other AI problems.

Inward-facing Questions:

  • What is the best way to represent the evolution of a tag’s meaning (evolution on both the individual and group scale). Folksonomy is a lot more dynamic than a fixed ontology, so we might not be able to use the same old tools.
  • Folksonomy is the relationship between three types of information: tags, tagged objects, and the users who tag them. What information can we derive each that are not explicit in the structure. You can call this “tag grouping”, “neighbor search”, “related items”… but it’s really all just clustering. What are the differences when you cluster each?
  • Continuing from the last quesiton: it’s most intuitive to hierarchically cluster tags—this maps well onto the formal “ontology” model that information architects and NLP researchers are comfortable in dealing with. But what happens when we hierarchically cluster users and tagged items? What does hierarchy infer about the relationships between parents, children, and siblings in the resulting structure?
  • What are the differences in (tags, users, items) between digg, delicious, flickr, and citeulike?

Ah, to have time to pursue these….

Finished the Dissertation Proposal

Ahhh, I’m done. Now, don’t that feel good. 71 pages on building a computational model of language learner errors. Phew, now to sleep.

Consuming

Talking to a friend last week, an interesting idea came up: We don’t just consume information, information also consumes us.

My attention is a scarce resource, and different ideas, media, schools of thought, compete for it. (This is what makes multidisciplinarity hard).

It makes me think twice about metaphors for learning that compare research and knowledge acquisition to foraging for food. What if, instead of likening ourselves to the predators and farmers, we liken ourselves to the prey and the farmed.

There’s plenty of discussion of memes as pseudo-genetic entities (evolving, reproducing, self-transmitting)… but underlying this is the idea that we are the medium of transmission, we are the host to the virus.

It certainly puts a new spin on the way I look at sites like All Consuming.

I don’t like this metaphor of being consumed, it feels too passive and fatalistic to me. But maybe it’s true.

Aggregators for the New Age

I’ve never been completely satisfied with the ongoing state of feed readers. Too many of them take the “mail client” paradigm, in which the user interface is modeled after email readers, and makes the implicit assumption that you want to read every single item. This becomes a cognitive dilemma once one’s list of subscriptions becomes too big (25, or 50, or 200, or more feeds). Vis: google for “information overload”, “digital guilt”.

Obviously, we need something structured more like a newspaper (I’ve heard this called a “river of news” before). I see two important lessons to learn from newspapers:

  • Reading patterns: Very rarely will someone read the newspaper from front page to back page, in all its entirity. People browse, thumb, skim.
  • Formatting to direct attention: Newspaper formatting is designed with this skimming interaction in mind. Important stuff is placed in more attention-grabbing type, using format (big, big font) and location (important stuff in the frontmost and backmost pages).

But Aggregators are not just newspapers. Because they’re computer-driven, they have the chance to be smarter, more personalized. It’s funny, people have been talking about “smart” aggregators for a while. A short while ago I did a google for “bayesian feedreader” and found plenty of insightful stuff written back as far back as 2003. However… while there’s been plenty of punditry, nothing “smart” has made it to the mainstream yet.

I don’t know why.

But I have my theories. Maybe it’s because simple aggregation is “good enough”. This is definitely part of it. But I suspect a lot of it is also because, while AI techniques like bayesian classifiers are a much better fit for filtering spam than they are for filtering “interestingness”. “Interestingness” is such a broad target to hit, I’m not sure if classifiers built on just naive keywords are going to cut it.

Well, I’m going to try building my own. I’ll probably model the UI to look something like kinja’s, and make the backend very plugin-able so that I can hotswitch different methods of calculating “interestingness”. More details to follow, but I’m thinking of experimenting with some machine learning based on explicit feedback + implicit browsing patterns, plus popularity metrics based on general population and on a more specific network of trust. Between digg, technorati, delicious, other social bookmarking sites, and all of these aggregated feeds, there are certainly a lot of tools available for calculating an “interestingness metric”.

I might want to have a separate “boringness” metric too. We’ll let a little hacking show which yields the best results.

Some links:

On Rexa

Rexa, a new player in community bibliography management, was opened to the public a couple weeks ago.

Here’s a blog post from the PI on this project (Andrew McCallum) who details the announcement, and a little more here, from Matthew Hurst’s Data Mining blog.

A cursory use of the system shows it to be a sort of “new generation citeseer”, with a little smarter NLP and data mining, and a halfhearted attempt at facet-driven organization. They mention folksonomy in explaining their tags, but from what I can tell, implementation seems to be more like straight-up facet-based personal information management, rather than actual tag-sharing and folksonomy. But, it’s a start. And, the release is accompanied by promises to make it smarter (especially on the data mining side).

All I can say is, you can tell it was made by NLP guys and data miners and not social software guys. Interface-wise, it’s not too friendly (eh? I need to create an account before I can even begin browsing through it?? Before I’m even presented with a link to the “about” page??). And the interface looks like it was designed by a C++ monkey rather than an HTML monkey.

And I won’t even comment on the poor coverage of publications (Andrew promises to improve this). Err, actually looks like I did just comment.

These things being said, they have some GREAT approaches: smart data mining, as well as automatic extraction of author and grant profiles along with the usual paper aggregation (and with promise of forthcoming extraction/aggregation of conferences and research communities!)… it looks like they realize that research (like soylent green) is made of PEOPLE and not just papers.

The thing that really excites me is the suggested examples of tags that the use as seeds for the future folksonomy:

“hot”, “seminal”, “classic”, “controversial”, “enjoyable”

This is exciting because, if this tagging becomes more widespread and mainstream, we’ll FINALLY have a better metric of the value of a publication in academia. Think about it, right now, there are only two kinds of people that can tell the rest of the academic world that a paper is “valuable”: (1) the people on the acceptance/review committee for a conference or journal, and (2) the people who choose to cite a paper in the bibliography of their own publications. And, both of these aren’t too good–the first group is very exclusive and small in number (and at best biased, and at worst unknowledgable in the research niche of a paper’s focus), and the second group requires a high investment of investment to communicate value (need to publish a paper, just to put in a vote–and who ever reads bibliographies closely anyways, unless they’re already looking for something specific)?

The upshot is that, of so many people who read an article, only a very small few get to formally, aggregatably comment on its worth. That’s a lot of untapped, already-invested effort. I would love to see some sort of paper ranking system become more mainstream!

On The Success of LaTeX

I suspect that the success of LaTeX–and its ubiquity as a format for thesis-writing–is in part due to the fact that learning its arcane subtleties is a wonderful source of procrastination.

What a glorious escape from having do to actual paper-writing!

Tagging, Searching, Linking

categorization vs ranked search. the old google-vs-yahoo! war of 1998-1999.
it struck me the other day that these two paradigms aren’t as orthogonal as we make them out to be in our minds.

  • full-text search is really just categorization where the categorical tags are the words in the text.
  • it’s a very rough heuristic, but works much of the time.  you might miss big-picture categorization, but as far as content-driven tagging, odds are that the categories you care about are mentioned in the article.
  • the big problem is scale.  250 tags per item, and it gets too noisy to browse easily.  so you need to prune your tags.  This is an NLP problem.  What words do we emphasize, what words do we de-emphasize?  Stemming, removing stop words are the standards for de-emphasizing.  Emphasizing?  That’s quite nontrivial.  I don’t know of anyone in our field that’s tried it yet.
  • Things get even things get even more interesting (and even more nontrivial) when we can create terms ex nihilo, create categorizations whose names aren’t found anywhere in articles.
  • Clustering of tags will prove hugely useful.  But can you generate human-readable names for your clusters (and hierarchical clusters, if that’s your style?)
  • Hyperlinking is a form of tagging too.

Machine-generated categorization are a huge unexploited area in folksonomy and tag-based IA.  I don’t know why no-one’s done anything with it yet.

Ning

Phew, Ning is meme-du-jour. It’s basically a web toolkit to create social software. It’s the first product I’ve seen to come out of Marc Andreesen’s stealth startup 24 Hour Laundry. (reference)

My hunch might be wrong, but it seems to be web2.0 applied to raw application development. What I mean is this: the typical read-write-web facilitates user-contributed data, and the social sharing of user-contributed data. Ning looks like it facilitates user-contributed code, and the social sharing of user-contributed code.

And, by providing a good development platform, it encourages mash-ups between applications, data-sharing, etc. I am curious if this enablement is just inward-focused or also outward-focused. That is, is it just as easy to API into a Ning app from another webapp outside Ning as it is for one Ning app to talk to another?

I’ll be able to tell more after I get my beta developer account, which according to Gordon should be “any hour now” =p.

But, wow, this looks like it could be the sandbox to end all sandboxen.

Update: ahh, here is the business model (i.e. where the money’s gonna come from):

the third party ad networks such as Google AdSense don’t look warmly upon more than one person running ads on an App or a page. Hence the trade for running apps on Ning is that we offer free app creation, management, hosting, security, and shared services, and - in return - you open your code to inspire other developers and refrain from running third party ads. We totally understand if this is not for everybody.

CALICO 2006 Call for Papers

                        CALL FOR PARTICIPATION

                      CALICO 2006 ANNUAL SYMPOSIUM
                  Online Learning: Come Ride the Wave

                               Hosted by

                      University of Hawaii at Manoa
                            Honolulu, Hawaii
                            May 16-20, 2006

Preconference Workshops: Tuesday, May 16 - Wednesday, May 17
Courseware Showcase: Thursday, May 18
Presentation Sessions: Thursday, May 18 - Saturday, May 20

Use CALICO's on-line proposal submission form at

          http://calico1.modlang.txstate.edu

or click on CALICO 2006 on the homepage: http://calico.org

You will need to register on the site ("Proposer registration")
before being able to submit.

DEADLINE FOR PROPOSALS: OCTOBER 31, 2005

All presenters must be current members of CALICO by the time of the
conference and are responsible for their own expenses, including
registration fees.

The Computer Assisted Language Instruction Consortium (CALICO) is a
professional organization dedicated to the use of technology in
foreign/second language learning and teaching. CALICO's symposia bring
together educators, administrators, materials developers, researchers,
government representatives, vendors of hardware and software, and others
interested in the field of computer-assisted language learning.

For more information or if you have questions or problems, contact

Mrs. Esther Horn
CALICO Coordinator              512/245-1417 (phone)
214 Centennial Hall             512/245-9089 (fax)
601 University Drive            http://calico.org
San Marcos, TX 78666            e-mail: info@calico.org or ec06@txstate.edu