computer science – sardonick http://motespace.com/blog Disclaimer: The following web space does not contain my own opinions, merely linguistic representations thereof. Fri, 14 Oct 2011 16:26:45 +0000 en-US hourly 1 https://wordpress.org/?v=4.6.1 Tracking Browsing History http://motespace.com/blog/2011/06/26/tracking-browsing-history/ Mon, 27 Jun 2011 02:59:38 +0000 http://motespace.com/blog/?p=412 I’ve long wanted a way to track/store (and later search?) my browser history.
Why do modern browsers throw any of this away? I estimate I consume far less than < 50M of html per day (flash videos, large files excluded) I want to be able to search over this, and gather stats about my browsing habits. Until now I'd thought of doing this with a local proxy that ran on my machine, this morning I realized that a much simpler greasemonkey script should be able to do this. My hypothetical script will inject into every web page a 1x1 image URL from some directory on a server I own. In the image's URL, I'll encode the page's URL, any parameters passed in (and also the current time or some random number to keep browser from caching the image). Then, tracking my browser history is just grep'ing server logs for everything served from that directory, and decoding the metadata. I like this because it's centralized (aggregates browsing across many machines to one central place). And super simple to install/manage. This doesn't save content, unfortunately (though I a separate script running on my server could do that, parsing logs and fetching web page contents. This wouldn't work for dynamic pages or ajax, but every click in gmail or calendar isn't as important as other pages).

]]>
Ranking Algorithms for My Feedreader http://motespace.com/blog/2011/03/20/ranking-algorithms-for-my-feedreader/ Mon, 21 Mar 2011 05:46:59 +0000 http://motespace.com/blog/?p=367 I have been using a home-brew Feedreader for the last 6 years or so. It’s a river-of-news style aggregator, that ranks posts in order of “interestingness” rather than date, with the most interesting entries that I haven’t seen yet at the top.

Interestingness is derived via my click interactions: if my feedreader shows me an article and I click on it, I’m implicitly voting that the article is interesting. If it shows me something and I don’t click on it, I’m saying it’s not interesting. All of this click data is used to train a naive bayesian classifier, which classifies each new entry as it comes in.

There are some great advantages to sorting things by interest: there’s no sense of digital guilt when I don’t visit my reader for a few days (if something interesting happens while I’m on a week-long vacation, for instance, it’ll still stay at the top of my queue). Also, I can subscribe to a decently large number of feeds (300 or so, at last count) without feeling information overload.

This ranking system has been very good from a precision point of view (the items that are recommended to me are usually stuff I’m interested in). However, I’ve been feeling lately that recall or coverage is lacking (my top-ranked items are too similar, too echo-chambery, too drawn from the same sources).

(One difficulty I’m having is that precision is very easy to measure. Recall or coverage is much harder to quantify, with just the click/attention data I have available).

So… I’m now beginning to consider changing my ranking algorithm up. Maybe I can capture different views:

  • show me stuff from my friends only (or some pre-specified list of must-reads)
  • show me stuff from blogs I haven’t looked at in a while
  • show me stuff from blogs that publish very infrequently
  • show me stuff that each blog considers abnormally interesting (does it have an abnormally high amount of comments/likes/upboats/etc compared to the typical post)
  • show me stuff restricted by content type (video, img, comic, news, blog)
  • or to generalize, show me stuff that will take a certain estimated time tom consume (an image is quick, as is a tweet, a short blog entry is longer, a multi-page economist article is longer still)
]]>
Visualizing Command Line History http://motespace.com/blog/2011/03/13/visualizing-command-line-history/ Sun, 13 Mar 2011 07:12:34 +0000 http://motespace.com/blog/?p=352 So, after documenting how I save a timestamped log of my bash file, I got curious about what kind of analyses I could pull out of it.

(caveat: I only started this logging about a month ago, so there aren’t as many data points as I’d like. However, there is enough to see some interesting trends emerging).

Day of Week

First, here is the spread of activity over day-of-week for my machine at home. I found this surprising! I’d expected my weekend hacking projects to show a significant weekend effect, but I did not notice the Thursday slump. It’s interesting when data shows us stuff about ourselves that we didn’t realize. I have no idea what causes the Tuesday mini-spike.

Next, I have activity per hour-of-day, broken up by weekends-only and weekdays-only (because my behavior differs significantly between these two sets).

Weekends

Both charts clearly show my average sleeping times. Weekends show a bump of morning hacking and evening hacking, with less computer time than I’d have expected in the middle of the day.

Weekdays

I love the evening just-got-home-from-work-and-finished-with-dinner spike for the weekdays, followed by evidence of late-night hacking (probably too late for my own good).

Where to go from here

I wonder if the unexpected Tuesday spike and 6pm-weekday spikes are legitimate phenomena or artifacts due to data sparsity. It will be interesting to check back in with this data in a few more months to see how it smooths out. (Ugh, daylight savings time is going to mess with this a bit =/ ).

Also, this only measures one aspect of my activity in a day–stuff typed at the command line, which is mostly programming-related. I would love to plot other information alongside it (emails sent, lines of code written, instant messages sent, songs played, GPS-based movement). I’m tracking much of this already. I’ll need a good way of visualizing all of these signals together, as the graph is going to get a bit crowded. Maybe I’ll pick up that Tufte book again…

(And, speaking of visualization, I think a heatmap of activity per hour of the week would be interesting as well… Google Spreadsheets doesn’t do those, though, so while I have the data I couldn’t whip one up easily tonight).

Lastly, what’s the purpose of this all? What do I want to accomplish from this analysis? They’re nice-looking graphs, for sure. And honestly there is a bit of narcissistic pleasure in self-discovery. And I suppose it’s good to realize things like the mid-week slump (exhaustion from work? external calendar factors?) are happening.

But I’m eventually hoping for something less passive than just observation. Later I look forward to using this data to change myself. I can imagine later setting goals (in bed by a certain hour, up by a certain hour, no coding on day-x vs more coding on day-y) and letting the statistics show my progress towards those goals.

]]>
Saving Command Line History http://motespace.com/blog/2011/03/12/saving-command-line-history/ http://motespace.com/blog/2011/03/12/saving-command-line-history/#comments Sun, 13 Mar 2011 00:19:04 +0000 http://motespace.com/blog/?p=339 I’ve never been satisfied with the defaults for the way linux & osx save command line history. For all practical purposes, when we’re talking about text files, we have infinite hard drive space. Why not save every command that we ever type.

First, A Roundup of What’s Out There

Here’s the baseline of what I started with, in bash:

declare -x HISTFILESIZE=1000000000
declare -x HISTSIZE=1000000

But there are a few problems with this: bash and zsh sometimes corrupt their history files, and multiple terminals sometimes don’t interact properly. A few pages have suggested hacks to PROMPT_COMMAND to get terminals to play well together:

briancarper.net

  • relatedly, shopt -s histappend (for bashrc)
  • export PROMPT_COMMAND=”history -n; history -a” (upon every bash prompt write out to history and read in latest history). While this works, it feels a bit hacky

tonyscelfo.com has a more formalized version of the above.

Further down the rabbit-hole, this guy has a quite complicated script to output each session’s history to a uniquely ID’d .bash_history file. Good, but it only exports upon exit from a session (which I rarely do… for me, sessions either crash (which doesn’t trigger the write) or I don’t close them… still, it’s an interesting idea).

(Aside: shell-shink was an interesting solution to this issue, though it had its own set of problems — privacy implications… in case I type passwords in the command-prompt, I would really rather not have this stuff live on the web. Also, it’s now obselete and taken down, so it’s not even an alternative now). Links, for posterity:
[1] [2] [3]

Now, what I finally decided to use

Talking to some folks at work, I found this wonderful hack: modify $PROMPT_COMMAND to output to a history file manually… but also output a little context — the timestamp and current path, along with the command. Beautiful!

export PROMPT_COMMAND='if [ "$(id -u)" -ne 0 ]; then echo "`date` `pwd` `history 1`" >> ~/.shell.log; fi'

ZSH doesn’t have $PROMPT_COMMAND but it does have an equivalent.

For posterity, here’s what I ended up with:

  • zsh:

    function precmd() {
    if [ "$(id -u)" -ne 0 ]; then
    FULL_CMD_LOG=/export/hda3/home/mote/logs/zsh_history.log;
    echo "`/bin/date +%Y%m%d.%H%M.%S` `pwd` `history -1`" >> ${FULL_CMD_LOG};
    fi
    }

  • bash:


    case "$TERM" in
    xterm*|rxvt*)
    DISP='echo -ne "\033]0;${USER}@${HOSTNAME}: ${PWD/$HOME/~}\007"'
    BASHLOG='/home/mote/logs/bash_history.log'
    SAVEBASH='if [ "$(id -u)" -ne 0 ]; then echo "`/home/mote/bin/ndate` `pwd` `history 1`" >> ${BASHLOG}; fi'
    PROMPT_COMMAND="${DISP};${SAVEBASH}"
    ;;
    *)
    ;;
    esac

This gets ya a wonderful logfile, full of context, with no risk of corruption:

20110306.1819.03 /home/mote/dev/load 515 ls
20110306.1819.09 /home/mote/dev/load 516 gvim run_all.sh
20110306.1819.32 /home/mote/dev/load 517 svn st
20110306.1819.35 /home/mote/dev/load 518 svn add log_screensaver.py
20110306.1819.49 /home/mote/dev/load 519 svn ci -m “script to log if screensaver is running”

(As an aside, you’ll notice that these commands are all timestamped. Imagine the wealth of personal infometrics data that we can mine from here! When am I most productive (as measured by command-density-per-time-of-day?). What really are my working hours? When do I wake? Sleep? Lunch? )

Next up, need to make a `history`-like command to tail more copy-pastable stuff out of this file.

]]>
http://motespace.com/blog/2011/03/12/saving-command-line-history/feed/ 1
Authority, Influence in Social Networks [tentative thoughts] http://motespace.com/blog/2011/01/29/authority-influence-in-social-networks-tentative-thoughts/ http://motespace.com/blog/2011/01/29/authority-influence-in-social-networks-tentative-thoughts/#comments Sun, 30 Jan 2011 04:47:30 +0000 http://motespace.com/blog/?p=326 I spent the day fiddling around with twitter and buzz, to see what signals I have at my disposal.

Eventually I’d like to get some metrics that quantify a few different aspects of human relationships:

  • Global influence (how much influence does this user have upon the world). This is pretty straightforward.
  • Local influence (how much influence does the user have within his more personal social sphere). This is less straightforward and much more interesting. Relatedly, who are the top influencers for an individual or for a clique of people. And can we get an InfluenceRank(a, b) between any two people, or a person and a group, etc.
  • Level of friendship, or closeness (how vague is that, huh?)
  • sub-graphs within a user’s FOAFs & FOAFOAFs that correspond to different social circles/publics/social identities. I’m pretty sure this is a well-studied problem, but it’s interesting to run the numbers for myself.

I’m just getting started, so here’s a working braindump…

I’d like to come up with some more rigorous definitions for these metrics (maybe look in some social psychology journals? read up on social networks?). And there are plenty of other stuff I want to measure, too…

Note: some of these are definition unidirectional (influence). Are any relationships or relationship-metrics bidirectional? (is friendship itself?)

Now, the signals that I have access to:

  • num followers
  • num followers in FOAF network
  • num followers in FOAFOAF network
  • num_replies(a, b)
  • num_reshares(a, b) (not in buzz, though…)
  • num_likes(a, b)
  • more?

These signals should also be normalized over how much a person communicates or follows in general — all we have is the observation “a is following b” or “a is talking to b”, we don’t know the internal impedence in a’s mind — do they follow lots of people, or is the fact that they are following this one person a more significant event?

I should probably also look at reciprocity. min(replies(a, b), replies(b, a)) for 2 users a and b will be very useful. Add on a minimum threshold (say, 3), and there’s a good proxy for friendship.

Another problem is that many of these metrics are so sparse! Just because A is friends with B doesn’t mean that A is going to necessarily comment/like/reshare that often.

I should probably also eliminate the “celebrities” of the network (people with friends/followers above a certain amount. Or at least treat them differently. These users are closer to proxies for measuring ideology or worldview of their followers, rather than “friends” in the canonical sense.

The hardest (most interesting?) part of all this will be evaluation. Once I have a metric, how can I quantify how good it is, beyond just eyeballing it? I have no labeled data…

This afternoon, I had some decent success approximating local influence as

num_followers_in_foaf_network – 0.01*num_followers_globally

(varying that 0.01 constant was a means of penalizing the global popularity of a person… keeping it at 0.01 got me the tech people who influence me personally, 0.05-0.1 got me my non-computery real-life-friends).

This one also worked nicely:

num_followers_in_foaf_network / (1 + log(num_followers_globally)

p.s. Many thanks to the authors of python-twitter and buzz-python-client, you made my life a lot easier…

]]>
http://motespace.com/blog/2011/01/29/authority-influence-in-social-networks-tentative-thoughts/feed/ 2
On Microblogging and Feedreaders http://motespace.com/blog/2010/10/31/microblogging-and-feedreaders/ Mon, 01 Nov 2010 00:17:26 +0000 http://motespace.com/blog/?p=277 I’ve been hearing more and more, “I don’t really use my Feedreader any more, I keep up to date with news and interesting links via Twitter, Buzz, and Facebook”.

So, now I’m wondering: can I extract and distill this interesting stuff from microblogs?

  • Brute-force solution: extract all hyperlinks from twitter feeds of friends and friends-of-friend. Expose them in an RSS feed for syndication
  • More work: Keep track of the source of information (from friend, from friend’s network), aggregate it (x people posted this same link, x different friend’s networks posted this same link)
    • (aside: I remember reading somewhere that while the immediate-friend relationship provides lots of utility for news, friend-of-friend is mostly noise)
  • More work: Provide some sort of feedback/voting mechanism, to see which sources are actually useful (it’d be easy to hook these up as features to a machine learning classifier
  • Tangent: Aside from links, retweets could be an interesting thing to explore in the same way; less sanguine about this though.
]]>
I can has consciousness? http://motespace.com/blog/2007/11/28/i-can-has-consciousness/ http://motespace.com/blog/2007/11/28/i-can-has-consciousness/#comments Thu, 29 Nov 2007 06:50:11 +0000 http://motespace.com/blog/2007/11/28/i-can-has-consciousness/ Conversations at work recently have turned again and again to consciousness and self-awareness (what, you thought “Android” was just a phone? ;) ). Now, I’m not going to belabor the point with discussions of artificial intelligence and yet another amateur’s resummarization of Searle’s Chinese Room[1]. Instead, I’ve been thinking about self-awareness in groups of humans.

A bullet-point braindump:

  • As background, remember that short story in Godel Escher Bach, where the ant-eater communicated with the colony of ants (not the ants themselves, but the colony), and ate certain individual ants as a way to shape the colony into something that’s more intelligently connected?
  • It’s a cliche’d remark that groups of humans begin to resemble organisms in their own right. Corporations seek after the good of the corporation rather than the good of any of its individuals. Cultures grow, intermingle, reproduce spawning new cultures. OK, so these macro-groups of humans are animals, that’s for sure. But are they self-aware Conscious? Would we recognize it if they were?
  • It’s interesting when a group of people who’ve been meeting for a while realize that they are in fact behaving as a group, and in turn have a group identity. Is this awareness of group identity the same as self-awareness in the group? (answer: I don’t think so, this is something different).
  • To extend the brain metaphor, imagine humans to be the neurons in a larger collective brain. Urgh, the speed of signal transition along axon-dendrite gap is horribly slow. What effect does this slowness have? Also, humans are damn intelligent signal processors compared to neurons. What effect would our individual intelligences have on the larger structure?
  • Would such a self-aware “organism” think thoughts that are entirely separate and entirely transcendent above the thoughts of its constituents?
  • Scale? Seems to be the general belief that intelligence is the emergent result of massive amounts of highly, highly interconnected neurons. How many people do you need in a group before it can be considered an organism? A self-aware organism? Is the interconnectedness of humans even on a large enough order of magnitude to support a functionally processing organism? What are such an organism’s inputs, outputs? Would human sub-organizations specialize into computational functional tools, similar to how neurons in the brain are specialized into groups like the PFC, the amygdala, etc?
  • I imagine an extraterrestrial coming to the earth, and conversing with society as opposed to individuals. That would be an interesting story. But not the kind of sci-fi that would entertain a puny human mind, though, that’s for sure.

Hmm, I’ll have to think more about this… so many premature thoughts… And most of them the result of only 4 hours of sleep for the last couple days. My apologies, dear anonymous reader, for the unpolished words, the undeveloped concepts, the flaws. “Time past and time future / Allow but a little consciousness.”

[1] (In any case, I love Ben Goertzel‘s take on the situation, which, to paraphrase: “When the time comes, and you’re actually arguing with the computer whether it is self-aware or not, then the point is already moot, isn’t it?”)

]]>
http://motespace.com/blog/2007/11/28/i-can-has-consciousness/feed/ 1
What we can learn from Folksonomy http://motespace.com/blog/2006/06/24/what-we-can-learn-from-folksonomy-and-delicious/ Sun, 25 Jun 2006 01:07:06 +0000 http://fairuz.isi.edu/blog/index.php/2006/06/24/what-we-can-learn-from-folksonomy-and-delicious/ Outward-facing Questions:

  • The great thing about delicious and folksonomy is that it creates an ontology as an emergent biproduct of individual self-serving efforts (that is, personal bookmarking). I’m wondering if we can take a similar tact to solve other AI problems.

Inward-facing Questions:

  • What is the best way to represent the evolution of a tag’s meaning (evolution on both the individual and group scale). Folksonomy is a lot more dynamic than a fixed ontology, so we might not be able to use the same old tools.
  • Folksonomy is the relationship between three types of information: tags, tagged objects, and the users who tag them. What information can we derive each that are not explicit in the structure. You can call this “tag grouping”, “neighbor search”, “related items”… but it’s really all just clustering. What are the differences when you cluster each?
  • Continuing from the last quesiton: it’s most intuitive to hierarchically cluster tags—this maps well onto the formal “ontology” model that information architects and NLP researchers are comfortable in dealing with. But what happens when we hierarchically cluster users and tagged items? What does hierarchy infer about the relationships between parents, children, and siblings in the resulting structure?
  • What are the differences in (tags, users, items) between digg, delicious, flickr, and citeulike?

Ah, to have time to pursue these….

]]>
Finished the Dissertation Proposal http://motespace.com/blog/2006/06/19/finished-the-dissertation-proposal/ Mon, 19 Jun 2006 07:37:08 +0000 http://fairuz.isi.edu/blog/index.php/2006/06/19/finished-the-dissertation-proposal/ Ahhh, I’m done. Now, don’t that feel good. 71 pages on building a computational model of language learner errors. Phew, now to sleep.

]]>
Consuming http://motespace.com/blog/2006/06/12/consuming/ http://motespace.com/blog/2006/06/12/consuming/#comments Mon, 12 Jun 2006 17:20:53 +0000 http://fairuz.isi.edu/blog/index.php/2006/06/12/consuming/ Talking to a friend last week, an interesting idea came up: We don’t just consume information, information also consumes us.

My attention is a scarce resource, and different ideas, media, schools of thought, compete for it. (This is what makes multidisciplinarity hard).

It makes me think twice about metaphors for learning that compare research and knowledge acquisition to foraging for food. What if, instead of likening ourselves to the predators and farmers, we liken ourselves to the prey and the farmed.

There’s plenty of discussion of memes as pseudo-genetic entities (evolving, reproducing, self-transmitting)… but underlying this is the idea that we are the medium of transmission, we are the host to the virus.

It certainly puts a new spin on the way I look at sites like All Consuming.

I don’t like this metaphor of being consumed, it feels too passive and fatalistic to me. But maybe it’s true.

]]>
http://motespace.com/blog/2006/06/12/consuming/feed/ 2