I’ve long wanted a way to track/store (and later search?) my browser history.
Why do modern browsers throw any of this away? I estimate I consume far less than < 50M of html per day (flash videos, large files excluded)
I want to be able to search over this, and gather stats about my browsing habits.
Until now I’d thought of doing this with a local proxy that ran on my machine, this morning I realized that a much simpler greasemonkey script should be able to do this.
My hypothetical script will inject into every web page a 1×1 image URL from some directory on a server I own. In the image’s URL, I’ll encode the page’s URL, any parameters passed in (and also the current time or some random number to keep browser from caching the image).
Then, tracking my browser history is just grep’ing server logs for everything served from that directory, and decoding the metadata.
I like this because it’s centralized (aggregates browsing across many machines to one central place).
And super simple to install/manage.
This doesn’t save content, unfortunately (though I a separate script running on my server could do that, parsing logs and fetching web page contents. This wouldn’t work for dynamic pages or ajax, but every click in gmail or calendar isn’t as important as other pages).