Skip to content

Parsing Sigalert

Been wanting to do this for a while:

sigalert.com
has wonderful data that can be mined, with regards to patterns in freeway traffic. It’d be great to harvest and analyze traffic trends.

TODO:

  • email Ken (urban planning/transportation PhD at UCI) to see if they have any data i can steal
  • email sigalert to see if they can give me access to their raw data (probably not… or should i just covertly spider it?)

Pages to gather from:

  • http://www.sigalert.com/speeds.asp?Region=Greater+Los+Angeles&Road=405%20South
  • http://www.sigalert.com/speeds.asp?Region=Greater+Los+Angeles&Road=405%20North
  • http://www.sigalert.com/speeds.asp?Region=Greater+Los+Angeles&Road=10%20East
  • http://www.sigalert.com/speeds.asp?Region=Greater+Los+Angeles&Road=10%20West
  • http://www.sigalert.com/speeds.asp?Region=Greater+Los+Angeles&Road=5%20North
  • http://www.sigalert.com/speeds.asp?Region=Greater+Los+Angeles&Road=5%20South
  • (and perhaps others…though don’t want to overload their servers too much or alert them to my presence)

Programming TODO:

  1. figure out how to use cron and wget
  2. start sucking down pages, every 25 minutes or so
  3. write a parser to grab the necessary data from the pages in question
  4. find a good DB backend to store my data in
  5. cron extract freeway speeds and write them to DB (get it all working together)
  6. data processing
    • current speeds
    • average speeds (model this after web-traffic data: avg per day-of-week, month, weekday/weekend, etc)
  7. figure out distance measures between points on the map
  8. use the prior to calculate a numerical value of travel-time (e.g. “25 minutes if you leave now”)
  9. explore equations that fit traffic
    • accidents
      • measure severity by time-to-normal
      • clustering of severity
      • resolution with respect to location (because the front cars are accelerating and the back cars are slowing down, congestion would move backwards along a road like a wave, even once the original accident site is resolved, huh?)
      • resolution with respect to time (how long does it take an accident to dissolve?)
    • rush-hour trends (can i treat rush-hour and congestion as an accident without a specific center?)
    • smoothing, with respect to day-of-week, accident resolution, hour-of-day
    • The effects of reverse commutes (e.g. the 405 where both sides are backed up)
    • Average traffic speeds for specific exits
  10. adapt estimated time with averages, to see if I can get an estimate of “how long will it take me if i leave +5 minutes, +10 minutes, +20 minutes, etc…”
  11. measure a minutes-here-waiting-vs.-minutes-on-the-road ratio
  12. put all of this into a web frontend

Leonard recommends

  • This for an example of www:mechanize and tokeparse
  • DBI to hook perl into mysql
  • output to web using php linking into mysql
  • jpgraph as an alternative to gnuplot which i was thinking of using

Ken says:

  • The Transportation Research Board has a bunch of good papers
  • The University of California Transportation
    Center
    does too, specifically

    • http://www.uctc.net/access/access.asp
  • But, analyzing traffic dynamics just by knowing traffic speeds at points on the freeway is a pipedream…there are way too many other variables involved.
  • UCI’s ITS (and UCLA, too) should have all the data i could ever want…
    • Brian Taylor is a “transportation guru” at UCLA
    • Hiro Iseki is a PhD candidate at UCLA (hiseki @ ucla.edu) (his background is in transportation engineering )

2 Comments

  1. kyle

    i wish somebody would just make a page that would let you see past traffic conditions, say snapshots every hour. they would only need to be updated every few months or so. i’m currently wondering if i should leave at 4 or 5 to go to san diego from camarillo, which time will result in no traffic?

    Posted on 05-Apr-06 at 21:57 | Permalink
  2. mote

    Kyle, I agree. Big companies definitely have this information–I just wonder why this aggregate information isn’t out there.

    Actually… I would have been able to answer your question for you, but I’ve been spidering the Greater LA area, not San Diego.

    My guess is that driving habits vary so much between city and city, that my localized information wouldn’t be very useful to you.

    Looks like you’re stuck with trial and error =/.

    Posted on 05-Apr-06 at 22:16 | Permalink