Friday, February 17, 2006

Overview of Last Quarter

I noticed that I've neglected to update this blog for the last three months or so. That means I've gone an entire quarter of the school year and not said anything. It's been a busy quarter, but I figured I should update everyone on what I've accomplished.

I formalized some of the ideas behind the search algorithm I hope to implement. Then I built a web crawler and crawled about 12,000 web pages. There are about 300,000 more pages the crawler has found but hasn't managed to visit yet. It's not terribly robust at the moment. I haven't had a chance to look at the web site information yet, so I can't comment on that much. The plan is to use this in simulations so the graph topology actually represents a real subset of the Web.

I started running some simulations. Basically I generate a graph with certain interesting properties, then I have a swarm of simulated agents go start randomly following links in search of something. While I can't say that this will be a good approximation of actual human browsing habits, it will hopefully give some insight as to whether this approach has any hope of working at all. I've had some interesting results so far, but again I haven't analyzed them enough to report anything truly useful. It looks like the amount of work I expect each node to be able to do might be a bit high, so some serious optimization of my algorithms is going to be needing.

As far as what's coming up, I'm going to take a break to write up what I've done for my thesis advisor. Hopefully this will also include analyzing some of the data I've generated so far. I have a poster presentation the second week of March. I think my focus until then is going to be mostly on simulation, and then I will start doing serious work towards getting a prototype done. Hopefully now on I'll have more time to keep this blog up to date as well.