Tuesday, September 13, 2005

Framing the Problem

First of all, I'd like to apologize for being so delinquent in my updates. My summer turned out to be pretty busy towards the end (that's my excuse anyway) and it definitely takes a while to get back into the swing of things when the school year starts up. The school year is looking good so far. My classes are interesting so far, and my schedule allows me enough time to fit in most of what I need to fit in. I'm starting some more intense work on my thesis, which should mean there will be more frequent posts here. I've been meeting weekly with my advisor, and he's doing a really good job giving me some direction for my project.

The current thing my advisor is wanting me to do is to coming up with a clear working statement of the problem. I agree this is a good idea, since that will give me a good way to focus my ideas. I wrote up a simple one earlier today, which started out with "can web search be done successfully in a decentralized manner." Now, what do I mean by decentralized? Google could argue that their search is decentralized, since they have their army of robots and hundreds of computers that crawl the web and index most of what they come across. This is decentralized in a sense, but it's not what I mean. Google seems centralized to the user because everything is wrapped up behind www.google.com and only Google controls what search results it will return. I would like to see if the problem of searching can be successfully distributed out to the individual web users, so that each computer running a web browser is participating in building the index.

This leads me to the question of how to frame the problem. I'm looking at an extremely decentralized architecture for doing web search. I can see two different ways to look at this problem, either as a swarm intelligence problem or a peer-to-peer problem. As a peer-to-peer problem this becomes looking at building a peer-to-peer network for searching the web. This naturally directs one more towards the issues facing peer-to-peer networks, such as trust, accountability, and security. As a swarm problem, we would be looking at this as lots of agents out searching the web (i.e. people using web browsers to look at information) that then start cooperating to make everyone more efficient. Swarm intelligence problems look at how many seemingly independent agents that are relatively simple on their own combine and have a very distinct group behavior. In the context of web search, it would be looking at people browsing the web just having a few address to start from and then following links everywhere, versus if these people we to start communicating and make everyone more efficient by sharing previous experience.

So which aspect would be most useful? I think the swarm aspect brings out part of the problem that are more interesting to me. I also think swarm intelligence and peer-to-peer are very closely related, although it seems like relatively few people are looking at peer-to-peer networking from a swarm intelligence perspective. Peer-to-peer networks involve lots of computers or nodes on a network that all participate in accomplishing some particular goal. For example, let's look at Gnutella. Gnutella is a decentralized way for sharing files. If it were just one computer, there would be a very small selection of content available. On the other hand, by combining these nodes into a network, one searches the whole network rather than the individual computers, and as a network there is a much wider range of information available. Thus we have the behavior of individual nodes and the behavior of the network as a whole as two very distinct things. This is exactly what swarm intelligence is about.

So peer-to-peer seems like it can be cast as a swarm intelligence problem, which then opens it up to all the insights and approaches of swarm intelligence. Swarm intelligence can also gain something from what the peer-to-peer people have been developing. All the swarm problem solving approaches I know of assume everyone in the swarm is behaving well. In a peer-to-peer network, this is almost never the case. Using swarm intelligence in peer-to-peer networks will force swarm intelligence to develop means of dealing with misbehaving members of the swarm.

Both aspects seem valid, and they have quite a bit. I think for the most part I will consider this as a swarm intelligence problem, and deal with the peer-to-peer issues from a swarm intelligence standpoint, at least as far as it makes sense to do that.