Monday, July 18, 2005

The SearchActive Idea

While I'm probably going to talk mostly about peer-to-peer networking on this blog, the key idea behind SearchActive is basically a swarm intelligence solution to searching the Web. In a nutshell, swarm intelligence involves studying the behavior of social creatures and then applying these behaviors to solving other problems. Most of swarm intelligence approaches are based on social instects, like ants, bees, and termites. There has also been some work in studying the flocking behavior of birds. Taking ants for example, they basically do two things, they look for food, and then when they find it they carry the food back to their nest. While they're carrying food back, they leave a trail of pheromones, which other ants tend to follow. In essence the ants solve the shortest path problem. Anyone who's played a lot of real time strategy games has probably experienced the difficulty computers seem to have solving this same problem.

The analogy for searching the Web is closer to people wandering through town looking for something. Let's say our good friend Bob is looking for an ATM machine, and he's new to town. He starts out walking along, and he sees someone walking by. He asks that person if they know where the nearest ATM is, and preferably one from his home bank so he can avoid paying ATM fees. This person doesn't know where that particular ATM is, but he does know where an ATM is, so he points Bob in the right direction and off he goes. For Bob this isn't ideal, but it's better than nothing, so he goes with it. Along the way, he sees another person walking down the street, so he asks this person the same question. This person tells Bob where an ATM is, and even one that's from the right bank, but the trouble is it's a couple blocks further. Bob has to then decide which is the better result based on whatever factors he thinks is important, such as the distance, how trustworthy either of these people are, and how much it's worth to him to not have to pay an ATM fee.

Anyway, later on, Bob decides he needs to find a drug store. Bob remembers passing one on his way to the ATM earlier that day, so he starts out heading the same direction. Along the way he asks some of the people he passes if there's a better one. He comes across one man who happens to be the owner of a drug store that's known for having a very poor selection and ridiculously high prices. This man obviously wants to improve his business, so he explains to Bob that this is by far the best drug store in town, and that it's just a block to the east. He was actually lying, as his store is a good three miles to the east. Anyway, Bob, not knowing any better decides to try his luck. After going about a block the drug store is no where to be found, so he asks someone about it. This person responds that the store is actually a long way away and it's not really worth going to anyway, and recommends a couple of better drug stores to Bob. One of these is the one Bob was heading for in the first place, so he decides to go back to that one.

SearchActive will use a similar method to search the Web. First let's ignore the peer-to-peer aspects and just look at one it would work for one computer in isolation. SearchActive will be a program that runs on your computer and monitors all the web pages you visit. It assumes you can find at least one without a search engine, which isn't that unreasonable because everyone at the very least needs to be able to find a search engine. When you go there, SearchActive catches the page, decides what are the keywords on the page, and stores it in its index. Then, in the background while you're busy reading your web page, SearchActive is following the links on that page and adding the linked pages to its index. When you click on a link, your click takes priority over anything else it's doing, so it goes and grabs the page and gives it to you, and then proceeds to go ahead and index that page. If you're lucky, the link you clicked is one that SearchActive already indexed, so it can give you a local copy instead, which makes things seem a lot quicker for you because it hides the download time.

When you want to make a search, you give some keywords to SearchActive and it checks its index for pages it has seen that might be a good match. The process continues and before long SearchActive is able to give you pretty good search results.

Now, there needs to be some way to rank these pages, since most of the web pages don't have that great of content. I'll talk about some ideas I have for this in a later post. This is really where the hard part of searching the web is.

Now lets take your own computer running SearchActive and connect it to a few other computers running SearchActive (hereafter called nodes) which are in turn connected to a few other nodes, and so on. Now when you go to search, SearchActive not only looks at its own index, but also asks its neighbors if they know of any good pages for the keywords you gave. Each of the neighbors can either answer, or ask their own neighbors too. In essence, when each node responds it is voting for a particular page. Your particular node should decide how much each vote can count and then combine them all to figure out what the best page is. In essence you're deciding how much you trust certain nodes.

As certain nodes gain your trust in the future you'll be more likely to ask them first. You also have the option of breaking some connections and connecting to new nodes, so ideally your node would favor connections with highly trusted nodes. Since the trust should be based on the quality of search results, in essence the network should organize itself so that nodes run by users with similar interests tend to get closer together. In essence you'd end up with different regions of the network specializing in specific topics on the Web.

That's the idea behind it, but of course the devil is in the details. Future posts will likely discuss some ideas for dealing with some of these details.

0 Comments:

Post a Comment

<< Home