Exploiting Content Localities for Efficient search in P2P Systems



Existing P2P search algorithms generally target either the performance objective of improving search quality from a client's perspective, or the objective of reducing search cost from an Internet management perspective. We believe that the essential issue to be considered for designing and optimizing search algorithms in unstructured P2P networks is the trade-off between the two performance objectives. Motivated by our observations, the locality of content serving in the peer community and the localities of search interests of individual peers, we propose CAC-SPIRP, a fast and low cost P2P searching algorithm. Our algorithm consists of two components. The first component aims to reduce the search cost by constructing a CAC (Content Abundant Cluster), where content-abundant peers self-identify, and self-organize themselves into an inter-connected cluster providing a pool of popular objects to be frequently accessed by the peer community. A query will be first routed to the CAC, and most likely to be satisfied there, significantly reducing the amount of network traffic and the search scope. The second component in our algorithm is client oriented and aims to improve the quality of P2P search, called SPIRP (Selectively Prefetching Indices from Responding Peers). A client individually identifies a small group of peers who have the same interests as itself to prefetch their entire file indices of the related interests, minimizing unnecessary outgoing queries and significantly reducing query response time. Building SPIRP on the CAC Internet infrastructure, our algorithm combines both merits of the two components and balance the trade-off between the two performance objectives. Our trace-driven simulations show that CAC-SPIRP significantly improves the overall performance from both client's perspective and Internet management perspective.