Set Similarity Query Processing


More information here

With the proliferation of data, similarity query processing plays an important role in many application areas. Consequently, there has been much interest in developing efficient methods for this fundamental operation. In this tutorial, we focus on the processing of set similarity queries where records are represented in sets of elements. It is an essential procedure to provide an effective and efficient way to correlate data in many applications, such as near-duplicate Web page detection, data integration and cleaning, record linkage, and pattern recognition. We will introduce an overview of the state-of-the-art approaches to set similarity query processing, including both exact and approximate solutions. We will also discuss a series of related problems which can be converted to set similarity queries and efficiently solved by exploiting set similarity query processing techniques.