Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    ANSWER: APPROXIMATE NAME SEARCH WITH ERRORS IN LARGE DATABASES BY A NOVEL APPROACH BASED ON PREFIX-DICTIONARY

    The obvious need for using modern computer networking capabilities to enable the effective sharing of information has resulted in data-sharing systems, which store, and manage large amounts of data. These data need to be effectively searched and analyzed. More specifically, in the presence of dirty data, a search for specific information by a standard query (e.g., search for a name that is misspelled or mistyped) does not return all needed information, as required in homeland security, criminology, and medical applications, amongst others. Different techniques, such as soundex, phonix, n-grams, edit-distance, have been used to improve the matching rate in these name-matching applications. These techniques have demonstrated varying levels of success, but there is a pressing need for name matching approaches that provide high levels of accuracy in matching names, while at the same time maintaining low computational complexity. In this paper, such a technique, called ANSWER, is proposed and its characteristics are discussed. Our results demonstrate that ANSWER possesses high accuracy, as well as high speed and is superior to other techniques of retrieving fuzzy name matches in large databases.

  • articleNo Access

    AN EFFICIENT ENTITY RESOLUTION METHOD FOR LARGE RELATIONS

    Entity resolution (ER) is to find the data objects referring to the same real-world entity. When ER is performed on relations, the crucial operator is record matching, which is to judge whether two tuples refer to the same real-world entity. Record matching is a longstanding issue. However, with massive and complex data in applications, current methods cannot satisfy the requirements. A Sequence-rule-based record matching (SeReMatching) is presented with the consideration of both which attributes should be used and their importance in record matching. We have changed the Bloom filter and therefore the checking speed is greatly increased. The best performance of the algorithm makes the complexity of entity resolution O(n). And extensive experiments were performed to evaluate our methods.