Congress of Cultural Atlases: The Human Record
May 7-10, 2004
University of California, Berkeley

Schedule | Registration Form | Congress Home

Digital Gazetteers for Cultural Atlases: Abstracts
Sunday, May 9, 2004

 

“Use of Feature Typing in a Digital Gazetteer: An Exploratory Statistical Analysis”
Jun Wang, Peking University, and Linda Hill, University of California, Santa Barbara

Gazetteers, as dictionaries of feature names (placenames), associated geospatial locations and related information, function as bridges between geospatial locations that can be mapped (formal referencing) and the feature names which are the cultural, social administrative, scientific ways of identifying the features. A key component of gazetteers is a feature typing scheme (e.g., indicating whether the feature is a populated place, school, or lake); these schemes are designed to meet certain levels of specificity and to structure a way of classifying the set of places in a particular gazetteer. This research project explores a method of correlating the words and phrases in feature names with three independent feature typing schemes. Word and phrase parsing methods and statistical correlations were applied.

The Alexandria Digital Library (ADL) gazetteer was used as the raw data. It contains 4.4 million entries and 5.9 million feature names for worldwide features. Each entry is indexed by ADL Feature Type Thesaurus (FTT); subsets of the entries are indexed by either the National Geospatial-Intelligence Agency (NGA – formerly NIMA) gazetteer category scheme or the U.S. Geological Survey (USGS) gazetteer category scheme. Using software developed by the VISION research group of Peking University, this extensive set of feature names and associated typing was mined to develop an initial prototype of a typing advisory service to assist in both cataloging (the assignment of types to new entries) and the choice of the most likely type terms for searching distributed gazetteers with different typing schemes.

The method is based on the VISION group’s research work on the knowledge organization systems (KOS) of digital libraries (see http://vision.pku.edu.cn). The result is an explorable table of feature name tokens (i.e., words and phrases); associated with each token are the type terms from each scheme and the statistical strength of that association. We will discuss some of the issues of development (e.g., the nature of stopwords in feature names; identifying the generic versus the proper portions of worldwide feature names; treatment of feature names without generic tokens (e.g., ”Chicago”) and explore ideas for further development and applications.