A Multilingual Gazetteer System for Integrating
Spatial and Cultural Resources

Abstract | Proposal | Final Report

Lewis Lancaster, Electronic Cultural Atlas Initiative, UC Berkeley

January 24, 2001

A small grant is requested to enable the design, development, and testing of standards for entries in gazetteers and also for characterizing gazetteers themselves at the level of complexity needed to provide effective support for humanities and historical computing.

Digital library research has become concerned with a steadily increasing range of genres and materials and, more challengingly, with the use of diverse digital genres in conjunction with each other. Researchers associated with the Electronic Cultural Atlas Initiative are investigating means of combining textual information with geospatial data, enabling cultural, historical, and social data to be represented in time and space through Geographical Information Systems (GIS). Linking the mention of place names to maps involves three different genres: toponym-rich texts, GIS maps, and, mediating between the two, gazetteers, structured records about locations and their names.

Making gazetteers available to the users and contributors of digital library resources, permitting indirect referencing and GIS mapping of places, is key to communicability among digital resources with geospatial information. Gazetteer resources can also integrate scientific and demographic data about places with the global, multilingual records of human culture-art, literature, biography, history and other fields-that are rapidly being digitized.

Important gazetteers are being developed by a number of digital humanities projects worldwide. The rise of a networked environment makes it possible to draw on multiple, network-accessible gazetteer servers. Unfortunately, the effective use of gazetteers in historical and humanities computing is impeded by the lack of standards for both the records about places within gazetteers, and for records describing the gazetteers themselves. The emerging standards for conventional gazetteer entries, based largely upon contemporary North American gazetteers that focus on environmental science, are inadequate for humanities computing. New work is required to extend these standards to accommodate:

  • multiple toponyms in multiple scripts that refer to the same geospatial location
  • the instability of toponyms over time
  • changing boundaries, locations, and spatial footprints of places (as towns become cities or rivers spring their banks)

In addition, the range of types of geographical entities currently used in gazetteer place name type thesauri (bridge, tumulus, church) are simply not detailed enough to accommodate the range of place name types found in the global, historical texts about human culture.

A small grant will support three necessary tasks for enabling interoperability between gazetteers, texts, and maps in a distributed environment:

  • Standards validation, enhancement, and development. Gazetteer content standards need to be tested on real global, historical and cultural data, and enhanced as necessary to support the international exchange of gazetteer data. Gazetteer metadata standards and protocols must be developed to allow interoperability among gazetteers and toponym-rich texts in diverse languages and formats.
  • Infrastructure design and testing. We will create a multilingual union gazetteer prototype by importing XML records from several gazetteers in Chinese and English along with qualitative textual information about those places. These records will be used to establish a unicode-enabled database to link and enhance gazetteer and text records.
  • GIS visualization. The creation of metadata for the union gazetteer database will enable the data in the enhanced records to be viewed using the time and space visualization tools of the Electronic Cultural Atlas Initiative (ECAI) and to be linked to other globally distributed records about the same places.

Extending Existing Research

Several organizations working on related projects have already agreed to participate in meetings and to collaborate in the exchange of digital materials in a testbed environment. The research, based at UC Berkeley, will be carried out in consultation with a global community of scholars.

  • The Alexandria Digital Library (ADL) (www.alexandria.ucsb.edu). The ADL Gazetteer currently contains more than 4.2 million entries with worldwide coverage. The contents are based primarily on data from two U.S. federal government gazetteers, which emphasize named features that appear on topographic maps rather than historical and cultural materials. ADL developed a Gazetteer Content Standard and a Feature Type Thesaurus to support its gazetteer development and to encourage the growth of standards-based gazetteers and interoperability among distributed gazetteer services. The standards developed under this grant will enhance future developments by ADL.
  • The Electronic Cultural Atlas Initiative (ECAI) (www.ias.berkeley.edu/ecai). ECAI is developing a globally distributed temporo-spatial library of cultural and historical resources with a centralized metadata catalogue and a GIS viewer. The development of gazetteer reference systems will enable ECAI users and project developers to conduct queries across alternative or fluid toponyms and to contribute new data to gazetteers. It will also free humanities scholars with little or no training in geography from the need to determine the geospatial location of the places they study.
  • Academia Sinica Computing Center (http://www.ascc.net/center/index_e.html). Academia Sinica is providing global access to a corpus of 2,500 years of Chinese historical writing through their Scripta Sinica project. The project currently amounts to 300 million Chinese characters. They have recently embarked on the development of a historical and contemporary gazetteer of China containing over 70,000 historical names and an additional 5,000 contemporary names. The research carried out under this grant will contribute to their goal of linking these two projects.

Subsequent Research

Gazetteer standards enhancement, gazetteer metadata standards development, and prototype linkage of diverse gazetteers and text records in a multilingual environment, can be accomplished in one year with a grant of $100,000. These are valuable achievements on their own. However, these activities will realize their maximum potential through their application to subsequent projects. This small grant enables us to accomplish the necessary preconditions for developments of the following kinds:

  • Linking texts, gazetteers, and maps in a distributed environment. Moving beyond the database implementation described above, subsequent developments in multilingual gazetteer metadata will link the information about places found in distributed resources, even if those resources are not structured according to a uniform gazetteer record standard.
  • Creating topical indexes about places. Having prototyped the capacity to associate toponyms with the texts in which they are mentioned, we intend to link gazetteer research to ongoing developments in the automated creation of topical indexes. This will make it possible to conduct queries about places by subject matter and to create second-level thematic gazetteers on the basis of texts that name places in conjunction with people, events or any other question.
  • Geographical access to library catalogues. Geographical access to bibliographies and online library catalogues is currently only weakly supported, even though geographical information is provided in several parts of the internationally accepted MARC format. Ordinarily, searching is supported only for title words (20X-24X) and subject heading words (650 & 650). Gazetteer research will enable sophisticated geographical access using these clues and visualization in GIS ( e.g. "Map the geographical spread through time of publishing on this topic" or "Find works on folklore within the area in which this language is spoken.") Such refinements can now be designed, but they will become feasible only if gazetteers can be brought to the requisite degree of completeness and standardization.

References

Hill, L. L. (2000). Core elements of digital gazetteers: placenames, categories, and footprints. In J. Borbinha & T. Baker (Eds.), Research and Advanced Technology for Digital Libraries : Proceedings of the 4th European Conference, ECDL 2000 Lisbon, Portugal, September 18-20, 2000 (pp. 280-290). Berlin: Springer. Available: http://www.alexandria.ucsb.edu/~lhill/paper_drafts/ECDL2000_paperdraft7.pdf.

Hill, L. L., & Zheng, Q. (1999). Indirect geospatial referencing through place names in the digital library: Alexandria Digital Library experience with developing and implementing gazetteers. Proceedings of the American Society for Information Science Annual Meeting, Washington, D.C., Oct. 31- Nov. 4, 1999, pp. 57-69. www.alexandria.ucsb.edu/~lhill/paper_papers/ASIS99_confpaper2_final.pdf.

Larson, Ray R., Plaunt, Christian, Woodruff, Allison G. and Hearst, Marti (1995). "The Sequoia 2000 Electronic Repository" Digital Technical Journal. 7(3), pp. 50-63.

Woodruff, Allison G. and Plaunt, Christian (1994). "GIPSY: Georeferenced Information Processing System". Journal of the American Society for Information Science. 45(9), pp. 645-655.

Back to Gazetteer Project