Database and Software Development

The Database

The STEDT database is now online and can be found here:

http://stedt.berkeley.edu/search

Background

In order to store and analyze the large quantities of data involved in the STEDT project, we have developed a large relational database and a number of software tools to manage it.

The database consists of a lexical file, an etyma file, a language file, a source bibliography file, and a number of linking files relating them. Lexical data has been loaded into the database from hundreds of Sino-Tibetan languages. Data sources range from published dictionaries and wordlists to questionnaires solicited from field researchers. Most of the data has been entered manually into the computer by STEDT personnel, then loaded into the lexical file after extensive proofreading.

The etyma file now contains over 3,000 proposed roots at all levels of reconstruction, from PST and PTB down to parent languages of individual branches. The vast majority of these roots are new, in that they have been discovered at the STEDT project or are refinements of previously posited etyma.

Researchers who would like to contribute data are encouraged to consult this page for instructions.

Tagging

Several software tools have been developed to address the problem of analyzing and etymologizing the vast amounts of data in the STEDT database. The most important of these tools is the online tagging interface.

In STEDT parlance, to “tag” is to associate a lexical record with the root from which it is deemed to descend. The current online tagging interface has its roots in a small FoxPro program we developed named the “Tagger’s Assistant” that made it possible to tag etyma directly in the lexicon file, facilitating the etymologization process. The online tagging interface provides numerous views of the lexical records and facilitates the grouping and searching of records by various criteria, such as by gloss, phonological shape, original data source, or language. The software does not make any decisions about etymologization; it simply provides an environment in which the work of etymologizing can be carried out with speed, efficiency, and accuracy.

STEDT Font

In order to accommodate all the orthographies of our source transcriptions, a special font was developed in the early years of the project. Known as the STEDT font, it includes the letters of the Roman alphabet, a large number of IPA and other phonetic symbols, and various symbols and diacritics found in sources on East and Southeast Asian languages. This font is now deprecated, but is still available for download from our STEDT font page.