In order to store and analyze the large quantities of data involved in the STEDT project, we have developed a large relational database and a number of software tools to manage it.
The database consists of a lexical file, an etyma file, a language file, a source bibliography file, and a number of linking files relating them. Lexical data has been loaded into the database from hundreds of Sino-Tibetan languages. Data-sources range from published dictionaries and wordlists to questionnaires solicited from field researchers. Most of the data is entered manually into the computer by STEDT personnel, and is loaded into the lexical file after extensive proofreading. A small amount of data made available to us on computer disk can be loaded directly.
The etyma file now contains over 1,900 proposed roots at all levels of reconstruction, from PST and PTB down to parent languages of individual branches. The vast majority of these roots are new, in that they have been discovered at the STEDT project or are refinements of previously posited etyma.
Several software tools have been developed to address the problem of analyzing and etymologizing the vast amounts of data in the STEDT database. The most important of these tools is the "Tagger's Assistant".
In STEDT parlance, to "tag" is to associate a lexical record with the root from which it isdeemed to descend.The "Tagger's Assistant" is a small FoxPro program that accesses the database for purposes of etymologization. Using Tagger's Assistant, it is possible to tag etyma directly in the lexicon file, facilitating the etymologization process.
The Tagger's Assistant provides numerous views of the lexical records and facilitates the grouping and searching of records by various criteria, such as by gloss, phonological shape, original data source or language.In the view of the lexical data file shown here, records are dispalyed in order by the GLOSS field.
The Tagger's Assistant shortcuts the tedious steps normally associated with using FoxPro by providing a point-and-click interface to common database functions (shown below).The computer software does not make any decisions about etymologization; it simply provides an environment in which the work of etymologizing can be carried out with speed, efficiency, and accuracy.
In order to accommodate all the orthographies of our source transcriptions, a special font was developed in the early years of the project. Known as the STEDT font, it includes the letters of the Roman alphabet, a large number of IPA and other phonetic symbols, and various symbols and diacritics found in sources on East and Southeast Asian languages. This font is available for downloading from our STEDT font page.
National Science Foundation, Division of Behavioral & Cognitive Sciences, Linguistics, Grant Nos. BNS-86-17726, BNS-90-11918, DBS-92-09481, FD-95-11034, SBR-9808952, BCS-9904950.
National Endowment for the Humanities, Division of Research Programs, Grant Nos. RT-20789-87, RT-21203-90, RT-21420-92, PA-22843-96, PA-23353-99, and PA-24168-02.
STEDT
University of California
Department of Linguistics
1203 Dwinelle Hall
Berkeley, CA 94720-2650
Send comments to stedt@socrates.berkeley.edu.
This page was designed by David Mortensen, based on content produced by John B. Lowe, Ju Namkung, and Richard Cook. The STEDT elephant logo was designed by Nadja R. Matisoff.