The Sino-Tibetan (ST) languages constitute one of the great language families of the world. Yet despite the rapid development of Sino-Tibetan studies in recent decades, they remain poorly understood; indeed, some branches of the family are still virtually unstudied. To fill this vacuum in historical linguistic research, the STEDT project began in August 1987. Funded jointly by the National Endowment for the Humanities and the National Science Foundation, it is charged with gathering data from both extant published works and the growing body of fieldwork and integrating them into a coherent whole. The concrete goal of the project is the publication of a multi-volume etymological dictionary of the ancestor language, Proto-Sino-Tibetan (PST), spoken perhaps 6,000 years ago. The Dictionary is organized by semantic fields, so it will constitute a "thesaurus" of PST.
To support our research we have created a large computer database of the etymological and lexical information. This database consists of a lexical file (currently containing over 376,000 words in about 200 languages and dialects), an etyma file (containing over 2500 reconstructed etymological roots), and ancillary files containing references to language names, bibliographic citations, extensive etymological notes, and semantic diagrams.
At this point, approximately 45,000 morphemes in the lexical file have been etymologized into cognate sets. This analyzed data forms the core of Volume I of the Dictionary. The reconstructed proto-roots will serve as main entries in the Dictionary, followed by those words which derive from them in the modern ST languages. Along with the Dictionary, electronic and hardcopy versions of materials supporting the reconstructions have been or eventually will be published, including a directory of language names, a compendium of phonological inventories, and detailed sound correspondences in the form of a "sound law database. " In due course, the entire etymological database will also be made available in machine-readable form to the scientific community.
STEDT publications and data are already helping to unravel the tangled web of genetic relationships among languages in South and Southeast Asia. The results of our research on Sino-Tibetan, taken together with ongoing work on the other major language families of South and Southeast Asia (Mon-Khmer, Austronesian, Tai-Kadai, Hmong-Mien), will decide larger questions about the ultimate genetic affiliations (if any) existing among them. In addition, it is our hope that one of the ultimate benefits of the STEDT project will be to put Sino-Tibetan studies on a par with Indo-European in terms of its potential contributions to a wide range of general issues in historical linguistics. In many ways the problems faced by diachronic linguists are similar everywhere. Yet each different area of the world has its own typological / cultural / historical flavor, and Sino-Tibetan has many unique features which will be of interest to the general linguistic public.