01-26-2024, 06:08 PM
(This post was last modified: 01-26-2024, 07:37 PM by okarinaofsteiner.)
https://genoplot.com/discussions/topic/1...readings/1
Ryukendo Wrote:The Sino-Tibetan language family is one of the largest in the world and has one of its most extensive literatures. It is however shrouded in mystery, and is one of the least well-constructed of all the major language families. Among these languages, Chinese, Tibetan (which has many varieties), Newar in Nepal, and Burmese in Burma have extensive written records; some other written languages like Tangut or Zhangzhung are now extinct. Most other languages, which are spoken by ethnicities organized in small independent kingdoms or chiefdoms, down to horticulturalists and hunter-gatherer bands (in the Burma-India-China border area or along South Asian side of the Himalayan foothills), or by ethnic groups under Chinese, Tibetan, Burmese, Bhutanese or Nepali control, were written down only in the 18th-20th Ct, and many of the languages are somewhat poorly documented or are even recently discovered, including many of those in NE India, N Burma, and Nepal. There are many difficulties with working with this material, not least because many of the words in ST languages are very short (often monosyllabic), and some languages, such as Chinese, have become very poor in morphology (no inflections, derivations etc) and even consonant-poor.
Nevertheless, work continues. The major source of information about ST languages is the Sino-Tibetan Etymological Database and Thesaurus: https://stedt.berkeley.edu, run from UC Berkeley since the 1930s and a prelude to much future work. The first work meant to come from this database was never published, and the first attempt at doing anything with the (by then large accumulation of) data came out only in the 1970s, by Benedict. By this time, it was clear that these languages were related, but because the subgroups of ST were not known and so not even the proto-stage of the subgroups of ST were reconstructed, you can imagine how difficult it was to reconstruct proto-ST--think about reconstructing proto-Indo-European from modern English, Sri Lankan, Albanian, Armenian etc! There were few definite sound laws, though there were family resemblances all over the place (e.g. a c in one language becoming c-h, g or x or h in others, but in a highly irregular and unpredictable way). Nevertheless, we know at least the following facts: that proto-Sino-Tibetan was quite inflecting, with many prefixes, suffixes and infixes. It was mostly monosyllabic, and because of the inflection often had very complex consonant clusters.
Today, people are embarking on the slow and painstaking work of collecting ever more data from less-known languages, and reconstructing subgroups of ST in individual papers, books and dissertations, such as proto-Bodic (all the varieties and languages related to Tibetan), or proto-Karen. Furthermore, many subgroups are now recognized: Rgyalrong, Himalayish Qiangic, Lolo-Burmese etc., though there are far too many subgroups (many more than the primary branchings of Indo-European) and their deeper interrelationships are unclear, resulting in a family tree that is still far too bushy and disorganized.
Recently, major breakthroughs have taken place that have clarified the pattern of spread for this language family.
First, two breakthrough papers have been published in ST linguistics, both using Bayesian Phylogenetic analysis to automatically classify languages using linguist-provided cognate sets. The two papers are this: https://www.pnas.org/doi/10.1073/pnas.1817972116 and this: https://www.nature.com/articles/s41586-019-1153-z.
An excellent picture of the distribution of ST languages can be found in the supplements for the first paper, page 3: https://www.pnas.org/doi/10.1073/pnas.18...-materials
I know many people are skeptical of such analytic methods, but the main point here is not that Bayesian Phylogenetic methods can confirm language groupings--this still requires the painstaking work of manually identifying regular sound changes--but that such methods can give us promising hypotheses that linguists can then investigate further. Furthermore, if some linguists propose a subgrouping based on some preliminary patterns they notice, the confirmation of such groupings using these methods further highlights their viability as targets of research. If multiple groups of researchers using different sets of cognates and different methodologies independently recover the same subgroupings, this may serve as strong (albeit not conclusive) evidence that these groupings are legitimate.
Another set of issues lies in the quality and size of the data; in this case, the STEDT data is the gold standard for ST research for the first paper, and the second paper additionally involves wordlists provided by professional linguists working on the languages themselves. Bayesian phylogenetic inference requires the researchers to make a number of technical choices about parameters, including a model for how words appear and disappear over time, how much loaning there is, how frequently regular or irregular sound changes occur, and how likely it is in general that trees of different shapes occur in language families (tree prior); ideally, a robust analysis would explore a wide range of parameters, inference methods and a wide range of priors and show that the results are robust to these (which theoretically, given enough data, they should be). These issues are discussed extensively here: http://www.sfs.uni-tuebingen.de/~yanovic...9-subm.pdf, where a paper using such methods to place Yeniseian as a sub-branch of Na-Dene is extensively criticized as the results were not robust to choices of parameter, implying that the dataset was not actually large enough and of high enough quality to perform robust inference. If you check the supplements of the papers linked for Sino-Tibetan, which all demonstrate high levels of convergence regardless of the parameters chosen, this is not an issue for the conclusions they make.
Something to mention here is that Bayesian inference allows for us to assign a certain probability to multiple models, giving us a natural way to represent uncertainty, and thus in the Indo-European case often infers nuclear IE as being a mixture of trees of different shapes, represented by a densitree (plotting all possible trees with intensity proportional to the likelihood of the tree being correct). We get the same issue for high-level groupings of ST here: this is not telling us about “language mixture”, but rather about uncertainty. A set of densitrees are provided for the second paper in the supplementary materials. One can think about the densitree as representing how likely linguists are to find regular sound correspondences or cognate sets conclusively showing that one grouping is correct over another if they went through things with a fine-grained comb; the more intense the densitree for a particular grouping, the more likely. Of course, these Bayesian analyses were done over lexicon (the cognacy of words). Much more could be done using shared paradigmatic morphology; Yeniseian and Dene were connected using reconstructed paradigmatic morphology, but share very little lexicon, so this kind of Bayesian analysis would not be able to detect that.
Strikingly, these two papers independently recover at least 6 groups of ST languages:
- A large group that includes Bodic (Tibetan and all its varieties, including the prestige language spoken in Bhutan--Dzongkha, and Sikkimese), Lolo-Burmese (Burmese needs no introduction; Lolo includes many populations in S Yunnan, N Burma and NW Thailand we find in genetic studies such as Jinuo, Hanhi, and Lahu which are distinctive from other mainland Southeast Asians in tending to peak in Yellow River Neolithic and the "Austroasiatic" Vietnam_N component and be lacking in Dai-related, Hmong-related and Austronesian-related components), Rgyalrong, Qiangic and Naic (including pastoralists and agriculturalists of the Tibetan foothills on the Chinese side of Tibet from Qinghai down to Sichuan, featuring such populations as Naxi from genetic studies, ancient Tangut speakers who founded the Xixia state, and other warlike tribal polities the Chinese have known generically as "Qiang" in their history), plus a range of assorted groups including Nuosu, Ersu and so on. This large group has not really been emphasized in previous studies (though Blench and Post 2014 placed a bunch of these together on their tree), receives very high support in both studies (posterior probabilities around 1), and is the first long-range subgroup proposal for ST--and a very major one--I that I think will stand the test of time.
- A major "Sal" group (from the word for Sun), that has only recently been proposed, that unites all Bodo-Garo tribals distributed throughout the Brahmaputra valley in NE India and W Burma, with some groups in far N Burma and S China, is also recovered with high probability in both papers.
- A major Kuki-Chin-Naga group, also only tentatively proposed so far, uniting some peoples of Nagaland in far Eastern India and populations all along Western Burma, is also supported in both studies. Karen languages (included only in the second paper's analyses), a major group of languages spoken by some militarily powerful ethnic groups on the border of Burma and Thailand, may also be in this group.
- A group uniting two sets of languages in far southeastern Tibet, termed Tani-Yidu in one paper and Tani-Digarish in the other, has only recently been proposed by some linguists but also receives strong support in both papers.
- Himalayish, including most of the ST languages spoken in Nepal and important local prestige languages such as Newar, are recovered in both papers, and known as the “Kiranti” group in the former and Himalayish in the latter.
- Chinese/Sinitic languages, though this was never in doubt.
There are also other small, independent groups recovered, especially for the second paper which samples more languages, such as Nungish, Kinnauri and so on—-though these groups have been well-established among linguists for some time.
Excitingly, the two papers agree that the large Bodic-Lolo-Burmese-centered group is a crown group for ST. Both papers also agree that Chinese, Sal, and Kuki-Chin-Naga tend to be the stem groups/basal branches for ST (first groups to branch off), followed by Tani-Digarish, followed by various Himalayan languages, followed by the ST crown group.
Both groups’ papers agree that Chinese/Sinitic is the most likely to be the outgroup to all other ST languages, but this is as a rather weak outgroup, approaching a toss-up between Chinese and the other basal groups. The shape of the overall tree is therefore much less certain than for IE (where we know for sure the split order is Anatolian, then Tocharian, then nuclear IE); however, the presence of some long-range groups, and the positioning of various groups relative to the crown group is starting to get clearer. This is a surprising level of convergence between independent researchers!
In addition, both groups recover support for the traditional Yellow River homeland for ST languages, in the former case supported by cognates for foxtail millet, pigs and sheep in both Chinese and non-Chinese branches of the family, but less so for other domesticates (such as rice, wheat, barley, horse, or cow).
The second advance is in ancient and present-day DNA, which has shown that the ST peoples expanded in two waves; after reaching the Upper Yellow River Valley, near the Qinghai region, ST speakers expanded both directly into the Tibetan plateau and in another wave southwards along the mountains and valleys, reaching Burma and wrapping around the middle elevations of the Tibetan foothills into Nepal and Kashmir. This is supported strongly by Y-chromosomal studies: https://link.springer.com/article/10.100...018-1461-2. https://onlinelibrary-wiley-com.ezp-prod...11.00690.x, and by studies using ancient DNA, which finds that present-day Tibetoburman populations derive from two waves of migration taking two different routes from the Upper Yellow River region:
https://www.nature.com/articles/s41467-0...7-2#MOESM1
The Y-chromosomes have been recovered from ancient DNA in such a way as to confirm the role of the Upper Yellow River Yangshao-derived population in the genesis of Tibetoburmans in two routes, as seen in page 7 of the supps: https://static-content.springer.com/...MOESM1_ESM.pdf
More and more autosomal analyses have also been published for present-day ST speakers, placing more and more groups into the academic and public domains, including this on the Tibetan-Chinese border areas: https://www.cell.com/cell-reports/fullte...22)00481-8 and this on Thailand, which has many ST groups on the far north, near the Burma-China-border: https://academic.oup.com/mbe/article/38/8/3459/6255759.
How can the tree of ST languages, which proposes that some widely-separated languages (the “Sal” group, the Kuki-Chin-Nagas, and Sinitic) are first to split off and the Tibetan-Lolo-Burmese group the last, be reconciled with the aDNA and present-day DNA data? It seems like what could have happened is that initial levels of deep diversity, which spread far from the Upper Yellow River area, were overprinted by groups successively closer to the ST crown group expanding out of a region between Tibet, Burma, and China. Furthermore, the weird centre of gravity for diversity in this family (located around the N Burma, Tibet, India area) would be resolved under the author’s scenarios because diversity around the YR River Valley to the East has been completely purged by the historical expansion of Chinese and the diversity in the North purged by Tibetan, pushing the C of gravity to the SW, closer to Burma and India.
There are a few more unresolved questions that I think can be fruitfully answered with future aDNA and linguistic work:
- Why are the distributions of non-YR ancestry in present-day ST groups so different? Why is it that Lolo-Burmese groups of the Southern Lolo Branch alone have a YR_N + Vietnam_N combination out of all populations in E Asia, while all other Tibetoburman populations of the Tibet-China-Burma borders have YR_N + Vietnam_N + “Tai” components? Is it because ancient Sichuan had only Vietnam_N, and then addition of YR_N and then “Tai” and “Hmong”-type ancestries? This can be fruitfully answered using aDNA work.
- Is the similarity between Yi, Naxi, and Middle-elevation groups along the Himalayas to YR_MN because they all have a little bit of the “Tai” component, that higher-elevation Tibetans do not have? Note that YR_MN is a little bit more SE Asian-shifted than Upper_YR_LN because Upper_YR_LN received a bit of gene flow from N Asian HGs (“ANA”). Groups like Yi and Naxi clearly have a little bit of the “Tai” and “Hmong”-type ancestries, whether the rest came from Upper_YR_LN or YR_MN.
- Are there extremely ancient, pre-Neolithic (as in before the fully-fledged agropastoralist package) dispersals of ST languages into the Himalayas, Tibet and the Brahmaputra valley that were subsequently overprinted by later, more fully agriculturalist dispersals? Especially because the estimated dates for ST divergence in both papers are a little too old, before the fully Neolithic package had appeared and when hunting and gathering were still important for the Neolithic populations of the Yellow River Valley. This is something both sets of researchers talk about, and is also addressed in this short paper that includes archaeologists of the Neolithic such as Ruth Mace: https://www.nature.com/articles/s41598-020-77404-4
- What about various “para-Sinitic” languages like Bai and Tujia, who were not included in the linguistic papers but whose affinity with Chinese was always mysterious? Linguists have always wondered whether the close similarity of these languages with Chinese was because of massive layers of ancient loans from Old Chinese, or because they were long-lost relatives of the otherwise very lonely Sinitic family of languages.
About this last point: in recent decades, work has picked up again on some extremely intriguing languages discovered and described in the 1920s-1980s in N and W Guizhou. These languages include the Caijia, Longjia and Luren languages, which are all quite poorly documented and either extinct or on their way. These languages preserve extremely interesting similarities with Old Chinese, and may be the last vestiges of para-Sinitic ST languages. They may give us a tantalizing glimpse into a universe of diverse Sino-Tibetan languages in the YR Valley and surrounds that was wiped out by millennia of Chinese domination. Andreas Holzl from Potsdam university has been trying to chase down all the documentation he can about these languages, buying volumes of field notes from antique shops (!): http://www.elpublishing.org/docs/1/20/ldd20_02.pdf. Since Caijia is still alive, he urges linguists to document as much Caijia as they can.
Future work on Bai, Waxiang, Caijia, Luren and Longjia may help supply an Eastern wing to a family that hashitherto been lacking it, and supply some lost siblings to the Sinitic branch of the family. I consider this to be some of the most exciting work happening on the linguistic side. Watch this space.
anti-racist on here for kicks and giggles
“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou
“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead
“If you want to grant your own wish, then you should clear your own path to it”
― Okabe Rintarou
“Never doubt that a small group of thoughtful, committed, citizens can change the world. Indeed, it is the only thing that ever has.”.
― Margaret Mead