Wednesday, April 26, 2017

Nasality in a world of orality

Nasal leakage: It's a real condition affecting field linguists around the world.

Does the language you work on have nasal sounds? Rhetorical question! Of course, it does! Even if nasality is a non-contrastive feature of a language, air finds its way out through the schnoz one way or another during language production [WALS; nasality]. There's no stopping the all-knowing nose.

While the burning question on everyone's mind is, if the magnificent bone structure of Worf's forehead is connected to his nasal cavity, what kind of resonance can we expect, there's a more pressing issue here (yes, really).

A survey of 372[1] grammars listed on the Grammar Watch List of the Association of Linguistic Typology revealed that empirical bases for claims regarding nasality are virtually non-existent. In cases were nasality was described, they were solely based on impressionistic observations. The reasons for this in the past had often been just; equipment for analysing nasality is intrusive, expensive, delicate, not very portable; especially in remote areas. Deciphering nasality (especially, pre-/post-, anticipatory/carryover, regressive/progressive, allophonic nasalisation and nasal harmony) from a normal audio recording can either be impossible or a huge pain in the sa'Hut.

Now for the shameless plug. If nasality is something you'd like to document empirically in your grammar, consider the Earbuds Method; an absurdly cheap and easy, yet robust, method for empirically identifying not only the presence of nasality in the speech stream, but also information on the duration and timing of nasal gestures. Results can be made visual in Praat or in R (scripts are included in the appendix of the article).

The basics: 
1.) Purchase 'dollar store' stereo earbuds (a pair for each consultant).
2.) Set your audio recorder to stereo.
3.) Plug the earbuds into the mic jack.
4.) Place one earbud with the silicon tip facing upwards toward a nostril (or in the nostril of the consultant is comfortable with this). Place the other one at the corner of the mouth with the tip facing forward.  
5.) Press record, elicit words or sentences. Press stop.  
6.) Import the recordings into Praat, follow the steps in the paper for visualisation, and see the magic.

May thy noses never stop leaking on to thy grammars.

These graphs present a word-initial prenasalized stop and anticipatory nasalization in the [õ] vowel as the velum lowers to reach its target position for the [m] segment. The rest of the segments appear as oral. The nasal track appears as a red solid line while the oral track appears as a blue dashed line. Language: Kirundi, a Bantu language spoken in Burundi.                                                    

        La Méthode Earbuds                    

[1] The Grammar Watch List of the Association for Linguistic Typology is the most complete list of recently published grammars available. It includes grammars published worldwide between 1993 and 2007. There are 574 publications on the list that refer to spoken languages, but we only had access to 372 of these.

Thursday, April 13, 2017

LingQuest - the new language guessing game you should play!

We've just published a research article on what languages players of the Great Language Game confused for each other. The game provided us with a lot of data, over 15 million guesses by an estimated 960,000 players. It was a lot of fun working with, and provided an interesting insight to what non-academics think about similarities of languages.

However, the game was not originally designed for research. Lars Yencken, the brilliant inventor of the game, intended it more as a fun game to spread awareness of linguistic diversity. The game presents some drawbacks, so in parallel with doing this research, we also designed a new game! Hurray!


The game was designed by Seán Roberts, Peter Withers, myself and Mark Dingemanse and Pashiera Barkhuysen, and is a part of the Dutch research consortium Language in Interaction.

The most significant change is that in this game you don't get to pair a language name with an audio clip, but instead compete by saying which two sound clips are of the same language. This makes for data that is less confounded with cultural knowledge that has less to do with how the languages sound.

Go play it please!

Thursday, April 6, 2017

Why are some languages often confused for others? FAQ wanted

Seán Roberts, Lars Yencken and I have just published a paper on confusion of languages with data from The Great Language Game. Alongside, we also created a new game: LingQuest!

The Great Language Game is quite simple, players hear a short audio clip of people speaking and then they have to guess what language it is that they heard from some given alternatives. Very often people guess correctly, in our study we found that 70% of the time people guess correctly. But, what about those other times? What's happening there? What languages are players confusing for which?

In this paper, we explore this question and test some different ideas for what it may be that is going on. The paper is getting some media coverage, and I thought we'd answer some Frequently Asked Questions (FAQ). So, please post your questions to us and we'll answer them!

Cheese pic of me listening to sounds with a Great Language Game-t-shirt that Lars sent me
One thing that I'm really happy about with this paper, and that has nothing to do with the actual content, is that it was based on sharing and collaboration. I contacted Lars, the creator of the game, out of the blue and asked him if he couldn't release some more stats on players behaviour on the site. He, not knowing anything about me really, sent over the raw data and encouraged me to do research on it. I then mentioned it to Seán in casual conversation at the MPI-Nijmegen and the ball started rolling. And here we are! Sharing is caring and great research comes of it.

Now, let's see what questions you all have for us!

Wednesday, April 5, 2017

Shit, There Are Dogs: Fieldwork on Austroasiatic Languages of Southwest China

This is an account of my linguistic field trips to Xishuangbanna in southwest China in 2015-2016.  I visited a total of fourteen villages, plotted on the map below, and recorded dialects of Palaungic languages in the Austroasiatic family.  I describe both my experiences doing fieldwork and some of the amazing linguistic variation in this region.

I did this study as a paper for my PhD thesis at the Max Planck Institute for Psycholinguistics, where I was lucky to meet friends and colleagues who I learnt about fieldwork from, such as Luis Miguel Berscia in Peru (who is described in this splendid article in the New Yorker), Simeon Floyd in Ecuador, Lila San Roque in New Guinea, Mark Dingemanse in Ghana, Nick Enfield in Laos, and Ewelina Wnuk in Thailand, to name just a few.

Linguistic fieldworkers like these bring to my mind an image of braving malaria and take boats for days to remote jungle communities, such as in Dan Everett's memoir Don't Sleep, There Are Snakes (which the title of this blog post pays homage to).  By contrast my field trips were in a comparatively tame region, travelling largely by bus through the mountains, and where the only recurring annoyance was dogs that blocked my path on mountain roads or occasionally chased me out of temples.  

The aim of the study was not to document one language in depth, but to survey variation in as many closely related languages as possible.  For that reason I spent a lot of time travelling between places, taking buses for five to ten hours at a time, and having to stop overnight in larger towns.  Towards the end of my trip I walked between some villages or was driven to more remote places in a car or taxi.

I spent a total of about three months in this region, travelling about 405 km.  I spoke to people in Mandarin, and they translated sentences and vocabulary into varieties they spoke.  In some places I tried more complex elicitation and recording of conversations, but normally I was recording only enough of the languages they spoke that could be elicited in a day or so, superficially and with all of potential problems of using translation.  It is not generally in the spirit of sensitive language documentation to show up in a village in a taxi, record some sentences, and drive away again; but as I explain in the sections below, sampling from several locations gives a picture of the history of these languages that the study of one language by itself would not be able to give.

It turned out that the fourteen varieties that I surveyed corresponded to five languages in earlier descriptions, according to Glottolog: Nuclear Wa, Awa, Blang, Hu and U.  There are five languages here shown in different colours, but plenty of dialectal variation also exists within these languages.  Although not quite as diverse a place as some places in the world such as the island of Malakula in Vanuatu (which has over forty languages in an area smaller than Luxembourg), the variation was more extreme than I expected, with closely related languages within a few kilometres of each other being mutually unintelligible, and differing in fundamental properties such as word orders, whether they have tone, and the way that they divide up semantic categories.

The Palaungic languages are part of the Austroasiatic language family, which I plotted on the map below, showing the family's structure (using BEAST 2 and the phylogeny from Glottolog).  The black lines are the root of the tree, while the light blue lines are the main sub-groups: Munda languages in India, Aslian languages in Malaysia, Khmer in Cambodia, Vietic in Vietnam, and so on.  The red dots are the Palaungic languages that I visited.

I describe my fieldwork experiences below in roughly chronological order, alternating between sections about the surroundings, and descriptions of linguistic variation.


I did not know how to find interviewees, and in the beginning I often spent time in my hotel room, scared to go outside and talk to people.  I eventually plucked up the courage to approach some people selling fruit on the street, who turned out not to be particularly good at answering questions about language.  My first good informant was an old lady who had a small restaurant.  I spent time drinking in her house, and going for walks around the beautiful lake.  

The park by the lake is filled with bull skulls on poles.  The Wa used to sacrifice animals and also sometimes cut off human heads until the 1970s, a practice that the lady I interviewed remembered seeing.  The buildings in many towns in the north are also adorned with the shape of bull heads.  This peters out in Wengwa (right), where there are instead discrete horns on the roofs, and disappears completely in Buddhist towns further south.

The use of bull skulls correlates neatly with the whether people are called (in Mandarin) 'Wa' 佤 or 'Bulang' 布朗.  'Wa' roughly corresponds to people who speak Nuclear Wa and Awa, while 'Bulang' corresponds to speakers of Blang, Hu and U.  This is a cultural rather than linguistic distinction: Wa people preserve elements of their culture such as bull skulls and animist beliefs, while Bulang people are described as Theravada Buddhists (such as in this description by James Miller) and have been generally much more influenced by Tai culture, such as in their architecture.  ('Tai' here will be used to refer mostly to the Tai Lü language.)

The terms 'Wa' and 'Bulang' turn out to be useful for linguistic purposes too, as a way of dividing these languages based on how much they have been influenced by Tai.  This cultural division correlates especially neatly with a division in these languages in their basic word order.

Word order

'Wa' languages place the verb at the beginning of the sentence, such that the Awa for 'I eat rice' is sɔm ə 'eat.rice I' ('eat rice' is a single verb sɔm).  This is in contrast with 'Bulang' languages further south which place the subject first, ə sɔm 'I eat.rice'.

The Wa languages are unusual (even unique, according to this map from WALS) in mainland Southeast Asia in using verb-initial word order.  However, there are some interesting indications in the rest of Austroasiatic that this word order may once have been more widespread.  Car, an Austroasiatic language of the Nicobar islands, uses verb-object-subject order; Shompen, another probable branch of Austroasiatic (as Sidwell and Blench argue), uses verb-subject-object order; Semelai in the Aslian branch has both verb-subject and subject-verb order.  Most relevant is Khasi, an Austroasiatic language in northeastern India which is most closely related to the Palaungic languages, that also displays verb-subject order in some clause types (Rabel 1961, and see this book chapter for a similar summary).  This is in contrast with the surrounding language families Hmong-Mien, Tai-Kadai and Sino-Tibetan, which consistently have subject-verb order.

Verb-subject languages are shown in blue on the map below, while subject-verb languages are shown in red, along with the sentence 'I eat rice'.  One language, Wengwa shown in purple, uses both at the same time (ə ʃɔm ə ‘I eat I’), as if there is a person agreement system.

The fact that there is a neat geographical split in the use of these word orders, and furthermore that the variety that uses both simultaneously is found exactly in between them, suggests that subject-verb order has been spreading in these varieties by contact with Tai languages.

The unusual subject-verb-subject word order in Wengwa is evidence that the direction of change has been from verb-subject order to subject-verb order.  There are various reasons why a verb-subject language might innovate subject-verb order, and in fact subject-verb order is a common alternative word order available to verb-subject languages (Dryer 2014).  When this happens, the high frequency of pronouns might cause pronouns to retain their ancestral verb-subject ordering, explaining why Wengwa places full noun phrases before the verb but allows (and systematically uses) pronouns after the verb.  

In the Wa languages, verb-subject order is normally used, but subject-verb order is used in imperatives.  A wonderful MA thesis by Ma Seng Mai on a Wa variety in Myanmar points to greater complexity in this system than I was able to elicit.  It reminds me of German in its convoluted set of rules about clause types (such as placing the verb at the end of the sentence in subordinate clauses).  I reproduce two of her tables below showing the way that different clause types take different orders, and even particular subordinating verbs seem to vary:

To the extent that these complicated systems are a product of language contact, of verb-subject order and subject-verb order coming together, they also suggest one way that systems of word order in different clause types (such as in German) or person agreement markers (such as in most Indo-European languages) can arise.  

Buddhist temples

I went to Bulang towns in the south next, where  people are described as Theravada Buddhists (such as this description by James Miller), and there is no trace of Wa animism or bull iconography, with instead a lot of temples resembling those found in Thailand.  Papaya trees grew by the side of the roads.  I saw many other plants for the first time: fluffy mango trees, pink pineapple bushes, coffee plants in someone's garden, and orchards of tea bushes on the mountainside.  Most beautiful were the banana tree fields, and the long rice paddies, layered like staircases, filled with water and green stalks popping out in neat rows.
I was driven to Kunge by some people that I met in Jinghong, who offered to take me to a village where they knew someone who spoke a Blang variety.  I interviewed a lady there, and then we had a Tai fish lunch by a lake.

I took a bus by myself to some other nearby towns, such as Bulangshan, which turned out to be a tiny and depressing place.  I obtained some particularly noisy data from some men who were spending the day drinking outside the supermarket.

In Zhanglang someone drove me down to a temple which had just been completed, so they were having fireworks and giving out food for the village.  I was then invited into someone’s home, and I sat drinking tea for the entire afternoon while various relatives wandered in and out and allowed themselves to be interviewed. I ate an entirely home-grown meal: the family had grown the rice and the vegetables, made the rice wine, and grown the tea.  As usual they refused payment, so I bought some of their tea to repay them.

I went to Bada next, my most enduring memory being woken by the drawn out screams of a pig outside my room.  I listened in horror for a few minutes and went outside to find the animal on the ground, a long trail of blood running onto the street and a family with small children standing over it.  I avoided going to breakfast and instead walked down the hill for a few hours to a temple, where I had my first experience of interviewing monks, who turned out to be enthusiastic informants.  I walked down further, and saw another temple in progress.  I did not attempt to interview anyone there, instead watching people lay foundations, and young monks play on the sand.

Verb semantics: the case of 'eat'

In English, ‘eat’ is a verb used with solid foods and ‘drink’ is used with liquids.  By contrast some languages only have one verb meaning ‘to consume’ which covers everything; for example Thai effectively says ‘eat water’ กินน้ำ gin naam for ‘drink water’.  Another extreme situation is found in Mayan languages in Mexico, where there is often no general verb 'to eat' but several specific verbs such as 'to eat tortilla' (see this article).

Both extremes are found in Palaungic languages, with variations in between.  I have tried to represent these different systems in the tables below.  The simplest system makes no distinctions at all.  The second simplest system has a verb for ‘eat rice’ sɔm and then a verb for ingesting everything else.  The third simplest systems make a distinction between solids and liquids (‘eat’ and ‘drink’) as well as often the separate verb for ‘eat rice’.

The columns show the dialect names, while the rows show different types of food and drink, and the required verb is in the cell.  The cells are coloured by column to show which verbs they are identical to in the column (for example sɔm is used with rice and congee for Menglian informant 1, and so those cells are both coloured blue).

More complex systems have verbs for ingesting one specific thing, such as həp 'to drink' in Ximeng Awa which is just used with soup but not with other liquids, which use niə.  Awa in Gonxin has a verb specifically for drinking tea, krət.

The most complex system was found in Zhanglang and nearby Bada, which expresses the sentence ‘I eat meat’ as ‘I meat meat’, and ‘I eat vegetables’ as ‘I vegetable vegetables’.  These more complex systems are summarised in the table below.  The white spaces are where I was unable to elicit a verb for that noun, mostly if they did not typically eat congee or soup and hence could not tell me how what verb they used.

If a verb takes a noun, it has a certain probability of being able to take other nouns; for example, if a verb can be used with ‘water’, it is likely (but not certain) that it will also be able to be used with ‘alcohol’.  The probabilities are summarised in the table below, treating each unique system as an independent data point (red >= 0.8,  orange >=0.4, yellow >=0.2).  Clusters emerge which correspond to solids (beef-vegetables), liquids (alcohol-water), and rice by itself, with some foods flitting between categories (congee, soup, vegetables).

The word for 'eat' kʰʌi in Hu is the same as the word for 'rice' in the other Angkuic language U.  This may reflect a tendency in Austroasiatic languages to derive verbs from nouns, such as the verb ‘drink’ from the noun ‘water’ in Mlabri (Rischel 1995).  We also saw this in Zhanglang, where ‘I eat meat’ is expressed as ‘I meat meat’, using the noun bɔn as both a noun and a verb.

The use of verbs for specific kinds of eating is found in other parts of Austroasiatic such as the Aslian languages (e.g. Jehai) and Mlabri, suggesting that it might be quite an old feature of the family.  The simplification of these systems in some languages, such as the Angkuic languages Hu and U, may have been due to contact with Tai (which like Thai only has a single verb covering 'eat' and 'drink').

Mangjing was my favourite place, which I went to by chance because I saw a book about it in a bookshop while waiting for another bus.  On the spur of the moment I decided to catch a bus to Mangjing instead, which turned out to be a wonderful decision as I met my most patient informant there, Yuni.  She was a young woman running an inn with a few rooms in Wengwa, a very pretty village with chickens wandering around.  I spent a happy week resting and playing badminton with people in front of the temple, and writing in the room overlooking the roof tops, having a small glass bowl refilled with various types of Pu'er tea grown in the fields outside.  I visited the family's tea factory and saw the leaves being dried out in the sun and then tossed in hot vats.    I tried slightly more complicated elicitation tasks, such as recording the pear film with a few informants, and I experimented with translating Harry Potter (in Chinese) into Blang, an experience that Yuni found frustrating.  I toyed with the idea of translating parts of the Bible (the most widely translated book in the world and hence a useful linguistic parallel text), but I thought that might give the wrong impression.

The village was small but there was a lot of construction, partly because tourists from nearby Pu’er kept coming through to buy and taste tea.  When a house was completed the builders would celebrate by having an extremely loud disco in the finished house, which would reverberate through the whole village into my room.

I ate home-cooked meals there.  I went down one day and asked the woman in the kitchen what type of meat she was cooking.  She said they had found a wild cat outside and killed it.  I made no comment but decided not to touch the meat.  Three girls who had just arrived at the inn came downstairs and immediately tucked in to the food.  I let them eat for a bit before telling them that they were eating wild cat.  I then tried a bit myself (not memorable).

The people there claimed that there was a slightly different accent in each village, and also claimed that the way that people spoke was due to differences in the water (I considered telling them to send that theory to PNAS).  The three villages shown below do indeed have slightly different pronunciations of the word for ‘sun’ despite being fewer than 5 km apart, and this turns out to be part of a much larger picture of how words vary in this region, which is likely to be again due to contact with Tai.

'Eye of the day' = 'sun'

The word for 'sun' turns out to be particularly variable in these languages, with some of them employing a phrase 'eye of the day'.

This evocative expression is found over a large part of southeast Asia.  A great article by Matthias Urban shows that this expression is only found in this part of the world, mostly in Tai-Kadai and Austroasiatic languages, but also Indonesian (matahari) and stretching out to parts of New Guinea with the expansion of the Austronesian languages.

Tai has the form tʌ wʌn 'eye of the day', and in the Palaungic languages, it turns out that 'eye of the day' is found mostly in the south in the Blang and Angkuic languages marked in red (including the Angkuic language U further north), near to where Tai is spoken.

The form is normally ŋʌi sŋi 'eye day'.  Modifiers come after the main noun in these languages, meaning that this is semantically 'eye of the day'.  Wa and Awa marked in blue by contrast use just the form ʌi and its cognates, which often drop the s or change it to a h.

The phrase 'eye of the day' is a beautiful meme, a distinctive feature travelling by cultural transmission through Asia.  Perhaps another example is the word for 'rainbow', which in several of Palaungic languages is 'drink water', niə rɔm.  I was told that this was a reference to the rainbow looking liking a dragon in the sky drinking water.  It resembles the expression for 'rainbow' in Thaiรุ้งกินน้ำ roong gin naam 'rainbow drinking water'; and also the word for 'rainbow' in the Tibeto-Burman language Naxi which I found in a dictionary below, which means literally 'sky tongue drink water'.  The Naxi pictographic character beside it recalls the etymology of the Chinese character for 'rainbow' 虹 hóng, the two-headed dragon.

(from Zhao Jingxiu, Dongba Xiangxingwen 赵净修,东巴象形文 p.114)

Menglian and the surroundings

While I was in Mangjing some girls invited me in a car with them to Menglian, derailing my plans to go further north.  A couple of days passed there with them and their friends in endless karaoke sessions filled with beer, cigarettes, and, for some reason, rock-paper-scissors.  I awoke with a bad hangover but was not allowed to rest, being forced instead to come along with them for lunch.  They drove in a four-person car that they somehow managed to squeeze eight people into, some of them crammed into the boot.  

We drove somewhere in the countryside, where they all proceeded to get drunk over lunch again, on a terrace in the middle of a field of tea and coffee plants.  Some people drove me back to the town and then took me on a tour of a sweltering sugar factory where one of their friends worked.  More food and karaoke followed in the evening.  

My patience was rewarded the next day when a Wa informant took me on her motorbike to a village where her grandmother sat in Wa clothing with her friends, smoking long silver pipes.  The grandmother greeted me with a Wa ritual: she tied some string around both my wrists, murmuring ɛn gʌ mɔm hrəm ('to your good health').  Then she handed me a small cup of rice wine, and I was told to pour a bit down my wrist, which then went down the sleeve of my shirt.  Depending on which way the wine went, it would forecast whether we would meet again (apparently we would).  The grandmother then handed me a ¥50 note and told me to keep it.  Apparently this was another way of welcoming someone into the family, and the amount varied depending on how much you liked the person: some people would get only ¥10.  If you were introducing a boyfriend, it might be ¥500.

I eventually managed to fulfil my aim of going further north, taking a very long bus ride to Cangyuan, which was a large ghost town, obviously rich (perhaps due to Wa opium trade) but very empty, so I took a taxi on to a village called Wengding.  We had a small crash on the way, when a car hurtled around the mountain road; our car pulled over to avoid it, and the car behind crashed into us.

After the police came, we eventually left and arrived at Wengding, a beautiful village with thatched roofs and bull skulls everywhere.   I had a guide, who I interviewed to record the Cangyuan variety of Wa, and then was introduced to a Wa official, who was sitting smoking in a room with a sort of throne adorned with a bull skull and some rifles.

I then visited some 3500 year old cave paintings, and was struck again by the bull imagery.  On the way back the driver dropped me off in the middle of a beautiful sugarcane field, with palm trees unusually growing on the side of the mountains surrounding it.

The following day I took another six hour bus to Lincang, specifically in order to meet an American who I was told spoke fluent Wa.  I phoned him and asked how to get to his farm, and he said it was 18km from the town, by the Boshang junction, and 'just tell them to go to the foreigner's farm, waiguoren de nongchang, and they'll know where to go'.  I duly asked several taxis whether they knew a waiguoren de nongchang, and none of them knew.  I eventually asked one driver to phone him, and we found the place.  He walked around speaking Wa with the staff.  

We had another small collision with a lorry on the ten-hour trip back to Menglian, meaning that a replacement bus had to come.  I stayed for two days in Menglian, and was driven by one of them to someone's house in the mountains in Gongxin where we ate melons by the side of the road as I recorded informants, with plenty of people in the nearby village turning out to gawk at me.

In the later stages of my trip I became more adventurous and decided to take taxis to more remote places.  One time I took an extremely good value taxi ride for thirty euros around 100 km of mountain road, stopping off with the driver in random villages and asking if anyone spoke Wa.  The driver turned out to be indispensable, as nobody spoke Mandarin, so he asked people in the local dialect.

We sat around drinking tea until eventually we were pointed to a house up a vertiginous hillside, which we drove to and then had to go the rest of the way on foot.  We arrived at the house and asked whether anyone spoke Wa.  They called some elderly relatives, who arrived ten minutes later wearing blue robes and smoking silver pipes.  I interviewed them, with the driver translating my questions.  On the way back, the driver took a fancy to a tree-trunk in the middle of someone’s field, so I helped him steal it and load it into the back of his car to bring back for firewood.

The following day I took a taxi into a different area, arriving at a school in the middle of a valley where I heard there was a teacher who the language U.  From previous experience I was anxious not to disturb the children, so I crept into the school up into a common room, where some teachers were sitting.  One saw me and immediately said ‘请坐’ (‘please sit’).  He went to fetch the teacher, who was again very happy to be interviewed (people’s friendliness there astonishes me).  I then left with her, talking as we went out of the school.  

At that point some children were coming out, and saw me.  I greeted them awkwardly and tried to talk to them, but they just stared back with a mixture of shyness and curiosity.  I continued chatting with the teacher while they followed me, and within a minute there was a steadily growing mass of children walking quickly behind us up the hill to where my taxi was waiting.  I got in and we drove away, leaving the crowd of audibly disappointed children on the hillside.

Numerals and numeral classifiers

I revisited many places to ask questions about the use of numeral classifiers.  I chose 73 nouns and asked how to say 'one [noun]'.  In Mandarin and many languages of east and southeast Asia numerals require an additional word that varies depending on the noun used.

In some cases the category of nouns that a classifier is used with is clear, such as the classifier du which is used only with animals.  In other cases the category was less transparent, and may to some extent be a memorised list, such as the classifier lu in Xiaomenge, which is used with the nouns 'hat', 'stone', 'bag', and 'drum'.  Perhaps this is a class of round-ish objects, except that it is not used with either 'eye' or 'egg'.

Numeral classifiers are used differently in each variety, and the number of classifiers that I was able to elicit (using these 73 nouns) varies from between 10-12 in the Wa group (Nuclear Wa and Awa), to 22 and 24 in U and Blang.  The greater number of classifiers (shown in red) is found in Bulang rather than Wa languages, again suggesting Tai influence is responsible: Tai itself has 25 classifiers in this elicitation task. 

The default numeral classifier in all of these languages is mu.  In the two Angkuic languages it has also become the word for 'one' (as opposed to the other languages in the sample which all use dɛ).  

The fact that the word for 'one' is derived from a numeral classifier results in an interesting word order rule: if a classifier is used which is not mu, such as do for animals, then the order is noun-numeral-classifier:

so    mu  do
dog one classifier
'one dog'

If the classifier used is mu itself, then the construction is the noun followed by ʌmu:

mɔk ʌmu
hat  one.classifer
‘one hat’

The development of the word for 'one' from a numeral classifier is unusual but is also found in some Sinitic languages of southeast China, such as Shaowu (Ngai 2015), where the word 'one' is kɛi from the classifier written 个, and so may be a more widespread tendency in southern China.  Like 'eye of the day' it may turn out to be a good example of a linguistic meme that indicates contact between languages.


My personal reason for wanting to do this study was that I wanted to experience doing fieldwork first hand, and to see geographical variation in languages played out at a local level.  I envisaged walking between villages and hearing almost imperceptible differences between languages at each one and seeing how they gradually become mutually unintelligible.  I have similar fantasies about travelling in Europe along dialect continua to hear the way that languages can gradually change, such as along the border between France and Italy, or between Holland and Germany.  

A bigger reason is that there is a lot of language diversity in the world, which to some extent needs greater documentation at the dialectal level; many languages have been described, but not necessarily the way that they vary locally between places.  There are now documentation projects to address this issue, one of them being the Australian National University's 'Wellsprings of Linguistic Diversity' (such as Hedvig Skirgård documenting variation in Samoan) and related projects in Nijmegen (such as Luis Miguel Berscia's fieldwork on variation in Shawi in the Peruvian Amazon).  The 'sound comparisons' website is an especially good resource for dialectal variation in European languages.

Comparative fieldwork also sheds light on the question of what creates linguistic diversity.  In its crudest form, the question is: why are there so many languages in the world?  Why are there seven thousand languages, rather than seven hundred, seven million, or just seven?  This is a question that the Max Planck Institute for the Science of Human History in particular is attempting to answer both globally and through fieldwork in places where there is an especially large number of languages, such as in Vanuatu, the 'Galapagos of language evolution'.

Geographical boundaries (such as mountains and ocean) can help foster linguistic diversity.  There are also historical reasons why some places have fewer languages than others, such as in Europe, where large state languages have been promoted at the expense of regional dialects; or in the Americas
, where the guns and germs of the arriving Europeans caused the decimation of whole populations and the loss of an unknown number of languages.

There are also forces which give rise to new languages, which may explain the exceptional linguistic diversity of some parts of the world.  An analogy is the spread of English around the world.  English is spoken in various parts of Asia (Singapore, Hong Kong), Africa (Nigeria, Tanzania and so on), India, and has creoles and pidgins in areas such as the Caribbean, aboriginal Australia and the Pacific.  In all of these places there has been contact between English and local languages, such as with Malay and various varieties of Chinese in Singapore, meaning that Singapore Colloquial English (Singlish) has properties of southeast Asian languages that distinguish it from Standard English.  This process of hybridisation accelerates the creation of new types of English, as documented in the Electronic World Atlas of Varieties of English

The spread of English and the formation of hybrid languages such as Singlish is a modern phenomenon associated with imperialism and globalisation.  But similar processes have happened in the past with the spread of ancient empires (such as the Chinese or Khmer empires), or even simply the migration of people into new areas and coming into contact with populations that speak other languages.  This process could be a way of creating new languages especially quickly in some regions.

Austroasiatic languages are a good example, as they are the descendants of a single language, Proto-Austroasiatic, that came to dominate a large area of southeast Asia from India to Malaysia, Vietnam and Cambodia.  In the last two thousand years this has been due to the spread of empires such as the Khmer and the Vietnamese.  In its earlier history, one reason why Austro-Asiatic spread was probably the invention of rice farming, given that terms for 'rice' can be reconstructed in Proto-Austroasiatic, and perhaps suggested by the age and homeland of the family (a phylogenetic research project being pursued by Sidwell, Greenhill and Gray).

Much like modern English, Proto-Austroasiatic encountered unrelated languages when it was spreading, and perhaps therefore created hybrid languages.  Here is an animation I made of the spread of Austroasiatic and Tai-Kadai, which I made using the phylogenetic software BEAST 2 and trees from Glottolog.  Austroasiatic spreads out first on the left, and Tai-Kadai makes a late spread from southern China at the end.  They meet at the yellow pins, which mark the places that I went to.

The overlap of these two families in the region that I visited suggests a way of explaining the unusual variation in Palaungic languages.  The 'Bulang' languages (Blang, Hu and U) have properties of Tai-Kadai, such as the expression 'eye of the day' for 'sun', more complex numeral classifiers, fewer 'eat' verbs, and subject-verb order.  At least two of these languages also have complex tonal systems like Tai (see this article on Angkuic tonogenesis).  'Wa' languages by contrast (Nuclear Wa and Awa) use the single morpheme sŋʌi for 'sun', have fewer numeral classifiers, some specific eat or drink verbs, and verb-subject order, and in these ways seem to preserve properties that are found in other parts of Austroasiatic. To demonstrate these points statistically would require a larger database of these properties, especially for Austroasiatic; while data on subject-verb order exists (such as this map from WALS), no equivalent data set exists for terms for the number of numeral classifiers used or the number of 'eat' distinctions, and only a partial data set on the semantics of terms for 'sun'.  Without this type of statistical demonstration, I view my study here as only exploratory, a way of showing the type of historical hypothesis that can be generated but which should be tested more rigorously.

As fun as it is for a researcher to travel through unfamiliar regions collecting recordings, a more effective long term project is to encourage native speakers to record data themselves and contribute it to collective databases.  Projects such as localingual or language landscape are doing this on a large scale already by asking people to upload recordings of themselves.  A more systematic approach is to ask people to translate particular sentences or do particular tasks, such as an interesting project called tatoeba that has written translations of a set of English sentences into 100 languages.  

Such datasets are going to be useful in many ways, and in particular could transform the the way that we learn foreign languages (as is indeed what localingual and tatoeba are partly designed to do).  An additional use that I have been advocating in this post is to show the ways that people have been interacting in the past.  The process of sampling linguistic diversity at a lot of different places may one day reach the level of detail of population genetics, which can build a picture of migration in a region (such as this study of the Solomon islands).  But perhaps an additional deep motive of these projects is simply aesthetic rather than scientific, to map language so that it may be cherished in different places like architecture and landscape.