Thursday, March 16, 2017

Neat map illustrations from new internet friend


We've made a new friend on the Internets (yes, the internet can be a great place to make new friends, it's not just arguing with Drunk Uncles and cats). The new friend is Stephan Steinbach from the blog Alternative Transport. His blog does very interesting work on data visualisations, in particular maps, go check it out.

He recently made a post in relation to our old post about illustrating questions about linguistic diversity. For other posts from us with data visualisation/illustration, go here.

While we're on the topic of visualising geographic data (i.e. data somehow tied to a place, such as number of languages etc), go check out Speckman's lab in Eindhoven. They've made a neat kind of map visualization called "necklace maps". If you're really into map visualisation you can even play their game on making cartograms!

Friday, March 10, 2017

Podcasts of linguistic seminars from CoEDL

Happy news, it is possible to access lectures from the Centre of Excellence for Language Dynamics (CoEDL) on any podcasting app! Now you can also look this smart while listening to lectures about linguistic diversity on your normal podcast app!
For more on this map, go here.
The centre has long had lectures up at iTunes U, but it wasn't until now that I figured out how to get them out of the apple-bubble and into podcasting apps for android etc (i.e the RSS URL)*. I tried it out yesterday, I ended up listened to Russell Gray's talk on grand challenges of linguistics while grocery shopping at Aldi - a throughly pleasant experience that I wish you all! (I'm a podcast freak, every alone moment spent not working is spent listening to podcasts.)

The centre has several different programmes and projects, and the lectures are organised into different "courses" or "podcasts" accordingly. Below, I've included links to them all and some information about each podcast. 

In order to subscribe to these with your podcasting app, you need to enter the RSS URLs below under the options for "adding new podcasts". It's not very difficult, I promise. One of the most popular podcast apps for android phones, "podcast addict", let's you do this easily (just look for the "add RSS feed" option under adding). Non-smartphone users don't despair, it is also possible to listen to podcasts via desktop apps. If you need further assistance, comment here and we'll help.

The lectures are from the public seminar series of the centre. Please note that when they are video feeds, you might not be able to play only audio as background (depends on your player).

Naturally, there are other institutions that offer lectures freely online via iTunes U, podcasts etc. Before I moved to Australia, I listened to the Linguistic Diversity-podcast/iTunesU by La Trobe for example (iTunes link & RSS link). This post is about the podcasts of CoEDL, since it's where I'm at currently and I just found this out, but I'd be happy to make posts about other seminar series in future. Leave recommendations in the comments! 


***

Language and society

Any analysis of language as a dynamic and evolving system must confront a self-evident truth: languages are spoken with intention by living, breathing human beings. The linguistic choices of individual and groups point to the diverse ways that we understand our world and the complex meanings that we exchange with another.

RSS URL: https://itunesu.itunes.apple.com/WebObjects/LZDirectory.woa/ra/directory/courses/1017184690/feed


***

Language evolution


Just as Darwin showed that species are not fundamentally immutable, it is now known that languages continue to evolve over time, adapting to societies and their environments. In recent years a wide range of disciplines have contributed new ideas towards understanding the evolution of language. We are uniquely poised to build on these new initiatives to develop a general theory of language evolution. 

Language evolution operates over many levels and time-spans: from evolution of language as a communicative system, which took place over tens or hundreds of millennia, to evolution of specific languages across generations and within speech communities. Our research aims to link the cognitive capacity of individuals and how they process language to the use of language as a public and social product in a specific cultural and ecological context. This will therefore integrate our understanding with how language works at the level of the individual to the level of the community or nation.

RSS URL: https://itunesu.itunes.apple.com/WebObjects/LZDirectory.woa/ra/directory/courses/999201494/feed

This podcast is from the evolution program of the centre, more about that here.

***

Language learning


When children and adults learn a language, they engage with its internal complexity and varietal characteristics (see Shape), use human cognitive abilities for processing which change over their lifespans (Processing), inspire computational models of potential technological application to building adaptive systems that know how to learn, and contribute over generations to language change (Evolution).


The learning program integrates all these elements, but with a twist: it is putting a spotlight on how children and adults learn languages in contexts that are acutely under-researched, but which are of social, educational, and economic importance for Australia and its place in our region.

RSS URL: https://itunesu.itunes.apple.com/WebObjects/LZDirectory.woa/ra/directory/courses/999201738/feed

This podcast is from the learning program of the centre, more about that here.

***
Language processing


How does our language processing ability enable us to rapidly perceive, produce and understand language given the massive diversity observed across both speakers and languages?


We examine language processing in a breadth and depth previously unmatched in any one project. We will map processing at multiple levels of description and observation, in monolingual and multilingual individuals, and in typical and impaired populations. All these investigations will take place across a range of languages and dialects representing the unrivalled diversity in the Indo-Pacific region.

RSS URL:
https://itunesu.itunes.apple.com/WebObjects/LZDirectory.woa/ra/directory/courses/999201734/feed

This podcast is from the processing program of the centre, more about that here.

***
Language shape (aka diversity, description and documentation)


How widely do languages differ, why do they differ, and what do these differences tell us about people and their diverse communicative needs? Currently only around 10-15% of the world’s 7,000 languages are well described, and many of the remaining 85-90% are highly endangered, including almost all of the languages of our region.


The Shape program is exploring the design space of language by investigating a strategic selection of little-known languages of our region. We will push forward efforts to document this language heritage by a broad range of methods, drawing on innovative approaches and technologies; building the first large corpora of Indigenous Australian and Papuan languages; and initiating new research on how intergenerational variation can reveal different design solutions evolving in languages to solve similar social communicative problems.

RSS URL: https://itunesu.itunes.apple.com/WebObjects/LZDirectory.woa/ra/directory/courses/999201747/feed


***
Language, technology & archiving

Research technologies in the language sciences are in a period of unprecedented development, and the judicious use of new technologies can result in rapid advances, even paradigm shifts in the nature and scope of research. Big data is being collected by citizens (crowdsourcing), while corpora visualisations techniques facilitate the modelling of language as it evolves. Technology has also become vital for the assessment of language and hearing, using eye tracking, ultrasound and/or iPad-based interactive activities. Perhaps the most common application of digital technology in language research is in archiving linguistic data in the form of acoustic recordings together with script-based lexical and interlinear analyses. Due to advances in computation, archives that were once regarded as simple repositories have no been repurposed as powerful tools for corpus analysis.

RSS URL: https://itunesu.itunes.apple.com/WebObjects/LZDirectory.woa/ra/directory/courses/1017185107/feed



***

* It turns out that if you subscribe to an iTunes U channel in the iTunes app and you view it in your library and right-click the item in the list, you get the option "copy iTunes U URL" which is all you need to then feed into your podcasting app. For some weird reason, iTunes does not really explain that this is how you do this, and their Applecare customer support didn't even know this. But yeah, that's one way of getting the RSS URL. I've copy-pasted them in here for these podcasts for your convenience. 

Thursday, March 9, 2017

International Women's Day 2017

Last year, Hedvig wrote a post about the extremely prolific Joan Bresnan, co-founder of the formal grammatical framework Lexical Functional Grammar. Since the 8th of March is over in Hedvig's time zone, I thought I would make a small post with a few grammar goodies about mothers. Just because I happen to be one and being a mother is one of the important roles women take on.

from Bàsáá
híɣìí  m-ùràá      ɲ´ɛn               à-ŋ-gwés              wèè       mán.
every 1-woman 1EMPH.PRO 1.AGR-PRS-love 1.POSS 1-child
'Every woman loves her child.'

Hamlaoui, Fatima, and Emmanuel-Moselly Makasso. (2015). Focus marking and the unavailability of inversion structures in the Bantu language Bàsàá (A43). Lingua 1:35-64. p. 50.

from Tadaksahak
barr-én     i=yyasáf     s(a)        i=tǝ-keen(í)
child-PL  3p=prefer   COMP   3p=FUT-sleep
i=n           nan-én         ǝn       áaṣi-tan           ka. 
3p=GEN  mother-PL  GEN   belly.side-PL   LOC
'children prefer to sleep against the belly of their mothers.'

Christiansen-Bolli, Regula. (2010). A Grammar of Tadaksahak, a Northern Songhay Language of Malip. PhD dissertation, Leiden University. 210

from Siwu
Mmā ɔso ne, la losarɛ.
Because of my mother I am looking well.

Atsu, John. (2006) Siwu language learning course. Hohoe, Ghana: Volta Region Multi-Project. p. 53

from Tuwuli
a-ma                        lɛ-l-aa-do                                                     nɔ   fɔtɔ        o-sĩ
2SG.POSS-mother  NP.SUBJ.FOC-NEG.FUT-FUT-put.inside you message 2SG-refuse
'if your mother gave you an instruction, you wouldn't refuse (to do it)'

Harley, Matthew W. (2005). A descriptive grammar of Tuwuli, a Kwa language of Ghana. PhD dissertation, SOAS, University of London, p. 452.

from Mesqan
ɑj      dɑkko   tə-nə-tʃɛɲɲ               dɑkko-ɛɲɲɑ  ɑn-tə-tʃ’tʃ’ɑwɛt-Ø
CON mother  SUB-1P-come.JUS  mother-my      NEG-2-play.IPV-SM
ɑf-ɑhɛ                   jə-mɛst                    jɛ-bɑr-ɛt-e                 səlɛ-hɛnɛ 
mouth-your.2SM  3-horrible.IPV-SM  DAT-say.PV.3SF-1S  reach.PV.3S-be
ɑn-tətʃ’tʃ’ɑwɛt-Ø       jə-bbən-ɑ
NEG-2-play.IPV.SM  3-say.IPV.SM-3SF
‘He told her about what his mom told him about his rude language and that he was told not to speak.’

Getachew, Alemayehu. (2011). Mesqan folktales: A contribution to the documentation of the Mesqan language. MA thesis, Addis Ababa University. p. 48

from Lele (Chad)
na      me   lee          yé          dí-nì                           ná         me  jè        má          lay
HYP  2F   eat:FUT  mother  GEN:PL-1PL.EXCL  ASSC  2F  IMPF  die:FUT  also
'If you eat our mother, you will also die.'

Frajzyngier, Zygmunt. (2001). A grammar of Lele. Stanford: CSLI Publications. p. 186

See Hedvig's post again on why celebrating International Women's Day is important. Random lessons we can learn from these random grammar goodies: 1) it's OK to co-sleep, you are just listening to your child; 2) thank your mother for being a good person; 3) don't touch our mothers, you will suffer. Love and freedom to all!




Wednesday, March 8, 2017

Spurious correlations

*apologies for pay-walled links ahead*

I was first confronted by spurious correlations in language and culture during the EVOLANG 10 conference in Vienna in 2014, where I think I saw a poster on the relationship between tense marking and economic behaviour. If I remember correctly, this poster build on the famous findings of Keith Chen, who published a paper in 2013 on the relationship between obligatory future tense marking and various types of social and economical decisions people take.

The paper was very controversial before it was even published, with posts on Language Log (see links at bottom of the page for more posts) a reply on Language Log by Keith Chen and a variety of media coverage that can be found here and here and here and here.

Chen found that languages lacking a distinct future tense: "save more, retire with more wealth, smoke less, practice safer sex, and are less obese" (abstract). He explains his findings as follows "[...] being required to speak in a distinct way about future events leads speakers to take fewer future-oriented actions. This hypothesis arises naturally if grammatically separating the future and the present leads speakers to disassociate the future from the present. This would make the future feel more distant, and since saving involves current costs for future rewards, would make saving harder. On the other hand, some languages grammatically equate the present and future. Those speakers would be more willing to save for a future which appears closer. Put another way, I ask whether a habit of speech which disassociates the future from the present, can cause people to devalue future rewards." (section 1).

At EVOLANG, I felt completely flabbergasted at these findings. How could such complex individual decisions, including on how much to save for later life, dietary habits, sex habits, and smoking habits, be related to a tiny aspect of the language that one speaks? It seemed completely unrealistic to me, although Chen (2013) goes through some efforts to explain the mechanisms through which this connection would work.

Save, but only if you speak German or another future-less language

Then in 2015, the Chen (2013) study was partly refuted by a follow-up study by Seán Roberts, James Winters, and Keith Chen. Lead author Seán Roberts wrote two blog post about their study, found here and here. They point out several criticisms of Chen's (2013) paper, but focus on whether the correlation between less savings and a distinct future tense remained when controlling for the historical relatedness of the languages included in the study. As it turned out, the correlation was no longer significant when the control was included. Their point is that both language and culture have to be considered in light of history: languages are likely to inherit a particular way of marking future tense from their ancestors, and populations are likely to have economic and dietary habits similar to the populations from which they descend. Once this genealogical signal was taking into account, the relationship between the predisposition to save less money and having obligatory future marking became insignificant.

(BTW, The person who is aiming to shed more light on this is Cole Robertson, who wrote a follow-up post on a new study on the topic op future tense and economic decision taking.)

The problem I talk about here is not new. Seán Roberts and James Winters wrote an article in 2013 warning against spurious correlations between cultural traits - correlations between traits that are very likely accidents of history as there is no functional mechanism that could explain the association between the two types of behaviour. They illustrate the existence of spurious correlations by showing that these exist between morphological complexity and having a siesta or not, and between the presence of acacia trees and tone languages within countries. The problems they identify include the following:

1. Galton's problem: the need to control for historical relatedness and diffusional associations in order not to overestimate the number of independent datapoints;
2. Distance from data: In many cases, the data on a language has been collected by one individual and this data is subsequently categorised into (sometimes coarse) variables, creating a distance between the dataset and reality;
3. Inverse sample size problem: given that culture data is often incomplete, complex, and based on inconsistent data, the noise-to-signal ratio increases rather than decreases in larger datasets.

They warn that correlational studies should bear in mind a realistic hypothesised mechanism for the correlation, and should attempt to control for alternative explanations, especially those relating to diffusion and historical descent. Here are some cool pictures bringing across this point, and there is far more on spurious correlations on Replicated Typo.

Use trees!

However, the story does not end here. To my surprise, their are A LOT of papers that report correlations between aspects of language and sociological aspects of culture that seem far-fetched, and clearly they have not read Roberts & Winters (2013). I am not talking about the overview provided by Ladd et al. (2014), blogpost here, who present a review of studies that look at correlations between languages and non-linguistic forces acting on language, including variables such as the amount of second-language speakers and the type of climate. The kind of demographics that they cover in their review make much more sense to me: population size, for instance, should be expected to have an effect on certain aspects of language as it makes a huge difference whether you speak your language within a village of 100 people, or within a state with 100 million people.

I am talking about studies like the following. In 2013, Santacreu-Vasut and colleagues published a paper entitled "Do female/male distinctions in language matter? Evidence from gender political quotas" in Applied Economics Letters. This paper investigates the correlation between an index of gender variables from the World Atlas of Language Structures and the presence of a legislated quota of female members in the lower house of parliament. They find that such a correlation exists, in their words: "Countries with a higher emphasis of female/male distinctions in their dominant language (higher GII) are therefore more likely to regulate women's political participation." (p. 497). I am not even going to get started on the gender index they used, which does not adequately measure whether languages make a male/female distinction in their grammar - I am just going to say that the claim is very likely to be a spurious one. The data on gender quotas indicates that most European countries as well as many sub-Saharan African countries have gender quotas - both areas are 'gender hotbeds' (Nichols 1992: 132), as gender is both areally and phylogenetically stable in these places. So the national languages of these countries are probably driving the effect found by Santacreu-Vasut et al., making the finding an historical accident rather than a true link between gender and legalised political engagement of women.

Unlike the data on propensity to save money used by Chen (2013), the data is collected on a country level, taking the 'most spoken' language as the appropriate variable to associate with country-level variables on political participation, the Human Development Index, and the number of years since women were first allowed to run for election. I have a huge problem with this. Most countries harbor speakers of tens or hundreds of different languages (remember the map on number of languages per countries?), although most have a limited set of national languages. It is completely inappropriate to set any national language to be THE language associated with country level demographics and propose their correlational findings to be in line with theory on language shaping cognition.

Non-gray countries have some form of legislated candidate quotas, from http://www.quotaproject.org

Things would not be so bad if, as Roberts & Winters (2013) say in their paper: "Since some of these studies are receiving media attention without a widespread understanding of the complexities of the issue, there is a risk that poorly controlled studies could affect policy." Santacreu-Vasut et al. (2013) has been cited over 20 times so far. I simply do not have the guts to look at detail to all of these, but some of them are:

Lucas van der Velde et al. (2015) 'Language and (the estimates of) the gender wage gap', from the conclusion: "We hypothesized that in countries where language has a more marked distinction between genders, differences in labor market outcomes will be larger. [...] The results robustly confirm the hypothesis." and explicitly on policy: "From a policy perspective, the major message of our study is that gender wage gap may be driven by some deep societal features stemming from such basic social codes as language. This suggests that if reducing GWG [gender wage gap, AV] was a policy objective, education on gender equality is needed already at early stages of education, when language characteristics are absorbed by children and translated into societal norms."

Hicks et al. (2015) 'Does mother tongue make for women's work? Linguistics, household labor, and gender identity'; from the abstract: "We use a novel approach relying on linguistic variation and document that households with individuals whose native language emphasizes gender in its grammatical structure are significantly more likely to allocate household tasks on the basis of sex and to do so more intensively."

Davis and Reynolds (2016) 'Gendered language and the educational gender gap'

Shoham et al. (2017) 'Encouraging environmental sustainability through gender: A micro-foundational approach using linguistic gender marking'

Malul et al. (2016) 'Linguistic gender marking gap and female staffing at MNC’s'

Seán Roberts also commented on an earlier paper by some of the authors featured above here. At the bottom of this post he says: "To put it cynically, it’s as if gender inequality is only due to humans being slaves to their language, rather than centuries of active patriarchal societies. The hypothesis doesn’t seem to have a good reason why distinctions in gender should disfavour women over men. Perhaps most disturbing is the authors’ clear appeal for these findings to be used in policy"

Indeed, this is scary stuff.

A more popular paper, albeit with perhaps less dire consequences on policy is Kashima and Kashima (1998), who find a positive correlation between pronoun drop, a grammatical phenomenon where pronouns can be left out of otherwise well-formed sentences, and collectivism, a tendency for society to look after in-group members. As was the case for Santacreu-Vasut et al. (2013), the unit of analysis is the country, with the national language taken to supply information about pronoun drop. This paper has been cited over 300 times and seems to be in high standing in cross-cultural psychological research, as it is being cited in handbooks and otherwise mostly in papers that do not pertain to this exact hypothesis, suggesting it's  results are taken as a given. There is also a follow up by the same authors, Kashima and Kashima (2003), including a larger set variables on economy, climate, and geography, and an erratum to the 1998 article from 2005.

This map was made by Gert Jan Hofstede, son of Geert Hofstede, both famous social scientists focusing on cross-cultural differences (see geerthofstede.com)

The established nature of these types of papers is underlined by the fact that the attention to potential spurious correlations is not confined to specialist journals such as Journal of Cross-Cultural Psychology and Applied Economics Letters. Science published a paper in 2014 by Talhelm et al. entitled 'Large-Scale psychological differences within China explained by rice versus wheat agriculture'. From the abstract: "We tested 1162 Han Chinese participants in six sites and found that rice-growing southern China is more interdependent and holistic-thinking than the wheat-growing north." See a popular write-up of the paper here and here.

Luckily, we can count on the absolute hero Seán Roberts to provide a commentary. See here for a blog write-up. Roberts shows that it is very likely that at least part of the correlation reported by Talhelm et al. can be explained by linguistic history, again showing the need for a control for cultural contact and genealogy.

I am sure there are more papers in this vein - please comment if you know of any! I am of half a mind to track them all down, get their data, and show that most if not all of these relations disappear when controlling for genealogical descent and/or diffusion. On the other hand, clearly this would be a waste of time as the premise of most of these papers is that the national or 'most used' language of a country can be of influence on population-level findings on social and economic behaviour, a premise which I think is inherently flawed. The mechanism behind such relationships is simply unfathomable, despite the efforts taken by the authors of these papers to demonstrate the contrary. I am thinking of the level of multilingualism in most countries, let alone the amount of different communities with different psychological and economic behaviours.

Turns out that the paper on rice and wheat agriculture by Talhelm et al. (2014) is in fact an improvement on at least this issue, as it looks in detail at agricultural practices and psychological measures within a set of provinces in one country, China. However, it still doesn't account for the contingencies that arise when comparing communities that are both closely related and in close contact.

There really is no longer any excuse for not incorporating information on geographical distance and genealogical descent in cross-linguistic and cross-cultural analysis. There are reference trees available for all language families in several typological and reference databases (Glottolog, Ethnologue, AUTOTYPE, WALS) - the awesome Dan Dediu has made this extremely easy for you as explained here, as well as Bayesian posterior samples of phylogenetic trees for over 10 language families. On Glottolog, latitude and longitude of the location of a speaker populations are freely available. D-PLACE has cultural variables for many societies around the world linked to language and various phylogenetic trees. All the people that have worked on potentially spurious correlations in language and culture know how to use statistics, well great! - you can use multilevel models and regressions that correct for genealogical relatedness and geographic distance and do things the way you should be doing things.

Vortices, Principia Philosophiae, René Descartes, 1644.

EDIT: Seán Roberts alerted me to the newly published The Palgrave Handbook of Economics and Language, with a paper by Nigel Fabb that is critical of these studies entitled "Linguistic Theory, Linguistic Diversity and Whorfian Economics". His main critiques are 1) the linguistic data is simplified to such an extent that it may no longer represent linguistic facts; and 2) the studies do not present a demonstration of causation despite their claims, and need to present a far more rigourous case both in theory and in experimentation. I am really happy this critical paper was published in a venue which will surely be read by economists, yay!

Selected references

Chen, M. K. (2013). The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets. American Economic Review 103.690–731.

Kashima, E. S. & Kashima, Y. (1998). Culture and language: The case of cultural dimensions and personal pronoun use. Journal of Cross-Cultural Psychology 29.461-486.

Kashima, Y. & Kashima, E. S. (2003). Individualism, GNP, climate, and pronoun drop: Is individualsm determined by affluence and climate, or does language play a role? Journal of Cross-Cultural Psychology 34.125-134.

Ladd, D. R.,  Roberts, S. G. & Dediu, D. (2014). Correlational studies in typological and historical linguistics. Annual Review of Linguistics 1.221-41.

Nichols, J. (1992). Linguistic diversity in space and time. Chicago: University of Chicago Press.

Roberts, S. G. & Winters, J. (2013). Linguistic diversity and traffic accidents: Lessons from statistical studies of cultural traits. PloS One 8.e70902.

Roberts, S. G., Winters, J. & Chen, K. (2015). Future tense and economic decisions: Controlling for cultural evolution. PloS One 10.e0132145.

Santacreu-Vasut, E., Shoham, A. & Gay, V. (2013). Do female/male distinctions in language matter? Evidence from gender political quotas. Applied Economics Letters 20.495-98.


Tuesday, March 7, 2017

Listen to the world's languages pt 4!

There's a lot of website around where you can literally listen to the diversity of the world's spoken languages, we've covered a few of them before under the tag Listen To The World's Languages.

Since we last made a post, we've found some more sites for you to enjoy. Just to reiterate here's a list of the sites we already mentioned before:
Now, to new additions in this category. We've got six for you this time, all quite spectacular! 

***
Radio garden


Radio Garden is a Transnational Radio Knowledge Platform traveling online exhibition designed by Amsterdam-based Studio Moniker and developed by the Netherlands Institute for Sound and Vision. It’s a project that aims to connect people worldwide through shared experiences.

This is a really cool thing, I'm thoroughly enjoying this. You can literally listen to radio stations from all over the world, both speech and music. Very cool, very good art project.

Interactive map of languages
of the International Phonetics Association


This might not sound as cool, but trust me it is. The journal of the International Phonetics Association has a standard format for people to submit phonological descriptions of languages in. It includes that certain words, minimal pairs and a certain story is recorded (the north wind & sun). These language illustrations are here mapped out, so you can click and listen to them! It is Marija Tabain from La Trobe University in Australia and colleagues have developed this map. It's neat because it features many lesser known languages and the actual scientific publication associated with the phonological description. I took particular pleasure in finding Nen, one of the languages people here at ANU work on (in particular my PhD supervisors Nick Evans).

Localingual


This site is quite similar to Language Landscape, it lets you upload audio samples that are geo-tagged and then you can browse the world map and listen to other clips. It's very nice, but maybe not so new.. Sorry.  It's got more than 18.000 clips, which is more than Language Landscape's 708 recordings. I must confess though, beside the size I'm not really understanding what the big differences are between these two sites. Language Landscape have you log in, and Localingual does not require that. That has advantages (more people submit) and disadvantages (more crap submitted).

Both are cool, go to both.

**EDIT** I hadn't noticed the up and down voting at Locallingual until after writing this. That is very good, and it gets rid of a lot of the troubles people have been raising with the site. Thumbs up!!

***
These next three are databases of languages of a different kind, these contain material collected in various research projects - primarily on endangered languages. It's not at all as easy to just click and listen, unfortunately, but they do contain more material and more rigorous scientific description of the languages and material!

Endangered Languages Archive (ELAR)
ELAR is an archive for endangered languages hosted at the School of Oriental and African Studies in London. It currently contains 517 languages and is a very important institution.


Pacific and Regional Archive for Digital Sources In Endangered Cultures (PARADISEC)

PARADISEC is a database of material on endangered languages, with primary focus on the Asia-Pacific region. It features 800 languages currently, both text and audio material. It is the #1 site for language material in this region.

Dokumentation bedrohter Sprachen (DOBES)

DOBES contains 68 languages and is a project for documenting endangered languages sponsored by the VolkswagenFoundation and hosted by MPI-Nijmegen.


***

We hope you enjoyed that, if you've got suggestions be sure to comment or tweet at us.

Grand Challenges of Linguistics pt 4: response from readers

Continuing on the series "Grand Challenges of Linguistics", I thought I'd share responses that came in on twitter (you can tweet your own response here). These are questions that occupy researchers of the field today. What do you find to be the most important and interesting questions and challenges of your research field?


Nick Enfield said :
"Focus on answering research questions not on defending methods/approaches"
"Take seriously the connections between grammar and social interaction"

Piers Kelly said: 
"Work out how to locate living breathing humans in the data. If linguistics is not a humanist science, we've lost the mission!" 
"Address structural biases against cross-disciplinary research. Don't just affirm that cross-disciplinarity is a 'good thing'."

Simon Greenhill said: 
"Scaling up the data and methods to detect & quantify the global, regional, & familial processes shaping language diversity."

I'd also like to share some quotes from two round table dsicussions on this very topic that we talked about before. The two round table discussions had different perspectives, one more "generative" and one more "functional.

Generative round table discussion
Rose-Marie Déchaine said: 
"there remains an Indo-European bias in the field, which privileges certain data sets as being inherently more theoretically interesting than others."
Elena Anagnostopoulou said:

"Syntacticians are often highly selective in the way they read and cite, and they adopt main stream proposals without questioning their basic assumptions. At the same time, interesting theoretical work is ignored if it is not fashionable or produced at the right places. This imbalance does not encourage free thinking. Success measures are often one-sided and the pressure for increased productivity does not always outweigh the cost of decrease in depth "

At the round-table discussion with a more functional slant
Katarzyna Bromberek-Dyzman said: 
"Linguists need to employ their language-related expertise to answer bigger scale questions about the nature of language systems in connection with other systems of meaning involved in communicative sense making. Neuroimaging research shows that language processing is not computed in a mental and neurophysiological vacuum."

Eitan Grossman said: 
"The field of grammaticalization studies has turned up a massive amount of data on the regularities of change that result in grammatical structures. It has also turned up counterexamples and rarer types of change that can result in grammatical structures. Of course, cross-linguistic study and research on historical change in languages with real documented historical corpora can help us to evaluate the hypotheses of grammaticalization research, but I would like to point to another avenue of research that bridges disciplinary boundaries, forging a link between the work of ‘unhyphenated’ linguists and experimentalists."

Monday, March 6, 2017

Linguistic typology conference and workshops coming up! #lingtypconf

This year, Canberra will host the 12th biannual meeting of the Association for Linguistic Typology. This is the big event for researchers of cross-linguistic diversity from all over the world. This is, as is well known, a very cool and challenging research field. Grasping the diversity of the world's languages is an ambitious and worthwhile enterprise, that can sometimes leave you feeling a bit floored.


I'm happy to be able to welcome you here in Canberra for this conference, it's going to be great! If you're floored, I'll pick you up!

The deadline for abstract submission is coming up, so you're all hereby reminded that if you want to partake then get that abstract in by the 31st of March. This goes for the general session, but also the workshops.

We (Martin Haspelmath, Hannah Haynie, Robert Forkel and myself) are organising a workshop on Design principles and comparisons of typological databases. If you are interested, do get in touch and submit an abstract. Below in this post is a longer description of our workshop.

Interested people should submit abstracts for both the general session and the workshops in the same form: http://www.dynamicsoflanguage.edu.au/alt-conference-2017/call-for-abstracts/ . You don't have to be an ALT member to send in an abstract, but you'll have to become one if you're coming here and giving a talk. But, worry about that later.

Please remember that for those with funding problems, there are a limited number of scholarships for researchers are available, applications also due 31 March 2017. 

There's actually two workshops this year with similar topics, ours and one by Round, Macklin-Cordes and Quinn titled: quantitative analysis in typology: the logic of choice among methods. Since they're overlapping, we'll see if we can arrange for some time together or some sort of linking.

If you can't come, but are keen on following what's going on in the world of linguistic typology, then subscribe to this mailing list.

Longer description of the workshop: Design principles and comparisons of typological databases

What are the shared challenges and opportunities facing databases of language diversity? What kinds of databases are out there, and what can they be used for? These are questions we would like to address in this workshop, bringing together researchers working with compiling this kind of data, and users of it.

There are quite a few existing databases of grammatical features of languages, and several more are under construction. They differ in their design and in the kinds of research questions they aim to answer. Some are created to investigate the particular history of a certain region or family (e.g. van Gijn 2014), others a particular set of traits in a global set of languages (Stassen 1997), and so on. Despite these differences, there is often the possibility of sharing data or design between different typological surveys. 

We would like to take this opportunity during the ALT to bring together scholars who are working on designing typological databases and end users of such databases and discuss comparisons and possible opportunities for co-ordination. We’re interested in the design decisions that go into the construction of a database and what consequences that has for what it can be used for, and if it can be linked to other similar databases.

Within MPI-SHH's Glottobank project (http://glottobank.org/), there have been discussions of how different typological databases relate to each other and what their different aims and uses are. We would like to engage the broader typology community in these discussions and hear viewpoints from other database designers and end-users.

We are also interested in discussing design principles in relation to end-users of the data. There are many different kinds of end-users of this data, and the methods with which they approach the material carries with it certain assumptions and prerequisites. For phylogenetic studies, for example, it is best if the features are logically independent of each other and associated with a confidence value. What does the data that is available today look like, and what should future surveys look like?

This is not only a question of adjusting to certain end-users preferences, but also a matter of clearly communicating what the data looks like, how it was designed and why. This will make it clear which research questions the data is suited for, and which questions it should not be applied to.

For example, WALS (Dryer & Haspelmath 2013) was constructed using already existing data from a number of well-known typologists. There was also a core sample of languages (100 and 200) that all/most of the chapters covered, but there were still significant gaps in the database coverage of features per language. This renders certain kinds of analysis impossible. In WALS, there was most likely greater consistency per feature as opposed to per language since that was how labour was divided. This can be contrasted with APiCS (Michaelis et al 2013), where the languages each was represented with experts who corresponded with the APiCS editorial team to answer a typological questionnaire. In the case of APiCS, we expect greater consistency over each language instead of over each feature. APiCS also allows for languages to be represented with several values for one feature, whereas WALS only allows for one. These design choices has consequences for the nature of the data and are interesting to discuss in relation to databases under construction, end users and comparison. 

We would like to take this opportunity to invite researchers who are working on constructing typological databases of structural/grammatical features to discuss the questions below and related ones. We would also like to invite end-users who are engaging with this kind of data to present findings and engage in discussions on what the limitations and possibilities of the databases are.

The workshop aims at discussing these questions, but is also open to other related questions:
  • What kind of questions do we want to answer with our data, and which questions do we need to admit we cannot answer?
  • What does it mean if we are comparing doculects instead of languages?
  • What do linguistic descriptions, globally, enable us to research and what does it not?
  • What other feasible sources of information besides descriptions can we use?
  • What do we gain and lose by designing our features to be logically independent from each other (or conversely by including non-independent items in questionnaires)?
  • How do the circumstances of data collection (e.g. coding by feature or by language) affect the use and comparability of data from different surveys?
  • Can data from regionally oriented questionnaires be coordinated with globally oriented surveys to fruitfully build better sets of information on the world's languages? How do data design limitations impact this enterprise?
  • What elements need to be considered and what information needs to be documented when mapping between grammatical/typological datasets? (i.e. setting the stage for the grammaticon/getting input from other database designers on this concept)
  • How do we implement measures of coder-inter-reliability into more databases and into comparison of them?

Wednesday, March 1, 2017

Maybe some isolates are creoles? pt 2


It has not escaped me that a while ago we asked the question "maybe the language isolate Bangime is actually an old creole?"* and now the MPI-SHH in Jena has a research project on the genetics of Bangande people (Bangime speakers) and they also had a mini-African symposium with language experts on Bangime and Dogon.. and now later this year there's a workshop on "Language shift and substratum interference in (pre)history"...

Interesting... I will be watching this space.

/Cheeky-Cheekvig

*Basically, they've got lexicon from the surrounding Dogon and from the perspective of a typological questionnaire (NTS), the language "lacks" a lot of grammatical things, including similarities with Dogon languages. Also, there's suggestions that they used to be slaves of the Dogon.

Thursday, December 22, 2016

Resultatives

December is traditionally a month for looking back. I have been looking back quite far for an abstract on causative motion for a conference in Paris next year. During my PhD, I collected data on both non-causative motion (Robin ran out of the house) and causative motion (Alice threw the book out the window), but I never did anything with the causative data. If the abstract is accepted, maybe I can finally do something with it.

Coding things up for the abstract lead me to go even further back. Among the causative motion sentences selected from my parallel corpus, I included a resultative:

"A large rose-tree stood near the entrance of the garden: the roses growing on it were white, but there were three gardeners at it, busily painting them red." (from Alice's Adventures in Wonderland by Lewis Carroll)

To paint something red is a resultative construction :). Resultatives were the topic of my MA thesis completed now seven years ago, and they have always kept a special place in my heart. They are one of three types of secondary predication:

Manner predication:  Sue walked slowly
Depictive:                  Lisa ate her vegetables raw
Resultative:               Melissa cut the grass short

Secondary predicates 'attach' to a normal predicative constituent that encodes an event, here walk, eat, and cut, and expres a state or a property regarding that event, here slow, raw, and short. They are well-known in construction grammar and generative linguistics, but in typology, major cross-linguistic work has only been done on manner predication by Flora Loeb-Diehl in her dissertation "The typology of manner expressions" from 2005.

Given that I'm not doing anything with secondary predication for the foreseeable future, I thought a little typology of the 'to paint red' resultative from Alice in translation in a few Indo-European languages would be a nice Christmas read for (perhaps one of) you.


The strategies used to translate the English construction 'to paint red' vary along several different axes. To start with the closest relatives of English, these are the Dutch, German, and Swedish translations (apologies for poor gloss alignment):

Dutch
Er   stond              een            grote rozenboom  bij de 
ER stand.PST.SG INDF.ART big    rose.tree      by  DEF.ART
ingang    van de            tuin;     de            rozen    die
entrance of    DEF.ART garden DEF.ART rose.PL DEM
eraan  groeiden        waren           wit     maar er   waren            drie 
on.it    grow.PST.PL COP.PST.PL white but    ER COP.PST.PL three
tuinlieden     druk  aan de             gang om ze    rood te   schilderen.
gardener.PL busy  on   DEF.ART  way  to   3PL red    TE paint.INF

German
Ein                              hoher                     Rosenstock        stand                  nah 
INDEF.ART.M.NOM  high.M.SG.NOM  rose.tree.M.SG  stand.3SG.PST  close
dem                       Eingang               zum                         Garten:
DEF.ART.M.DAT  entrance.M.DAT  to.DEF.ART.M.DAT  garden.M.DAT
die                           Rosen                   die   daran  wuchsen          waren         weiß   aber 
DEF.ART.PL.NOM  roses.F.NOM.PL  that  on.it    grow.3PL.PST  is.3PL.PST  white  but
drei    Gärtner                      waren         dabei               sie            geschäftig  rot  an-zu-streichen.
three  gardener.M.NOM.PL  is.3PL.PST  in.the.process  3PL.ACC  busily         red  on-to-paint.INF

Swedish
Vid  ingång-en                     stod            ett                    stort   rosenträd.     Ros-orna            
by    entrance-SG.DEF.UT  stand.PST  INDF.ART.NT large  rose.tree.SG  rose-PL.DEF.UT
som                         växte         på  det        var                vit-a             men
which.REL.PRON  grow.PST  on  3SG.N be.PST.COP  white-DEF  but
tre     trädgårdsmästare  var                sysselsatt-a      med  att  måla          dem         röd-a
three gardener.PL           be.PST.COP  occupy-ADV  with  to   paint.INF  3PL.OBJ  red-DEF

Dutch, German, and Swedish all make use of an adjective as a secondary predicate, similar to the English original. This can be an invariable form, as in Dutch, or the adjective can agree with the noun for 'roses' in terms of definiteness, person, gender, case, etc., as in Swedish. This 'bare' adjective strategy is quite big in Europe: in my MA thesis, I note that this is possible in Greek, Icelandic, Italian, Norwegian, and Spanish too. It is indeed used in the Greek translation:

Greek
Mia                               megal-i                  triantafylli-a              fytron-e
INDF.ART.F.NOM.SG  large-F.NOM.SG  rose.tree-F.NOM.SG grow-PST.IPFV.3SG
kont-a        s-tin                                 eisod-o                      toy                                 perivoli-oy. 
near-ADV  in-DEF.ART.F.ACC.SG  entrance-F.ACC.SG  DEF.ART.M.GEN.SG  garden.M-GEN.SG
Ta                               triantafyll-a           tis                 itan              aspr-a
DEF.ART.N.NOM.PL roses-N.NOM.PL POSS.F.3SG be.PST.3PL white-N.NOM.PL
alla  ypirch-an                   kont-a         tis                treis                 kipoyr-oi                      poy   me 
but   exist.PST.IPFV-3PL  near-ADV  3SG.F.OBJ  three.M.NOM  gardener-M.NOM.PL  who  with
poly-aschol-o                yf-os                    ta              e-vaf-an                            kokkin-a.
very-busy-N.ACC.SG   look-N.ACC.SG  3PL.OBJ  PST-paint-PST.IPFV.3PL  red-N.ACC.PL

And also in Irish:

Irish
Bhí       crann   mór   róis            ina                        sheasamh  gar   do  gheata
is.PST  tree      great  rose.GEN  in.3SG.M.POSS  standing    near  to  gate
an              ghairdín:       is       geal    a                  bhí        na                   rósanna
DEF.ART  garden.GEN COP  white  REL.PART  is.PST  DEF.ART.PL  rose.PL
ag  fás             air                  ach  bhí       triúr   gairneoirí             ina
at   grow.INF  on.3SG.M/N  but  is.PST  three  gardner.GEN.PL  in.3SG.M/N.POSS
thimpeall       agus  iad            go               gnóthach  á                bpéinteáil   dearg.
surrounding   and   3PL.OBJ  ADJ.PART busy         3PL.POSS  paint.INF  red

But not in Italian:

Italian
Presso  l’-entrata                                   del                            giardino 
near     DEF.ART.F.SG-entrance.F.SG  of.DEF.ART.M.SG  garden.M.SG
c’-er-a                      un         grande  rosaio:               vi        cresce-v-ano
there-be.IPFV-3SG  one.M  big.SG  rose.tree.M.SG  there   grow-IPFV.PST-3PL
rose           bianche      ma  c’-er-ano                  tre      giardinieri         tutti  indaffarati 
rose.F.PL  white.F.PL  but  there-be.IPFV-3PL  three  gardener.M.PL  all    busy.M.PL
a   diping-er-le                  di   rosso.
to  paint-INF-ACC.F.3PL of  red

Italian uses a combination of a preposition and an adjective meaning 'red'. This strategy is in fact used by all other Romance translations, even though the 'bare' adjective strategy is supposedly possible as well:

French
Un                   grand   rosier          se                 trouv-ai-t              près  de 
ART.INDF.M  big.M  rose.tree.M  REFL.3SG  found-IPFV-3SG  near  to
l'entrée                        du                       jardin:      ses                 rose-s 
ART.DEF.F-entrance  of.ART.DEF.M  garden.M  3PL.F.POSS  rose.F-PL
ét-ai-ent          blanche-s     mais  trois   jardinier-s     s-'affair-ai-ent
be-IPFV-3PL  white.F-PL  but     three  gardener-PL  REFL.3PL-be.busy-IPFV-3PL
à    les           peindre     en  rouge.
to  3PL.OBJ  paint.INF  in  red

Portuguese
Perto  d-a                          entrada       d-o                            jardim      estava
near   of-DEF.ART.F.SG  entrance.F  of-DEF.ART.M.SG  garden.M  be.IND.IPFV.3SG
uma                      grande  roseira        com  rosa-s        branca-s,     mas  havia                              três
INDF.ART.F.SG  large     rosebush.F  with  rose.F-PL  white.F-PL  but   there.be.IND.IPFV.3SG three
jardineiro-s        muito  atarefado-s               a    pintar-em-na-s                            de  vermelho.
gardener.M-PL  very    burden.PTCP.M-PL  to  paint.INF-PERS.3PL-OBJ.3F-PL  of  red.M

Romanian
Un                 arbust                     stufos  de  trandafir-i           se                 înălț-a
INDF.M.SG  shrub.M.NOM.SG bushy  of   rose-M.ACC.PL REFL.3SG  go.up-IPFV.3SG
aproape  de  intrar-ea                       în  grădină;                  trandafir-i-i                    înflori-seră 
near        of  entry-F.ACC.SG.DEF  in  garden.F.ACC.SG  rose-M.NOM.PL-DEF  bloom-PPRF.3PL
alb-i         dar   trei    grădinar-i                    dădeau              zor     pe  lângă    arbust
white-PL  but  three  gardener-M.NOM.PL  give.IPFV.3PL  haste  on  next.to  shrub.M.NOM.SG
și     vops-eau          florile                             în  roșu.
and  dye-IPFV.3PL  flower.F.ACC.PL.DEF  in  red

Perhaps the Romance translations can be related to the Latin translation, which uses the ablative case marker. Latin also uses a nominal construction, pigmento rubro 'red paint', rather than just the adjective. Many more languages do this, especially those with case-marked adjectives.

Latin
Prope              adit-um                       hort-i                        arbor 
close.to.ADV  entrance-M.ACC.SG  garden-M.GEN.SG  tree.F.NOM.SG
ros-arum            magn-a                 sit-a                                    est 
rose-F.GEN.PL  large-F.NOM.SG  lie-PASS.PFV.F.NOM.SG  be.PRS.3SG
in  ea                 ros-ae                  alb-ae                    er-ant 
in  3SG.F.ABL  rose-F.NOM.PL  white-F.NOM.PL  be-IPFV.3PL
sed  tr-es                        hort-i                         cultor-es                     
but   three-M.NOM.PL  garden-M.GEN.SG   grower-M.NOM.PL
eas                pigment-o             rubr-o                strenu-e        ping-ebant
3PL.F.ACC   paint-N.ABL.SG  red-N.ABL.SG  active-ADV  paint-IPFV.3PL

Other languages that use the adposition strategy common in the Romance languages are Breton:

Breton
E-tal       toull             al                liorzh               e           oa                    ur
in-front   hole.M.SG   DEF.ART   garden.M.SG   PART   be.PST.AUX   INDF.ART
bod-roz:                        gwenn        e         oa                     ar               roz            anezh-añ 
bush.M.SG-rose.F.PL   white.ADJ  PART  be.PST.AUX   DEF.ART  rose.F.PL   of-3SG.M.OBJ
met  tri       liorzhour            a          oa                   a-zevri         ouzh  o           livañ         e    ruz.
but   three   gardener.M.SG  PART  be.PST.AUX  really.ADV  to       PROG  paint.INF  in   red.ADJ

Albanian uses the instrumental preposition me (and again a nominal, 'red colour', rather than simply the adjective 'red'):

Albanian
Pranë  hyrjes                                së  kopshtit                            kishte                    një
near     entrance.F.DEF.DAT.SG  of  garden.M.DEF.DAT.SG  have.IMPRF.3SG  INDF.ART
shkurre                          të                          madhe                         trëndafilash.
bush.F.INDF.ACC.SG  INDF.F.ACC.SG  big.F.INDF.ACC.SG  rose.M.INDF.DAT.PL
Aty    lulëzonin                 lule                                   të                           bardha                             që
there  bloom.IMPRF.3PL  flower.F.INDF.NOM.PL  INDF.F.NOM.PL  white.F.INDF.NOM.PL  that
tre      kopshtarë                               lodheshin                              gjithë  ditën                          duke
three  gardener.M.INDF.NOM.PL   get.tired.PASS.IMPRF.3PL  all       day.F.DEF.ACC.SG  by
i                        lyer                     me    ngjyrë                            të                          kuqe.
3PL.F.AB.SBJ  paint.GER.PRS  with colour.F.INDF.ACC.SG  INDF.F.ACC.SG red.F.INDF.ACC.SG

Hindi does the same with the instrumental se (and yet again a nominal, 'red paint'):

Hindi
baġīce               ke                praveśdvar  ke               nazdīk            hī         gulāb     ka           ek
garden.M.OBL  GEN.OBL  entrance.M  GEN.OBL  nearby.ADV EMPH  rose.M  GEN.M  DEF.ART
baṛa             peṛ       thā.                   is-me                         safed           gulāb    lage
big.ADJ.M  tree.M  be.PST.M.SG  3SG.PROX.OBL-in  white.ADJ  rose.M  attach.PFV.PTCP.M.PL
the,                           par  tīn      mālī                  in-kī                                    pankhuṛiyoṃ
be.AUX.PST.M.PL  but  three  gardener.M.PL  3SG.PROX.OBL-GEN.F  petal.F.PL.OBL
ko      lāl           rang       se      rangane              meṃ  vyast          the.
DAT  red.ADJ  paint.M  with  paint.OBL.INF  in       busy.ADV  be.PST.M.PL

Persian uses the dative postposition be in combination with a nominal:

Persian
yek  deraḵt-e      gol-e           sorḵ-e        bozorg  dar  qesmat-e    vorud-i               bāġ 
a       tree-of.EZ  rose-of.EZ  red-of.EZ  large     in    part-of.EZ  entrance-INDF  garden
vojud       dāšt:                            gol-hā-ye           ān    sepid-rang      bud 
existence  have.AUX.PST.3SG  rose-PL-of.EZ   that  white-colour   be.COP.PST.3SG
amā  se       bāġ-bān-e                   sar-garm-e              rang=kardan-e 
but    three  garden-keeper-of.EZ  head-warm-of.EZ  colour=make.AUX.INF-of.EZ
gol-hā    be  rang-e            qermez  budand.
rose-PL  to  colour-of.EZ  red         be.PST.3PL

Polish likewise uses the dative na, in combination with an adverbial suffix:

Polish
U  wejści-a                     do  ogrod-u                    sta-ło                            spor-e
at  entrance-N.GEN.SG  to  garden-M.GEN.SG  stay.IPFV-PST.3SG.N  fair.sized-N.NOM.SG
drzew-k-o                     różan-e;                          kwit-ły                                 na  nim
tree-DIM-N.NOM.SG  rose(ADJ)-N.NOM.SG  flower.IPFV-PST.3PL.NM  in   3SG.LOC.N
biał-e                         róż-e                        ale  trzech                    ogrodnik-ów               pracowici-e
white-NM.NOM.PL  rose-NM.NOM.PL  but  three.M.GEN.PL  gardener-M.GEN.PL  diligent-ADV
przemalowywa-ło              je                       na  czerwon-o.
repaint.IPFV-PST.3SG.N   3PL.ACC.NM   to  red-ADV

Russian uses the preposition v 'in' and a nominal, 'red colour':

Russian
U      vhod-a                        v   sad                            ros                                   bol’š-oj 
near  entrance-SG.M.GEN  in  garden.SG.M.ACC  grow.PST.3SG.M.IPFV  big-SG.M.NOM
rozov-yj                 kust -                      roz-y                na  nem                by-l-i
rose-SG.M.NOM   bush.SG.M.NOM  rose-PL.NOM  on  3SG.M.OBJ   be-PST.IPFV-PL
bel-ye                no   vozle  stoja-l-i                     tri       sadovnik-a               i 
white-Pl.NOM  but  near    stand-PST.IPFV-PL  three   gardener-PL.NOM  and
userdno  kras-i-l-i                     ih              v   al-yj                    cvet.
busily      paint-IPFV-PST-PL   3PL.OBJ  in  red-SG.M.ACC  colour.SG.M.ACC

The remaining languages also use spatial markers to encode that the roses are painted red, but these markers are case markers rather than free-standing adpositions. This true for Latvian (Latvian also uses a nominal):

Latvian
pie    ieej-as                       dārz-ā                       aug-a                        liels
near  entrance-SG.F.GEN  garden-SG.M.LOC  grow-PST.IND.SG  large.SG.M.NOM
balt-u                    rož-u                   koks                     bet  trīs                dārznieki
white-PL.F.GEN   rose-PL.F.GEN  tree.SG.M.NOM  but  three.NOM   gardener.PL.M.NOM
pašreiz     steidzīgi  ņēmā-s                             pār-krāsot          ziedus                       sarkan-ā 
presently  hastily     undertake-PST.IND.SG   again-paint.INF  blossom.PL.F.ACC  red-SG.F.LOC
krās-ā
colour-SG.F.LOC

Assamese uses the locative case marker -ɔt (and yet again a nominal):

Assamese
pʰʊl-ɔrɛ            ʊposɪ      tʰɔka ɛ-jʊpa         bɔr daŋɔr  bɔga gʊlap-ɔr    gɔs-ɛ 
flower-MEANS  over.flow-CVP  stay    NUM-CLF  very big    white rose-GEN  tree-NOM
prɔtʰɔmɔtɛ  ɛlɪs-ɔr drɪʃtɪ akɔrxɔn korɪ-lɛ.  kɪntʊ  taɪr 
at.first     alice-GEN attention attract do-3.PST.PFV  but      3.SG.F.GEN
asorjy-ɔr   xima  na-tʰak-ɪl      jetɪya taɪ      dekʰɪ-lɛ         jɛ hat-ɔt 
wonder-GEN limit   NEG-stay-PST   when 3.SG.NOM   see-3.PST.PFV  that hand-LOC
rɔŋ-ɔr       tɛma arʊ bʊrʊʃ  lo-ɪ      tɪnɪ-ta malɪ-ɛ   bɔr 
colour-GEN   container and brush    take-CVP  three-CLF gardener-NOM  great
byɔstɔ  bʰabɛ bɔga   gʊlap-bʊr-ɔt     rɔŋa  rɔŋ  xan-i           pʰʊrɪ-sɛ
busy      way white  rose-CLF-LOC  red   colour  paint-CVP  roam.around-3.PST.PROG

Nepali similarly uses the locative case marker in combination with a nominal:

Nepali
bagaica  bhitra  pas-ne                   bittikai  us-le           euta   thulo  gulāph-ko   rukh  dekh-i
garden    inside  enter-IPFV.PTCP  soon      3SG-ERG  one    big      rose-GEN  tree    see-PST.3SG.F
jas-mā      gulāph-ka        seta     phul     phul-eka                      thi-e                           tehã   tin-jana
that-LOC  rose-GEN.PL  while  flower  bloom-PFV.PTCP.PL  be.AUX-PST.3PL.F  there  three-CLF
mali         ubhiy-eka                       thi-e                       tiniharu-le  hāt-ma        burus  liyera         euta 
gardener  stand.up-PFV.PTCP.PL  be.AUX-PST.3PL  3PL-ERG  hand-LOC  brush  hold-CVB  one
euta  phul-lāi        gulāphi  rang-mā      rangau-dai     thi-e
one   flower-DAT  red          colour-LOC  colour-PROG  be.AUX-PST.3PL

Armenian uses the instrumental case marker -ov and a nominal:

Armenian
Partez-i          mutk’-i              mot   ach-el                   er                            mi              mets 
garden-GEN  enterance-DAT  near  grow-PRF.PTCP  be.AUX.3SG.PST  one.INDF  large
spitak  vard-eni.                       Surjy   kangn-ats              yerek’  partizpan
white   rose-tree.NOM.INDF   round  stand-RES.PTCP  three    gardener.NOM.INDF
shtap-shtap              karmir guyn-ov      nerk-um              ein                         vard-er-y.
hurriedly-hurriedly  red       colour-INS  paint-PRS.PTCP  be.AUX.3PL.PST  rose-PL-ACC.DEF

The remaining two languages are different: Lithuanian and Serbo-Croatian both use and adverbial marker, and no nominal, to encode the secondary predicate 'red':

Lithuanian
Prie  įėjim-o                         į       sod-ą                        aug-o            didel-is                  rož-ių
near  entrance-SG.M.GEN  into  garden-SG.M.ACC  grow-3.PST  large-SG.M.NOM rose-PL.F.GEN
krūm-as.                Rož-ės                 žydė-jo                      balt-ai           bet  prie   jų 
bush-SG.M.NOM  rose-PL.F.NOM  bloom-3SG.PL.PST  white-ADV  but  near  3PL.GEN
stovė-jo       trys               sodinink-ai              ir     paskubomis   daž-ė             žied-us 
stay-3.PST   three.NOM  gardener-PL.NOM  and  hastily.ADV  paint-3.PST   blossom-PL.M.ACC
raudon-ai.
red-ADV

Serbo-Croatian 
Kraj      ulaz-a                          u  vrt                            ras-l-o
next.to  entrance-M.GEN.SG  in  garden.M.ACC.SG  grow.IPFV-PST.ACT.PTCP-N.SG
je                   velik-o                 ruž-in-o                        drv-o.                  Ruž-ic-e 
be.PRS.3SG   big-N.NOM.SG  rose-ADJ-N.NOM.SG  tree-N.NOM.SG  rose-DIM-F.NOM.PL
koj-e                    su                 na  njemu           cva-l-e 
REL-F.NOM.PL  be.PRS.3PL  on  3SG.N.LOC  bloom.IPFV-PST.ACT.PTCP-F.PL
bi-l-e                                su                bijel-e                    ali su                 oko
be-PST.ACT.PTCP-F.PL  be.PRS.3PL  white-F.NOM.PL  but  be.PRS.3PL  around
njih           radi-l-a                                            tr-i                         vrtlar-a                       i 
3PL.GEN  work.IPFV-PST.ACT.PTCP-N.PL  three-M.NOM.PL  gardener-M.GEN.SG  and
žuri-l-a                                            se                da ih              što prije
hurry.IPFV-PST.ACT.PTCP-N.PL  REFL.ACC  to  3PL.ACC   as   soon.ADV
o-boj-e                                  crven-o.
PRFX-paint.IPFV-PRS.3PL  red-AD

So, basically we have four classes of translations of 'to paint red':

'bare' adjective:                    Dutch, English, German, Swedish, Greek, Irish
adposition plus adjective:    French, Italian, Portuguese, Romanian, Breton, Albanian, Hindi, Persian,        
                                             Polish, Russian
case marker plus adjective:  Latin, Latvian, Assamese, Nepali, Armenian
adverbial marker:                 Lithuanian, Serbo-Croatian

There are of course alternative ways of categorising the data. An alternative would be to code for the case relation that is employed, which would give us another typology:

no case relation:                  Dutch, English, German, Swedish, Greek (well, Greek is accusative
                                             technically), Irish, Lithuanian, Serbo-Croatian
locative:                               Italian, French, Portuguese, Romanian, Breton, Russian, Latvian,
                                             Assamese, Nepali
ablative:                               Latin
instrumental:                       Albanian, Armenian, Hindi
dative:                                 Polish, Persian

We can also look at the use of a nominal 'red colour' rather than a bare adjective/adverb 'red/red-ly', and find the following split:

'paint red':                            Dutch, English, German, Swedish, Greek, Irish, Lithuanian, Polish,
                                             Serbo-Croatian
'paint in red colour':             Italian, French, Portuguese, Romanian, Breton, Russian, Latvian,
                                             Assamese, Nepali, Latin, Albanian, Armenian, Hindi, Persian

The use of a nominal seems clearly related to the use of case marking, so if a language has case marking, it is more likely to use a clause 'paint in a red colour' rather than 'paint it red'. There are probably all kinds of interesting underlying case assignment issues involved.

This is just one sentence in one book, so me including it in my causative motion dataset was really just butterfly collecting. But sometimes it is nice to collect butterflies and these are particularly cool ones :). Merry Christmas - if you have read this far down you especially deserve it!



EDIT: Natalia on Twitter drew my attention to the 2015 dissertation on resultatives in the European languages by Benita Riaubienė, so cool! Riaubienė (2015) discusses resultative strategies in 31 European languages, focusing on telicity, causation, and verb semantics to explain the use of different strategies in different constructions and languages. I am so happy I wrote this post now, otherwise it might have been far longer before I found out about this thesis :).