- Moeljadi, David (2024) Onomatopoeia in Indonesian. In L. Körtvélyessy & P. Štekauer (Ed.), Onomatopoeia in the World’s Languages: A Comparative Handbook (pp. 837–848). Berlin, Boston: De Gruyter Mouton. (https://doi.org/10.1515/9783111053226-070)
- Winata, Genta, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, and Sebastian Ruder (2023) NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 815–834. Dubrovnik: Association for Computational Linguistics.
Abstract (click to toggle)
Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on developing resources for languages in Indonesia. Despite being the second most linguistically diverse country, most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia. Our resource includes datasets, a multi-task benchmark, and lexicons, as well as a parallel Indonesian-English dataset. We provide extensive analyses and describe the challenges when creating such resources. We hope that our work can spark NLP research on Indonesian and other underrepresented languages.
- Cahyawijaya, Samuel, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti (2022) NusaCrowd: Open Source Initiative for Indonesian NLP Resources. arXiv preprint arxiv:2212.09648.
Abstract (click to toggle)
We present NusaCrowd, a collaborative initiative to collect and unite existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have has brought together 137 datasets and 117 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their effectiveness has been demonstrated in multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and its local languages. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and its local languages. Our work is intended to help advance natural language processing research in under-represented languages.
- Kratochvíl, František, David Moeljadi, Benidiktus Delpada, Václav Kratochvíl, and Jiří Vomlel (2021) Aspectual pairing and aspectual classes in Abui. STUF-Language Typology and Universals, 74(3-4), 621-657.
Abstract (click to toggle)
This paper describes the aspectual classes in Abui, a Papuan language of the Timor-Alor-Pantar family. Abui innovated a system of aspectual stem pairing, realized by consonant mutation, vowel grading, and rime mutation. Although stem pairing is widespread (about 61% of the verbs alternate), about 38% of our 1,330 verb sample are unpaired and immutable. Abui verbal stems combine with aspectual affixes, adverbs and auxiliary verbs, whose distribution is used here together with the stem types to describe aspectual classes, which are understood as lexicalizations of transitional possibilities of lexical items (e.g. inchoative-stative vs. inchoative-gradual.inchoative-stative). The paper takes the bidimensional approach to aspect distinguishing between properties associated with the perfective-imperfective system and other aspectual marking (cf. Sasse, Hans-Jürgen. 2002. Recent activity in the theory of aspect: accomplishments, achievements, or just non-progressive state? Linguistic Typology 6(2). 199–271). Combining the features of both types of aspectual marking, we construct in a bottom-up fashion the aspectual classes in Abui and also show that these may be further refined if contextual features such as valency or degree of change (affectedness) were included. A characteristic feature of the Abui system is the elaborate system of stative-inchoative verbs sensitive to scalar and change properties (e.g. instant vs. gradual). Abui telic verbs show sensitivity to the properties of the resulting state and are formally associated with stem alternation.
- Moeljadi, David (2021) Providing Etymological Information for Sinitic Loanwords in The KBBI Indonesian Dictionary. In Amalia, Dora et al. (Eds.), Proceedings of the 14th International Conference of the Asian Association for Lexicography, 134–141. National Agency for Language Development and Cultivation, Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia.
Abstract (click to toggle)
This paper documents the process of adding the etymological information of loanwords from Sinitic languages in Indonesian language into the KBBI Indonesian dictionary fifth edition, the most comprehensive and authoritative Indonesian monolingual dictionary, published by The Language Development and Cultivation Agency, under the Ministry of Education and Culture. It is a part of the etymology project which involves experts from universities in Indonesia (Moeljadi et al. 2019). Data of Sinitic loanwords from various sources such as Schlegel (1891), Hamilton (1924), Png (1967), Leo (1976), Kong (1994), and Jones (2009) were compiled. Data selection is based on the dictionary headwords, thus words which are listed in the KBBI dictionary were chosen and further analyzed. Finally, a database of Sinitic loanwords for the KBBI dictionary was built. Historically, there are four major periods associated with external cultural and linguistic influence in the Indonesian archipelago: (1) Indian, (2) Chinese, (3) Islamic, and (4) European (Blust 2009). As of February 2021, the KBBI dictionary has etymological information of loanwords from Semitic languages (especially Arabic) and Indic languages (especially Sanskrit). Since languages in southern part of China were the early donor languages, it is worth adding the etymological information of loanwords from those languages into the KBBI dictionary. The earliest instance of a Sinitic loanword is tahu ‘bean curd’ which is attested in an Old Javanese inscription from the tenth century (Jones 2009). Some tools such as gunting ‘scissors’ which is also found in Old Javanese texts might be borrowed from a southern Chinese language (Blust 2009). In the early Ming period after 1368, various Sinitic loanwords were borrowed through trade such as opau ‘money belt, small wallet’ and honcoe ‘smoking pipe’ (Blust 2009). I found that there are more than 350 Sinitic loanwords in the KBBI dictionary. Regarding semantic domains, many of them are related to food, tradition and customs, and commerce, the rests are related to tools, clothes, kinship terms, martial arts, opium, prostitution, medicine, etc. Regarding donor languages, most of them are from Hokkien, others are from Cantonese, Hakka, and Mandarin.
- Moeljadi, David and Zakariya Pamuji Aminullah (2020) Building the Old Javanese Wordnet. In Nicoletta Calzolari et al. (Eds.), Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 2933–2939. European Language Resources Association.
Abstract (click to toggle)
This paper discusses the construction and the ongoing development of the Old Javanese Wordnet. The words were extracted from the digitized version of the Old Javanese–English Dictionary (Zoetmulder, 1982). The wordnet is built using the ‘expansion’ approach (Vossen, 1998), leveraging on the Princeton Wordnet’s core synsets and semantic hierarchy, as well as scientific names. The main goal of our project was to produce a high quality, human-curated resource. As of December 2019, the Old Javanese Wordnet contains 2,054 concepts or synsets and 5,911 senses. It is released under a Creative Commons Attribution 4.0 International License (CC BY 4.0). We are still developing it and adding more synsets and senses. We believe that the lexical data made available by this wordnet will be useful for a variety of future uses such as the development of Modern Javanese Wordnet and many language processing tasks and linguistic research on Javanese.
- Nomoto, Hiroki and David Moeljadi (2019) ‘Linguistic studies using large annotated corpora: Introduction’. In Hiroki Nomoto and David Moeljadi, eds. Linguistic studies using large annotated corpora. NUSA 67: 1–6. [Permanent URL:
http://repository.tufs.ac.jp/handle/10108/94450] [doi: 10.15026/94450]
- Moeljadi, David, Aditya Kurniawan, and Debaditya Goswami (2019) Building Cendana: a Treebank for Informal Indonesian. In Ryo Otoguro, Mamoru Komachi, and Tomoko Ohkuma (Eds.), Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, Future University Hakodate, 156–164. Tokyo: Waseda Institute for the Study of Language and Information.
Abstract (click to toggle)
This paper introduces Cendana, a treebank for informal Indonesian. The corpus is from a subset of online chat data between customer service staff and customers at Traveloka (traveloka.com), an online travel agency (OTA) from Indonesia that provides airline ticketing and hotel booking services. Lines of conversation text are parsed using the Indonesian Resource Grammar (INDRA) (Moeljadi et al., 2015), a computational grammar for Indonesian in the Head-Driven Phrase Structure Grammar (HPSG) framework (Pollard and Sag, 1994; Sag et al., 2003) and Minimal Recursion Semantics (MRS) (Copestake et al., 2005). The annotation was done using Full Forest TreeBanker (FFTB) (Packard, 2015). Our purpose is to create a treebank, as well as to develop INDRA for informal Indonesian. Testing on 2,000 lexically dense sentences, the coverage is 64.1% and 715 items or 35.8% was treebanked, with correct syntactic parses and semantics. INDRA has been developed by adding 6,741 new lexical items and 22 new rules, especially the ones for informal Indonesian. The treebank data was employed to build a Feature Forest-based Maximum Entropy Model Trainer. Testing against the annotated data, the precision was around 90%. Moreover, we leveraged the treebank data to develop a POS tagger and present benchmark results evaluating the same.
- Moeljadi, David, Ian Kamajaya, and Azhari Dasman Darnis (2019) Considerations for Providing Etymological Information in the KBBI Indonesian Dictionary. In Mehmet Gürlek, Ahmet Naim Çiçekler, and Yasin Taşdemir (Eds.), Proceedings of the 13th International Conference of the Asian Association for Lexicography, Istanbul University, 161–178. the Asian Association for Lexicography. Istanbul: Asos Publisher.
Abstract (click to toggle)
We discuss the inclusion of etymological information in the Indonesian dictionary KBBI (Kamus Besar Bahasa Indonesia) fifth edition. KBBI is the most comprehensive and authoritative Indonesian monolingual dictionary, published/launched by The Language Development and Cultivation Agency, under the Ministry of Education and Culture (Moeljadi et al. 2017, Kamajaya et al. 2017). It is mainly online-based (https://kbbi.kemdikbud.go.id), updated regularly, and will be enriched by etymological information from October 2019. This etymological information is valuable for the Indonesian language that has loanwords from various languages and language families: Austronesian (Old Javanese), Indo-European (Sanskrit, Persian, Portuguese, Dutch, English), Dravidian (Tamil), Semitic (Arabic), and Sinitic (Hokkien, Cantonese, Mandarin). The first etymology project began in the late 2000s, after the fourth edition was published. A team was formed based on the grouping of donor languages: (1) Arabic and Semitic languages, including Persian; (2) European languages (Dutch, English, and French); (3) Old Javanese; and (4) Chinese languages. Unfortunately, this project did not meet the target. Starting from the second etymology project, since 2016 each year we focus on two groups and involve experts from universities in Indonesia. We refer to previous work and references e.g. Jones et al. (2007). Etymology projects on Sanskrit and Old Javanese were carried out (2016~2017), after that Dutch (2017~) and Arabic (2018~). Data collection is based on the dictionary headwords. The technical part involves programming and database restructuration. The existing KBBI database will be augmented with etymological information-related tables containing the original scripts of the loanwords and the relationships within them as well as between them and the entries, enabling KBBI to present etymological relations of the loanwords accurately. We believe that the etymological information in KBBI will serve as a valuable resource and accompaniment to Indonesian historical, linguistics, and lexicography research.
- Moeljadi, David and Viola Ow (2018) Serial Verb Constructions in Indonesian: An HPSG Analysis and Its Computational Implementation. Journal of the Southeast Asian Linguistics Society (JSEALS) Special Publication, (2), 90–101.
Abstract (click to toggle)
This paper discusses syntactic and semantic properties of Serial Verb Constructions (SVCs) in standard Indonesian. Analyses of Indonesian SVCs can be found in Englebretson (2003), as well as in reference grammars such as Sneddon et al. (2010). A syntactic analysis in Lexical Functional Grammar (LFG) (Kaplan and Bresnan 1982; Dalrymple 2001) was done by Arka (2000). However, no work has been done on modeling Indonesian SVCs within the Head Driven Phrase Structure Grammar (HPSG) framework (Pollard and Sag 1994) and Minimal Recursion Semantics (MRS) (Copestake et al. 2005). This paper aims to fill this gap. As for our data source, we employ the Indonesian section of the Nanyang Technological University — Multilingual Corpus (NTU-MC) (Tan and Bond 2012). We wrote a Python script to extract the Indonesian SVCs. Our HPSG analysis is implemented and tested in the Indonesian Resource Grammar (INDRA), a computational grammar for Indonesian (Moeljadi et al. 2015).
- Moeljadi, David and Francis Bond (2018) HPSG Analysis and Computational Implementation of Indonesian Passives. In Müller, Stefan and Frank Richter (Eds.), Proceedings of The 25th International Conference on Head-Driven Phrase Structure Grammar, University of Tokyo, 129–139. Stanford, CA: CSLI Publications.
Abstract (click to toggle)
This study aims to analyze and develop a detailed model of syntax and semantics of passive sentences in standard Indonesian in the framework of Head-Driven Phrase Structure Grammar (HPSG) (Pollard & Sag, 1994; Sag et al., 2003) and Minimal Recursion Semantics (MRS) (Copestake et al.,2005), explicit enough to be interpreted by a computer, focusing on implementation rather than theory. There are two main types of passive in Indonesian, following Sneddon et al. (2010, pp. 256-260) and Alwi et al. (2014, pp. 352-356), called ‘passive type 1’ (P1) and ‘passive type 2’ (P2). Both types were analyzed and implemented in the Indonesian Resource Grammar (INDRA), a computational grammar for Indonesian (Moeljadi et al., 2015).
- Nomoto, Hiroki, Hannah Choi, David Moeljadi, and Francis Bond (2018) MALINDO Morph: Morphological dictionary and analyser for Malay/Indonesian. In Kiyoaki Shirai (ed.) Proceedings of the LREC 2018 Workshop "The 13th Workshop on Asian Language Resources", pp 36–43. Miyazaki.
Abstract (click to toggle)
Malay/Indonesian lacked an open wide-coverage dictionary that can be used for both NLP tasks and non-NLP purposes. The MALINDO Morph morphological dictionary is the first such dictionary. It provides morphological information (root, prefix, suffix, circumfix, reduplication) for roughly 232K surface forms. The entry forms are those found in the authoritative dictionaries in Malaysia (Kamus Dewan) and Indonesia (Kamus Besar Bahasa Indonesia) (core dictionary) as well as frequent words in the Leipzig Corpora Collection (Goldhahn et al., 2012) (expanded dictionary). The morphological analyses were checked by hand for all surface forms, except for (i) basic and di- forms in the expanded dictionary whose existence is predicted from the corresponding meN- active forms in the core dictionary and (ii) the case variants of the items in the core dictionary. This paper also discusses the morphological analyser that we developed to create our morphological dictionary. Our morphological analyser is more linguistically rigorous than previous morphological analysers and stemmers/lemmatizers such as MorphInd (Larasati et al., 2011) because it takes into account circumfixes, which have previously been neglected, largely due to a misunderstanding among NLP researchers that circumfixes are no more than combinations of a prefix and a suffix.
- Nomoto, Hiroki, Kenji Okano, David Moeljadi, and Hideo Sawada (2018) TUFS Asian Language Parallel Corpus (TALPCo). In Proceedings of the Twenty-fourth Annual Meeting of the Association for Natural Language Processing, Kyoto, pp 436–439.
Abstract (click to toggle)
The TUFS Asian Language Parallel Corpus (TALPCo) is an open parallel corpus consisting of Japanese sentences and their translations into Burmese, Malay, Indonesian and English. This paper describes how we built it and its notable features, especially those pertaining to the choice of Japanese as the source language of translation.
- Kamajaya, Ian, David Moeljadi, and Dora Amalia (2017) KBBI Daring: A Revolution in The Indonesian Lexicography. In Electronic lexicography in the 21st century. Proceedings of eLex 2017 conference., Leiden, pp 513–530.
Abstract (click to toggle)
Kamus Besar Bahasa Indonesia (KBBI) is the official dictionary of the Indonesian language, published by Badan Pengembangan dan Pembinaan Bahasa (The Language Development and Cultivation Agency) or Badan Bahasa, under the Ministry of Education and Culture, Republic of Indonesia. The current, fifth edition of KBBI (Amalia, 2016) was launched on 28 October 2016 and contains more than 100,000 entries and 120,000 senses. It is available in three formats: printed, online, and offline mobile applications. The online version, called KBBI Dalam Jaringan or KBBI Daring (kbbi.kemdikbud.go.id), is categorized as Dictionary Writing System (DWS) (Atkins & Rundell, 2008). Through it, we invite online public participation to make proposals to add and to edit entries, senses, and examples. We are changing our workflow from manual to computerized work which has greatly reduced the time needed to make a dictionary. KBBI Daring greatly expands the database which was previously made for the fourth edition of KBBI (Sugono, 2008) using the data in Microsoft Excel and Word files (Moeljadi et al., 2017), fitting to its online usage. This paper describes our efforts in building the KBBI Daring which has revolutionized both the way people use a dictionary and the lexicographical workflow of the editorial staff in Indonesia.
- Moeljadi, David (2017) Building JATI: A Treebank for Indonesian. In Proceedings of The 4th Atma Jaya Conference on Corpus Studies (ConCorps 4), Jakarta, pp 1–9.
Abstract (click to toggle)
This paper introduces and describes the ongoing construction of a new lexical resource for Indonesian: the JATI treebank. It is being built from a subset of parsed dictionary definition sentences. The main data for this study comes from the fifth edition of Kamus Besar Bahasa Indonesia (KBBI) (Amalia, 2016), the official and the most comprehensive dictionary for the Indonesian language. The dictionary definition sentences are parsed using the Indonesian Resource Grammar (INDRA) (Moeljadi, Bond, and Song, 2015), a computational grammar for Indonesian in the Head-Driven Phrase Structure Grammar (HPSG) framework (Sag, Wasow, and Bender, 2003). JATI will be employed to build an ontology, in which knowledge is extracted from the semantic representation in Minimal Recursion Semantics (MRS) (Copestake et al., 2005).
- Moeljadi, David, Ian Kamajaya, and Dora Amalia (2017) Building the Kamus Besar Bahasa Indonesia (KBBI) Database and Its Applications. In Proceedings of The 11th International Conference of the Asian Association for Lexicography, Guangzhou, pp 64–80.
Abstract (click to toggle)
The official dictionary of the Indonesian language, Kamus Besar Bahasa Indonesia (KBBI), is published by Badan Pengembangan dan Pembinaan Bahasa (The Language Development and Cultivation Agency) or Badan Bahasa, under the Ministry of Education and Culture of the Republic of Indonesia. The fourth edition of KBBI (Sugono 2008) has more than 92,000 entries and 100,000 senses and contains a wealth of linguistic information and cultural diversity of Indonesia. However, the data was available only in Microsoft Excel and Word files in exactly the same format as the one in the printed dictionary. Its online edition was only meant for basic word search by entry words. Thus, in order to create an online dictionary application which has advanced search capabilities, building a database is very vital: the data structure needs to be identified and the data itself needs to be cleaned so that it can be broken down based on its components. Atkins and Rundell (2008: 114) state that a database is one of the three main components of Dictionary Writing System (DWS). This paper describes our efforts in building the KBBI database in SQLite (www.sqlite.org) using Python programming language (www.python.org) and presents some applications for lexicographic and linguistic research and analysis. The KBBI database is employed for the online DWS application called KBBI Dalam Jaringan or KBBI Daring (https://kbbi.kemdikbud.go.id) (Kamajaya et al. 2017), the offline KBBI mobile applications in Android and iOS, and the printing of the latest, fifth edition of KBBI (Amalia 2016).
- Le, Tuan Anh, David Moeljadi, Yasuhide Miura, and Tomoko Ohkuma (2016) Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets. In Proceedings of The 12th Workshop on Asian Language Resources, Osaka, pp 123–131.
Abstract (click to toggle)
This paper describes our attempt to build a sentiment analysis system for Indonesian tweets. With this system, we can study and identify sentiments and opinions in a text or document computationally. We used four thousand manually labeled tweets collected in February and March 2016 to build the model. Because of the variety of content in tweets, we analyze tweets into eight groups in total, including pos(itive), neg(ative), and neu(tral). Finally, we obtained 73.2% accuracy with Long Short Term Memory (LSTM) without normalizer.
- Moeljadi, David, Francis Bond, and Luís Morgado da Costa (2016) Basic copula clauses in Indonesian. In Proceedings of the Joint 2016 Conference on Head-driven Phrase Structure Grammar and Lexical Functional Grammar, Warsaw, pp 442–456.
Abstract (click to toggle)
We want to show how basic copula clauses in Indonesian can be dealt with within the framework of Head Driven Phrase Structure Grammar (HPSG) (Pollard & Sag, 1994). We analyzed three types of basic copula clauses in Indonesian: copula clauses with noun phrase complements (NP) expressing the notions of 'proper inclusion' and 'equation', adjective phrases (AP) expressing 'attribution', and prepositional phrases (PP) expressing relationships such as 'location'. Our analysis is implemented in the Indonesian Resource Grammar (INDRA), a computational grammar for Indonesian (Moeljadi et al., 2015).
- Moeljadi, David and Francis Bond (2016) Identifying and Exploiting Definitions in Wordnet Bahasa. In Proceedings of the Eighth Global WordNet Conference, Bucharest, pp 226–232.
Abstract (click to toggle)
This paper describes our attempts to add Indonesian definitions to synsets in the Wordnet Bahasa (Nurril Hirfana Mohamed Noor et al., 2011; Bond et al., 2014), to extract semantic relations between lemmas and definitions for nouns and verbs, such as synonym, hyponym, hypernym and instance hypernym, and to generally improve Wordnet. The original, somewhat noisy, definitions for Indonesian came from the Asian Wordnet project (Riza et al., 2010). The basic method of extracting the relations is based on Bond et al. (2004). Before the relations can be extracted, the definitions were cleaned up and tokenized. We found that the definitions cannot be completely cleaned up because of many misspellings and bad translations. However, we could identify four semantic relations in 57.10% of noun and verb definitions. For the remaining 42.90%, we propose to add 149 new Indonesian lemmas and make some improvements to Wordnet Bahasa and Wordnet in general.
- Moeljadi, David, Francis Bond, and Sanghoun Song (2015) Building an HPSG-based Indonesian Resource Grammar (INDRA). In Proceedings of the Grammar Engineering Across Frameworks (GEAF) Workshop, 53rd Annual Meeting of the ACL and 7th IJCNLP, pp. 9–16, Beijing, China, July 26-31, 2015.
Abstract (click to toggle)
This paper presents the creation and the initial stage development of a broad-coverage Indonesian Resource Grammar (INDRA) within the framework of Head Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994) and Minimal Recursion Semantics (MRS) (Copestake et al., 2005). At the present stage, INDRA focuses on verbal constructions and subcategorization since they are fundamental for argument and event structure. Verbs in INDRA were semi-automatically acquired from the English Resource Grammar (ERG) (Flickinger, 2000) via Wordnet Bahasa (Nurril Hirfana Mohamed Noor et al., 2011; Bond et al., 2014). In the future, INDRA will be used in the development process of machine translation. A preliminary evaluation of INDRA on the MRS test-suite shows promising coverage.
- Moeljadi, David (2014) Usage of Indonesian Possessive Verbal Predicates: A Statistical Analysis Based on Storytelling Survey. Tokyo University Linguistic Papers 35: 155–176 (URI: http://repository.dl.itc.u-tokyo.ac.jp/dspace/handle/2261/56385).
Abstract (click to toggle)
This paper deals with possessive verbal predicates in Indonesian, both the present high variety which is originally based on Riau Malay, and the present low variety, which is called 'Colloquial Jakartan Indonesian' in Sneddon (2006). The eight predicates in Moeljadi (2010) are chosen as the object of discussion: three possessive verb predicates, memiliki, mempunyai, and punya; two existential verb predicates, ada and ada ...=nya; and three predicates with the denominal affixes: ber-, ber-...-kan, and -an. This paper tries to answer whether the frequency of occurrence of each possessive verbal predicate differs according to whether it appears in the high or low variety. A storytelling survey was conducted in Malang and Tokyo in 2011, in order to determine the speakers' choice of predicates in both varieties, based on the assumption that the speakers choose different possessive verbal predicates for different varieties. A statistical index, the correlation coefficient, are employed to investigate the statistical relationship between the low variety's tokens and the predicate tokens. The main result is that memiliki is primarily used in the high variety, while punya is very frequently used in the low variety.
- Nagaya, Naonori and David Moeljadi (2013) Five levels in Indonesian. In: Tasaku Tsunoda (ed.) Five levels in clause linkage (Vol. 1), pp. 281–303. Joso City, Ibaraki Prefecture, Japan: Matsueda Publishers.
Abstract (click to toggle)
This paper examines clause linkage patterns in Indonesian, an Austronesian language of Indonesia, from a perspective of a five-level classification of clause linkage advanced by M. Tsunoda (2004, 2012, this volume). Focusing on the standard variety of Indonesian, we point out two findings about subordinate structures in this language. First, clause linkage markers in Indonesian cover a wide range of semantico-pragmatic relations: almost all markers can be used for expressing clause linkage patterns of any level. Second, the more morphologically marked a clause linkage marker is, the more specific its meaning is. The paper is organized as follows. Section 2 offers preliminary information on Indonesian, ranging from historical and sociolinguistic backgrounds to the structure of subordinate constructions. In Section 3, subordination structures are investigated in depth. In Section 4, we present the summary of our survey and discuss our findings. Lastly, this paper is concluded in Section 5.
- Moeljadi, David (2011) Possessive verbal predicate constructions in Indonesian. Tokyo University Linguistic Papers 31: 117–133 (URI: http://repository.dl.itc.u-tokyo.ac.jp/dspace/handle/2261/52732).
Abstract (click to toggle)
This paper deals with verbal predicate constructions used to express 'possession' in Indonesian (both 'formal Indonesian' and 'Colloquial Jakartan Indonesian'). In Moeljadi (2010), I stated that there are eight possessive verbal predicate constructions in Indonesian, i.e. X memiliki Y, X mempunyai Y, X punya Y, X ada Y, X ada Y=nya, X ber-Y, X ber-Y-kan Z, and X Y-an (X represents 'possessor', Y represents 'possessee' or 'possessum', and Z represents a complement.). The analysis of how Indonesian encodes one 'possession' concept to more than one constructions shown above has mainly been based on intuition as a native speaker of Indonesian. The conclusion is that the 'register' and the '(in)alienability' notion play important roles in the encoding process. I previously analyzed this based on intuition in Moeljadi (2010), but this time I conducted interviews in 2010 and 2011 in order to make an objective analysis. The data I got from those interviews were then analyzed using cluster analysis. I conclude that (i) only five constructions, i.e. X memiliki Y, X mempunyai Y, X punya Y, X ada Y, X ber-Y, can be regarded as encoding the meaning of 'possession', (ii) one construction, i.e. X ber-Y, has a special characteristic and takes a different kind of possessee, and (iii) whether the possessor is singular, plural, the first, second, or third personal pronoun, the acceptability of the constructions does not change.