Wednesday, January 29, 2020

Corpus Analysis and Linguistics Theory by Charles F. Meyer. (2004)

When the first computer corpus, the Brown Corpus, was being created in the early 1960s, generative grammar dominated linguistics, and there was little tolerance for approaches to linguistic study that did not adhere to what generative grammarians deemed acceptable linguistic practice. As a consequence, even though the creators of the Brown Corpus, W. Nelson Francis and Henry Kuˇcera, are now regarded as pioneers and visionaries in the corpus linguistics community, in the 1960s their efforts to create a machine-readable corpus of English were not warmly accepted by many members of the linguistic community. 

W. Nelson Francis (1992: 28) tells the story of a leading generative grammarian of the time characterizing the creation of the Brown Corpus as “a useless and foolhardy enterprise” because “the only legitimate source of grammatical knowledge” about a language was the intuitions of the native speaker, which could not be obtained from a corpus. Although some linguists still hold to this belief, linguists of all persuasions are now far more open to the idea of using linguistic corpora for both descriptive and theoretical studies of language. 

Moreover, the division and divisiveness that has characterized the relationship between the corpus linguist and the generative grammarian rests on a false assumption: that all corpus linguists are descriptivists, interested only in counting and categorizing constructions occurring in a corpus, and that all generative grammarians are theoreticians unconcerned with the data on which their theories are based. Many corpus linguists are actively engaged in issues of language theory, and many generative grammarians have shown an increasing concern for the data upon which their theories are based, even though data collection remains at best a marginal concern in modern generative theory.
sussy_corpus linguistics


To explain why corpus linguistics and generative grammar have had such an uneasy relationship, and to explore the role of corpus analysis in linguistic theory, this chapter first discusses the goals of generative grammar and the three types of adequacy (observational, descriptive, and explanatory) that Chomsky claims linguistic descriptions can meet. Investigating these three types of adequacy reveals the source of the conflict between the generative grammarian and the corpus linguist: while the generative grammarian strives for explanatory adequacy (the highest level of adequacy, according to Chomsky), the corpus linguist aims for descriptive adequacy (a lower level of adequacy), and it is arguable whether explanatory adequacy is even achievable through corpus analysis. 

However, even though generative grammarians and corpus linguists have different goals, it is wrong to assume that the analysis of corpora has nothing to contribute to linguistic theory: corpora can be invaluable resources for testing out linguistic hypotheses based on more functionally based theories of grammar, i.e. theories of language more interested in exploring language as a tool of communication. And the diversity of text types in modern corpora makes such investigations quite possible, a point illustrated in the middle section of the chapter, where a functional analysis of coordination ellipsis is presented that is based on various genres of the Brown Corpus and the International Corpus of English. Although corpora are ideal for functionally based analyses of language, they have other uses as well, and the final section of the chapter provides a general survey of the types of linguistic analyses that corpora can help the linguist conduct and the corpora available to carry out these analyses. (P. 1 – 2)

REFERENCE
Charles F. Meyer. (2004). English Corpus Linguistics An Introduction. New York: Cambridge University Press.

Linguistics Theory and Description by Charles F. Meyer. (2004)

Chomsky has stated in a number of sources that there are three levels of “adequacy” upon which grammatical descriptions and linguistic theories can be evaluated: observational adequacy, descriptive adequacy, and explanatory adequacy.
sussy_Linguistics Theory

If a theory or description achieves observational adequacy, it is able to describe which sentences in a language are grammatically well formed. Such a description would note that in English while a sentence such as He studied for the exam is grammatical, a sentence such as *studied for the exam is not. To achieve descriptive adequacy (a higher level of adequacy), the description or theory must not only describe whether individual sentences are well formed but in addition specify the abstract grammatical properties making the sentences well formed. Applied to the previous sentences, a description at this level would note that sentences in English require an explicit subject. Hence, *studied for the exam is ungrammatical and He studied for the exam is grammatical. The highest level of adequacy is explanatory adequacy, which is achieved when the description or theory not only reaches descriptive adequacy but does so using abstract principles which can be applied beyond the language being considered and become a part of “Universal Grammar.” At this level of adequacy, one would describe the inability of English to omit subject pronouns as a consequence of the fact that, unlike Spanish or Japanese, English is not a language which permits “pro-drop,” i.e. the omission of a subject pronoun that is recoverable from the context or deducible from inflections on the verb marking the case, gender, or number of the subject.

Within Chomsky’s theory of principles and parameters, pro-drop is a consequence of the “null-subject parameter” (Haegeman 1991: 17–20). This parameter is one of many which make up universal grammar, and as speakers acquire a language, the manner in which they set the parameters of universal grammar is determined by the norms of the language they are acquiring. Speakers acquiring English would set the null-subject parameter to negative, since English does not permit pro-drop; speakers of Italian, on the other hand, would set the parameter to positive, since Italian permits pro-drop (Haegeman 1991: 18). 

Because generative grammar has placed so much emphasis on universal grammar, explanatory adequacy has always been a high priority in generative grammar, often at the expense of descriptive adequacy: there has never been much emphasis in generative grammar in ensuring that the data upon which analyses are based are representative of the language being discussed, and with the notion of the ideal speaker/hearer firmly entrenched in generative grammar, there has been little concern for variation in a language, which traditionally has been given no consideration in the construction of generative theories of language. This trend has become especially evident in the most recent theory of generative grammar: minimalist theory.

In minimalist theory, a distinction is made between those elements of a language that are part of the “core” and those that are part of the “periphery.” The core is comprised of “pure instantiations of UG” and the periphery “marked exceptions” that are a consequence of “historical accident, dialect mixture, personal idiosyncracies, and the like” (Chomsky 1995: 19–20). Because “variation is limited to nonsubstantive elements of the lexicon and general properties of lexical items” (Chomsky 1995: 170), those elements belonging to the periphery of a language are not considered in minimalist theory; only those elements that are part of the core are deemed relevant for purposes of theory construction. This idealized view of language is taken because the goal of minimalist theory is “a theory of the initial state,” that is, a theory of what humans know about language “in advance of experience” (Chomsky 1995: 4) before they encounter the real world of the language they are acquiring and the complexity of structure that it will undoubtedly exhibit.

This complexity of structure, however, is precisely what the corpus linguist is interested in studying. Unlike generative grammarians, corpus linguists see complexity and variation as inherent in language, and in their discussions of language, they place a very high priority on descriptive adequacy, not explanatory adequacy. Consequently, corpus linguists are very skeptical of the highly abstract and decontextualized discussions of language promoted by generative grammarians, largely because such discussions are too far removed from actual language usage. Chafe (1994: 21) sums up the disillusionment that corpus linguists have with purely formalist approaches to language study, noting that they “exclude observations rather than . . . embrace ever more of them” and that they rely too heavily on “notational devices designed to account for only those aspects of reality that fall within their purview, ignoring the remaining richness which also cries out for understanding.” The corpus linguist embraces complexity; the generative grammarian pushes it aside, seeking an ever more restrictive view of language.

Because the generative grammarian and corpus linguist have such very different views of what constitutes an adequate linguistic description, it is clear why these two groups of linguists have had such a difficult time communicating and valuing each other’s work. As Fillmore (1992: 35) jokes, when the corpus linguist asks the theoretician (or “armchair linguist”) “Why should I think that what you tell me is true?”, the generative grammarian replies back “Why should I think that what you tell me is interesting?” (emphasis added). Of primary concern to the corpus linguist is an accurate description of language; of importance to the generative grammarian is a theoretical discussion of language that advances our knowledge of universal grammar.

Even though the corpus linguist places a high priority on descriptive adequacy, it is a mistake to assume that the analysis of corpora has nothing to offer to generative theory in particular or to theorizing about language in general. The main argument against the use of corpora in generative grammar, Leech (1992) observes, is that the information they yield is biased more towards performance than competence and is overly descriptive rather than theoretical. However, Leech (1992: 108) argues that this characterization is overstated: the distinction between competence and performance is not as great as is often claimed, “since the latter is the product of the former.” Consequently, what one discovers in a corpus can be used as the basis for whatever theoretical issue one is exploring. In addition, all of the criteria applied to scientific endeavors can be satisfied in a corpus study, since corpora are excellent sources for verifying the falsifiability, completeness, simplicity, strength, and objectivity of any linguistic hypothesis (Leech 1992: 112–13).

Despite Leech’s claims, it is unlikely that corpora will ever be used very widely by generative grammarians, even though some generative discussions of language have been based on corpora and have demonstrated their potential for advancing generative theory. Working within the framework of government and binding theory (the theory of generative grammar preceding minimalist theory), Aarts (1992) used sections of the corpus housed at the Survey of English Usage at University College London to analyze “small clauses” in English, constructions like her happy in the sentence I wanted her happy that can be expanded into a clausal unit (She is happy). By using the London Corpus, Aarts (1992)was not only able to provide a complete description of small clauses in English but to resolve certain controversies regarding small clauses, such as establishing the fact that they are independent syntactic units rather than simply two phrases, the first functioning as direct object and the second as complement of the object.

Haegeman (1987) employed government and binding theory to analyze empty categories (i.e. positions in a clausewhere some element is missing) in a specific genre of English: recipe language. While Haegeman’s investigation is not based on data from any currently available corpus, her analysis uses the type of data quite commonly found in corpora. Haegeman (1987) makes the very interesting claim that parametric variation (such as whether or not a language exhibits pro-drop) does not simply distinguish individual languages from one another but can be used to characterize regional, social, or register variation within a particular language. She looks specifically at examples from the genre (or register) of recipe language that contain missing objects (marked by the letters [a], [b], etc. in the example below):
(1) Skin and bone chicken, and cut [a] into thin slices. Place [b] in bowl with mushrooms. Pur´ee remaining ingredients in blender, and pour [c] over chicken and mushrooms. Combine [d] and chill [e] well before serving. (Haegeman 1987: 236–7)
Government and binding theory, Haegeman (1987: 238) observes, recognizes four types of empty categories, and after analyzing a variety of different examples of recipe language, Haegeman concludes that this genre contains one type of empty category, wh-traces, not found in the core grammar of English (i.e. in other genres or regional and social varieties of English).

What distinguishes Haegeman’s (1987) study from most other work in generative grammar is that she demonstrates that theoretical insights into universal grammar can be obtained by investigating the periphery of a language as well as the core. And since many corpora contain samples of various genres within a language, they are very well suited to the type of analysis that Haegeman (1987) has conducted. Unfortunately, given the emphasis in generative grammar on investigations of the core of a language (especially as reflected in Chomsky’s recent workin minimalism), corpora will probably never have much of a role in generative grammar. For this reason, corpora are much better suited to functional analyses of language: analyses that are focused not simply on providing a formal description of language but on describing the use of language as a communicative tool. (P. 2 – 5)

REFERENCE   
Charles F. Meyer. (2004). English Corpus Linguistics An Introduction. New York: Cambridge University Press.
   

Wednesday, January 22, 2020

Developmental Linguistics in Language

Readers familiar with small children will know that they generally produce their first recognizable word (e.g. Dada or Mama) round about their first birthday; from then until the age of about one year, six months, children’s speech consists largely of single words spoken in isolation (e.g. a child wanting an apple will typically say ‘Apple’). At this point, children start to form elementary phrases and sentences, so that a child wanting an apple at this stage might say “Want apple”. From then on, we see a rapid growth in children’s grammatical development, so that by the age of two years, six months, most children are able to produce adult-like sentences such as “Can I have an apple?”

sussy_linguistics
From this rough characterization of development, a number of tasks emerge for the developmental linguist. Firstly, it is necessary to describe the child’s development in terms of a sequence of grammars. After all, we know that children become adults, and we are supposing that, as adults, they are native speakers who have access to a mentally represented grammar. The natural assumption is that they move towards this grammar through a sequence of “incomplete” or “immature” grammars. Secondly, it is important to try to explain how it is that after a period of a year and a half in which there is no obvious sign of children being able to form sentences, between one-and-a-half and two-and-a-half years of age there is a ‘spurt’ as children start to form more and more complex sentences, and a phenomenal growth in children’s grammatical development. This uniformity and (once the ‘spurt’ has started) rapidity in the pattern of children’s linguistic development are central facts which a theory of language acquisition must seek to explain. But how?

Chomsky maintains that the most plausible explanation for the uniformity and rapidity of first language acquisition is to posit that the course of acquisition is determined by a biologically endowed innate language faculty (or language acquisition program, to borrow a computer software metaphor) within the human brain. This provides children with a genetically transmitted set of procedures for developing a grammar which enables them to produce and understand sentences in the language they are acquiring on the basis of their linguistic experience (i.e. on the basis of the speech input they receive). 

Children acquiring a language will observe people around them using the language, and the set of expressions in the language which the child hears (and the contexts in which they are used) in the course of acquiring the language constitute the child’s linguistic experience of the language. This experience serves as input to the child’s language faculty, which provides the child with a set of procedures for analyzing the experience in such a way as to devise a grammar of the language being acquired. Chomsky’s hypothesis that the course of language acquisition is determined by an innate language faculty is known popularly as the innateness hypothesis. 

Invocation of an innate language faculty becoming available to the child only at some genetically determined point may constitute a plausible approach to the questions of uniformity and rapidity, but there is an additional observation which suggests that some version of the innateness hypothesis must be correct. This is that the knowledge of a language represented by an adult grammar appears to go beyond anything supplied by the child’s linguistic experience. A simple demonstration of this is provided by the fact that adult native speakers are not only capable of combining words and phrases in acceptable ways but also of recognizing unacceptable combinations. The interesting question this raises is: where does this ability come from? An obvious answer to this question is: that the child’s linguistic experience provides information on unacceptable combinations of words and phrases. But this is incorrect. Why do we assert this with such confidence?

Obviously, when people speak, they do make mistakes (although research has shown that language addressed to children is almost completely free of such mistakes). However, when this happens, there is no clear signal to the child indicating that an adult utterance contains a mistake, that is, as far as the child is concerned, an utterance containing a mistake is just another piece of linguistic experience to be treated on a par with error-free utterances. Furthermore, it has been shown that adults’ ‘corrections’ of children’s own speech do not take systematic account of whether children are producing syntactically acceptable or unacceptable combinations of words and phrases; parents do ‘correct’ their children, but when they do this, it is to ensure that children speak truthfully; grammatical correctness is not their target. Overall, there is compelling evidence that children do not receive systematic exposure to information about unacceptable sequences, and it follows that in this respect the child’s linguistic experience is not sufficient to justify the adult grammar. From this poverty of the stimulus argument it follows that something must supplement linguistic experience and the innate language faculty fulfils this role.

Now, it is important to underline the fact that children have the ability to acquire any natural language, given appropriate experience of the language: for example, a British child born of monolingual English-speaking parents and brought up by monolingual Japanese-speaking parents in a Japanese-speaking community will acquire Japanese as a native language. From this it follows that the contents of the language faculty must not be specific to any one human language: if the language faculty accounts for the uniformity and rapidity of the acquisition of English, it must also account for the uniformity and rapidity of the acquisition of Japanese, Russian, Swahili, etc.; and if the language faculty makes up for the insufficiency of a child’s experience of English in acquiring a grammar of English, it must also make up for the insufficiency of a child’s experience of Japanese in acquiring a grammar of Japanese, for the insufficiency of a child’s experience of Russian in acquiring a grammar of Russian, for the insufficiency of a child’s experience of Swahili in acquiring a grammar of Swahili, etc. This entails, then, that the language faculty must incorporate a set of UG principles (i.e. principles of Universal Grammar) which enable the child to form and interpret sentences in any natural language. Thus, we see an important convergence of the interests of the linguist and the developmental linguist, with the former seeking to formulate UG principles on the basis of the detailed study of the grammars of adult languages and the latter aiming to uncover such principles by examining children’s grammars and the conditions under which they emerge.

In the previous paragraph, we have preceded ‘language’ with the modifier “human”, and genetic transmission suggests that a similar modifier is appropriate for ‘language faculty’. The language faculty is species-specific and the ability to develop a grammar of a language is unique to human beings. This ability distinguishes us from even our nearest primate cousins, the great apes such as chimpanzees and gorillas, and in studying it we are therefore focusing attention on one of the defining characteristics of what it means to be a human being. There have been numerous attempts to teach language to other species, and success in this area would seriously challenge the assertion we have just made. Indeed, it has proved possible to teach chimpanzees a number of signs similar to those employed in the Sign Languages used as native languages by the deaf, and it has been reported that pigmy chimpanzees can understand some words of spoken English, and even follow a number of simple commands. Such research arouses strong emotions, and, of course, we are not in a position to assert that it will never produce dramatic results. At the moment, however, we can maintain that all attempts, however intensive, to teach grammatical knowledge to apes have been spectacular failures when the apes’ accomplishments are set alongside those of a normal three-year-old child. As things stand, the evidence is firmly in favors of the species-specificity of the language faculty.  (Page. 6 - 9)   
REFERENCES

Andrew radford, Martin atkinson, David britain, Harald clahsen and Andrew spencer. (2009). Linguistics an Introduction. New York: Cambridge University Press.

Monday, January 13, 2020

Understanding of Phonology

  Trevor Harley (2001)

  1. Phonology describes the sound categories each language uses to divide up the space of possible sounds. (p. 28) 
  2. Phonology is the study of phonemes. (29) 
  3. Phonology describes the sound categories each language uses to divide up the space of possible sounds. (p.46) 
  4. Phonology is the study of sounds and how they relate to languages; phonology describes the sound categories each language uses to divide up the space of possible sounds
sussy_phonology

Parviz Birjandi & Mohammad Ali Salmani-Nodoushan (2005)

  1. Phonology is the study of all aspects of the sounds and sound system of a language. It includes two major sub-branches: (a) phonetics, and (b) phonemics. (p.6) 
  2. The abstract nature of phonology implies firm boundaries to the segment, and a straightforward conversion from abstract to concrete (phonemic representation to phonetic representation) as phonetics realizes phonology. (p.143) 

     

     Joan Bybee (2003)

  1. The goal of phonology as conceived by generative theory is to describe the following phenomena: (i) the relations among similar but physically distinct sounds that are nonetheless taken to be ‘the same’ in some sense (allophonic relations), (ii) the relations among variants of morphemes as they occur in different contexts, (iii) phonological units of various sizes -features, segments, syllables, feet, and so on, and (iv) language-specific and universal properties of these relations and units. (p. 19)   

    Linda Shockey (2003)


    1. One could take the stance that phonology deals only with the relationship between sound units in a language (segmental and suprasegmental) and meaning (provided you are referring to lexical rather than indexical meaning). Truly phonological events would then involve exchanges of sound units which made a difference in meaning. (p. 4) 
    2. Phonology could be seen as the study of meaning-changing sound units and their representatives in different environments, regardless of whether they change the meaning, and with no constraints on the relationship between the abstract phoneme and its representatives in speech: anything can change to anything else, as long as the change is regular/predictable, that is, as long as the linkage to the underlying phonemic identity of each item is discoverable. (p.4) 
    3. .Phonology could be seen as the study of meaning-bearing sound units and their representatives in different environments, regardless of whether they change the meaning, with the addition of constraints as to what sorts of substitutions are likely or even possible. (p.5) 
    4. Phonology is the systematic study of the pronunciation/perception targets and processes used by native speakers of a language in everyday life. (p. 10)


    REFERENCES

    Joan Bybee. (2003). Phonology and Language Use. New York: CambridgeUniversity Press.

    Linda Shockey. (2003). Sound Patterns of Spoken English. United Kingdom: Blackwell Publishing.

    Parviz Birjandi & Mohammad Ali Salmani-Nodoushan. (2005). An Introduction to Phonetics. Iran: Zabankadeh Publications.

    Trevor Harley. (2001). The Psychology of Language from Data to Theory. New York: Psychology Press.

What Kind of Linguistics


Some descriptive linguists define basic linguistic documentation as reflecting the very essence of the discipline of linguistics, which they regard as taking precedence over all other areas of linguistic activity: 
sussy_What Kind of Linguistics
The ideal apprenticeship (for a linguist) is to undertake fieldwork on some previously undescribed (or scarcely described) language-recording, transcribing and analyzing texts; observing how people use the language in the daily round; writing a grammar and phonology; compiling a dictionary; and publishing a volume of annotated texts. (Dixon 2001)


There are probably as many different ways of describing the phonology, morphology, and syntax of a language as there are linguists, though different approaches can often be gathered roughly under a variety of different theoretical labels. Linguistic descriptions that are rigidly bound to particular theoretical constructs do not tend successfully to outlast the theories which spawned them. Grammars that are written, for example, according to the strict structuralize formulae that were in vogue up to the early 1960s, or according to the transformational model which became all the rage immediately after this, are often extremely difficult for people to read today. Similarly, descriptions that are expressed exclusively in terms that derive from some of today’s theoretical traditions are often equally unreadable to those who hail from other traditions. 

The most readable and arguably, therefore, the most valuable-accounts of languages seem to be those which are relatively open to theoretical eclecticism. By this, I mean that a linguistic description should set out to allow the linguistic data to govern the form of the description, rather than requiring a single theoretical model rigidly to dictate the shape of the entire grammar. Some would argue that there is a single theory-neutral model for grammatical description which can be used for any language, as implied by Dixon’s (1997: 128–38) use of the expression

 Basic Linguistic Theory. While it would perhaps be nice if this were the case, any linguistic description is in reality going to exhibit certain kinds of theoretical biases, many of which may be implicit. The most important consideration in producing a good linguistic description is that the writer’s particular theoretical assumptions should be clearly recognizable to the reader, that any terminological conventions should be clearly explained, and that the grammar should be richly exemplified with natural data so that readers coming from other theoretical persuasions are able to make sense of your discussion. With new technological tools it is possible to create rich contextual data in a media corpus for which your analysis provides a gateway (e.g. Thieberger 2004) and which allows others to confirm and extend your analysis.


Linguistics is a hugely diverse discipline, and Weld linguists can contribute to our knowledge in a wide variety of ways. While most linguists are primarily interested in matters of phonology, morphology, and syntax, the ways in which language is used ‘in the daily round’ noted by Dixon seldom receive more than passing mention. Only a minority of linguists-better, perhaps, linguistic anthropologists-have shown themselves to be interested in documenting how language functions as a ‘mode of social interaction [which] provides the material out of which a group of people recognize themselves as a community’ (Duranti 1997: 99). Such accounts cannot be based on the traditional direct ‘elicitation’ of language data from ‘informants’; rather, the fieldworker must become a long-term participant observer, recording extensive collections of both audio and video data of natural language use between different individuals within the community.

At the same time, it is clear that Duranti’s (1994) discussion of the ways in which grammatical constructions in Samoan are used to achieve socio-political goals depends crucially on a prior analysis of the language in a traditional descriptive account such as that of Mosel and Hovdhaugen (1992). Even the title of Duranti’s linguistic anthropological field account-From Grammar to Politics-implies this priority. Similarly, Schieffelin and Ochs (1986) show how language acquisition by children is related to socialization into different cultures using the same kinds of basic linguistic information. Traditional descriptive accounts, then, should be promoted as an essential stepping stone for bigger-and some would no doubt argue, better-things.

The goals of fieldwork can sometimes be much more limited than a full published grammar or dictionary, or a detailed ethnographic account of the communicative strategies that are used by people in a particular society. Sometimes, previously published materials may spark the interest of specialists who return to somebody else’s previous Weld site in order to analyze some particular features of the language in more detail. For instance, in some of the languages of Vanuatu, there are phonemically contrastive linguo-labial consonants in which nasals, stops, and fricatives are produced with the tip of the tongue touching the upper lips. Ian Maddieson undertook a fieldtrip from the United States to Vanuatu in the 1980s simply to study these speech sounds in detail in a number of languages in which they are found. A guide that is specifically geared towards instrumental phonetics is beyond the scope of this volume, though Ladefoged (2003) andMaddieson (2001) provide a great deal of very readable material.

Published grammars concentrate for the most part on regular patterns, resulting in a lack of attention devoted to the study of patterns of variable data. There is absolutely no reason why the kinds of corpus-based statistical studies that have been carried out extensively on different varieties of English could not be carried out in other languages as well, e.g. Dorian’s (2001) work on Gaelic. However, in order to do this, a linguist would need to pay close attention to a much wider range of sampling and data gathering issues than is commonly done in order to ensure that statistically representative samples of different categories of speakers have been recorded. While I have also decided to exclude sociolinguistic studies from the scope of this volume, readers who are interested in an up-to-date guide to data-gathering of this kind could consult Milroy (1987).

REFERENCE
Terry Crowley. (2007). Field Linguistics; A Beginner’s Guide. New York: Oxford University Press.

Apa Hakikat Linguistik sebagai Ilmu Bahasa?

  Apa Hakikat Linguistik sebagai Ilmu Bahasa?   Manusia yang selalu berkomunikasi menggunakan bahasa sebagai alat komunikasi tentunya akan d...