1. What
is a corpus?
In corpus linguistics, a
corpus can be generally defined as… ‘a
collection of naturally-occurring texts in a computer-readable format which can be retrieved and analyzed using
corpus analysis software’ (Kennedy,
1998; McEnery & Wilson, 2001; O’Keeffe, A., McCarthy, M., & Carter, R. , 2007; Teubert & Cermakova, 2007)
2.Sources of language corpora
- ‘Paraconc’ (http://www.athel.com/para.html)
3. Designing a specialized corpus
Corpus size
-
There are no fixed ruled; depending
on research purposes, availability of data and time.
-
Large, general corpora may be less
useful than small, focused corpora if searches are made on context-specific terms.
-
There are limitations of ‘too small’ corpora e.g. not enough concepts, terms, or patterns under
investigation.
-
It is preferable to create a ‘monitor’ or ‘open’ corpus because specialized words/usage
are dynamic.
Text extracts vs. full
texts
-
Depends on the aim of corpus
compilation.
-
Whole text offers more coverage
because words or terms to be looked at may be randomly distributed throughout
the text.
-
Specific sections may be helpful if
we are looking for words or phrase under particular content areas or want to
create purposeful sub-corpora.
Number of texts
-
Choices can be made between collect
few texts of large size or a number of texts with smaller sizes.
-
Choices can also be made between
selecting texts written by one or two key writers or sources, or texts retrieved
from different sources or written by different authors.
-
Depends on your research focus e.g. to study overall language use or
to study idiosyncrasy or linguistic choices preferred by particular writers.
Medium
-
Can be spoken or written texts or
mixed.
-
Depends on research questions.
-
Some practical factors should also
be considered e.g.compiling spoken
corpora can be time-consuming and needs special types of
tagging.
Subject and text type
-
Should mainly focus on the
specialized text under investigation, although this is less clear-cut
in multidisciplinary subjects.
-
Texts may come from different
subject if the research focus is on the study of particular language features
rather than term extraction.
-
Text types within a specialized
subject field may vary from‘expert-to-expert’ texts to ‘expert-to-non-expert’ texts, or in other words, from technical to popular texts.
Other considerations
-
Authorship: Texts
written by experts in a field tend to present more reliable and authentic
examples of specialized language.
-
Language: Specialized
texts can be stored and retrieved in the form of monolingual, comparable, or
parallel corpora.
-
Publication date: Texts
should come from recent publications unless queries are made in relation to
particular periods of time.
4. Sources of specialized texts
-
Printed materials
-
Word document
-
CD-ROMs
-
Texts on the Web
-
Online databases
5. Getting started with Antconc
Download the latest version of Antconc watch YouTube
tutorials from http://www.antlab.sci.waseda.ac.jp/antconc_index.html
1.Run
the program.
2. Open
Files (browse and select targeted files) or
Open Dir (to select targeted folders)
3.Choose the function.
4.Clear All Tools and Files before selecting opening new files.
5. Save
Output to Text File to save output e.g.concordance
lines.
ความคิดเห็น
แสดงความคิดเห็น