Translation testing is a controversial issue in
translation studies. Translation testing, especially in its multiple-choice test
format, seems to lack a comprehensive theory. This article is an attempt to
investigate the performances of translation trainees on two kinds of translation
tests, namely production and multiple-choice tests. The researcher hypothesizes
that translators' performance on production tests is different from that on
other tests. To this end, 45 (both men and women) senior students of the
translation training program of the Islamic Azad University of Rasht were
selected and participated in the experiment. The subjects were tested on the
mentioned forms of translation tests (with the same content). The data analysis
of the study showed that the subjects’ performances on the mentioned kinds
of tests were different; therefore, it can be said that the hypothesis of the
study was supported. The results and implications of this study, as well as
suggestions for future research are discussed.
Translation, Evaluation, Translation Testing, Production test, Multiple-choice
translation studies, various training and evaluation methods are currently being
used and developed. According to Hubscher-Davidson (2007) the aim of those
methods is to improve translation pedagogy by focusing on the students' needs.
However one of the main issues in translation studies is translation testing,
which is related to translation evaluation and assessment. Although issues
related to translation evaluation have receivedconsiderable attention over the
years, it seems that translation testing has not received much attention.
Mousavi (1999:394) defines a test as “any procedure used to measure
a factor or assess some ability." He (ibid) also defines testing as, “the
use of tests or the study of the theory and practice of their use, development,
Translation evaluation can be defined as the
act of making judgments on a translation. Such judgments might be affected by
some subjective factors; therefore, it seems that evaluation in translation is a
problematic area. It is also possible that such evaluation is difficult even for
experienced translation instructors. Regarding to this idea, Goff-Kfouri (2004)
emphasizes that translation instructors need to become competent in test writing
but they must know that there is no perfect test and no foolproof grading or
marking system. Rahimi (2005) calls translation and testing as two
“controversial” and “challenging” fields in language
One of the main issues in translation studies is translation testing.
Surprisingly enough, despite the importance of evaluation in
translation, few studies have been carried out on translation testing and there
seems to be a need to do much more research in this field. Rahimhi (2005: 62)
claims that “testing is underlined because there can be no science without
measurement.” Maybe it can be said that importance of testing,
particularly language testing, is even more significant since it is rooted in
many complicated scientific disciplines such as linguistics, psychology and
sociology. Ghonsooly (1993) mentions that translation testing methodology
has been criticized for its subjective character so he did a study on the
objectivity and scorability in translation testing methodology. Rahimi (2005)
investigates the relationship between test form and trainees’ translation
performance and he concludes that the difference in translation test forms
doesn’t affect the subjects’ performance. On the other hand, Schmidt
and McCutcheon (1994:118 as cited in Goff-Kfouri, 2004) state that, “it
seems that the instructor’s testing methods do have a lasting effect on
the learning experience, the students’ attitude, as well as the
The present study is an
interdisciplinary research between translation teaching and translation
testing. The results of the study have pedagogical implications for
teaching and testing translation. This study aims at discussing the
possible relationship between two kinds of translation test forms and the
performance of translation trainees on them. First, some general concepts
related to translation, translation evaluation and testing will be mentioned.
Then, the experiment of the study will be described.
2. What is
As Munger (1999:5) holds the word
“translate” comes from the Latin word “transferre” which
means “to bear,/carry/bring across; to transfer.” The term
translation can have several meanings: it can refer to the subject field, the
product, and the process. Newmark (1988:5) defines translation as,
“[…] rendering the meaning of a text into another language in the
way that the author intended the text.” Nida (1975:95) claims that
translation is, “reproducing in the receptor language (target language)
the closest natural equivalent of the message of the source language; first in
terms of meaning and second in terms of style.” According to Guthknecht
(2003), translation is a communicative device that is a necessity on economic
and on general human grounds. Bassnet (1991:12) says that, “what is
generally understood as translation involves the rendering of a source language
(SL) text into the target language.” Similarly, Catford (1965: 20) states
that translation is “a process of substituting a text in one language for
a text in another.”
3. Types of Translation
According to Beekman and Callow (1989), there are four main
types of translation: 1) highly literal, 2) modified literal, 3) idiomatic, 4)
unduly free. Beekman and Callow (1989:21): “ if its form [linguistic form
of translation] corresponds more to the form of the original text, it is
classified as literal; if its form corresponds more to the form of the receptor
language (RL), then it is classified as idiomatic.” They (ibid) call
literal and idiomatic translations as “two basic approaches to
translation.” Manafi Anari (2004:85) quotes Beekman and Callow (1989),
“the highly literal translation is that in which the obligatory
grammatical rules of the RL are set aside and translation follows the order of
the original word for word and with high consistency.” He continues
(ibid), “the modified literal translation occurs when the translator makes
some lexical or grammatical adjustment to correct the errors arising from
literalism, and produce something which is equivalent to the original.”
Larson (1984: 10) also defines as, “[…] one which has the same
meaning as the source language but is expressed in the natural form of the
receptor language.” Beekman and Callow (1989:23) point out, in the unduly
free translation “the purpose is to make the message as relevant and clear
as possible.” Newmark (1981) classifies translation to: communicative and
semantic. He (ibid: 22) states that, “in communicative translation, the
translator takes into consideration all parameters of two languages involved in
translation process. In this kind of translation, readers get the same
impression from the translated text as the readers of the author’s work
experience.” According to him (ibid: 12), semantic translation is,
“[…] the precise contextual meaning of the author.”
Larson (1984) also divides translation into two categories: meaning-based
translation and form-based translation. According to him (ibid: 114), a meaning-
based translation focuses on the communication of the meaning and contextual
elements of the source texts and a form-based translation, on the other hand,
focuses on the lexical and grammatical factors of translation.
4. Translation Evaluation
According to Carry and Jumpelt
(1963) defining the quality of translation was first discussed in the third
conference of the International Federation of Translators on Quality in 1959. So
far, within the field of translation studies, translation evaluation has
received much attention and there have always been some efforts to investigate
the issue both in theory and practice.
As House (2001: 255) states
Translation quality is a problematic concept if it is taken to involve
individual and externally motivated value judgment alone. Obviously, passing any
“final judgment” on the quality of a translation that fulfils the
demands of scientific objectivity is very difficult indeed.
(2000) mentions that “translation scholars have tried to improve practical
translation quality assessment by developing models which allow for
reproducible, intersubjective judgment (e.g. Reiss 1972: 12-13; Wilss 1977: 251;
Ammann 1993: 433-34; Gerzymisch-Abrogast 1977)”. Lauscher (2000, ibid)
claims that “they [the translation scholars] hoped to achieve this goal
[improving a practical translation quality assessment] by building their models
on scientific theories of translation, which can provide a yardstick, and by
introducing a systematic procedure for evaluation.” Besides this, House
(2001) presents a similar viewpoint where she claims that translation quality
assessment requires a theory of translation. She (ibid) claims that different
views of translation lead to different concepts of translation quality, and
different ways of assessing it.
Similarly, in the context of
translation teaching some scholars have also introduced some proposals for
translation evaluation (e.g. Delisle 1993; Hurtado 1995; Nord 1988 and 1996;
Kussmaul 1995; Pym 1996; Gouadec 1981 and 1989; Presas 1996).
5. Translation Testing
Translation production test
In this form
of translation tests, translation trainees are given a number of separate
sentences or a full paragraph, and asked to translate them into the target
language. Essay-type test is a kind of production test. Scoring process in this
form of test is generally subjective since there is no objective and generally-
agreed- on answer key, and there is little or no standard production test of
translation available. A teacher-made translation test is this kind of test.
choice translation test
These are tests that
consist of a sentence from the source language as the stem and two, three or
four translation versions into target language as alternative choices. One of
the choices is the correct answer. Here, the testees do not produce the
translation, but recognize the answer. In this case the responses are scored
objectively because there is a fixed answer key, thus, any scorer can score such
tests objectively. (ibid)
Farahzad (1992) argues that these kinds of
translation tests limit the examinees’ performance creativity and it is
not useful for the students to conclude that none of these options is adequate.
Then, she suggests two kinds of translation test: limited-response item and
controlled free-response item test.
Limited-response item test is an
integrated test which examines several components such as comprehension of the
source text, accuracy in terms of content, appropriateness of grammatical forms,
and choice of words, etc., of translation at a time. In this type of translation
test, translators are free to select certain equivalents from among a series of
synonyms, to adopt certain grammatical arrangements, to ignore lexical or
grammatical adjustments in order to secure the fidelity of the source text, etc.
She believes that the limited-response item tests prevent translation innovation
in translators. She (ibid:274) also states, “that there is doubt about
appropriateness of selected items in multiple-choice translation test, but the
exact problem with the translation knowledge of examinees can be determined
through this test at once.”
Based on the free-response item test,
her emphasis is on the appropriateness of selected texts. She believes that in
this kind of translation test, the selected text must be authentic, self-
contained and at the level of testees’ competence. Also, she states (ibid,
272) that, “the examiner should give some general information about the
text, the author and the name of the book, and the text selected, to the testees
in this case.”
6. Four models for Translation
Waddington (2001: 311-325) describes four models
for translation as follows:
6. 1. Method a
This method is taken from Hurtado (1995); it is based on error analysis
and possible mistakes are grouped under following headings:
Inappropriate renderings which affect the understating of the source text; these
are divided into eight categories: countersense, faux sense, nonsense, addition,
omission, unresolved extralinguistic references, loss of meaning, and
inappropriate linguistic variation (register, style, dialect, etc.).
(2) Inappropriate renderings which affect expression in the target language;
these are divided into five categories: spelling, grammar, lexical items, text,
(3) Inadequate renderings which affect the transmission of
either the main function or secondary functions of the source text.
each of these categories a distinction is made between serious errors (-2
points) and minor errors (-1 point). There is a fourth category which describes
the plus points to be awarded for good (+1 point) or exceptionally good
solutions (+2 points) to translation problems. In the case of the translation
exam where this method was used, the sum of the negative points was subtracted
from a total of 110 and then divided by 11 to reach a mark from 0 to 10. For
example, if a student gets a total of –66 points, his result would be
calculated as follows: 110-66=44/11=4 (which fails to pass; the lowest pass mark
6.2. Method B
Method B is also based on error analysis and was designed to take into account
the negative effect of errors on the overall quality of the translations (Cf.
Kussmaul1995:129, and Waddington 1999: chapter 7). The corrector first has to
determine whether each mistake is a translation mistake or just a language
mistake; this is done by deciding whether or not the mistake affects the
transfer of meaning from the source to the target text: if it does not, it is a
language error (and is penalised with –1 point); if it does, it is a
translation error (and is penalized with –2 points). However, in the case
of translation errors, the corrector has to judge the importance of the negative
effect that each one of these errors has on the translation, taking into
consideration the objective and the target reader specified in the instructions
to the candidates in the exam paper. In order to judge this importance, the
corrector is given the following table:
Table 1. Typology of errors
in method b.
Negative effect on words in ST:
Penalty for negative effect
On: 1-5 words
The whole text
The final mark for each translation is calculated in the same
way as for Method A: that is to say, the examiner fixes a total number of
positive points (in the case of method B, this was 85), then subtracts the total
number of negative points from this figure, and finally divides the result by
8.5. For example, if a student is given 30 minus points, his total mark would be
6.5 (pass): 85-30 = 55/8.5 = 6.5.
Method C is a holistic method of assessment. The
scale is unitary and treats the translation competence as a whole, but requires
the corrector to consider three different aspects of the student’s
performance, as shown in the table below. For each of the five levels there are
two possible marks, so as to comply with the Spanish marking system of 0 –
10; this allows the corrector freedom to award the higher mark to the candidate
who fully meets the requirements of a particular level and the lower mark to the
candidate who falls between two levels but is closer to the upper one.
Table 2. Scale for holistic method c
Accuracy of transfer of ST content
Quality of expression in TL
Degree of task completion
Complete transfer of ST
information; only minor revision needed to reach professional standard.
Almost all the translation reads like a
piece originally written in target Language. There may be minor lexical,
grammatical or spelling errors.
Almost complete transfer; there may be one or two insignificant
inaccuracies; requires certain amount of revision to reach professional
Large sections read like
a piece originally written in target language. There are a number of lexical,
grammatical or spelling errors.
Almost completely successful
Transfer of the general idea (s) but with a number
of lapses in accuracy; needs considerable revision to reach professional
Certain parts read like a
piece originally written in target language, but others read like a translation.
There are a considerable number of lexical, grammatical or spelling errors.
Transfer undermined by
serious inaccuracies; thorough revision required to reach professional standard.
Almost the entire text reads like a
translation; there are continual lexical, grammatical or spelling errors.
Totally inadequate transfer
of ST content; the translation is not worth revising.
The candidate reveals a total lack of ability to
express himself adequately in English.
Method D consists of combining error analysis
Method B and holistic Method C in a proportion of 70/30; that is to say, Method
B accounts for 70% of the total result and Method C for the remaining 30%.
7. The study
7.1. Research question
The purpose of this study is to find out the answer to the following question:
Does the translation performance of translation trainees differ on
translation production and multiple-choice tests?
7.2. Research hypothesis
In order to
investigate the above mentioned question, the following null hypothesis was
There is no difference between the performances of
translation trainees on translation production and multiple-choice tests.
of the study were 45 senior students of translation training program studying at
the Islamic Azad University. They were randomly selected from among 100 students
who participated in an Oxford Placement Test. The purpose of this test was to
assure the homogeneity of the subjects’ general proficiency. They were
also tested on principles of translation for relative homogeneity of their
On the whole, three measures were used in
this study. The first measurement was an Oxford Placement Test to determine the
general proficiency of the subjects. All the subjects were asked to perform the
test in a limited time. The reliability of this TOEFL test was calculated by
estimating Chronbach’s alpha (internal consistency) and turned out to be
.90, which is a highly satisfactory correlation coefficient.
measurement technique was a translation production test which had three
paragraphs and 186 words. In this test, the students had to create the target
The third measurement technique was a multiple-choice
translation test which was exactly based on the sentences of the production test
but only its form was different. This test consisted of 20 isolated sentences
each with four equivalents in the target language in which only one answer was
correct (sentences were contextually rich enough).
This study was carried out
in four experiments. In the first experiment, 120 number of translation
students, both male and female, who had passed "translation principles and
methods", were randomly selected to take part in an Oxford Placement Test. In
the second experiment, based on the results of the Placement Test 90 numbers of
them were identified to be tested on two kinds of translation test forms. They
were assigned to two equal groups. Two other experiments were carried out too.
In the third experiment the first group was given a production test and the
second group was given a multiple-choice test. In the forth experiment each
group was given a test that they had not been given in the third experiment. For
the evaluation of the papers, two translation scholars who had MA in translation
studies were asked for help. They both had teaching experience in translation
and were asked to evaluate the production test papers based on the Method D
explained in chapter. As soon as all the tests were done and all the data were
gathered, SPSS Software (Version 16) was used for analyzing the data.
7.6. Data Analysis
outcome of statistical analysis of this study will be represented in table 3.
Table 3. Group statistics for comparison of the performances
of on production and multiple-choice tests
Std. Error Mean
Table 4: independent samples
test for comparison of the performances on two production and multiple-choice
Equal variances assumed
Equal variances not assumed
Levene's Test for
Equality of Variances
t-test for Equality of Means
Std. Error Difference
95% Confidence Interval of the Difference
Table 4 indicates that the significance of the Levene's test is .033, and is
lower than the significance level of 0.05 and therefore, our H o
(equality of variances) is rejected. So we consider the first row for
concluding about the means.
Significance of the test of equality of
means supposing the inequality of variances is lower than 0.05, therefore, we
reject the null hypothesis, and the claim of inequality of means of the two
groups is accepted.
As it can be seen in table 3, the means in the
production and multiple-choice tests are 15.1667 and 12.9778 respectively. It is
clear that there is a significantly meaningful difference between them.
8. Conclusion, Pedagogical Implications
focused on the performances of translation trainees on two kinds of translation
tests. The researchers aimed to investigate whether or not different forms of
translation tests can affect translators' performances. The two kinds of
translation tests were with the same content. The results of the study showed
that there was a significant difference between the mean scores of the two
groups. This provides a justified evidence to reject the null hypothesis of the
study. Therefore, it can be said that translation trainees’ performances
differ on translation production test and multiple-choice tests which means that
there seems to be a relationship between translation test forms and the
translation trainees' performances on them.
This study can have
pedagogical implications for translation teachers, students, evaluators, and
test makers. Translation teachers will be able to design suitable kind(s) of
translation tests for the students. The organizations which are responsible for
designing translation exams or interviews will be able to choose suitable
kind(s) of translation tests.
On the other hand translation students
themselves will be able to understand their ability in doing different kinds of
translation tests and increase such an ability which consequently can improve
the quality of their translation.
for further research
The present study focused only on two kinds of
translation tests, production and multiple-choice tests. Therefore, further
research can be done on other kinds of translation tests or even similar
research can be done on different kinds of interpretation tests.
Ammann, M. (1993), Kriterien
für eine allgemeine Kritik der Praxis des translatorischen Handelns. In J.
Holz- Mänttäri & C. Nord (Eds.), Traducere Navem. Festschrift
für Katharina Reiß zum 70 Geburtstag (PP.433-466). Tampere:
University of Tampere.
Carry, E. and R. W. Jumpelt (eds) (1963) Quality
in Translation, Proceedings of the 3rd Congress of the International
Federation of Translators (Bad Godesberg, 1959), New York: Macmillan/Pergamon
Delisle, J. (1993). La traduction raisonnée: Manuel
d’initiation a la traduction professionnelle de l’anglais vers le
français (Collection Pédagogie de la traduction). Ottawa: Presses de
Farahzad, F. (1992). Testing
achievement in translation classes. In C. Dollerupt and A. Loddergard (eds.),
Teaching translation and interpreting training, talent and experience (pp.271-
278). Amesterdam:John Benjamin
Farahzad, Farzaneh (2004) ‘Meaning
in Translation’. Translation Studies Quarterly 2 (7 & 8): 81.
Gerzymisch-Abrogast, H. (1997). Wissenschaftliche Grundlagen für die
Evaluierung von Übersetzungsleistungen. In E. Fleischmann (Ed.),
Translationsdidaktik: Grundfragen der Übersetzungswissenschaft (pp.
573579). Tübingen: Narr.
Ghonsooly, B. (1993), "Development and
Validation of a Translation Test." Edinburgh Working Papers in Applied
Linguistics, v 4, p 54-62.
Goff-Kfouri, C.A. (2004). 'Language Learning
in Translation Classrooms' [online] Translation Journal. Volume 9, No.2.
available from: http://accurapid.com/journal/32ed
Golavar, E. (2009). The Role of Perceptual Learning
Styles of Translation Students in Their Performance on two kinds of Translation
Tests (Unpublished Master's thesis). University of Chabahar, Iran.
Gouadec, D. (1981). Paramètres de l’évaluation des traductions.
Meta, 26(2), 99116.
Gouadec, D. (1989). Comprendre, évaluer,
prévenir. Special issue: L’erreur en traduction. TTR, 2(2), 3554.
House, Juliane (2001) ‘Translation Quality Assessment: Linguistic
Description versus Social Evaluation’. Meta, 46(2): 243-257.
Hubscher-Davidson, S. (2007). "Meeting Students' Expectations in Undergraduate
Translation Programs. "[Online] Translation Journal, Volume (11), No 1.
Available from: http://acurapid.com/journal/39edu.h
Hurtado, A.A. (1995). La didáctica de la traducción. Evolución y estado actual. In P. Hernandez & J.M. Bravo (Eds.),
Perspectivas de la traducción (pp. 4974). Universidad de Valladolid.
Kussmaul, P. (1995). Training the translator. Amsterdam and Philadelphia:
Mousavi, S.A. (1999). A Dictionary of Language Testing.
(2nd Ed.). Tehran: Rahnama Publiction. 394-404.
Newmark, P.(1988). A
Textbook of Translation. Prentice Hall.
Nida, E. (1975). Language
Structure and Translation: essays by Nida. Stansford University Press.
Nord, C. (1991): Text Analysis in Translation. Amsterdam: Rodopi.
Presas, M. (1996). Problems de traducció i competéncia traductora.
Master’s thesis, Universitat Auto`noma de Barcelona.
(2005). Test Forms and Trainees’ Translation Performance. Translation
Studies, 9(3), 61-74.
Reiss, K. (2000). Translation criticism the
potentials and limitations: Categories and criteria for translation quality
assessment. (E.F. Rhodes, Trans). Manchester: St. Jerome. (Original work
Tajvidi, G.R. (2003). Fields of research in
translation studies. In F. Farahzad (Ed.), (2004). Proceedings of Translation
Studies Conferences (pp. 101121). Tehran: Setarhe Sabz.
R., (2006). Translating Texts in Politics. (4th Ed.). Tehran: Payame
Noor University Press.
Waddington, Christopher (2001),
“Different methods of Evaluating Student Translations: The Question of
Validity.” Meta, XLVI, PP. 311-325.
Wilss, W. (1977).
Übersetzungswissenchaft. Probleme and Methoden (p. 251). Stuttgart: Ernst
Klett. (Trans. 1982 as The science of translation: problems and methods.
Tübingen: Gunter Narr).