Volume 16, No. 2 
April 2012


Front Page

Select one of the previous 59 issues.


Index 1997-2012

TJ Interactive: Translation Journal Blog

  Translator Profiles
Planning and Passion
by Helen Eby

  The Profession
The Bottom Line
by Fire Ant & Worker Bee
ID Fraud in the Translation Industry: A guide on how to protect freelance translators and translation agencies against identity fraud
by Aleksandra Narożna
  The Translator and the Computer
Identification of Terms Marked by the Japanese and Indian Cultures
by Cristina Castillo Rodríguez, Ph.D.
Language Resources for Translation in Multilingual Question Answering Systems
by María-Dolores Olvera-Lobo and Juncal Gutiérrez-Artacho

  Language and Communication
Mr. *** was not amused!
by Danilo Nogueira and Kelli Semolini
Is every bilingual a translator?
by Dr. Samuel Oladipo Kolawole

Cultural Aspects of Translation
A Typology of Derivatives: Translation, Transposition, Adaptation
by Henry Whittlesey

Translation and Politics
From the Colonial to the Anti-Colonial: Marathi Reception of American Literature
by Dr. Sunil Sawant

Interpreting Strategies in Real-life Interpreting
by Dr. Binhua Wang

  Caught in the Web
Web Surfing for Fun and Profit
by Cathy Flick, Ph.D.
Translators’ On-Line Resources
by Gabe Bokor
Translators’ Best Websites
by Gabe Bokor

  Translators' Tools
The best freeware corpus analysis program for translators?
by Michael Wilkinson
Voices, I Hear Voices
by Jost Zetzsche
Translators’ Emporium

Call for Papers and Editorial Policies
  Translation Journal

The Translator & the Computer

Identification of Terms Marked by the Japanese and Indian Cultures:

an empirical practice using a multilingual comparable corpus of wellness and beauty tourism (Spanish, English, Italian, French) in a translation-classroom environment1

by Cristina Castillo Rodríguez, Ph.D.


Comparable corpora are usually seen as highly useful resources for the identification of certain terminological units (TU) of a concrete knowledge domain or sub‑domain. Moreover, corpora management programs offer the user the possibility of better exploitation of these resources, since they allow the user the comprehension of certain aspects of a given language.

The purpose of this paper is to show the students how to use the corpora management program called AntConc so that they can successfully exploit a multilingual comparable corpus (Spanish, English, Italian, and French), whose texts were compiled directly from the Internet in the knowledge sub‑domain of tourism called ‘wellness and beauty.’ This segment of tourism contains a great amount of TU denoting ancient and new treatments, offered by an increasing number of hotels and spas. Most of these TU are terms originally coined in Indian and Japanese cultures; therefore, they constitute loan words in the languages analysed in this work.

Due to the high presence of those two cultures in the multilingual comparable corpus, this article aims at showing the students (in a multilingual translation‑classroom environment) a methodology consisting of three main steps, that is, the compilation of a multilingual comparable corpus, the identification of TU and variants (if any) contained in the four subcorpora of original texts in the different languages involved in this study, and the storage of the TU and variants in a terminology management system.

Keywords: Corpus linguistics, terminology, corpus management program, corpus analysis, wellness and beauty tourism, Indian and Japanese terminological units

1. Introduction

he tourism industry, as any other market, has suffered the process of segmentation, which is defined, according to Morrison (1996: 160), as: “the division of the overall market for a service into groups with common characteristics.” As a result, several types (or segments) of tourism have arisen today. Among these tourism segments, which have emerged according to the different needs and requirements of the potential users or tourists, we can highlight, for example, the well-known segment called “sun and beach,” or other emerging types of segments such as “rural tourism,” “cultural tourism,” and “sport tourism,” among others2.

Tourism entails a wide range of specialized languages, since the tourism market has suffered a process of segmentation.
Another type of tourism that has appeared in our tourism industry is the segment called ‘wellness and beauty’, whose main tourist services are usually offered in beauty centers and hotels with spas3 and thalasotherapy establishments. Besides, this type of tourism implies: i) the use of some terminological units (TU) that have already been used in other fields; ii) the birth of new TU that have arisen in this field denoting new concepts; and iii) the use of old TU belonging to other cultures, which, thanks to the fast progresses regarding the development and introduction of tourism industry today, constitute loan terms in our society.

As far as our purposes are concerned, we are aware of the difficult task of gathering all the TU used in a certain knowledge domain or sub‑domain. In fact, if we take into account that terminologists and translators need to collect a huge number of terms of a certain field, a methodology including corpora compilation, management, and, subsequently, TU identification and storage using the appropriate programs will serve them as a starting point for a well‑organised terminological database for future translation projects or glossary projects in any combination of the languages (Spanish, English, Italian, and French) involved in this study.

This article aims at showing the students (in a multilingual translation‑classroom environment) a methodology consisting of three main steps, that is, the compilation of a multilingual comparable corpus, the identification of TU and their variants, if they are found, contained in the four subcorpora of original texts in the different languages involved in this study, and the storage of the TU and their possible variants in a terminology management system.

2. Compilation of wellness and beauty multilingual comparable corpus

Although the definition of “corpus” and the notions of “comparable” and “ad hoc” are not the purpose of this study, it is worth mentioning here some of the most relevant definitions around these three concepts.

In spite of the fact that there are a number of definitions of the concept of “corpus,” one of the most cited definitions is the one proposed by EAGLES (1996), whose report defines the corpus as: “a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language.” However, today, a corpus cannot be conceived without the use of computational applications, as McEnery and Wilson (2004: 18‑19) state: “whatever philosophical advantages we may eventually see in a corpus, it is the computer which allows us to exploit corpora on a large scale with speed and accuracy, and we must never forget that technology has allowed a pseudo‑procedure to become a valuable linguistic methodology.” That is why Bowker (2003: 169), in a wider sense, considers the new and modern corpora as “large collections of machine‑readable texts that have been gathered according to specific criteria, and these criteria vary depending on the nature of the work being undertaken."

Besides, with regard to the concept of “comparable corpora,” Corpas Pastor (2001: 158) states that they are corpora containing similar types of texts that are originally written in a source language and compiled following the same design criteria. This author also states that “ad hoc corpora” do not necessarily include a huge number of texts, but they do include texts which are very similar in terms of the subject field, text type, genre, among other aspects, since the main purpose of this kind of corpus is to create an easy-to-use and faifhful resource for translators' work (Corpas Pastor 2004: 236). Moreover, and due to the fact that these corpora can also be a rich resource for terminological tasks, terminologists can also profit from the compilation of an ad hoc corpus.

Once we have briefly described these concepts, the first step of our study is the compilation of our multilingual comparable corpus, which implies the collection of the texts originally written in Spanish, English, Italian, and French. With the aim of carrying out this task, it is necessary to follow a compilation protocol based on the protocolised methodology of comparable corpus compilation proposed by Seghiri (2006) and also described in Corpas and Seghiri (2009). This protocol pays special attention to the establishment of certain design criteria so that the comparable corpus could be representative of the knowledge sub‑domain chosen for this study.

2.1. Design criteria

The use of corpora whose texts have been compiled from the Internet supposes considerable development for the management and extraction of the documentation available on the net. However, the texts that will be included in the comparable corpus cannot be randomly selected, since they must be compiled according to some concrete design criteria in order to achieve the highest level of representativeness of the sample gathered for the different languages. Otherwise, we could face a compiled corpus for a concrete purpose but not representative of the specific sub‑domain of knowledge with which we are concerned.

Many authors have researched and proposed some design criteria for compiling a representative comparable corpus; nevertheless, there is no consensus when describing the basic criteria for fulfilling this task. For the compilation of our multilingual comparable corpus we will follow the guidelines proposed by Bowker and Pearson (2002: 54), who underline the following criteria: size, full text or extract, number of texts, medium, subject field, text type, authorship, language, and publication date.

Out of these nine criteria, we will base our compilation and, therefore, our design criteria, on seven criteria, since size (that is, number of words contained in a corpus) and the number of compiled texts cannot be predicted a priori.

The first one is regarding the inclusion of full texts or extracts. This study was based on the compilation and analysis of virtual subcorpora composed of full texts codified in HTML format4, in PDF (.pdf), and Word (.doc) formats. Therefore, and according to the second criterion, the medium will be a computer-readable file.

Regarding the third criterion, the subject field, all texts belong to the sub‑domain of tourism called ‘wellness and beauty,’ particularly, the promotional texts of this sub‑domain, which is the fourth criterion to be fulfilled, that is, text type. These promotional texts cover the tourist brochures published online, which include the most typical colocations not only of the advertising messages but also of some descriptive texts; both of which constitute the essential part of the tourism marketing of this concrete segment.

The authorship and the publication date are the fifth and sixth criteria of our compilation. The texts of our corpus have been extracted from websites of hotel chains and other independent hotels, so texts are supposed to have been written by experts. With regard to the publication date, we will try to compile recent texts. Most of websites contain in their home page the date of update of the whole content, so this could be one of the indicators when verifying if the text has been recently published or updated, apart from the dates of the promotional offers of some of the treatments and other services. Lastly, the languages involved in this study, as mentioned above, are English, French, Italian, and Spanish.

Regarding the number of compiled documents to manage a multilingual corpus of these characteristics, and due to the fact that we cannot decide the number a priori, we used the computer application called ReCor in order to measure the degree of representativeness for each one of the subcorpora.

This computer application is based on a method, developed by Corpas and Seghiri (2006; 2007a; 2007b), which is able to measure, for the first time, the level of representativeness of a corpus a posteriori by means of the algorithm “N‑Cor.” This algorithm calculates the minimum number of documents and words that a corpus should include for being considered statistically representative of a certain knowledge domain.

Thus, for the 166 texts originally written in Spanish (Spain), ReCor gave us a representativeness result of 0.060, which means that the probability to find new types (of words) is 0.060 when analising around 200,000 words. The probabilities for the next subcorpora after having used ReCor were the following: for the 64 texts for English subcorpus (United Kingdom) the result was 0.040; for the 112 texts for Italian subcorpus (Italy) the representativeness given by the application was 0.070; and for the 84 texts for French subcorpus (France) the result was 0.060.

In this respect, it is worth mentioning that, in spite of the fact that texts compiled for the whole corpus have a great inner variety, the application gave us a certain level of representativeness for each one of the subcorpora, which is very significant if we take into account that lexicon and terminology of advertising and promotional texts, and, particularly, those related to tourist sector, is in constant innovation. Besides, the representativeness of our multilingual corpus can be also measured by external compilation criteria. That is to say, quality can be “measured” according to the type of corpus and the need of the user, either a translator or a terminologist, for example, when managing and analyzing this corpus.

2.2. Document‑coding protocol

Every compilation protocol entails a document or text‑coding protocol so that we can univocally identify each one of the texts compiled for the subcorpora. However, there are actually no fixed rules regarding how to proceed when naming and coding each one of the texts, since it could be enough to include the most relevant information of all texts and to store them properly.

Our proposal is to create a table for each language using Excel, a desktop application of the commercial office suite called Microsoft Office. In this table five columns are included for any text compiled. Thus, the first column contains the Code assigned to the text, the second one includes the Complete reference of the text (in which, for instante, we can type the title of the website), the third one refers to the Domain, in which we can specify if it is a spa or a thalassotherapy establishment, or a combination of those two; the fourth column shows the Text type, in which we can mention if it is exclusively a promotional material, or just information, or a combination of those two type of texts; and the fifth column incorporates the URL of the site from which we downloaded the information.

To codify our texts we used a very simple system for each one of the languages. For the Spanish comparable subcorpus we assigned several codes encoded with numbers and letters from 1001CTES to 1999CTES, where 1 is the number that corresponds to the country, that is, Spain, CT is comparable text and ES is Spanish language. For the codification of the other subcorpora, the same system was adopted, but for the English subcorpus codes assigned were from 2001CTEN to 2999CTEN, where 2 is the number identifying the country, in our case, United Kingdom, CT stands for comparable text and EN is English language. For the Italian subcorpus, the codes assigned were 3001CTIT to 3999CTIT, where 3 is the number identifying the country, Italy, CT is comparable text and IT is Italian language. Lastly, for French subcorpus the codes assigned were 4001CTFR to 4999CTFR, where 4 is the number corresponding to the country which our compiled texts belong to, that is, France, and CT stands again for comparable text and FR is French language.

As an example of the texts compiled using this simple system, the first 20 texts compiled for our English subcorpus are shown in the followin table:

Figure 1. Sample of texts compiled from UKs websites for the English-language subcorpus

3. Management and identification of terms contained in the four subcorpora

Texts belonging to wellness and beauty tourism are characterised by the high presence of very specific TU that, although some of them have their own name in each language, for instance, Californian massage, Swedish massage, or hot stone massage, among others, most of them have been named with their original name in Japanese and Indian cultures, maybe due to the fast introduction of these TU in each one of the languages involved in this study.

In order to identify the TU marked by these two cultures we used the management corpus software called AntConc 3.2.3.w5.

3.1. Identification of terms marked by the Japanese culture

Once we launched AntConc four times6 and included the texts compiled for each one of the subcorpora, we were able to proceed to use a methodology consisting mainly of typing some key words in order to search for the terms object of our study in each subcorpus.

3.1.1. Spanish terms

The key word typed in ‘Search term’ box for identifying the TU in Spanish subcorpus is the word japonés. However, as this language distinguishes between gender and number, by the addition of the corresponding letters at the end of the lemma, the most suitable key word to be typed is [jap+n*], since [*] symbol recognises more than one letter at the end of a word or even zero letters. This way the software recognises all the sequences with the following endings: [‑és] (masculine singular), [‑esa] (feminine singular), [‑eses] (masculine plural), [‑esas] (feminine plural). Besides, with the help of the [+] symbol the software also recognizes a sequence containing [‑ó‑], that is, which occurs in the name of the country in Spanish: Japón.

Once we observed and analysed all the concordance lines shown by the Concordance tool of the software, we gathered four TU marked by the Japanese culture: Shiatsu, Reiki, Amma, and Zen. However, if we take these TU and type them in the ‘Search Term’ box in order to find their concordances in all the texts gathered in Spanish subcorpus, some terminological variants of those TU can also be found: a) Shiatsu: masaje Shiatsu, masaje japonés Shiatsu, masaje Shiatsu japonés, masaje Shiatsu completo; b) Amma: masaje Amma, masaje japonés Amma; c) Zen: no terminological variants; d) Reiki: no terminological variants.

To achieve these results we made use of the option offered by the Concordance tool in order to rearrange the concordance lines, that is, using the buttons of Kwic Sort: 0 is the search word, 1L, 2L, 3L... are the words to the left of the target word, and 1R, 2R, 3R... are the words to the right of the target word. The following screen shot illustrates one of the procedures to find terminological variants, in this case, with the TU Shiatsu:

Figure 2. Sample of search for terminological variants in the Spanish subcorpus

3.1.2. English terms

In the English subcorpus, the search word typed in ‘Search term’ box for the identification of TU in English was the word [japan*] so that TU containing this sequence could be found. Thus, the Concordance tool gave us concordances just for the following two words: Japan and Japanese.

The TU shown for the English subcorpus and marked by Japanese culture are: Shiatsu, Reiki, Zen Therapy, Amatsu, Aikido, Seitai, Jin Shin Jyutsu, Judo, and Iaido.

Terminological variants were also found in this subcorpus using the same procedure as before, that is, typing these TU in ‘Search term’ box, and rearranging the concordance lines by means of the buttons of Kwic Sort. The terminological variants for the TU were: a) Shiatsu: Shiatsu massage, Shiatsu technique, Shiatsu treatment; b) Reiki: Reiki treatment; c) Zen therapy: no terminological variants; d) Amatsu: Amatsu practitioner, Amatsu therapists; e ) Aikido: no terminoological variants; f) Seitai: no terminological variants; g) Jin Shin Jyutsu: no terminological variants; h) Judo: no terminological variants; i) Iaido: no terminological variants.

To illustrate the search of terminological variants of one of our examples in English subcorpus, the following screen shot is shown below:

Figure 3. Sample of search for terminological variants in the English subcorpus

3.1.3. Italian terms

Our search word to be typed in order to search for TU marked by the Japanese culture in the Italian subcorpus is the word [giappon*] so that the Concordance tool could give us all the words related to the previous one, that is, giapponese (masculine and feminine in their singular form), giapponesi (masculine and feminine in their plural form) and the country in Italian, Giappone. The only two TU found in this subcorpus were Shiatsu and Reiki.

With the aim of finding more variants for these TU, we followed the same procedure used in the previous subcorpora. We typed these TU in ‘Search Term’ box, and we rearranged concordance lines. The results were the following: a ) Shiatsu: massaggio Shiatsu, trattamento Shiatsu; b) Reiki: massaggio Reiki, trattamento Reiki.

The following figure illustrates one of the examples for searching more terminological variants:

Figure 4. Sample of search for terminological variants in the Italian subcorpus

3.1.4. French terms

For the identification of the TU in French subcorpus the search term used was the word [japon*] so that Concordance tool could give us concordances for the following words: japonais (masculine singular and plural), japonaise (feminine singular), japonaises (feminine plural), and the country in French Japon. The TU found were Shiatsu, Reiki, Amma, and Kobido.

Once again we selected these TU as the search terms in the suitable box for searching for more terminological variants and we rearranged all the concordance lines using the options offered in Kwic Sort. The results in the case of this language were the following: a) Shiatsu: massage Shiatsu, modelage Shiatsu; b) Reiki: no terminological variants; c) Amma: no terminological vvariants; d) Kobido: no terminological variants.

The following figure shows one of the procedures carried out to find the terminological variants for the TU found in this subcorpus:

Figure 5. Sample of search for terminological variants in the French subcorpus

3.2. Identification of terms marked by the Indian culture

As we have already launched AntConc for management and analysis of the other TU marked by the Japanese culture, there is no need to relaunch the software again. In this section we will proceed to identify TU marked by the Indian culture in the four subcorpora compiled for this study, as we will describe in the following subsections devoted to the identification of TU and their terminological variants (if any) per language.

3.2.1. Spanish terms

The first search term we have typed in the case of the Spanish subcorpus is the word [indi+]. In this case we only add [+] symbol so that the software could give us concordances for: indio (masculine singular), india (feminine singular), and India, which is the noun denoting the country. The reason why we did not type [indi*] is because the software would give us concordances for other words which are not the object of our study. We refer to words such us indicaciones, individual, indicado, indispensable, etc. On the other hand, neither [indi++] was typed for the same reason explained before, that is, the software would give us other results not interesting for our purposes: indica and índice, for example. For the identification of the plural form of the previous words, we had to type another search term using the word [indi+s].

After having analysed all the concordances shown for the two search terms we found no TU actually marked by Indian culture. The only TU found using these two search terms was masaje de cabeza indio, although this does not give us any indication of the original name marked by Indian culture.

On the other hand, in the Spanish language there is another word to be used for searching the kind of words we are interested in. We refer to the word [hind*]; thereby, the software gives us concordances for the word hindú. The TU found with this search term are Abhyanga, Kerala, and Ayurveda. However, other TU were also found, although they are not culturally marked and, therefore, they have been translated into Spanish as in the case of the previous term, that is, masaje de cabeza indio. We refer to certain TU such asmasaje cráneo‑facial hindú, masaje hindú de cabeza, masaje de cabeza hindú, and masaje del cuero cabelludo; all of which constitute terminological variants of the TU found before.

With the aim of finding more terminological variants we have taken the previous TU, that is, Abhyanga, Kerala, and Ayurveda, and the results have been the following: a) Abhyanga: ayurvedico abhyanga, Ayurveda Abhyanga, masaje Abhyanga; b) Kerala: no terminological variants; c) Ayurveda7: masaje Ayurveda, masaje ayurvédico, Ayurvédico.

The following screen shot shows one of the procedures carried out to identify the terminological variants:

Figure 6. Sample of search for terminological variants in the Spanish subcorpus

3.2.2. English terms

For the English subcorpus the search word typed in the search box was [india+], which gave us as results: Indian and India. The results achieved after analyzing all the concordances given by the software were the following: Shirobhyanga, Shirodhara, and Champissage. Nevertheless, other TU that are not marked culturally in terms of their names were also found. We refer to terms such as Indian Head Massage and Indian Scalp Massage. Besides, the visualization of their contexts reveals us that Indian Head Massage is a synonym, and therefore, terminological variant of the TU Shirobhyanga, Champissage, and Shirodhara. A more thorough search inside this subcorpus made us reach the conclusion that in the first case, that is, Shirobhyanga, it is a head massage performed by means of essential or ayurvedic oils, while in the second one, that is, Champissage, the technique of head massage is oil‑free. In the third case, Shirodhara, although the technique is carried out using oils as well, it refers to a head massage but using a constant flow of warm oil running onto the so called ‘third eye’.

As we have observed in the previous subcorpus, for the English subcorpus we decided to type another search word, that is, [ayurved*] so that the software could give us words such as Ayurveda or Ayurvedic, since they are related to ancient Indian techniques, and, as we can guess, they can ‘hide’ other TU which are interesting for the purposes we pursue. Once we typed these words in the suitable box for searching, the TU found in this case are Kerala and Pindasweda, but we also found some terminological variants: a) Ayurveda: Ayurvedic massage, Ayurvedic treatment; b) Kerala: no terminological variants; c) Pindasweda: Pinda sweda.

One of the procedures carried out for the identification of the variants and other TU is shown in following figure:

Figure 7. Sample of search for terminological variants and other TU in the English subcorpus

3.2.3. Italian terms

The search term typed in the search box for the Italian subcorpus is [india*], so that the concordance tool could give us results for the following words: indiano (masculine singular), indiana (feminine singular), indiani (masculine plural) indiane (feminine plural), and the word denoting the country, that is, India. After having observed the context for each one of the concordances, the following TU marked by Indian culture were found: Shirodhara, Abhyanga, and Ayurveda. Thereby, we typed again another search word, [ayurved*], in order to find more TU marked culturally by this culture. The results obtained were the following: Kerala and Pindasweda.

The variants found in the Italian subcorpus for the TU registered before were: a) Shirodhara: no terminological variants; b) Abhyanga: no terminological variants; c) Ayurveda: massaggio Ayurveda, massaggio ayurvedico, trattamento ayurvedico; d) Kerala: massaggio Kerala; e) Pindasweda: pinda sweda.

To illustrate one of the procedures carried out to find more terminological variants and other TU the following screen shot is shown:

Figure 8. Sample of search for terminological variants and other TU in the  Italian subcorpus


3.2.4. French terms

For our last subcorpus, that is, the  French subcorpus, the first search word typed in search box was [indien*] so that the tool could show concordances for the following words: indien (masculine singular), indienne (feminine singular), indiens (masculine plural) and indiennes (feminine plural). Once again we had to type another word in order to view the concordances for the country, that is, Inde, so the word typed was [inde]. These two methods of searching with two words were opted for the identification of TU marked culturally since if we had chosen the search word [ind*], the software would have shown us concordances for other TU and words which are not the object of our study. We refer to words such as: indiqué, indiqués, intuire, individuel, individuels, among others. After having observed the contexts of all the concordances, the TU gathered for this subcorpus were: Shirodhara, Abhyanga, and Ayurveda.

These TU were typed in the search box in order to find their terminological variants, and the results were the following: a) Shirodhara: no terminological variants; b) Abhyanga: massage Abhyanga, massage Ayurvédique Abhyanga, modelage Abhyanga, soin Abhyanga; c) Ayurveda: massage Ayurveda, massage Ayurvédique, modelage Ayurvédique.

The following figure shows one of the procedures carried out to identify in French subcorpus the terminological variants for these TU in French:

Figure 9. Sample of search for terminological variants in the French subcorpus


4. Terminological unit storage

The third step in the methodology we proposed in this paper is the storage of all the TU culturally marked that we have found in the four subcorpora and their variants. To carry out this task we used the terminology management system called SDL TermBase8.

This sytem, apart from allowing the user to create his/her own database fields, offers a very easy and straightforward interface.

As far as our purposes are concerned, a database was created with the following fields for Spanish, English, Italian, and French languages: definition, gender, and number, as it is shown in the following figure9

Figure 10. Sample of fields created for our multilingual Japanese and Indian TU and variants database


Once we have created our template with the suitable fields to be fulfilled, all the TU and their terminological variants in the four languages can be registered. Besides, the creation of a database can also lead the translator or terminologist to compile more texts in order to cover certain TU that might have not been found in and, therefore, registered for some languages. The idea is that these professionals could store all the terms found in their subcorpora, but sometimes, and as we have observed in the previous cases analyzed, some terms are only found in one or two subcorpora.

5. Conclusions

Today it is very important to get all information about any knowledge domain in a short period of time. Besides, every day, translators and terminologists face an increasing revolution as far as specialised languages are concerned. Tourism entails a wide range of specialized languages, since the tourism market has suffered a process of segmentation, according to the requirements of the new types of tourists that  have arisen. One of the segments of tourism market is known as wellness and beauty tourism.

This kind of tourism is characterized by the use of certain TU that can found in other fields, but also it also implies the birth of new TU and the use of old TU belonging to some other cultures. This latter is the case of TU marked by the Japanese and Indian cultures. The collection of all the TU of a certain domain (particularly, and as far as we are concerned, of this sub‑domain of tourism called ‘wellness and beauty’) can become a very difficult task for translators and terminologists. The difficulty is greater if we try to collect all TU in more than one language. In order to make this task easier, the compilation of a multilingual corpus and the subsequent managament of all texts using a corpus management software are considered essential for translators and terminologists’ work.

The purpose of this paper was to show the students in a translation‑classroom environment how to use a corpus management software and a terminology management system to compile and analyse a multilingual corpus in four languages (Spanish, English, Italian, and French) with the aim of identifying and storing the TU and their terminological variants marked by the Japanese and Indian cultures. A simple methodology consisting of three steps was proposed: i) the compilation of a multilingual comparable corpus in the sub‑domain of wellness and beauty tourism; ii) the identification of TU contained in the four subcorpora of texts originally written in the different languages involved in this study using a corpus management software; and iii) the storage of the TU and variants in a database using a terminology management system.

The information extracted from each one of the TU and their variants in the four languages could also lead the translator and terminologist to elaborate their own definitions for each one of the terms marked by these two cultures. This information, as well as the other gathered for the other fields created in the database, can be more than beneficial for translators and terminologists, since this proposal can satisfy their demands, especially, if they are involved in important translation or documentation or terminology projects with regard to this emerging sub‑domain of tourism called ‘wellness and beauty’.

6. References

Bowker, L. 2003. “Corpus-based applications for translator training: exploring the possibilities.” In S. Granger, J. Lerot, S. Petch‑Tyson (Eds.), Corpus‑based Approaches to Contrastive Linguistics and Translation Studies, Amsterdam, New York: Rodopi, pp. 169‑183.

Bowker, L. & J. Pearson. 2002. Working with Specialised Language: A Practical Guide to Using Corpora. London: Routledge.

Castillo Rodríguez, C. 2010. “El término spa.” PuntoyComa. Boletín de los traductores españoles de las instituciones de la Unión Europea, 120, pp. 8‑9. <http://ec.europa.eu/translation/bulletins/puntoycoma/120/pyc120.pdf>. [accessed: 09/08/2011].

Castillo Rodríguez, C. 2011. “La conceptualización de los segmentos turísticos en Andalucía: una breve aproximación.” Revista de investigación en turismo y desarrollo (TURyDES), 4 (10). <http://www.eumed.net/rev/turydes/10/ccr.pdf>. [accessed: 09/08/2011].

Corpas Pastor, G. 2001. “Compilación de un corpus ad hoc para la enseñanza de la traducción inversa especializada,” TRANS. Revista de Traductología. 5, pp. 155‑184.

Corpas Pastor, G. 2004. “Localización de recursos y compilación de corpus vía Internet: Aplicaciones para la didáctica de la traducción médica especializada.” In C. Gonzalo García & V. García Yebra (Eds.), Manual de documentación y terminología para la traducción especializada. Madrid: Arco/Libros, pp. 223‑257.

Corpas Pastor, G. & M. Seghiri Domínguez. 2006. El concepto de representatividad en lingüística de corpus: aproximaciones teóricas y consecuencias para la traducción. Technical report. Department of Translation and Interpreting. University of Málaga. [BFF2003‑04616 MCYT/TI‑DT‑2006‑1].

Corpas Pastor, G. & M. Seghiri Domínguez. 2007a. “Determinación del umbral de representatividad de un corpus mediante el algoritmo N‑Cor». Procesamiento del lenguaje natural, 39, pp. 165‑172.

Corpas Pastor, G. & M. Seghiri Domínguez. 2007b. “Specialized Corpora for Translators: A Quantitative Method to Determine Representativeness.” Translation Journal, 11 (3) (July 2007), <http://www.translationjournal.net/journal/41corpus.htm>. [accessed: 09/08/2011].

Corpas Pastor, G. & M. Seghiri Domínguez. 2009. “Virtual Corpora as Documentation Resources: Translating Travel Insurance Documents (English‑Spanish).” In A. Beeby, P. Rodríguez Inés & P. Sánchez‑Gijón (Eds.). Corpus Use and Translating, Amsterdam & Philadelphia: John Benjamins Publishing Company, pp. 75‑107.

EAGLES. 1996. “Text corpora Working Group reading Guide.” EAGLES Document EAG-TCWG-FR-2. [Version report of May 1996], <http://www.ilc.cnr.it/EAGLES/corpintr/corpintr.html>. [accessed: 09/08/2011].

McEnery, A.M & A. Wilson. 2004. Corpus Linguistics. Edinburgh: Edinburgh University Press.

Morrison, A. M. 1996. Hospitality and Travel Marketing. New York: Delmar Publishers.

Seghiri Domínguez, M. 2006. Compilación de un corpus trilingüe de seguros turísticos (español‑inglés‑italiano): aspectos de evaluación, catalogación, diseño y representatividad . Málaga: Servicio de publicaciones de la Universidad de Málaga.


1 The present work has been partially carried out in the frame of the project Ecosistema (reference number: FFI2008‑06080‑C03‑03/FILO).

2 Please see Castillo Rodríguez (2011) for an overview of the main tourist segments in Andalusia.

3 For further information about controversies around the term “spa,” please see Castillo Rodríguez (2010).

4 We consider full texts in HTML, since, although each one of the pages in HTML could be considered as an extract, if we take into account the whole of all those pages, downloaded following some guidelines of hypertextual surfing, we can obtain the full texts.

5 This new version of the software, as well as versions for Mac and Linux are freeware and can be downloaded directly from: http://www.antlab.sci.waseda.ac.jp/software.html

6 The user is allowed to launch the software as many times as many corpora he/she wants to manage and analyze. In our case, since we have four subcorpora to be managed and analyzed, we have launched the software four times.

7 For the identification of more terminological variants we typed in this case the following sequence “Ayurved*,” so that adjectives could be recognised in concordance lines by Concordance tool.

8 To install this terminology management system, it is necessary to install the whole content of SDLX Lite, although the software can be run independently from the translation memory system. For further information about this software, as well as to download the trial version after having fulfilled a form, please visit the following URL: http://www.translationzone.com/en/resources/downloads/demodownloads/SDLX_trial.asp .

9 A field for terminological variants was created, since the software itself includes as fixed fields the ‘synonyms’ and ‘related items’ fields.