TCA2 – Parallel text processing at UiB.no

TCA2 (TRANSLATION CORPUS ALIGNER) is a Java program developed at UiB that allows to align a pair of parallel texts, that is, one is the translation of the other.

The system allows to include a predefined list of “anchor words” for every language, or the user might add other words depending on the frequencies of his/her own files. The program allows to do a “manual” alignment at the sentence level, (à la bitext2tmx), or to set a 1:1 alignment, where source and target sentences are identified. It also allows to do a semi-automatic alingment, and the user might specify how many sentences should be aligned after each click, e.g. enter a number N of sentences to be aligned every time. It includes a nice colored scheme that allows to perform an easier alignment with less eye strain. Once the alignment is finished, the program saves two TEI compliant files that can be opened by ParaConc or fed directly into the project’s password-protected corpus-query-interface webpage. Then, the user might enter queries for lexical units, find collocations, or ngrams. Frequencies of 5 tokens to the right and left of the search term are displayed as a table. It also generates graphics that show the absolute and relative frequency of each query or ngram. The word counts are performed at a global corpus or single text basis, which is useful for different research aims.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s