Suppose you have an aligned corpus in Excel (or any other delimited format) and you wish to reuse that content in your favorite CAT tool. It’s actually very easy to convert a bilingual text format to TMX via a very-handy and free open-source tool called Oliphant.
- Ensure you have an aligned corpus in Excel, with the leftmost column containing the source text and the target in the next column. If your corpus is not perfectly aligned, you may want to check out my early post about a down and dirty alignment tool.
- Paste the bilingual table in Notepad and save the file, ensuring the encoding is set to UTF-8.
- Download, install and launch Oliphant
- Press Ctrl+N to create a new TM and add your language code to the target field.
- Then go to File>Import and choose Tab-delimited files (.txt) from the dropdown menu. Locate the file you created in step 2 and hit Open.
- In the Destination Field, set the Field Type of Column 1 to Text, Language EN-US (or whatever source language you’re working with), and for Column 2, Text again as Field Type and your target language code in the Language field.
- Press OK and hit Save. Your bilingual corpus has been converted to a TMX file!
Oliphant is fitted with powerful editing tools including advanced Find/Replace and also gives you the ability to delete, add, merge and edit segments on the spot.