As linguists, we spend a considerable share of out time doing research on the web. Ads and pop-ups can be extremely distracting but there other more serious threats to your machine that normally lie under cover. Ransomware and scamware, for example, take over your computer and threaten to eliminate your files if you don’t pay up. The malware then starts erasing your files if you don’t provide a hefty bribe to some account that they provide.
It’s not until very recently that I became aware of Adblock Plus, a free extension for Firefox and Chrome, which allows you to block internet ads, pop-ups, unwanted images and, generally speaking, most incarnations of malware. The extension is supported by over 40 filter subscriptions in dozens of languages which automatically configure it for purposes ranging from removing on-line advertising to blocking all known malware domains. Adblock Plus also allows you to customize your filters with the assistance of a variety of useful features, including a context option for images, a block tab for Flash and Java objects, and a list of blockable items to remove scripts and stylesheets.
I confess that at the beginning it felt rather unorthodox browsing the web without ads, but I’m definitely loving it now.
Sometimes it may be useful to re-purpose the contents of a TMX file in MS Excel, Word or other applications (say you want to perform a thorough spellcheck or combine the content of the TMX with other bilingual date you may have.
To do so, proceed as follows:
1. Download, install and launch Oliphant
2. In Oliphant, press Ctrl+O and locate your TMX file
3. Go to File > Export and select Wordfast file (.txt) from the Save as type options
4. Set the export parameters accordingly
5. Copy the contents of the .txt file into Excel.
If you then need to recreate a TMX file from your export, check out my earlier post.
ApSIC Xbench is a free QA tool that allows localization professionals to perform many checks typically available on full-fledged CAT/TEnT environments. One of the lesser-known features, however, is the ability to convert the main proprietary and open-source translation file formats to a TMX or a plain text file for further processing.
Here’s how you do it:
1. Download ApSIC Xbench
2. Select Project > New
3. Select a file type from the list and click on Add
4. Locate the file on your machine and hit Next
5. Then go to the Tools menu and select Export Items…
6. Tick the relevant boxes in the Filtering Section. Under Output choose TMX or TXT as the export file type and select the location of the export file. If you are exporting to TMX, you will also need to specify the source and target language pairs.
By the way, the tool also accepts SDLXLIFF files, so you could, for example, grab a bunch of these files, convert them to TMX and import them back to your main TM. To do so, in Step 3 above you would select XLIFF file, hit Next and then under Files of type choose All files (.*) under the dropdown menu.
Even though Excel boasts a decent track changes feature (in Excel 2007-2010, it sits under the Review tab, and then Track Changes > Highlight Changes), we are still missing a Compare Document tool as in Word.
Spreadsheet Compare is a free Excel plug-in that allows you to perform cell based comparisons between two workbooks. The differences will be highlighted in yellow, like so:
The tool also allows you to automatically generate a very handy track-changes report:
After installing the plug-in from the link above, proceed as follows:
- Open the Workbook or Workbooks to be compared.
- Start Spreadsheet Compare.
- On the ‘Select two spreadsheets to be compared’ form, select the Workbook(s) to be compared. To compare two worksheets within the same workbook, select that workbook in both drop down list. Click next.
- On the ‘Processing Options’ form, select the processing options that you want:
- Start Row – Starts the compare from a particular row (useful when running long compares that have failed because of a mis-match).
- Delete Change Column – Self explanatory.
- Clear Existing Sheet Colours – Removes any cell colouring from the worksheet.
- Case Sensitive Comparison – self explanatory. If this is unchecked, the cell values read are converted to strings and the changed to upper case and compared. Should be checked by default.
- Add Change Column to show – Adds a column to the spreadsheet to indicate changes.
- Mismatched Column Name – Useful if the first row is a column name (for DB comparisons).
- Count of Mismatched Cells – Self explanatory.
- Eye-catchers – Self explanatory.
- Highlight changes with: – Set a colour to highlight the changes.
- On the ‘Select Worksheets from First Workbook’ form, if the workbook contains worksheets that you do not want to compare, select them and click the remove button. If you are comparing worksheets within the same workbook, remove one of the two worksheets that you want to compare. Click next.
- On the ‘Select Worksheets from Second Workbook’ form, if the workbook contains worksheets that you do not want to compare, select them and click the remove button. If you are comparing worksheets within the same workbook, remove the worksheet kept in the previous form. Click next.
- On the ‘Final Options’ form, check any final options:
- Stop on miscompare – stops on any miscompare.
- Stop on miscompare in the first column only – self explanatory.
- Generate Report – self explanatory.
- Click ‘Start Compare’
The only downside of this tool is that the comparison is made at the cell level, not at word or character level, and as linguists we all know too well that Excel tends to double as a word processor these days.
I have tried a couple of different tools which claimed to be able to compare Excel files at a more granular level, namely Beyond Compare and Araxis Merge, but the results were not satisfactory at all with my sample files (the results were either inaccurate or complete gibberish, even with nearly identical files). Mind you, these tools have other very useful applications (e.g. comparing folder structures or tagged file formats) but I can’t recommend them for this purpose.
If a cell-based comparison won’t do for you, you may try to copy the contents of the two worksheets to Notepad, then to MS Word, save them as two separate files and run the Review>Compare function in Word.
The WordPress.com stats helper monkeys prepared a 2012 annual report for this blog.
Here’s an excerpt:
The new Boeing 787 Dreamliner can carry about 250 passengers. This blog was viewed about 1,400 times in 2012. If it were a Dreamliner, it would take about 6 trips to carry that many people.
Click here to see the complete report.
Here’s another great source to enhance your Translation Memories or to build a new TM from scratch: the European Parliament Proceedings Parallel Corpus 1996-2011 . Last updated on 15th May 2012, it contains EU parliamentary proceedings in 21 languages: Romanic (French, Italian, Spanish, Portuguese, Romanian), Germanic (English, Dutch, German, Danish, Swedish), Slavik (Bulgarian, Czech, Polish, Slovak, Slovene), Finni-Ugric (Finnish, Hungarian, Estonian), Baltic (Latvian, Lithuanian), and Greek.
After you download the corpus in your specific language combination(s), you will not get a TMX file but rather a .tar archive with source and target languages saved as .txt files. In order to convert them to a TMX, you will need to align these first, using YouAlign, for example.
TypeIt is an on-line tool which allows you to type accent marks, diacritics and other characters in 21 languages plus the International Phonetic Alphabet and some special symbol sets. This is a very handy tool if you need to quote text in a different language/alphabet, if you’re struggling to find a special character or don’t have immediate access to a localized keyboard.