Linux Setup Notes

name and address
created jun 27, 2008

Converting LaTeX documents to MS Word

Although LaTeX produces output that is far superior to MS Word, it suffers from a fatal flaw: LaTeX document files are non-portable. Thus, it's often necessary to convert LaTeX files to MS Word. I tested a number of ways of doing this, using three types of files that are hard to convert: a table, a framebox, and an equation. I also tested their ability to handle Greek letters.

To test conversion of Greek letters in text mode, I put the following line in the declaration of a LaTeX file:
\font\gr = rgrrg10 scaled \magstep 1
Using this notation, the following LaTeX command produced a Greek alpha (α):
\gr a \rm.
In math mode, I used this instead: $\alpha$.

Examples of the Word output are shown below. This list makes no attempt to be complete. If you know of a better way to convert Latex to Word, feel free to let me know.

Latex2rtf

latex2rtf 1.9.16a (Wed Sep 21 18:09:29 2005)

Latex2rtf is a free, open source product that does an outstanding job on figures, thanks to its integration with GhostScript in Linux. It produces a usable RTF (Rich Text Format) file that needs considerable editing before it looks like the original, but there were no nasty surprises. Tables were usable, and equations were rendered in text mode, not converted to images. Font sizes were a little off, and it was necessary to use LaTeX's math mode to create Greek characters that would convert properly (see screen images below).

The biggest problem with latex2rtf is its incompatibility with the xspace package. Xspace should not be used if you plan to convert to the document to Word, because it will cause latex2rtf to truncate large sections of your text.

Nuance PDF Converter

Nuance is a commercial Windows program that can convert any PDF file into MS Word format. It can even convert PDFs that consist entirely of images. To convert a Latex file, you first convert it to PDF:
latex myfile
dvipdf myfile
Dvipdf is a script that calls dvips, like so:
exec dvips -q -f "$infile" | gs $OPTIONS -q -dNOPAUSE \
-dBATCH -sDEVICE=pdfwrite -sOutputFile="$outfile" $OPTIONS\
-c .setpdfwrite -

This produces a high-quality PDF file from your Latex document.

Nuance adds an extra menu item called "Open PDF File" to the File menu in Word. It "converts" the file using virtual optical character recognition. That means you can convert anything, but you also have to check the output for errors. Nuance sometimes gets the `nuances' wrong. For example, it often converted "USA" to "US A". But otherwise, it does a fantastic job. The margins and fonts are precisely the same as in LaTeX. Converting Bibtex references is easy, because to Nuance, they're just more pixels. Even frameboxes are converted perfectly.

The only problem comes with equations and Greek letters. Nuance hopelessly mangles equations. As for Greek letters, sometimes they are just dropped, and other times they're converted to images which are fixed in place at incorrect locations in your Word document. These have to be laboriously removed and replaced with symbols. Other times, Greek letters are converted to gibberish (`δ' is changed to '8', for instance).

Another problem is that Nuance tries to do too much. If your LaTeX file has images, Nuance will convert them to text as well. Although the result looks very much like the original, the result is actually a rat's nest of text boxes. Yes, you can edit them. But they're almost impossible to move or delete. (See the images below for examples.)

tth and latex2html

tth 3.81
latex2html Version 2002-2-1 (1.70)

Tth is a rapid Latex to HTML converter that runs in Linux. It compiles fast and runs fast. Unlike latex2html, tth puts all its results in a single file. It correctly inserted the Bibtex references and substituted the correct Greek letters using HTML 4.0 syntax like the following :
<font face="symbol">a</font>

Tth produces pretty good results within the limitations of HTML. However, when Greek letters were produced using text mode, tth incorrectly used English letters instead. Latex2html, on the other hand, converted all the Greek letters to images. Latex2html also had more difficulty with tables. Although latex2html converted equations into images, it was done correctly. Tth converted the equations to HTML characters. Windows Firefox and IE rendered the equations correctly, but Linux Firefox and Opera did not.

Chikrii Softlab TeX2Word 2.5

Chikrii TeX2Word is a Windows program that integrates with Word and converts the LaTeX file to Word as it's being read. It's a shareware program that is heavily copy-protected. On one computer, I had trouble getting Chikrii to start up. Apparently, I had tried an earlier version two years ago and deleted it for some reason. The new version insisted that my 30 day trial period had expired, and refused to run. To give Chikrii TeX2Word a fair test, I had to find another Windows computer and install it.

Chikrii TeX2Word was not as effective as some of the other programs and failed to render an equation, find and convert Bibtex entries, or create a table properly. This is consistent with the truism that the more time programmers spend on copy-protection, the less time they have to make a product that is worth protecting.

GrindEQ LaTeX to Word 2007

As with Chikrii, GrindEQ works by adding a menu item to MS Word's File menu. The installation and interfacing with Word were flawless. However, if the MS Equation Editor isn't found, and you don't put your Office CD in, you get the message:
Error 1706 No valid source could be found for product Microsoft Office 2000 SR-1 Professional. The Windows installer cannot continue.
However, the conversion results were much the same whether the Equation Editor was present or not.

GrindEQ adds an extra toolbar in Word with GrindEQ options. When I converted a document, GrindEQ popped up a message saying:
Update cross-references ('Update GrindEQ labels and references' button) and save the converted document in doc, docx, or rtf format.
Clicking on the Update button produced another message:
One or more fields in the selection could not be updated.
Afterward, instead of being replaced by error messages, the Bibtex references were replaced by question marks.

GrindEQ produced clean output, especially for the tables, although it seems to have its own preferences for fonts and margins. GrindEQ gave a better result when Greek letters were produced using text mode instead of math mode. In text mode, it substituted the corresponding English letters, while in latter case, the Greek letters were omitted altogether. I was unable to convert a math formula using GrindEQ; the result was an empty cursor (see image below.)

Output Results

Equations

Equations

Tables, Greek letters, and Bibtex references

Tables

Frame boxes

Frameboxes

Update (Sept. 24, 2008)

Another commercial program is AbleToExtract. This Windows program is similar to Nuance, but it handled Greek letters almost perfectly. Like Nuance, it mangles any images in the file by converting them to text boxes, which makes them impossible to move. The document is also full of hard returns, which makes editing tedious. But otherwise it does a fairly good job.

Conclusion

There was no correlation between quality and price. Nuance PDF converter produced the best result. None of the programs handled text mode Greek letters correctly. If all you need to do is send a Doc file to somebody else, the commercial converters work best, provided you remove the figures first. However, if you need to actually edit the file afterward, the best option is to use detex and convert the file manually to HTML. Caution is needed because detex frequently drops words from your document.

ΨΩd   ʕ▽王  ∠ηᚆ ႶθΨ

Back