Skip to content Skip to sidebar Skip to footer

Library To Convert Word Document Text To Html

Is there a .Net open source library to convert the word dococument to HTML to display inside the webpage. I know several tools to convert word docs to html files, but my requireme

Solution 1:

You just want to convert a *.doc file to HTML? Is saving it as a a HTML file an option?

There is the standard .SaveAs method which has the option to save as HTML:

wdFormatHTML Saves all text and formatting with HTML tags so that the resulting document can be viewed in a Web browser.

from: MSDN SaveAs Method

An example tutorial on how to use the method to convert .doc to a different format you can find here: How to convert DOC into other formats using C#.

If you have *.docx files instead of *.doc files it is even easier because you get to use the OpenXML API like explained on MSDN here: Manipulating Word 2007 Files with the Open XML Format API (Part 1 of 3). And if you get the XML of the Word file you can of course output it to any format (HTML) you want.

Solution 2:

Convert your doc files to pdf with the help of JOdConverter and OpenOffice

See How to convert ppt to images in Ruby? for reference

and then use pdftohtml (http://pdftohtml.sourceforge.net) a utility which converts PDF files into HTML.

You will get amazing results.

Post a Comment for "Library To Convert Word Document Text To Html"