Extract Text
Introduction
This program is able to extract the text content of different types of documents.
It is based on the technology in the Microsoft Index Server, which uses something
called IFilters to index text in files.
Using The Program
The program is very simple to use. It is a command line utility and takes
only two parameters. It has to know the file name of the document that you want to
extract text from. It also needs the file name of the new file that should hold
the extracted text.

Before you are able to run the program you need the
following installed on your system:
- Microsoft.NET Framework 2.0.
Installation
This program is just an executable file. It doesn't require any installation.
You simply unzip the downloaded file and copy the executable file to
the folder of your choice.
Extract Text From PDF Documents
The PDF filter DLL needed to extract text from PDF files was included
with Adobe Reader 7.0.5 to 9.x.
Starting with the release of Adobe Reader 10 also known as Adobe Reader X,
this DLL is no longer part of the Adobe Reader installation.
You can still extract text from PDF files if you run Adobe Reader X or
another brand of PDF reader.
Adobe has a separate download that will install the filter you need.
Please follow the link below to get the IFilter from Adobe.
Download Adobe PDF IFilter v6.0
Extract Text From Office 2007 Documents
Microsoft offers a filter pack that enables you to extract text from the following file formats:
.docx, .docm, .pptx, .pptm, .xlsx, .xlsm, .xlsb, .zip, .one, .vdx, .vsd, .vss, .vst, .vdx, .vsx, and .vtx.
Download the filter pack
Extract Text From PDF Documents
The filter you need to extract text from PDF documents is already on your computer if you have installed the Adobe Reader.
Version History
2011-02-15 (2.0.0.0)
- Always runs in x86 mode to support more filters on 64 bit machines.
2010-02-12 (1.0.0.0)
Additional Resources
|