Extracting PDF and Microsoft Word text using C#

1153270 pts.
Tags:
C
Microsoft Word
PDF
Anyone know how I could extract specific text from PDFs / Microsoft Word files in C#? I'm specifically looking to remove images / other rich text. Thank you.
1

Answer Wiki

Thanks. We'll let you know when a new response is added.
We have used this with good sucess
https://bytescout.com/

using System;
using Bytescout.PDFExtractor;
 
namespace ExtractAllText
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create Bytescout.PDFExtractor.TextExtractor instance
            TextExtractor extractor = new TextExtractor();
            extractor.RegistrationName = "demo";
            extractor.RegistrationKey = "demo";
 
            // Load sample PDF document
            extractor.LoadDocumentFromFile("sample2.pdf");
 
            // Save extracted text to file
            extractor.SaveTextToFile("output.txt");
 
            // Open output file in default associated application
            System.Diagnostics.Process.Start("output.txt");
        }
    }
}

Discuss This Question:  

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following

Share this item with your network: