Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Free .NET PDF converter to text

  • 03-11-2007 6:24am
    #1
    Closed Accounts Posts: 1


    Hey all!

    Could anybody help me? I'm seeking for free .NET component which can get text from PDF and push it into text file. Do you know any?

    My purpose is creation of small console utility which can perform convertion from command line for many PDF files.


Comments

  • Closed Accounts Posts: 3 raydenvm


    Text Mining Tool can help you. I have been using it for last two weeks - free, stable, good software.


  • Registered Users, Registered Users 2 Posts: 1,393 ✭✭✭Inspector Gadget


    I have to admit to being a tiny bit unsure about this, but as far as I remember text extraction from PDFs isn't foolproof; this is because text can be formed in a PDF in one of three ways:
    1. As a proper marked-up text block
    2. As a series of vector shapes without any indication that what's being drawn are actually letters
    3. As a bitmap (for example, a scanned document)
    Getting text from (1) is relatively simple (though there's often work involved in figuring out how all of the text blocks are positioned relative to each other, but I'm sure that text mining tool thingy does that fine). Getting text from (2) or (3) though essentially requires an OCR engine.

    Basically, you should check the type of documents you've got in Acrobat or similar; if the text can be selected within Acrobat, you're laughing, if not, you may have more work ahead of you.

    Gadget


Advertisement