Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

scanning a book to pdf

  • 13-08-2012 12:41pm
    #1
    Registered Users, Registered Users 2 Posts: 147 ✭✭


    I wastrying to help someone scan a booklet- he own the copyright- to pdf. I used the scanner on my lexmark x1250 printer/scanner to adobe acrobat.

    But the files were huge even at 300dpi. also the text was not very clear. It is light blue and even in colour came out blurry.

    I am reading a pdf that looks ok. It says it was created by ghostscript. Is such a program, or an image manipulation app really necessary or should i be able to scan direct to pdf and keep the file small?


Comments

  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    If you scan it's going to be images. They're bulky compared to text. Is the booklet mostly text? If it is I would suggest you scan it then do OCR on it to convert the text in the image in to editable text.

    Wouldn't worry too much about the scanned pictures being large. I would guess that they're bitmaps? (.bmp file) That will compress very well when it's put in the PDF.


  • Registered Users, Registered Users 2 Posts: 147 ✭✭whiteonblu


    thanks the book is text yes except the cover. I scanned to adobe acrobat would that be an image? can id ocr after scanning to acrobat or should i scan to a text file then ocr then acrobat?


  • Registered Users, Registered Users 2 Posts: 37,485 ✭✭✭✭Khannie


    If you can scan to a text file then that is probably OCR. You could then create a PDF from the text yourself. OpenOffice (free) has the ability to export documents to PDF. Not sure if MS office can do that or not as I haven't used it since.......some very long time ago.


  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    Scan to image, OCR the image into a word processor. Spell check, regex, check the formatting, proof read the document against the book and correct errors manually. Then convert to eBook (mind you, PDF isn't exactly the most popular format, certainly not when it comes to flow text eReading devices).


  • Registered Users, Registered Users 2 Posts: 37 Born To Be Mild


    Did he print to booklet himself, or does he have the design? If so, you can download PrimoPDF which installs as a printer. You can then hit print, select PrimoPDF as the printer and it creates a PDF file. This would be the best way and involve the least work, but is only possible if you have the booklet in digital form already (like in MS Word).


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 147 ✭✭whiteonblu


    Did he print to booklet himself, or does he have the design? If so, you can download PrimoPDF which installs as a printer. You can then hit print, select PrimoPDF as the printer and it creates a PDF file. This would be the best way and involve the least work, but is only possible if you have the booklet in digital form already (like in MS Word).
    don't have


  • Registered Users, Registered Users 2 Posts: 147 ✭✭whiteonblu


    i scanned a sample book lain flat on scanner showing two pages of just text to Acrobat reduced pdf and it is 120kb in size. Isn't that big?


  • Closed Accounts Posts: 6,224 ✭✭✭Procrastastudy


    No thats tiny

    120Mb is a big image file.


  • Registered Users, Registered Users 2 Posts: 147 ✭✭whiteonblu


    Thanks;). sorry do not understand. Is " just text to Acrobat reduced pdf" an image? Oh i see it is

    And while I can put comments i cannot edit the book from eg "i walked up the road" to " I ran up the road". Should I be able to with ocr


  • Closed Accounts Posts: 6,224 ✭✭✭Procrastastudy


    Adobe is a protected format with loads of formating and crap attached to it - does the OCR not allow you to output to RTF?


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 147 ✭✭whiteonblu


    Adobe is a protected format with loads of formating and crap attached to it - does the OCR not allow you to output to RTF?
    yes it does. I output to open office as rtf. i can edit and and exported as pdf. It came to 70kb. But some of the text is a bit smaller and there is one or two typos that are not in the orginal. How would it make a misread?
    Adobe is a protected format with loads of formating and crap attached to it
    Not sure what you mean by that


  • Closed Accounts Posts: 6,224 ✭✭✭Procrastastudy


    Scanner is taking an image as reading it into text - its very easy for it to mistake letters or the software to substitute a word if it's complex enough.

    E.g Basic OCR The long walk - might come up with - The Laog wall
    Software sees that and goes Laog thats not a work must mean lagg.

    Stupid example but hopefully you get the point.


  • Registered Users, Registered Users 2 Posts: 147 ✭✭whiteonblu


    Scanner is taking an image as reading it into text - its very easy for it to mistake letters or the software to substitute a word if it's complex enough.


    Ok I did not know that could happen
    E.g Basic OCR The long walk - might come up with - The Laog wall
    Software sees that and goes Laog thats not a work must mean lagg.
    see what you mean


  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    Adobe is a protected format with loads of formating and crap attached to it

    Adobe is not a format, it's a company. PDF on the other hand is a file format, formerly proprietary of said company but no longer protected, in fact it is an open standard since 2008.


  • Closed Accounts Posts: 6,224 ✭✭✭Procrastastudy


    Yep he's right if you want to get into it. Bottom line is if you output to PDF (the file format) rather than the Defense Force - you will have a harder time editing it and a bigger file size.

    Thank you for keeping us technically correct.


  • Registered Users, Registered Users 2 Posts: 147 ✭✭whiteonblu


    Torqay wrote: »
    PDF on the other hand is a file format, formerly proprietary of said company but no longer protected, in fact it is an open standard since 2008.
    what do you mean no longer protected/ open standard


  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    As I said before, with the rise of tablet computers and ereaders, PDF is not exactly the most popular publishing standard anymore.

    If you're going to scan the book (which I understand is text only), then OCR it, use a word processor, spell check, format the text properly and then proof it against the original. Then you can publish it in any format you want. Donkey work, I know it all too well... but the result is a helluva lot more satisfying than scanning to PDF.


  • Closed Accounts Posts: 6,224 ✭✭✭Procrastastudy


    I haven't used OCR is about a million years - what are you using btw? Is it built in to windows these days?


  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    Most folks I know swear on ABBYY FineReader, my tool of choice, however, is TopOCR (used to be free but has been rebranded recently and is now quite expensive as it's being sold bundled with a special HD camera), twas originally designed to deal with poor shots taken with cr@ppy old digicams and mobile phones, the results are stunning.

    And no, OCR is not build into Windows yet. ;)


  • Closed Accounts Posts: 6,224 ✭✭✭Procrastastudy


    Torqay wrote: »
    Most folks I know swear on ABBYY FineReader, my tool of choice, however, is TopOCR (used to be free but has been rebranded recently and is now quite expensive as it's being sold bundled with a special HD camera), twas originally designed to deal with poor shots taken with cr@ppy old digicams and mobile phones, the results are stunning.

    And no, OCR is not build into Windows yet. ;)

    I downloaded a copy of Free OCR - the issue is the books I'm trying to scan (law books are bloody expensive :D) don't sit flat. I might try and find a copy of Top so I can take photos - would save me even taking the things out of the library!

    Of course the copies would be destroyed at the end of the semester!:pac:


  • Advertisement
  • Closed Accounts Posts: 5,835 ✭✭✭Torqay


    Portableapps.com has a portable test build of TopOCR 3.1, which was the last free version.


  • Registered Users, Registered Users 2 Posts: 147 ✭✭whiteonblu


    Torqay wrote: »
    Most folks I know swear on ABBYY FineReader, my tool of choice, however, is TopOCR (used to be free but has been rebranded recently and is now quite expensive as it's being sold bundled with a special HD camera), twas originally designed to deal with poor shots taken with cr@ppy old digicams and mobile phones, the results are stunning.

    And no, OCR is not build into Windows yet. ;)
    That is what I am using . a slimed down version came with my scanner


Advertisement