After OCRing pdf images, I can search in Adobe, but find that the OCR conversion wasn't truly accurate. Is there a way to edit the text that is written behind the pdf through Adobe Acrobat?
After running the OCR did you then check for capture suspects?
I did, but I am working w/ 1000+ page documents of poor quality. The problem I'm encountering is that when I convert the pdf through the OCR program (OmniPage Pro), I lose all of my bookmarks. After I have the converted pdf I can redo the bookmarks, but if I want to make any changes to the searchable text, I have to send it back through OmniPage and I lose my bookmarks again.
Are you talking about Catapult? Yes, I see that Adobe makes a program. It appears to be that the main problem here is that Adobe doesn't want to cooperate w/ the other vendor's software (I have had a second problem that seems to confirm this). I am somewhat resentful about this and this will certainly affect my decision to use Adobe products in the future.
It is called Capture.
There is a program called Adobe Capture which is essentially for converting bulk paper into PDF and running OCR.
There is Paper Capture which is a plug-in for Adobe Acrobat - it is free and is included with Acrobat 6 and it can be downloaded for Acrobat 5.
I have Acrobat 6. I was not aware that the program had OCR capacities. In fact, I searched throughout the help extensively and could not find any help on OCR. I can convert the document, but now I am having problems correcting OCR suspects. When I select "find first OCR suspect," it returns a message saying that "Capture Complete" w/o finding any errors for me to fix. There are errors, however. Do you have any suggestions? Thanks for all of the help!
I see that you have to convert to "formatted text and graphics," when doing the paper capture, but I cannot compromise the integrity of the original image to make it searchable...any suggestions?
, but I cannot compromise the integrity of the original image to make
it searchable...any suggestions?
My experience with OCR is somewhat limited, but I seem to recall that (at least for some OCR software) there is an option to create an "Image with hidden text" type of file. That is, the "look" of your image would not be changed, but the text would be OCR'd. Some poor soul who has had more experience with OCR might be able to elaborate.
You have three options
Searchable Image (Exact)
Original Image visible but actual text hidden
Searchable Image (Compact)
As above but with compression whic reduces file size but also image quality.
Formatted Text and Graphics
As it says really.
Fr. Watson is right - that is what I want to accomplish. With Omnipage pro I can create a pdf image over text. However, I cannot seem to edit this text w/i Adobe. I work in a law firm and i cannot alter the appearance of the document for legal reasons. I need to be able to search through the pdfs, but I want to search text behind the document. Adobe paper capture does a really poor job w/ document of lower image quality. I need to be able to alter the text created by Omnipage or at least be able to edit the Adobe text from paper capture w/o altering the image.
I'm not really sure what you mean by that... I need to convert the pdf to searchable text. If I choose "searchable image (exact)" then I can't make edits to the text after Adobe converts to OCR. If I choose "formattable text and images" then I am stuck with the Adobe distortions of the text.
I'm not really sure what you mean by that... I need to convert the pdf
to searchable text. If I choose "searchable image (exact)" then I can't
make edits to the text after Adobe converts to OCR. If I choose "formattable
text and images" then I am stuck with the Adobe distortions of the text.
I wonder if there is a way that you could, say, export the OCR'd text to an RTF or text file, make the necessary changes, & then somehow 'slip it behind' the PDF image. I don't have enough experience to know if this is feasible. If I had to try, I think I would save/export the PDF as a text or RTF file, then open this in another program, and then make the changes necessary to the text. Once this was done, I would, if possible, make all the text white, overlay the text with images of the PDF document, then re-pdf the edited file. I think that this would work, but I have not done it.
You can't edit the text of a searchable image pdf as the text is 'hidden' under the 'image' in the pdf. You can only choose to capture suspects although I'm not sure if this will work with pdf files OCr-ed outside Acrobat.
You can't edit the text of a searchable image pdf as the text is 'hidden'
under the 'image' in the pdf.
I guess then the only thing might be to first do an ordinary OCR (i.e., convert to text only & no image), make corrections in that, then make that white, overlay an image, etc. Would that possibly work? It's labourious & not ideal, but might it be a workaround.
Hi Wynn,
Have you found a solution to the problem of editing the 'hidden text'? You are not on your own in coming across this frustrating problem after doing an 'Image' PDF to 'Image + hidden text' PDF conversion. I have also had the same experience trying to capture 'suspects' - in that no 'suspects' are found. I suspect this feature only works when scanning an original paper version rather than doing a file conversion.
It is a frustrating problem because the searchable text is obviously accessible. It can be copied and pasted into other applications - but not made visible and corrected within Acrobat.
I am trying to find out if the full version of Capture allows conversion of PDF's and editing of text.
There must be thousands of users with this problem.
Regards
David.
You can edit the text that is hidden behind the printed text, but the process is rather cumbersome.
Use the Touchup Text Tool to select the line of text that you wish to edit. Select all the text that is in the box. Right click and select 'Attributes'. There is a little box in the bottom left hand side of the Attributes dialog. Click this box and a colour palette will appear. Select a colour different from the printed one. For example, if the printed text is black, select Red. You will then be able to see the hidden searchable text and make the changes you require.