ADOBE ACROBAT WINDOWS 70 EDITTING TEXT FROM OCR CONVERSION
From: wynn_williamson@no-spam
Subject: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 08:53:58 -0800


After OCRing pdf images, I can search in Adobe, but find that the OCR conversion wasn't truly accurate. Is there a way to edit the text that is written behind the pdf through Adobe Acrobat?


























From: Jonathan_H@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 09:21:32 -0800

After running the OCR did you then check for capture suspects?


From: wynn_williamson@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 10:00:09 -0800

I did, but I am working w/ 1000+ page documents of poor quality. The problem I'm encountering is that when I convert the pdf through the OCR program (OmniPage Pro), I lose all of my bookmarks. After I have the converted pdf I can redo the bookmarks, but if I want to make any changes to the searchable text, I have to send it back through OmniPage and I lose my bookmarks again.



From: wynn_williamson@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 11:12:32 -0800

Are you talking about Catapult? Yes, I see that Adobe makes a program. It appears to be that the main problem here is that Adobe doesn't want to cooperate w/ the other vendor's software (I have had a second problem that seems to confirm this). I am somewhat resentful about this and this will certainly affect my decision to use Adobe products in the future.



From: Jonathan_H@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 11:15:43 -0800

It is called Capture.

There is a program called Adobe Capture which is essentially for converting bulk paper into PDF and running OCR.


There is Paper Capture which is a plug-in for Adobe Acrobat - it is free and is included with Acrobat 6 and it can be downloaded for Acrobat 5.



From: wynn_williamson@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 12:08:37 -0800

I have Acrobat 6. I was not aware that the program had OCR capacities. In fact, I searched throughout the help extensively and could not find any help on OCR. I can convert the document, but now I am having problems correcting OCR suspects. When I select "find first OCR suspect," it returns a message saying that "Capture Complete" w/o finding any errors for me to fix. There are errors, however. Do you have any suggestions? Thanks for all of the help!



From: wynn_williamson@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 12:26:27 -0800

I see that you have to convert to "formatted text and graphics," when doing the paper capture, but I cannot compromise the integrity of the original image to make it searchable...any suggestions?



From: Fr._Watson@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 12:48:28 -0800

, but I cannot compromise the integrity of the original image to make it searchable...any suggestions?

My experience with OCR is somewhat limited, but I seem to recall that (at least for some OCR software) there is an option to create an "Image with hidden text" type of file. That is, the "look" of your image would not be changed, but the text would be OCR'd. Some poor soul who has had more experience with OCR might be able to elaborate.



From: Jonathan_H@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 12:54:40 -0800

You have three options
Searchable Image (Exact)
Original Image visible but actual text hidden
Searchable Image (Compact)
As above but with compression whic reduces file size but also image quality.

Formatted Text and Graphics As it says really.


From: wynn_williamson@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 12:59:38 -0800

Fr. Watson is right - that is what I want to accomplish. With Omnipage pro I can create a pdf image over text. However, I cannot seem to edit this text w/i Adobe. I work in a law firm and i cannot alter the appearance of the document for legal reasons. I need to be able to search through the pdfs, but I want to search text behind the document. Adobe paper capture does a really poor job w/ document of lower image quality. I need to be able to alter the text created by Omnipage or at least be able to edit the Adobe text from paper capture w/o altering the image.



From: wynn_williamson@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Thu, 1 Apr 2004 13:41:03 -0800

I'm not really sure what you mean by that... I need to convert the pdf to searchable text. If I choose "searchable image (exact)" then I can't make edits to the text after Adobe converts to OCR. If I choose "formattable text and images" then I am stuck with the Adobe distortions of the text.



From: Kamilyon_Bambiraptor@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Sun, 4 Apr 2004 12:25:45 -0800

I'm not really sure what you mean by that... I need to convert the pdf to searchable text. If I choose "searchable image (exact)" then I can't make edits to the text after Adobe converts to OCR. If I choose "formattable
text and images" then I am stuck with the Adobe distortions of the text.

I wonder if there is a way that you could, say, export the OCR'd text to an RTF or text file, make the necessary changes, & then somehow 'slip it behind' the PDF image. I don't have enough experience to know if this is feasible. If I had to try, I think I would save/export the PDF as a text or RTF file, then open this in another program, and then make the changes necessary to the text. Once this was done, I would, if possible, make all the text white, overlay the text with images of the PDF document, then re-pdf the edited file. I think that this would work, but I have not done it.



From: de_Siem@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Mon, 5 Apr 2004 01:37:49 -0700

You can't edit the text of a searchable image pdf as the text is 'hidden' under the 'image' in the pdf. You can only choose to capture suspects although I'm not sure if this will work with pdf files OCr-ed outside Acrobat.



From: Kamilyon_Bambiraptor@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Mon, 5 Apr 2004 08:11:22 -0700

You can't edit the text of a searchable image pdf as the text is 'hidden'
under the 'image' in the pdf.

I guess then the only thing might be to first do an ordinary OCR (i.e., convert to text only & no image), make corrections in that, then make that white, overlay an image, etc. Would that possibly work? It's labourious & not ideal, but might it be a workaround.



From: David_C_Rowland@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Tue, 4 May 2004 23:03:04 -0700

Hi Wynn,
Have you found a solution to the problem of editing the 'hidden text'? You are not on your own in coming across this frustrating problem after doing an 'Image' PDF to 'Image + hidden text' PDF conversion. I have also had the same experience trying to capture 'suspects' - in that no 'suspects' are found. I suspect this feature only works when scanning an original paper version rather than doing a file conversion.


It is a frustrating problem because the searchable text is obviously accessible. It can be copied and pasted into other applications - but not made visible and corrected within Acrobat.


I am trying to find out if the full version of Capture allows conversion of PDF's and editing of text.


There must be thousands of users with this problem.

Regards
David.


From: Andrew_E_D_Clark@no-spam
Subject: Re: Editting text from OCR Conversion
Date: Wed, 5 May 2004 04:08:14 -0700

You can edit the text that is hidden behind the printed text, but the process is rather cumbersome.


Use the Touchup Text Tool to select the line of text that you wish to edit. Select all the text that is in the box. Right click and select 'Attributes'. There is a little box in the bottom left hand side of the Attributes dialog. Click this box and a colour palette will appear. Select a colour different from the printed one. For example, if the printed text is black, select Red. You will then be able to see the hidden searchable text and make the changes you require.