Converting to grayscale or b/w for free OCR API


#1

I want to reduce bandwidth and increase recognition quality by converting locally stored images to grayscale or preferably black and white. I have tried ImageMagick but found it a bit unwieldy. Is there a preferred command line tool for this? Does the OCR service do this itself prior to recognition?


#2

We use ImageMagick ourself for test and development. So as far as command line tools go, I think it is a good choice.

The ocr api does not convert images to black and white before recognition. So you will have to test if this conversion influences the recognition rate. An increase or a decrease in recognition quality is possible - depending on the document/image and the exact type of pre-processing that you do.

If your documents are very high dpi (say 300dpi or higher) you can also test reducing it to e. g. 200 dpi - this can help saving bandwidth and improve the upload time. Often 200dpi is good enough for high quality recognition.

Last but not least using the PRO OCR API is another option to reduce the upload time signficantly. The PRO api has endpoints in USA, Europe and Asia. So one of them is probably closer to your place than the free ocr api endpoint.