Scan of lab test creates many errors


#1

I used OCR.space to scan a high quality pdf:
http://www.davidjpotter.com/2018.06.22_Labcorp_CMP.pdf

I used table recognition.

But the scan gives many errors

“Bilirubin” becomes “Bil i rubin”
"Iron: becomes “I ron”
“Hemoglobin” becomes “Hemogl 0b in”
"Neutrophils " becomes “Neutrophi Is”

How can I get a better scan?


#2

Thanks for the interesting test case. I confirmed your result. I would say that the overall OCR quality is good, but not great.

You can improve the result further:

  • Take high resolution screenshots of the PDF and send these images to the OCR API. By doing the PDF to image conversion on your side, you are in full control of the image resolution and can optimize it for the best result. I did some tests with high resolution screenshots and improved the accuracy significantly. For example “Bilirubin” becomes really “Bilirubin”.

  • Or: If you use the PRO PDF plan, please contact us directly. We can then tweak the PDF processing parameters in your account for an optimized OCR result.