Scanned image PDFs padding page landscape and cropping top off


#1

Hello there. I am working on a scanned invoice importing system using the OCR.space API and I am having some trouble with the searchable PDF feature. When I upload the document, the resulting PDF is padded right to make the page landscape (fitting the watermark in) and, because of this, the top of the actual page is cropped off. Here are some images to demonstrate (both have been doctored to remove any identifiable information):
Here’s the broken version as retrieved from the API:

!

(I’m only allowed one image per post as I’m new so original will be in the following post)

Doing some digging, it seems that using the Windows 10 built-in PDF printer manages to fix whatever the issue is, but I can’t have our clients doing that to every single PDF before putting them through our system. So, to help you identify what the issue could be, here’s the information I can see has changed between the original and the WIndows 10 version:

Original:
PDF Producer: Xerox WorkCentre 7845
PDF Version: 1.4 (Acrobat 5.x)
Page Size: 8.28 x 11.69 in

Windows 10:
PDF Producer: Microsoft: Print To PDF
PDF Version: 1.7 (Acrobat 8.x)
Page Size: 8.27 x 11.69 in

So I’m wondering if it’s the page size that’s throwing things off, but it’s such a minor size difference I’m not sure.

If necessary, I can talk to someone at a9t9 privately to share the original invoices securely, but I think you can appreciate I don’t want to share these publically.

Thanks in advance to anyone who can help out.


#2

The original:

Again, this has been doctored to remove identifiable info.


#3

This seems strange. You upload a document in normal format, and the searchable PDF is returned in landscape format? When I use the censored image as input, all works fine.

=> Can you send me the original input for testing? You can send it directly to team (at) a9t9.com and mention this forum post.