Error when trying to pass AWS S3 Signed URL


#1

I get an error when try to pass AWS S3 Signed URL (like https://myinstance.s3.amazonaws.com/text.png?AWSAccessKeyId=aaa&Expires=1549023120&Signature=4pKoUuyax75MpctmoRwG8d%2BA2DY%3D&x-amz-security-token=FQoGZXIvYX). I use correct filetype: png and the same file without signed URL works just fine. Signed PDF files also don’t work with similar error. It is strange, because signed and unsigned files are identical.

{ 
  ParsedResults: [ { 
    FileParseExitCode: -10,
    ParsedText: '',
   ErrorMessage: 'Parsing Error: Please check the image. The image is corrupt.',
   ErrorDetails: 'Please check the image. The image is corrupt.' 
  } ],
  OCRExitCode: 3,
  IsErroredOnProcessing: true,
  ErrorMessage: [ 'All images/pages errored in parsing' ],
  ProcessingTimeInMilliseconds: '429',
  SearchablePDFURL: 'Searchable PDF not generated as it was not requested.' 
}

#2

My first guess is that the encryption confuses the automatic file type detection. Have you tried adding filetype=png to the API call?

I could not test because I don’t have your access key. So I only get <Message>Request has expired</Message>

If adding the filetype does not solve it, please provide a valid AWS S3 URL for us to test it. If you do not want to post it in the forum, please send it directly to us.


#3

I tried filetype, but without success

here is link, I made it valid for 48 hours https://rh-darkness-of-the-sun.s3.amazonaws.com/text.png?AWSAccessKeyId=ASIAZ5ZKREZ6EADF227E&Expires=1549206421&Signature=FfqqOpBIGaSDVVWCICns3M9XwHY%3D&x-amz-security-token=FQoGZXIvYXdzEGAaDNvPGF0FTUEQ4YgekCL8AatCjSlEOYesoosLjUFx%2B8Hv8gxPW8WBVmaj6cXWqC9MREQf6QnfKNfPYQtN0d3rSatQaHR%2Bj8Jfp5o1dF6kfSAuIgsYQGtmOlWQFZExYv42x6I%2FB97WTfJqG6pQKPZub%2FmJdd87jLLa4TKhngXvl6D%2FLNAlb8DK1n%2Fr%2BahgJ4L7C%2FiYey9P%2FzsC9XaG0cNNS6rPqh4nESGVNUWgUVwv%2BnCVzpNxemOms0jhUmIdkmwJhC39kBnTQE8Vi2bALG1%2FfyYy6Z55lPtxZY%2F4uiiYNRsC%2F%2BWG7Qzxj6bAz7q6VEzSzhcz6SnyeB6%2FIg5JzLiR%2BUbeatJz3k5Ir1hJGSiKv9HiBQ%3D%3D


#4

Thanks. I confirmed the issue and we will debugging it ASAP. If possible, please keep the link valid for more than 48h, I am not sure if we are that fast :wink:


#5

Here is a new link that expires in a week

https://rh-darkness-of-the-sun.s3.amazonaws.com/text.png?AWSAccessKeyId=ASIAZ5ZKREZ6DEHFGRE2&Expires=1549842531&Signature=0LzfQ9wFFueWYyMHZG0F5oNOZzE%3D&x-amz-security-token=FQoGZXIvYXdzEJH%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDMl3J1UpiWGC9e3hsyL8AaYnVmqYo%2BFqdpWBs33fuCi5t7Jt8MmOdS0sKV2q5k135stIZnoIaJf6biNBZUuDw4qscpjcgixQClIPk2yDjnX7gD3eTloPOxUjFz02B1wShsAq9hsCBOkQKIw8fONAo8nTGEJb5PoQp9yDhVsg8uHKky5uVefKtSYtGwjz7OTAC3nL%2FMdrrd7oMCjHjYNZJpzcwvFfXk0LxBjZfbla3IRBl%2FnFKaLpqzLvDWt9HqiHbIBgbY7WeIGwgYB7kaqQo%2BYKsZhxY%2F9LFhZzpwDTDlxWyOava51oDX4snTB%2FXnRVX6KxApY9BTt%2B18X8b5qSpyWjr%2FWAbythhE4LvijXmNziBQ%3D%3D


#6

Thanks for the links. We debugged this and found out what is wrong. The current code fails to download images and PDFs from signed URLs. So the image never reaches our OCR servers. As for the solution:

  • Do you have a PRO or PRO PDF plan? Then please contact us and we can add a solution to a PRO/PRO PDF endpoint on short notice. (In your mail, just mention this forum post).

  • Do you use the free ocr plan? Then please be a bit patient. The issue is now on the todo list for one of the next updates, but I don’t know when exactly the signed URL support will be available.