PDF to image

0
Has anyone had success using OCR to read a .pdf file to scan a certain section of a page? We were able to get a PDF reader to provide us a string of a pdf, however we also need the address from the page and it is providing it to us in a mixed format. Any suggestions or solutions to this problem is appreciated.
asked
3 answers
1

We recently used azure document intelligence to extract data from pdf files. You can model / design your own template using a visual tool provided by azure. Works like a charm!

 

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/overview?view=doc-intel-4.0.0

answered
1

Mendix also has a module called Amazon Textractas part of their close collaboration with AWS. Did you check that out? Next to text it can also extract other data from documents.

answered
0

What we ended up doing is use the pdf reader, split the string with \n, and loop through each iterator until we reached a certain index. We were reading invoices so there were little difference in the structure of rows provided. As a non-cloud method this was the best solution we could come up with.

 

A cloud method that Ivo and Rudd provided are good options.

answered