Question

PDF to image

0

Has anyone had success using OCR to read a .pdf file to scan a certain section of a page? We were able to get a PDF reader to provide us a string of a pdf, however we also need the address from the page and it is providing it to us in a mixed format. Any suggestions or solutions to this problem is appreciated.

asked 2024-01-18

Dominik Kreslo

3 answers

Ruud Fleskens · Answer 1 · 2024-01-20

We recently used azure document intelligence to extract data from pdf files. You can model / design your own template using a visual tool provided by azure. Works like a charm!

https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/overview?view=doc-intel-4.0.0

Ivo Sturm · Answer 2 · 2024-01-20

Mendix also has a module called Amazon Textractas part of their close collaboration with AWS. Did you check that out? Next to text it can also extract other data from documents.

Dominik Kreslo · Answer 3 · 2024-02-07

What we ended up doing is use the pdf reader, split the string with \n, and loop through each iterator until we reached a certain index. We were reading invoices so there were little difference in the structure of rows provided. As a non-cloud method this was the best solution we could come up with.

A cloud method that Ivo and Rudd provided are good options.