There are java package that takes text out of an image. Stack overflow thread
Afterwards you could do a word count to ensure that a large enough amount of text could be read. If a document is not readable for the comptuer, then it also will not be very human readable.