Excel importer (app store) and Apache Tika

1
Hi, We intend to use the Apache Tika library to extract text from our file documents. This works fine. However, when combined with an application that is already using the Excel importer app store module, the Tika extraction functionality does not work for MSOffice related documents. The issue seems to be caused by overlapping/duplicate jar files. The Excel importer contains the Apache Poi libraries (dated 20091214) and these are also (only in a more recent version) part of the Apache Tika jar. When I remove the Excel importer related jars (poi-ooxml-3.6-20091214.jar and poi-ooxml-schemas-3.6-20091214.jar) the issue is resolved. Since the Excel importer functionality is used frequently I am a bit reluctant to remove the standard poi libraries and replace them by the ones included in the Tika jar (even though compilation does not give any errors). What's the best approach to ensure that all functionality of the Excel importer is still working properly? Is testing the only way or are there more advanced options? Thanks. Brian
asked
0 answers