tika
tika copied to clipboard
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA)...
Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA)...
Thanks for your contribution to [Apache Tika](https://tika.apache.org/)! Your help is appreciated! Before opening the pull request, please verify that * there is an open issue on the [Tika issue tracker](https://issues.apache.org/jira/projects/TIKA)...
... in the mdule tika-parsers (and submodules)
This change will add FirstLanguage Translate API into TIKA.
Add parsing support for AC1027 and AC1032. If `dwgread` is installed, use it to parse the document. Test data is from https://github.com/LibreDWG/libredwg/tree/master/test/test-data/2018
Adds Myanmar LanguageProfile for Apache Tika https://issues.apache.org/jira/browse/TIKA-3340
Although there is no issue in the issue tracker, hopefully this is okay to submit as it's trivial.
Sox tool correctly pulls duration from wav files. I haven't seen any tests for external parsers anywhere. Sample output from sox --info duration-test-3.wav: Input File : 'duration-test-3.wav' Channels : 1...
Including changes in pom.xml for Findbugs To get report use command: mvn clean compile site and check the folder site in target.