NIFI-12959: Support loading python processors from NARs
Summary
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
- [ ] Apache NiFi Jira issue created
Pull Request Tracking
- [ ] Pull Request title starts with Apache NiFi Jira issue number, such as
NIFI-00000 - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such
NIFI-00000
Pull Request Formatting
- [ ] Pull Request based on current revision of the
mainbranch - [ ] Pull Request refers to a feature branch with one commit containing changes
Verification
Please indicate the verification steps performed prior to pull request creation.
Build
- [ ] Build completed using
mvn clean install -P contrib-check- [ ] JDK 21
Licensing
- [ ] New dependencies are compatible with the Apache License 2.0 according to the License Policy
- [ ] New dependencies are documented in applicable
LICENSEandNOTICEfiles
Documentation
- [ ] Documentation formatting appears as expected in rendered files
Nice improvement Mark! Would this approach help with a processor like ParseDocument where Tesseract has to be installed on the node first?
@pvillard31 no, it wouldn't really help there. The issue there is that a native library like Tesseract would need to be compiled for the correct OS and architecture. So in order to include that, we'd need to include many copies of the library. And currently, we do not set any sort of environment variables telling it to search for native libraries in the NAR's unpacked directory. It might be worth exploring as a future improvement, though. Even though it may not help for something like ParseDocument, it might be helpful at least for custom nars, where perhaps you know that you're only going to run in containers or on a specific architecture/os so you can include the appropriate library for the custom nar?
Yeah, I must admit that I tried to figure out a way to use ParseDocument with NiFi in a container and I didn't manage to find a clean way to bring Tesseract into it. I feel like it would probably be easier to have it as a side container or something but I didn't find a way to piece everything together so I was naively (without too much hopes) thinking that this improvement may help :)
@dan-s1 thanks great catch. I removed some carrige-return-newline combos in that file and my IDE did something I didn't expect :) Fixed that.
Hi ,
I found a bug. if the python processor file name is less in the ASCII order than META-INF & the NAR-INF while trying to follow the structure:
my-nar.nar
+-- META-INF/
+-- MANIFEST.MF
+-- NAR-INF/
+-- bundled-dependencies/
+-- dependency1
+-- dependency2
+-- etc.
+-- MyProcessor.py
You will get the following error:
java.io.FileNotFoundException: work\nar\extensions\nifi-AProcessor-nar-2.0.0-M4.nar-unpacked\MyProcessor.py (The system cannot find the path specified)
I found that while trying to convert the ChunkDocument python processor created as python extension to a NAR package. I copied all the dependencies and created a MANIFEST.MF file however I kept getting FileNotFoundException . I noticed if I have that file inside a directory I dont get this error but I wont see the processor in the Nifi UI which could be another bug because Jira Ticket 12959 states that the MyProcessor.Py could be module or a directory. It doesnt work!
If I rename the file to something bigger in order than META & NAR directory such as MFChunkDocument (MF > ME as META) I dont get this error and the processor is available in the UI.
Hope that makes sense.