nifi icon indicating copy to clipboard operation
nifi copied to clipboard

NIFI-12959: Support loading python processors from NARs

Open markap14 opened this issue 1 year ago • 4 comments

Summary

NIFI-00000

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • [ ] Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • [ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • [ ] Pull Request based on current revision of the main branch
  • [ ] Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • [ ] Build completed using mvn clean install -P contrib-check
    • [ ] JDK 21

Licensing

  • [ ] New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • [ ] New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • [ ] Documentation formatting appears as expected in rendered files

markap14 avatar Mar 26 '24 20:03 markap14

Nice improvement Mark! Would this approach help with a processor like ParseDocument where Tesseract has to be installed on the node first?

pvillard31 avatar Mar 27 '24 10:03 pvillard31

@pvillard31 no, it wouldn't really help there. The issue there is that a native library like Tesseract would need to be compiled for the correct OS and architecture. So in order to include that, we'd need to include many copies of the library. And currently, we do not set any sort of environment variables telling it to search for native libraries in the NAR's unpacked directory. It might be worth exploring as a future improvement, though. Even though it may not help for something like ParseDocument, it might be helpful at least for custom nars, where perhaps you know that you're only going to run in containers or on a specific architecture/os so you can include the appropriate library for the custom nar?

markap14 avatar Mar 27 '24 13:03 markap14

Yeah, I must admit that I tried to figure out a way to use ParseDocument with NiFi in a container and I didn't manage to find a clean way to bring Tesseract into it. I feel like it would probably be easier to have it as a side container or something but I didn't find a way to piece everything together so I was naively (without too much hopes) thinking that this improvement may help :)

pvillard31 avatar Mar 27 '24 13:03 pvillard31

@dan-s1 thanks great catch. I removed some carrige-return-newline combos in that file and my IDE did something I didn't expect :) Fixed that.

markap14 avatar Mar 27 '24 13:03 markap14

Hi ,

I found a bug. if the python processor file name is less in the ASCII order than META-INF & the NAR-INF while trying to follow the structure:

my-nar.nar
+-- META-INF/
    +-- MANIFEST.MF
+-- NAR-INF/
    +-- bundled-dependencies/
        +-- dependency1
        +-- dependency2
        +-- etc.
+-- MyProcessor.py

You will get the following error:

java.io.FileNotFoundException: work\nar\extensions\nifi-AProcessor-nar-2.0.0-M4.nar-unpacked\MyProcessor.py (The system cannot find the path specified)

I found that while trying to convert the ChunkDocument python processor created as python extension to a NAR package. I copied all the dependencies and created a MANIFEST.MF file however I kept getting FileNotFoundException . I noticed if I have that file inside a directory I dont get this error but I wont see the processor in the Nifi UI which could be another bug because Jira Ticket 12959 states that the MyProcessor.Py could be module or a directory. It doesnt work!

If I rename the file to something bigger in order than META & NAR directory such as MFChunkDocument (MF > ME as META) I dont get this error and the processor is available in the UI.

Hope that makes sense.

samer1977 avatar Oct 13 '24 22:10 samer1977