neqsim-python Pandas support

Dear Even Solbraa,

Thank you for putting this very nice library online. Do you have plans in the future to add support for pandas dataframes? Running the code to calculate the fluid properties for a single point (T, p) takes a few seconds - due to the connection with Java I guess - so I am a bit afraid of the time it will take to run the code on a (large) dataframe.

Best regards,

Mar 31 '20 07:03 UniqASL

Yes , there will be more integration of pandas dataframes in future releases. Recently it has been added support for creating fluids via dataframes.

See some examples in the Colab sheet: https://colab.research.google.com/github/EvenSol/NeqSim-Colab/blob/master/notebooks/PVT/PVTreports.ipynb

or:

https://github.com/equinor/neqsimpython/blob/master/examples/createFluid.py

Apr 16 '20 13:04 EvenSol

A benchmark running 5000 multiphase calculations for a simple gas/oil/water fluid is done in the linked Colab page. 5000 TPflash calculations takes about 5-6 sec in Colab/Python.

https://colab.research.google.com/drive/1JXaqqj1qkriY_DqT8nCpf0tCjNEcCvEW

Apr 16 '20 13:04 EvenSol

See this example filling a dataframe with properties:

https://github.com/equinor/neqsimpython/blob/master/examples/propertiesDataframes.py

Apr 16 '20 14:04 EvenSol

Thank you very much for taking some time for that. I tried your last example. The list having a length of 1,000 is running in approx. 14 s on my laptop. When I increase this value to 10,000 (~one year of data with an hourly frequency), it takes > 2 min.

The problem I guess is that when applying the function calcProperties on the df, python makes call to Java for each single point, which slows the calculations down. One option I can imagine would be to send the entire df to Java and then get the results back as list or df once Java ist done calculating everything.

Apr 17 '20 16:04 UniqASL

Yes, it will be some overhead when there are many calls to Java from Python. In the benchmark (https://equinor.github.io/neqsimhome/benchmark.html) it is indicated that the calculation speed is 2-3 times faster direct in Java compared to via Python. I will look into the reason for this, and if this can be improved. I guess every call to a java method has some overhead (even just reading some property), and that it can be improved by returning more information in each call. If the calculation will involve a process simulation for each time step (instead of just a flash and returning the properties of a fluid), I guess this overhead will be less significant.

I will look into your suggestion of sending the whole dataframe. Thanks for the suggestion.

Apr 17 '20 17:04 EvenSol

A new method has been implemented to fill a dataframe based on a list of tempeatures and pressures (method 2 in the example):

https://github.com/equinor/neqsimpython/blob/master/examples/propertiesDataframes.py

Probably the dataframes in the PySpark project will be a better solution in future work. This can be looked into when PySpark 3.0 will be released (the current version is based on an older version of py4j).

Apr 21 '20 21:04 EvenSol

Thanks for this! The second method is slightly faster eventhough it returns much more results than method 1 (18 with method 1, 63 with method 2). The overall calculations still remains quite slow however (~50 s with method 2 for 1000 values). Maybe PySpark will improve that!

Otherwise, I noticed that you make your imports in the code itself. Normally in python you should import eveything at the beginning. Moreover it is recommended to import entire modules. You can have a look here for instance, section "import".

Apr 27 '20 12:04 UniqASL