pyfilesystem2 icon indicating copy to clipboard operation
pyfilesystem2 copied to clipboard

Include in-memory fs into python path

Open Make42 opened this issue 5 years ago • 19 comments

I would like to write a module, that I decrypt during program execution, into an in-memory filesystem, so that he decrypted file is not on disk. Then I would like to import that module. How can I add the path of the file in the in-memory fs of pyfilesystem2 to the pythonpath? Usually I would do something like

import sys sys.path.append('path/to/module')

but I don't see how this would work here.

Make42 avatar Apr 23 '20 15:04 Make42

Your best bet is probably to write a custom importer here, like defined in PEP 302, that supports a pyfilesystem.

althonos avatar Apr 23 '20 16:04 althonos

I looked into that, but I am a bit out of my depth here and there seems to be no tutorial how to approach this. Do you know any tutorial except the offical documentation (which is either not deep enough or too deep) to help me solve this?

Make42 avatar Apr 23 '20 19:04 Make42

So it sounds like you 'only' want to be able to load a python module, which doesn't exist as plain-text on disk? It's something I've never looked into myself (so can't offer any advice), but from doing a quick search it looks like @althonos 's has suggested the best approach. However, why does pyfilesystem need to be involved? Might be easier to just have your custom importer do the on-the-fly decryption?

Ooh! I just remembered about this in the old (deprecated) PyFilesystem1 - maybe that'll provide a good starting point for you?

lurch avatar Apr 23 '20 19:04 lurch

@Make42 An import hook is probably the way to go, and the code @lurch found is a good starting point if you want to port that to fs2.

Another option if the code is a single self-contained file would be to read the code in to memory and exec it. Of course, you would have to be absolutely certain that you can trust the code, but it would be no worse than importing it.

Yet another option that may or may not fit your use case, is to register a custom code with the codecs module. You can then add an encoding declaration to your code, e.g. # coding=encrypted, and let Python invoke the decryption for you.

willmcgugan avatar Apr 23 '20 21:04 willmcgugan

Something else to bear in mind when dealing with "security sensitive code" is ensuring that you don't lull yourself into a false sense of security :police_officer: E.g. even if "the decrypted file is not on disk" that doesn't do you much good if the user is able to gain access to the encryption key and "manually" decrypt the encrypted file :wink:

lurch avatar Apr 23 '20 21:04 lurch

I read https://dev.to/dangerontheranger/dependency-injection-with-import-hooks-in-python-3-5hap but this does not solve the issue: Here the author build and import hook and "imports" a class that has been defined as code.

Ooh! I just remembered about this in the old (deprecated) PyFilesystem1 - maybe that'll provide a good starting point for you?

Thanks! I will look into it. Is this still available in PyFilesystem2? Is the documentation still valid?

However, why does pyfilesystem need to be involved? Might be easier to just have your custom importer do the on-the-fly decryption?

@lurch Yes you are right. What I am doing is to have companion file (let's not worry about the accessability of this one for the sake of argument for now) which is able to decrpyt the file. Now the file is in-memory as a Python file object myfile as in

with open('myfile.epyd') as file:
      myfile = decrypt(file)

Here the myfile.epyd is a pyd-file that I encrypted and put on disk. There is also a file myfile.pyi. If I do not encrypt, I would have the two file myfile.pyi and myfile.pyd and I would simply import with import myfile. But since I encrypted, I have this object myfile which I want to import but do not know how. So, my idea is to write it into an in-memory fs as myfile.pyd together with myfile.pyi and give the import system the chance to find those files in the in-memory fs.

I created the pyd and pyi file with Nuitka, btw.

Another option if the code is a single self-contained file would be to read the code in to memory and exec it. Of course, you would have to be absolutely certain that you can trust the code, but it would be no worse than importing it.

@willmcgugan Because of what I wrote earlier, it is not a string, but a pyd file, I am not able to use exec. I considered that as well.

Yet another option that may or may not fit your use case, is to register a custom code with the codecs module. You can then add an encoding declaration to your code, e.g. # coding=encrypted, and let Python invoke the decryption for you.

I was not aware of the codecs module. Might have to look into it.

E.g. even if "the decrypted file is not on disk" that doesn't do you much good if the user is able to gain access to the encryption key and "manually" decrypt the encrypted file 😉

@lurch: Yes, I am aware of that :-).

Make42 avatar Apr 23 '20 22:04 Make42

@lurch How can I donwload and install the original PyFilesystem - from the https://pypi.org/project/fs1/ repository? Afterall, PyFilesystem2 does not have this hook. EDIT: I think I got this part work:

I installed via pip install fs1, then changed the project using 2to3, but now I get

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm 2019.1.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "C:\Users\richard.leibrandt\AppData\Local\Continuum\miniconda3\envs\Klassierung\lib\site-packages\fs1\expose\importhook.py", line 186, in load_module
    exec(code in mod.__dict__)
TypeError: exec() arg 1 must be a string, bytes or code object

caused by the line exec(code in mod.__dict__) in "fs/expose/importhook.py" line 186. This is not surprising, because code in mod.__dict__ is False. But what is the meaning of having a check that returns a boolean in a exec-call anway?

This seems looks like a bug to me. Should it have been exec(code)? Here is what I did:

I replaced the existing line with that and now it seems to work for the example. However this approach with exec is only working if the files are .py or .pyc. As I just found it this is actually stated in the documentation

FSImportHook is a module finder and loader that takes its data from an arbitrary FS object. The FS must have .py or .pyc files stored in the standard module structure.

So I am not seeing how this would help with my .pyd object, since I cannot use exec.

Make42 avatar Apr 24 '20 09:04 Make42

The PyFilesytem1 code can be found at https://github.com/PyFilesystem/pyfilesystem but I'm afraid I have no idea about any of the rest of your questions - @willmcgugan may be able to better answer them.

lurch avatar Apr 24 '20 11:04 lurch

You're in for a ride.

.pyd files are Windows DLLs. You need to be able to get Windows to load and initialize a DLL from memory, which to my knowledge cannot be done purely from Python as it requires accessing the Windows API. There's an old project here that uses shellcode to install a memory importer from the py2exe project as a module. Specifically, this file.

The py2exe project hasn't had a commit in over three years so I'm assuming it's dead. Given the age, I don't know if it'll work if you install it as a dependency using a modern version of Python 3. It seems like a pretty heavyweight dependency to install too. Give that a try?

If that doesn't work, try using this. You'd have to compile it and build it yourself and install it/static link it with your program, and then call it using ctypes. A joy, for sure, but possible.

dargueta avatar Apr 24 '20 16:04 dargueta

@dargueta Due to all my research I am aware of [pymemimporter}(https://github.com/n1nj4sec/pymemimporter), but it is only available for 32-bit Windows and Python 2.

The final option that you mentioned [fancycode/MemoryModule] is what pymemimporter is building on. We know a C developer who might help me a bit, but... yeah... what joy.

I might have to consider, dropping the approach to load the .pyd and instead archiving my project putting it into the in-memory fs unpacking it there and then import the project from there using a custom import hook. But for that I would still need a custom import hook, which brings me back to my original question, I guess.

Make42 avatar Apr 25 '20 14:04 Make42

Have you tried this: https://blog.ffledgling.com/python-imports-i.html

You really just need to define a class with two methods, and to put them in the meta path.

althonos avatar Apr 25 '20 15:04 althonos

@althonos I have tried something similar, which was dev.to/dangerontheranger, but by reading the post you suggested, I think, I am getting a better grasp.

A mistake I might have done is changing exec(code in mod.__dict__) into exec(code), because we need to execute the code inside the namespace of mod.__dict__, which the ffledgling post mentions. I am not sure how to do this properly though... What is the correct code here?

If I get this right, I might actually be able to even load the pyd instead of a py later: A colleague figured out how to initialize and run a pyd using its internal C-functions. Maybe we just have to do this instead of using exec when using a pyd.

I am out of office next week, but I continue working on this the week after.

Make42 avatar Apr 26 '20 15:04 Make42

A mistake I might have done is changing exec(code in mod.__dict__) into exec(code)

I suspect the invalid exec(code in mod.__dict__) is probably a case of the 2to3 converter getting confused by Python2-only syntax? The original line was exec code in mod.__dict__ and https://docs.python.org/2/reference/simple_stmts.html#exec suggests that the Python3-compatible version of that would be exec(code, mod.__dict__) ? :shrug: (but that's just a guess, I've not actually tried it!)

lurch avatar Apr 27 '20 10:04 lurch

@lurch exactly ! The signature is basically exec(code, globals=None, locals=None), so by passing mod.__dict__ as the second positional arguments you make the source code execute into it.

In Python 2, exec used to be a keyword, so the syntax was exec <code> in <globals>.

althonos avatar Apr 27 '20 11:04 althonos

Just did a quick search, and it looks like https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_exec.py should be able to correctly convert the Python2 exec-statement to the Python3 exec-function (and that part of 2to3 hasn't changed in 4 years), so I dunno why it didn't work properly for @Make42 :confused: However I have neither the time nor motivation to investigate further :wink:

lurch avatar Apr 27 '20 11:04 lurch

A mistake I might have done is changing exec(code in mod.__dict__) into exec(code)

I suspect the invalid exec(code in mod.__dict__) is probably a case of the 2to3 converter getting confused by Python2-only syntax? The original line was exec code in mod.__dict__ and https://docs.python.org/2/reference/simple_stmts.html#exec suggests that the Python3-compatible version of that would be exec(code, mod.__dict__) ? shrug (but that's just a guess, I've not actually tried it!)

@lurch: Well, originally the line was exec code in mod.__dict__ and considering all your links, this is most likely very true. I will try it next week. Currently I am also not interested in 2to3 in and of itself. :smile: @althonos: Thanks for the additional explanation!

I will write how I fare with this new knowledge.

Make42 avatar Apr 27 '20 15:04 Make42

I managed to get this running for the conventional case, which is to read modules from the PyFilesystem's in-memory file system using PyFilesystem's mechanics and then using exec. This enables me to load .py files.

However, I am not able to simply add the PyFilesystem's in-memory fs into the Python path. The reason why I wanted to do this, was to load .dlls (here .pyd modules built by Nuitka) with the available importers. Since I am not able to fake to Python that PyFilesystem's is part of the PYTHONPATH.

Is this possible?

Make42 avatar May 20 '20 09:05 Make42

This enables me to load .py files.

Hooray! :tada:

Is this possible?

I guess this comes back to @dargueta 's earlier comment where he said ".pyd files are Windows DLLs. You need to be able to get Windows to load and initialize a DLL from memory, which to my knowledge cannot be done purely from Python as it requires accessing the Windows API" ? :shrug:

lurch avatar May 20 '20 10:05 lurch

Don't know if the issue is still relevant for @Make42 but stumbled upon this and wanted to point to: https://github.com/n1nj4sec/pymemimporter For your use case this might be an acceptable solution to import the decrypted pyd in memory by executing ShellCodeMemoryModule (https://blog.didierstevens.com/programs/shellcode/). You can't use a temporary fs path to import pyd without using this kind of tricks because Python is calling LoadLibrary under the hood to import pyds, and LoadLibrary takes a path from the OS filesystem. You could also have a look at this: https://github.com/scythe-io/in-memory-cpython This is another kind of strategy, here the interpreter has been modified to allow importing pyds from memory. Personally I like to use the signed interpreter for my apps but this can be also a good choice even if a bit more complex than the former.

naksyn avatar Nov 12 '22 00:11 naksyn