Include in-memory fs into python path
I would like to write a module, that I decrypt during program execution, into an in-memory filesystem, so that he decrypted file is not on disk. Then I would like to import that module. How can I add the path of the file in the in-memory fs of pyfilesystem2 to the pythonpath? Usually I would do something like
import sys sys.path.append('path/to/module')
but I don't see how this would work here.
Your best bet is probably to write a custom importer here, like defined in PEP 302, that supports a pyfilesystem.
I looked into that, but I am a bit out of my depth here and there seems to be no tutorial how to approach this. Do you know any tutorial except the offical documentation (which is either not deep enough or too deep) to help me solve this?
So it sounds like you 'only' want to be able to load a python module, which doesn't exist as plain-text on disk? It's something I've never looked into myself (so can't offer any advice), but from doing a quick search it looks like @althonos 's has suggested the best approach. However, why does pyfilesystem need to be involved? Might be easier to just have your custom importer do the on-the-fly decryption?
Ooh! I just remembered about this in the old (deprecated) PyFilesystem1 - maybe that'll provide a good starting point for you?
@Make42 An import hook is probably the way to go, and the code @lurch found is a good starting point if you want to port that to fs2.
Another option if the code is a single self-contained file would be to read the code in to memory and exec it. Of course, you would have to be absolutely certain that you can trust the code, but it would be no worse than importing it.
Yet another option that may or may not fit your use case, is to register a custom code with the codecs module. You can then add an encoding declaration to your code, e.g. # coding=encrypted, and let Python invoke the decryption for you.
Something else to bear in mind when dealing with "security sensitive code" is ensuring that you don't lull yourself into a false sense of security :police_officer: E.g. even if "the decrypted file is not on disk" that doesn't do you much good if the user is able to gain access to the encryption key and "manually" decrypt the encrypted file :wink:
I read https://dev.to/dangerontheranger/dependency-injection-with-import-hooks-in-python-3-5hap but this does not solve the issue: Here the author build and import hook and "imports" a class that has been defined as code.
Ooh! I just remembered about this in the old (deprecated) PyFilesystem1 - maybe that'll provide a good starting point for you?
Thanks! I will look into it. Is this still available in PyFilesystem2? Is the documentation still valid?
However, why does pyfilesystem need to be involved? Might be easier to just have your custom importer do the on-the-fly decryption?
@lurch Yes you are right. What I am doing is to have companion file (let's not worry about the accessability of this one for the sake of argument for now) which is able to decrpyt the file. Now the file is in-memory as a Python file object myfile as in
with open('myfile.epyd') as file:
myfile = decrypt(file)
Here the myfile.epyd is a pyd-file that I encrypted and put on disk. There is also a file myfile.pyi. If I do not encrypt, I would have the two file myfile.pyi and myfile.pyd and I would simply import with import myfile. But since I encrypted, I have this object myfile which I want to import but do not know how. So, my idea is to write it into an in-memory fs as myfile.pyd together with myfile.pyi and give the import system the chance to find those files in the in-memory fs.
I created the pyd and pyi file with Nuitka, btw.
Another option if the code is a single self-contained file would be to read the code in to memory and exec it. Of course, you would have to be absolutely certain that you can trust the code, but it would be no worse than importing it.
@willmcgugan Because of what I wrote earlier, it is not a string, but a pyd file, I am not able to use exec. I considered that as well.
Yet another option that may or may not fit your use case, is to register a custom code with the codecs module. You can then add an encoding declaration to your code, e.g. # coding=encrypted, and let Python invoke the decryption for you.
I was not aware of the codecs module. Might have to look into it.
E.g. even if "the decrypted file is not on disk" that doesn't do you much good if the user is able to gain access to the encryption key and "manually" decrypt the encrypted file 😉
@lurch: Yes, I am aware of that :-).
@lurch How can I donwload and install the original PyFilesystem - from the https://pypi.org/project/fs1/ repository? Afterall, PyFilesystem2 does not have this hook. EDIT: I think I got this part work:
I installed via pip install fs1, then changed the project using 2to3, but now I get
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Program Files\JetBrains\PyCharm 2019.1.3\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "C:\Users\richard.leibrandt\AppData\Local\Continuum\miniconda3\envs\Klassierung\lib\site-packages\fs1\expose\importhook.py", line 186, in load_module
exec(code in mod.__dict__)
TypeError: exec() arg 1 must be a string, bytes or code object
caused by the line exec(code in mod.__dict__) in "fs/expose/importhook.py" line 186. This is not surprising, because code in mod.__dict__ is False. But what is the meaning of having a check that returns a boolean in a exec-call anway?
This seems looks like a bug to me. Should it have been exec(code)? Here is what I did:
I replaced the existing line with that and now it seems to work for the example. However this approach with exec is only working if the files are .py or .pyc. As I just found it this is actually stated in the documentation
FSImportHook is a module finder and loader that takes its data from an arbitrary FS object. The FS must have .py or .pyc files stored in the standard module structure.
So I am not seeing how this would help with my .pyd object, since I cannot use exec.
The PyFilesytem1 code can be found at https://github.com/PyFilesystem/pyfilesystem but I'm afraid I have no idea about any of the rest of your questions - @willmcgugan may be able to better answer them.
You're in for a ride.
.pyd files are Windows DLLs. You need to be able to get Windows to load and initialize a DLL from memory, which to my knowledge cannot be done purely from Python as it requires accessing the Windows API. There's an old project here that uses shellcode to install a memory importer from the py2exe project as a module. Specifically, this file.
The py2exe project hasn't had a commit in over three years so I'm assuming it's dead. Given the age, I don't know if it'll work if you install it as a dependency using a modern version of Python 3. It seems like a pretty heavyweight dependency to install too. Give that a try?
If that doesn't work, try using this. You'd have to compile it and build it yourself and install it/static link it with your program, and then call it using ctypes. A joy, for sure, but possible.
@dargueta Due to all my research I am aware of [pymemimporter}(https://github.com/n1nj4sec/pymemimporter), but it is only available for 32-bit Windows and Python 2.
The final option that you mentioned [fancycode/MemoryModule] is what pymemimporter is building on. We know a C developer who might help me a bit, but... yeah... what joy.
I might have to consider, dropping the approach to load the .pyd and instead archiving my project putting it into the in-memory fs unpacking it there and then import the project from there using a custom import hook. But for that I would still need a custom import hook, which brings me back to my original question, I guess.
Have you tried this: https://blog.ffledgling.com/python-imports-i.html
You really just need to define a class with two methods, and to put them in the meta path.
@althonos I have tried something similar, which was dev.to/dangerontheranger, but by reading the post you suggested, I think, I am getting a better grasp.
A mistake I might have done is changing exec(code in mod.__dict__) into exec(code), because we need to execute the code inside the namespace of mod.__dict__, which the ffledgling post mentions. I am not sure how to do this properly though... What is the correct code here?
If I get this right, I might actually be able to even load the pyd instead of a py later: A colleague figured out how to initialize and run a pyd using its internal C-functions. Maybe we just have to do this instead of using exec when using a pyd.
I am out of office next week, but I continue working on this the week after.
A mistake I might have done is changing
exec(code in mod.__dict__)intoexec(code)
I suspect the invalid exec(code in mod.__dict__) is probably a case of the 2to3 converter getting confused by Python2-only syntax? The original line was exec code in mod.__dict__ and https://docs.python.org/2/reference/simple_stmts.html#exec suggests that the Python3-compatible version of that would be exec(code, mod.__dict__) ? :shrug: (but that's just a guess, I've not actually tried it!)
@lurch exactly ! The signature is basically exec(code, globals=None, locals=None), so by passing mod.__dict__ as the second positional arguments you make the source code execute into it.
In Python 2, exec used to be a keyword, so the syntax was exec <code> in <globals>.
Just did a quick search, and it looks like https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_exec.py should be able to correctly convert the Python2 exec-statement to the Python3 exec-function (and that part of 2to3 hasn't changed in 4 years), so I dunno why it didn't work properly for @Make42 :confused: However I have neither the time nor motivation to investigate further :wink:
A mistake I might have done is changing
exec(code in mod.__dict__)intoexec(code)I suspect the invalid
exec(code in mod.__dict__)is probably a case of the 2to3 converter getting confused by Python2-only syntax? The original line wasexec code in mod.__dict__and https://docs.python.org/2/reference/simple_stmts.html#exec suggests that the Python3-compatible version of that would beexec(code, mod.__dict__)? shrug (but that's just a guess, I've not actually tried it!)
@lurch: Well, originally the line was exec code in mod.__dict__ and considering all your links, this is most likely very true. I will try it next week. Currently I am also not interested in 2to3 in and of itself. :smile:
@althonos: Thanks for the additional explanation!
I will write how I fare with this new knowledge.
I managed to get this running for the conventional case, which is to read modules from the PyFilesystem's in-memory file system using PyFilesystem's mechanics and then using exec. This enables me to load .py files.
However, I am not able to simply add the PyFilesystem's in-memory fs into the Python path. The reason why I wanted to do this, was to load .dlls (here .pyd modules built by Nuitka) with the available importers. Since I am not able to fake to Python that PyFilesystem's is part of the PYTHONPATH.
Is this possible?
This enables me to load .py files.
Hooray! :tada:
Is this possible?
I guess this comes back to @dargueta 's earlier comment where he said ".pyd files are Windows DLLs. You need to be able to get Windows to load and initialize a DLL from memory, which to my knowledge cannot be done purely from Python as it requires accessing the Windows API" ? :shrug:
Don't know if the issue is still relevant for @Make42 but stumbled upon this and wanted to point to: https://github.com/n1nj4sec/pymemimporter For your use case this might be an acceptable solution to import the decrypted pyd in memory by executing ShellCodeMemoryModule (https://blog.didierstevens.com/programs/shellcode/). You can't use a temporary fs path to import pyd without using this kind of tricks because Python is calling LoadLibrary under the hood to import pyds, and LoadLibrary takes a path from the OS filesystem. You could also have a look at this: https://github.com/scythe-io/in-memory-cpython This is another kind of strategy, here the interpreter has been modified to allow importing pyds from memory. Personally I like to use the signed interpreter for my apps but this can be also a good choice even if a bit more complex than the former.