Some common filetypes are not detected
Pure magic seems to be failing to detect some very common file types, like text files (.py, .txt, .md).
$ file changelog.txt
changelog.txt: ASCII English text
$ python3.6 -m puremagic ./changelog.txt
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py:125:
RuntimeWarning: 'puremagic.__main__' found in sys.modules after import of package
'puremagic', but prior to execution of 'puremagic.__main__'; this may result in
unpredictable behaviour
warn(RuntimeWarning(msg))
'./changelog.txt' : could not be Identified
$ python3.6 -m puremagic -m ./changelog.txt
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py:125:
RuntimeWarning: 'puremagic.__main__' found in sys.modules after import of package
'puremagic', but prior to execution of 'puremagic.__main__'; this may result in
unpredictable behaviour
warn(RuntimeWarning(msg))
'./changelog.txt' : could not be Identified
You are correct, it is not able to detect these as those file types do not have file magic numbers for file detection and require additional analytics for a best guess that I have not written.
For example it does support Python files with their first line formatted as '#!/usr/bin/env python', whereas it would be better to upgrade this module to do some loser matching or some analytics to give more / better results. (Already tried to capture this idea in https://github.com/cdgriffith/puremagic/issues/3 but better spelled out with your example)
I don't have the time currently to work on it, but I at least remember how I thought about implementing I will capture in this issue:
- Create a subdirectory where 'detectors' live
- If detectors are enabled (probably by default) will load all files from that directory
- Each detector has a standard format / entry point that will be called against each file
- Each detector is for a specific file type and will return its confidence and filetype information