magika
magika copied to clipboard
Add suffix as output
Hi i just perfect timing.I just restored "some" files (40GB) with [1] But the filetype detection of photorec set some wrong file types. It would be super nice if the program would show me the "suffix" (like -i) aka ".md, .py, .rst, ..." so i can move the file to a folder.
[1] https://www.cgsecurity.org/wiki/PhotoRec
Current output (-i):
❯ magika -r /opt/SORTED/TXT -i
/opt/SORTED/TXT/1/100464.txt: text/plain
/opt/SORTED/TXT/1/101476.txt: text/x-c
/opt/SORTED/TXT/1/101485.txt: text/x-c
/opt/SORTED/TXT/1/101565.txt: text/x-c
/opt/SORTED/TXT/1/101700.txt: text/plain
/opt/SORTED/TXT/1/101729.txt: text/plain
/opt/SORTED/TXT/1/101786.txt: text/x-asm
/opt/SORTED/TXT/1/101812.txt: text/x-asm
/opt/SORTED/TXT/1/101941.txt: text/x-asm
/opt/SORTED/TXT/1/105997.txt: text/x-makefile
/opt/SORTED/TXT/1/107439.txt: text/plain
/opt/SORTED/TXT/1/108033.txt: text/plain
/opt/SORTED/TXT/1/109413.txt: text/markdown
/opt/SORTED/TXT/1/111266.txt: application/javascript
/opt/SORTED/TXT/1/114086.txt: text/x-python
Magika does have a list of expected extensions/suffixes for each Content type in Python config: https://github.com/google/magika/blob/main/python/magika/config/content_types_config.json
I made a PR to add the ability to export the expected extensions as output, with the arg -e
https://github.com/google/magika/pull/78