pyOneNote icon indicating copy to clipboard operation
pyOneNote copied to clipboard

Fix issues with JSON format and handling of non-existing file

Open Matmaus opened this issue 2 years ago • 3 comments

Hi, I stumbled upon three issues. They are independent, but I think they are too small, so making three separate PRs seems unnecessary to me.

  1. JSON can not be printed process_onenote_file does not print document in JSON if json_output parameter is set. Instead, it always returns it. This return value is not even used in the main. This fork fixes use of json_output parameter. If True, process_onenote_file will print document in JSON form instead of the default text form. The default text form will be used otherwise. No value will be returned.
    $ git checkout 74d7b05  # upstream main
    $ pyonenote --json --file <any_onenote_file>
    $ # nothing
    $ git checkout e363ca6  # forked main
    $ pyonenote --json --file <any_onenote_file>
    {"headers": {"guidFileType": ...
    
  2. Fix typo OneDocment -> OneDocument I think it was not intended at least based on the module's name.
  3. Fix handling of non-existing file
    $ git checkout 74d7b05  # upstream main
    $ pyonenote --file nonexisting_file
    Traceback (most recent call last):
      File "/home/user/pyOneNote/virtualenv/bin/pyonenote", line 33, in <module>
        sys.exit(load_entry_point('pyOneNote', 'console_scripts', 'pyonenote')())
      File "/home/user/pyOneNote/pyOneNote/Main.py", line 84, in main
        sys.exit("File: %s doesn't exist", args.file)
    TypeError: exit expected at most 1 argument, got 2
    $ git checkout e363ca6  # forked main
    $ pyonenote --file nonexisting_file
    File: 'nonexisting_file' doesn't exist
    

Matmaus avatar Mar 06 '23 11:03 Matmaus

This JSON object is meant to be used in another application in which pyonenote is called as a library. However, I agree because it is a switch, user should see something when they use it. I will try to think about this a little more to see how we can handle both scenarios.

DissectMalware avatar Mar 09 '23 19:03 DissectMalware

If I may, I would suggest to always return either data (dictionary object) or event better document (OneDocument object). Then any user can easily convert it either into JSON (document.get_json()) or into string (json.dumps(document.get_json())). I think it is better to let an user decide.

Also, I think it could be better to have a separate function which will be expected to be used as API. Some function which never print anything. For example (only first ~11 lines are modified):

def process_onenote_file(file):
    if not check_valid(file):
        log.error("please provide valid One file")
        exit()

    file.seek(0)
    return OneDocument(file)

def process_and_print_onenote_file(file, output_dir, extension, json_output):
    document = process_onenote_file(file)
    data = document.get_json()
    if not json_output:
        print('Headers\n####################################################################')
        indent = '\t'
        for key, header in data['headers'].items():
            print('{}{}: {}'.format(indent, key, header))
        print('\n\nProperties\n####################################################################')
        indent = '\t'
        for propertySet in data['properties']:
            print('{}{}:'.format(indent, propertySet['type']))
            for property_name, property_val in propertySet['val'].items():
                print('{}{}: {}'.format(indent+'\t', property_name, str(property_val)))
            print("")
        print('\n\nEmbedded Files\n####################################################################')
        indent = '\t'
        for name, file in data['files'].items():
            print('{}{}:'.format(indent, name))
            print('\t{}Extension: {}'.format(indent, file['extension']))
            print('{}'.format( get_hex_format(file['content'][:256], 16, indent+'\t')))
        if extension and not extension.startswith("."):
            extension = "." + extension
        counter = 0
        for file_guid, file in document.get_files().items():
            with open(
                    os.path.join(output_dir,
                                 "file_{}{}{}".format(counter, file["extension"], extension)), "wb"
            ) as output_file:
                output_file.write(file["content"])
            counter += 1
    else:
        print(json.dumps(data))

Matmaus avatar Mar 09 '23 19:03 Matmaus

I think it is a good suggestion, will refactor the code based on your suggestion

DissectMalware avatar Jan 29 '24 18:01 DissectMalware