git_diff_xlsx icon indicating copy to clipboard operation
git_diff_xlsx copied to clipboard

replaced xlrd with openpyxl to display formulae

Open emorisse opened this issue 10 years ago • 3 comments

Successfully changed git_diff_xlsx.py. Attempted to change readme.md and setup.py.

emorisse avatar Jul 13 '15 17:07 emorisse

Fantastic work! I have made minor changes to the script to work with new versions of openpyxl:

# Converts a Microsoft Excel 2007+ file into plain text
# for comparison using git diff
#
# Instructions for setup:
# 1. Place this file in a folder
# 2. Add the following line to the global .gitconfig:
#     [diff "zip"]
#   	    binary = True
#	    textconv = python c:/path/to/git_diff_xlsx.py
# 3. Add the following line to the repository's .gitattributes
#    *.xlsx diff=zip
# 4. Now, typing [git diff] at the prompt will produce text versions
# of Excel .xlsx files
#
# Copyright William Usher 2013
# Contact: [email protected]
#

import sys
import openpyxl as xl
from openpyxl import load_workbook

def parse(infile,outfile):
    """
    Converts an Excel file into text
    Returns a formatted text file for comparison using git diff.
    """

    book = xl.load_workbook(infile)

    num_sheets = book.get_sheet_names()

    print num_sheets

#   print "File last edited by " + book.user_name + "\n"
    if ( book.properties.lastModifiedBy ):
        outfile.write("File last edited by " + book.properties.lastModifiedBy + "\n")

    # loop over worksheets

    for index in book.get_sheet_names():
        # find non empty cells
        sheet = book.get_sheet_by_name(index)
        outfile.write("=================================\n")
        outfile.write("Sheet: " + index + "[ " + str(sheet.max_row) + " , " + str(sheet.max_column) + " ]\n")
        outfile.write("=================================\n")
        for row in range(1,sheet.max_row+1):
            for col in range(1,sheet.max_column+1):
                content = sheet.cell(column=col, row=row).value
                if content:	
                    outfile.write("    " + xl.utils.get_column_letter(col) + str(row) + ": " + unicode(content) + "\n")
                    #outfile.write("    " + unicode(xl.utils.get_column_letter(col)) + row + ": " + unicode(content) + "\n")
        print "\n"

# output cell address and contents of cell
def main():
    args = sys.argv[1:]
    if len(args) != 1:
        print 'usage: python git_diff_xlsx.py infile.xlsx'
        sys.exit(-1)
    outfile = sys.stdout
    parse(args[0],outfile)

if __name__ == '__main__':
    main()

ghost avatar Sep 12 '19 10:09 ghost

Awesome! Anything we can do to work through the CI/CD build and resolve the conflicts so this can get released? There are several projects I'm working on that I'd love to import the latest version on.

djfurman avatar Sep 15 '19 16:09 djfurman

I'm not quite familiar with the tests and the error codes. If you can identify the problem, that would be great 😁

I did however revise the code once more. I was working with a large spreadsheet (multiple sheets, many rows and columns) and the script was very slow (2min 40sec). Turns out you can really speed it up by iterating directly on rows and columns. Only takes a couple seconds now 😄

# Converts a Microsoft Excel 2007+ file into plain text
# for comparison using git diff
#
# Instructions for setup:
# 1. Place this file in a folder
# 2. Add the following line to the global .gitconfig:
#     [diff "zip"]
#   	    binary = True
#	    textconv = python c:/path/to/git_diff_xlsx.py
# 3. Add the following line to the repository's .gitattributes
#    *.xlsx diff=zip
# 4. Now, typing [git diff] at the prompt will produce text versions
# of Excel .xlsx files
#
# Copyright William Usher 2013
# Contact: [email protected]
#

import sys
import datetime
import openpyxl as xl
from openpyxl import load_workbook
from string import ascii_uppercase

def parse(infile,outfile):
    """
    Converts an Excel file into text
    Returns a formatted text file for comparison using git diff.
    """

    book = xl.load_workbook(infile)

    num_sheets = book.get_sheet_names()

    print num_sheets

#   print "File last edited by " + book.user_name + "\n"
    if ( book.properties.lastModifiedBy ):
        outfile.write("File last edited by " + book.properties.lastModifiedBy + "\n")

    # loop over worksheets

    for index in book.get_sheet_names():
        # find non empty cells
        sheet = book.get_sheet_by_name(index)
        outfile.write("=================================\n")
        outfile.write("Sheet: " + index + "[ " + str(sheet.max_row) + " , " + str(sheet.max_column) + " ]\n")
        outfile.write("=================================\n")
        for row in sheet.rows:
            for cell in row:
                content = cell.value
                if content:
                    outfile.write("    " + cell.coordinate + ": " + unicode(content) + "\n")
        outfile.write("\n\n")

# output cell address and contents of cell
def main():
    args = sys.argv[1:]
    if len(args) != 1:
        print 'usage: python git_diff_xlsx.py infile.xlsx'
        sys.exit(-1)
    outfile = sys.stdout
    parse(args[0],outfile)

if __name__ == '__main__':
    main()

ghost avatar Sep 16 '19 09:09 ghost