replaced xlrd with openpyxl to display formulae
Successfully changed git_diff_xlsx.py. Attempted to change readme.md and setup.py.
Fantastic work! I have made minor changes to the script to work with new versions of openpyxl:
# Converts a Microsoft Excel 2007+ file into plain text
# for comparison using git diff
#
# Instructions for setup:
# 1. Place this file in a folder
# 2. Add the following line to the global .gitconfig:
# [diff "zip"]
# binary = True
# textconv = python c:/path/to/git_diff_xlsx.py
# 3. Add the following line to the repository's .gitattributes
# *.xlsx diff=zip
# 4. Now, typing [git diff] at the prompt will produce text versions
# of Excel .xlsx files
#
# Copyright William Usher 2013
# Contact: [email protected]
#
import sys
import openpyxl as xl
from openpyxl import load_workbook
def parse(infile,outfile):
"""
Converts an Excel file into text
Returns a formatted text file for comparison using git diff.
"""
book = xl.load_workbook(infile)
num_sheets = book.get_sheet_names()
print num_sheets
# print "File last edited by " + book.user_name + "\n"
if ( book.properties.lastModifiedBy ):
outfile.write("File last edited by " + book.properties.lastModifiedBy + "\n")
# loop over worksheets
for index in book.get_sheet_names():
# find non empty cells
sheet = book.get_sheet_by_name(index)
outfile.write("=================================\n")
outfile.write("Sheet: " + index + "[ " + str(sheet.max_row) + " , " + str(sheet.max_column) + " ]\n")
outfile.write("=================================\n")
for row in range(1,sheet.max_row+1):
for col in range(1,sheet.max_column+1):
content = sheet.cell(column=col, row=row).value
if content:
outfile.write(" " + xl.utils.get_column_letter(col) + str(row) + ": " + unicode(content) + "\n")
#outfile.write(" " + unicode(xl.utils.get_column_letter(col)) + row + ": " + unicode(content) + "\n")
print "\n"
# output cell address and contents of cell
def main():
args = sys.argv[1:]
if len(args) != 1:
print 'usage: python git_diff_xlsx.py infile.xlsx'
sys.exit(-1)
outfile = sys.stdout
parse(args[0],outfile)
if __name__ == '__main__':
main()
Awesome! Anything we can do to work through the CI/CD build and resolve the conflicts so this can get released? There are several projects I'm working on that I'd love to import the latest version on.
I'm not quite familiar with the tests and the error codes. If you can identify the problem, that would be great 😁
I did however revise the code once more. I was working with a large spreadsheet (multiple sheets, many rows and columns) and the script was very slow (2min 40sec). Turns out you can really speed it up by iterating directly on rows and columns. Only takes a couple seconds now 😄
# Converts a Microsoft Excel 2007+ file into plain text
# for comparison using git diff
#
# Instructions for setup:
# 1. Place this file in a folder
# 2. Add the following line to the global .gitconfig:
# [diff "zip"]
# binary = True
# textconv = python c:/path/to/git_diff_xlsx.py
# 3. Add the following line to the repository's .gitattributes
# *.xlsx diff=zip
# 4. Now, typing [git diff] at the prompt will produce text versions
# of Excel .xlsx files
#
# Copyright William Usher 2013
# Contact: [email protected]
#
import sys
import datetime
import openpyxl as xl
from openpyxl import load_workbook
from string import ascii_uppercase
def parse(infile,outfile):
"""
Converts an Excel file into text
Returns a formatted text file for comparison using git diff.
"""
book = xl.load_workbook(infile)
num_sheets = book.get_sheet_names()
print num_sheets
# print "File last edited by " + book.user_name + "\n"
if ( book.properties.lastModifiedBy ):
outfile.write("File last edited by " + book.properties.lastModifiedBy + "\n")
# loop over worksheets
for index in book.get_sheet_names():
# find non empty cells
sheet = book.get_sheet_by_name(index)
outfile.write("=================================\n")
outfile.write("Sheet: " + index + "[ " + str(sheet.max_row) + " , " + str(sheet.max_column) + " ]\n")
outfile.write("=================================\n")
for row in sheet.rows:
for cell in row:
content = cell.value
if content:
outfile.write(" " + cell.coordinate + ": " + unicode(content) + "\n")
outfile.write("\n\n")
# output cell address and contents of cell
def main():
args = sys.argv[1:]
if len(args) != 1:
print 'usage: python git_diff_xlsx.py infile.xlsx'
sys.exit(-1)
outfile = sys.stdout
parse(args[0],outfile)
if __name__ == '__main__':
main()