csv format
It looks like the original author had ideas for other output formats other than plain text. I see HTML as one format in the code.
I was curious how hard it would be to add CSV? It appears I could copy _write_plain_text_report in reporter.py and tweak?
I'm tinkering with the code now and if I come up with anything will send it back.
Hi, yes, CSV would make a lot of sense, thanks!
A few guidelines if you wish to contribute to this one (otherwise I could write it, but not this week):
- use the python csv module (it will handle automatic quoting gracefully)
- if the output format is csv, the program should output csv to all output (email, console, file) to keep things simple.
- it would be nice to create an attachment to the email, but that is something I could add later on (e.g., for now, the csv could be just in the body of the email)
If you have other thoughts or questions, do not hesitate to post them. Thanks!
I'm hacking on the CSV output but being new to Python not sure about a few things.
I basically copied _write_plain_text_report to _write_csv_report and have been modifying from there...
I'm not sure how best to integrate with 'output_files'. I can hack up something with oprint:
oprint("STATUS,PAGE,PARENT",files=output_files)
for page in pages.values():
oprint("{0},{1},".format(
page.get_status_message(), page.url_split.geturl()),
files=output_files)
for source in page.sources:
oprint(",,{0}".format(source.origin.geturl()), files=output_files)
But as you mentioned we should probably use the CSV module:
f = open('/home/jpriest/wwwroot/pylinkvalidator3/pylinkvalidator/bin/test.csv', "wb")
writer = csv.writer(f)
writer.writerow( ('Status', 'Page', 'Parent Page') )
for page in pages.values():
writer.writerow( (page.get_status_message(), page.url_split.geturl()) )
for source in page.sources:
writer.writerow(('','',source.origin.geturl()))
f.close()
But it seems like I should integrate with output_file(s) somehow as that is used everywhere else.
Can you offer some guidance on how I might proceed? :)
Hi Jim, here are a few stubs to get you started. Your last snippet looks promising:
# in report(...)
if config.options.format == FORMAT_PLAIN:
_write_plain_text_report(site, config, output_files, total_time)
elif config.options.format == FORMAT_CSV:
_write_csv_report(site, config, output_files, total_time)
def _write_csv_report(...):
csv_writers = [csv.writer(output_file) for output_file in output_files]
# maybe write a first row/header here
for page in pages.values():
# here we just output the result of each url
# if we want to include the source of each url, i guess we would
# need to repeat the result and the original url on each row as if we had
# denormalized/joined multiple database tables
writerow([page.get_status_message(), page.url_split.geturl()], writers=csv_writers)
def writerow(row, writers):
for writer in writers:
writer.writerow(row)