Invalid html in BillStatus documents
Hello GovInfo!
We're seeing about 2% of the BILLSTATUS documents that we examine have faulty HTML in the <billSummaries> section. For example:
https://www.govinfo.gov/bulkdata/BILLSTATUS/117/s/BILLSTATUS-117s294.xml
The <billSummaries> section has unclosed <p> tags. Some of the <p> tags have corresponding </p> tag, but others do not. Any idea why this is? Can anything be done about it?
Thanks! -Evan
@evan-benoit - updated your comment to include code fencing around the tags.
I'm looking into this. If you can provide a few additional example IDs, that will help me to investigate. My initial thinking is that this is in the source data.
Sure, here's a few other examples, all with unmatched <p> tags
- https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr7.xml
- https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr1037.xml
- https://www.govinfo.gov/bulkdata/BILLSTATUS/117/hr/BILLSTATUS-117hr78.xml
I'm finding this problem in about ~2% of the BILLSTATUS documents.
Thank you -- the team that helps supply this is aware of the issue and working to address it by replacing a legacy system. I don't know the exact timeline for this to be completed.
Thanks, I appreciate the speedy response!
As an update, this is still in work upstream of us. This is being tracked by the Library of Congress here: https://github.com/LibraryOfCongress/api.congress.gov/issues/2
I am closing the issue here because it will end up being resolved upstream and then we will update our BILLSTATUS and BILLSUM files.