ER: Potential XSS Vulnerability in wins.js
Emergent Requirement - Problem
I originally submitted this as a security advisory, but @roslynwythe said there isn't a way to convert it to an issue so I'm reposting it here
Original text:
Summary
The usage of innerHTML to dynamically modify DOM elements in wins.js may introduce a vulnerability to Cross-Site Scripting (XSS) attacks if wins-data.json is compromised, not properly sanitized, or not adequately vetted.
Details
The vulnerability exists on the following lines of code where DOM elements are dynamically modified using external data.
https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L339 https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L340 https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L498 https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L501 https://github.com/hackforla/website/blob/gh-pages/assets/js/wins.js#L504
To address this vulnerability, it is recommended to avoid using
innerHTMLand prefer safer DOM manipulation methods likecreateElementandtextContentAdditionally, the unformatted nature of
wins-data.jsoncan make it difficult to spot changes in the file with a git diff. Consider reformatting the JSON file to be more human-readable and version control-friendly.PoC
The vulnerability audit states that "User input strings remain strings and escape injection through
decodeURIComponent()" however, in my testing this is not the case and strings parsed throughdecodeURIComponent()are still susceptible to XSS. Proof of concept: https://codepen.io/jaasonw/pen/GRPBzGJImpact
Potential defacement of the website or visitors being subject to phishing attacks.
Issue you discovered this emergent requirement in
- Originally discovered while reviewing https://github.com/hackforla/website/pull/5258, however it was not introduced by that PR
Date discovered
10/1/2023
Did you have to do something temporarily
- [ ] YES
- [x] NO
Who was involved
@
What happens if this is not addressed
Potential defacement of the website or visitors being subject to phishing attacks.
Resources
https://developer.mozilla.org/en-US/docs/Web/API/Document/createElement https://developer.mozilla.org/en-US/docs/Web/API/Node/textContent https://medium.com/front-end-weekly/javascript-innerhtml-innertext-and-textcontent-b75ec895cbe3 https://jekyllrb.com/docs/datafiles/
Recommended Action Items
- [ ] Make a new issue
- [ ] Discuss with team
- [x] Let a Team Lead know
Potential solutions [draft]
option 1
Update code that does DOM manipulation. Instead of using
innerHTML, we should update the code to use safer DOM manipulation methods likecreateElementandtextContent
option 2 Alternatively, I noticed a lot of the data on the website is loaded in on page load with javascript rather than at build time. Seeing as we already use Jekyll, I suggest using its built-in functionality with the strip_html filter to load data into the HTML at build time statically, as this has benefits to both page load times and accessibility benefits to visitors to the site that have javascript disabled
Hi @jaasonw.
Please don't forget to add the proper labels to this issue. Currently, the labels for the following are missing: Complexity, Role, Feature
NOTE: Please ignore the adding proper labels comment if you do not have 'write' access to this directory.
To add a label, take a look at Github's documentation here.
Also, don't forget to remove the "missing labels" afterwards. To remove a label, the process is similar to adding a label, but you select a currently added label to remove it.
After the proper labels are added, the merge team will review the issue and add a "Ready for Prioritization" label once it is ready for prioritization.
Additional Resources:
- Great work @jaasonw pointing out several excellent suggestions. Regarding the XSS vulnerability, would you be willing to write an issue to update wins.js? Regarding the formatting of _wins-data.json, I agree completely and have written #4035 for that purpose. If you have any comments on that issue, they would be welcome.
Finally regarding your suggestion to load wins data at build time rather than on page load, I will discuss that with Bonnie to see if she is interested in pursuing that. Thanks again for your contributions; we are very happy that you are on the website team!
@jaasonw Regarding your comment
Alternatively, I noticed a lot of the data on the website is loaded in on page load with javascript rather than at build time. Seeing as we already use Jekyll, I suggest using its built-in functionality with the strip_html filter to load data into the HTML at build time statically, as this has benefits to both page load times and accessibility benefits to visitors to the site that have javascript disabled
I wonder if you could indicate which pages load data on page load. I checked wins.js and project.js and both load data at build time using a liquid assign tag.
@roslynwythe Sorry, I should clarify that by "load data" I meant the actual creation of DOM elements based on the data. Currently, data is retrieved when the visitor loads the page, and client-side javascript is used to generate the DOM elements. This becomes apparent when Javascript is disabled when visiting the website. When Javascript is disabled, the following pages fail to render properly:
The events page does not display meeting times
Projects on the project page do not render
Wins on the wins page do not render
@jaasonw Thanks for the explanation.
We are open to the idea of using safer DOM manipulation methods like createElement and textContent in place of innerHTML. We should start with a relatively simple page, perhaps Events. We have been planning to update the code in the Events page, because meeting data is being retrieved JS via an API call to VRMS instead of using _data/external/vrms-data.json, which is obviously inefficient. So one option would be to write an issue for making both changes: using Jekyll to assign data from _data/external/vrms-data.json and also avoiding use of innerHTML in favor of the safer alternatives you mention. Or we could make those two separate issues.
But it sounds like alternatively, we could completely eliminate the JS that generates the DOM, instead using Jekyll to build the complete HTML at build time. And I assume that by using strip_html, that eliminates any risk of injection in the case that the JSON data source was corrupted?
Could you offer pros and cons for that alternative? Or should we create an issue to explore those pros and cons?
We really appreciate your contributions and look forward to working with you further on these issues.
Make an issue to do option 1 so that we reduce vulnerability immediately and silence the warning.
Make an epic that looks into the refactoring option and identifies all the places on the site that it would be required.
Hi @freaky4wrld, thank you for taking up this issue! Hfla appreciates you :)
Do let fellow developers know about your:- i. Availability: (When are you available to work on the issue/answer questions other programmers might have about your issue?) ii. ETA: (When do you expect this issue to be completed?)
You're awesome!
P.S. - You may not take up another issue until this issue gets merged (or closed). Thanks again :)
- Resolved this ER by creating issue #6303
@roslynwythe This ER is only partially resolved, we still need to do
Make an epic that looks into the refactoring option and identifies all the places on the site that it would be required.
- See #6380
@freaky4wrld I am still confused. It looks like this issues are both addressing the change of using createElement and textContent instead of innerHTML
- #6303
- #6380
And I what I was saying was missing was the issue or epic to do option 2 after option was complete.
Alternatively, I noticed a lot of the data on the website is loaded in on page load with javascript rather than at build time. Seeing as we already use Jekyll, I suggest using its built-in functionality with the strip_html filter to load data into the HTML at build time statically, as this has benefits to both page load times and accessibility benefits to visitors to the site that have javascript disabled
Basically, option 1 is the fast, temporary fix, option 2 is the long term solution.
So can you make the issues or epics for option 2 and we can put a dependency on them of the two above issues being complete. Or is there something else I don't understand.
@ExperimentsInHonesty the issue #6303, is the issue for the fast and temporary fix that we require for the ER
- the epic is aimed to do an audit, but we might not require it as @roslynwythe suggests we have full control over the html files
- the issue is aimed to remove
innerHTMLwithtextContentandcreateElementwhere needed - we should discuss the scope of the epic
So can you make the issues or epics for option 2 and we can put a dependency on them of the two above issues being complete. Or is there something else I don't understand.
Yes we would be making those issues.....
@freaky4wrld - Bonnie is suggesting that in addition to refactoring the use of innerHTML where needed, we also write an epic to explore option 2:
Alternatively, I noticed a lot of the data on the website is loaded in on page load with javascript rather than at build time. Seeing as we already use Jekyll, I suggest using its built-in functionality with the strip_html filter to load data into the HTML at build time statically, as this has benefits to both page load times and accessibility benefits to visitors to the site that have javascript disabled