ex_sponge
ex_sponge copied to clipboard
A real-time HTML filter and WYSIWYG / Microsoft Word / Rich Text editor cleanup plugin for ExpressionEngine
h1. ExSponge
A real-time HTML filtering and WYSIWYG / Microsoft Word / Rich Text editor cleanup plugin for "ExpressionEngine":http://www.expressionengine.com v2
h2. Info
This plugin cleans up the mess your clients (and other filters) leave behind!
Whether your markup was originally entered via WYSIWYG (Rich Text) editors (such as TinyMCE, CKEditor, FCKEditor, Expresso, Wyvern, Wygwam, Blogger's online editor, or ExpressionEngine's own built-in Rich Text Editor), pasted in from Microsoft Word or Adobe InDesign, or bulk-imported from XML via WordPress or Blogger or another CMS, ExSponge leaves it properly formatted and free of layout-breaking cruft.
It will also optionally remove all tags, or keep only the tags you want. Limit tag parameters too. And you can even trim the fully filtered, cruft-free content down to a specified number of paragraphs.
This plugin is for developers who want neatly formatted paragraphs with minimal, semantic styling, and who do not want the proprietary tags and unnecessary parameters inserted by word processors (or the "tag soup" unwittingly generated by clients) compromising their layout.
Although undoubtedly less comprehensive than HTML TIDY or HTML Purifier, it is also more efficient, easier to set up, and focused on the specific problems you will likely encounter if you give your clients a WYSIWG field with which to edit their channel entries. Especially if they are composing in Word and pasting the content in. In my worst-case scenario (a Microsoft Word document exported to HTML and pasted into an EE Rich Text field), ExSponge reduced the data size by 97% without any loss in content.
ExSponge is not just a real-time (inline) cleaner for text markup. Used with your importing routine, it can clean up markup exported from Blogger or WordPress. And with a little Ajax, it can clean code entered in your SafeCracker forms before they hit your database (set up a simple template that sends your text through the ExSponge filter, and call it via Ajax before submitting the form; more details on that will be added here soon).
Some of what is removed by default:
- Word document garbage (including comments, proprietary styles, useless XML tags, "smart" tags, etc.)
- Empty tags (including empty paragraphs, unnecessary tag pairs like <strong></strong>, etc.)
- Purposefully empty paragraphs that WYSIWYG editors are so fond of (<p> </p>, etc.)
- Out-of-scope sections (head, title, style, form, script, object, applet, xml)
- Unnecessary or layout-breaking tags (html, head, iframe, object, center, etc.)
- Unnecessary parameters within tags (unless otherwise specified)
- Inline styling (unless otherwise specified)
- JavaScript (including malicious code)
- Non-printing and control characters
- Newlines (\n) and linefeeds (\r)
- Images with no source
- Extra whitespace
- Zero-width spaces
- Empty lines
- PHP
In addition, ExSponge will:
- Convert oddball characters and entities to the appropriate web-safe ASCII equivalent or entity
- Convert ampersands to entities where appropriate (including inside URLs)
- Convert smart quotes (curly quotes) to normal quotes
- Close unterminated tags and quotes
- Convert non-breaking spaces ( ) to normal spaces
- Normalize all tags to lowercase
- Reformat table text to be readable (if tables tags are to be removed)
- Give special attention to paragraph formatting, and insert missing paragraph start and end tags
- Prettify the output (with newlines and tabs)
The final output will be compact, tidy, and ready to use in your layout.
h2. Demo
A live demonstration of ExSponge is available here:
http://fcgrx.com/sponge
h2. Installation
Place the ex_sponge folder in your system/expressionengine/third_party folder.
h2. Parameters
All parameters are optional:
|. Parameter |. Description |_. Default |
| allow_tags | Remove all HTML tags from the markup and leave only raw, unformatted text ("no"), strip most tags but keep the most useful and safe ("safe", which is the equivalent of "<p><br><b><a><i><em><strong><del><ins><u><ul><ol><li><img><h1><h2><h3><h4><h5><h6><blockquote><q><sup><sub><dl><dt><dd><cite><table><tr><td><th><thead><tbody><tfoot>"), strip most tags but the minimum ("minimal", which is the equivalent of "<p><br><b><a><i><em><strong><del><ins><u><ul><ol><li><img><h1><h2><h3><h4><h5><h6><blockquote><q><sup><sub>"), or strip all tags except the ones you list. Tip: if you set this parameter to "<p>", text will be reduced to paragraphs only. Note that out-of-scope tags (html, head, link, header, footer etc) will be removed regardless. | "safe" |
| allow_breaks | Allow @
@ tags to remain as-is ("yes"), or only convert double-breaks (@
@) to paragraphs while leaving single breaks alone ("single"), or consolidate all breaks into paragraphs ("no"). | "no" |
| allow_parameters | Allow tag parameters to remain ("yes"), strip all but the most necessary ("no", which is the equivalent of "href|src|height|width|alt|title|name|cite|colspan", or strip all parameters except the ones you list. | "no" |
| convert_tags | Convert presentational tags @@ and @@ and @@ and @@ to the semantic @@ and @@ and @@ and @@ ("yes"), or leave them as-is ("no"). | "yes" |
| paragraphs | Clip the text after a specified number of paragraphs. Any positive number ("1", "4", "9999") will cause the text to be trimmed. "-1" will not clip the text at all. | "-1" |
NOTE: allow_styles parameter removed as of v0.9; it is redundant since the addition of the more flexible allow_attributes parameter
h2. Usage
To use this plugin, simply wrap the text you want processed between these tag pairs:
bc. {exp:ex_sponge} ( your mess goes here ) {/exp:ex_sponge}
In my templates, I typically wrap the above tag (with no parameters) around the output of any Rich Text or WYSIWYG field the client is allowed to edit.
A more complex example, which reduces the markup down to the basics, keeps only the first four paragraphs, and takes advantage of EE's built-in tag caching:
bc. {exp:ex_sponge allow_tags="
h2. License
Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
http://creativecommons.org/licenses/by-nc-sa/3.0/
h2. Contact
- Email: "[email protected]":mailto:[email protected]
h2. Support / Feature Requests
This project is an active part of all my ExpressionEngine installations, and I'd like to keep it as fast, full-featured and bulletproof as possible.
Have a bug? Feature request? Please create an issue on GitHub at https://github.com/fcgrx/ex_sponge/issues
h2. Changelog
-
v0.9.1 - Speed improvements. allow_tags now can override the default purging of out-of-scope tags. Added support for tables. Many small refinements.
-
v0.9.0 - Added "minimal" option to allow_tags parameter. Further refinements to the allow_parameters parameter. Many optimizations to filters. Additional filtering for malformed HTML. Support for tables. Removed allow_styles parameter (made redundant by allow_parameters).
-
v0.8.9 - Added "single" option to allow_breaks.
-
v0.8.8 - Added height and width to attribute whitelist
-
v0.8.7 - Refined allow_parameters
-
v0.8.6 - Added allow_parameters argument, which allows control over how tag parameters are filtered. Slight rearrangement of filter order. Removed redundant filters and made minor refinements to others.
-
v0.8.5 - Changed the allow_tags argument to default to new "safe" value, which includes only critical tags. Expanded the MS Word filters. Rearranged filter order for better interaction. Removed redundant filters. Fixed an issue with lost spaces. Made some searches case-insensitive. Made output HTML a little prettier.
-
v0.8.4 - Minor additions to MS Word filters.
-
v0.8.3 - Initial release.