Skip to main content

This site requires you to update your browser. Your browsing experience maybe affected by not having the most up to date version.

Form Questions

Sanitizing HTMLText input with whitelist


Reply

3 Posts   1270 Views

Avatar
katja

23 March 2011 at 12:09am (Last edited: 23 March 2011 12:48am), Community Member, 46 Posts

I have a form with multiple textarea fields, for some of which I use HTMLText type. I provide the users with a very stripped down version of TinyMCE for those. (buttons for bold, italic, setting links only)

If someone turned javascript off though, they could just put html in there, and it would all get accepted, including scripts.

The data in the forms is displayed on another page. So I guess there are two issues here, how to store it in the database, and how to display what is stored in the database.

I think I would probably be happy for people to write any kind of HTML, as long as everything in script tags will be removed. So that would actually be more like a blacklist really. I am wondering how best to achieve that.

Has anybody come across this problem, or got an idea how best to approach this? Are there any in-built methods in Silverstripe for this? Or would it be enough to just write some code to remove <script>..</script> from input of those fields? [edit]: Of course it is more complicated than that as I have just seen http://stackoverflow.com/questions/2698079/strip-script-tags-and-everything-in-between-with-php :( But do i really need to use HTMLPurifier?

Thanks,
Katja

Avatar
swaiba

24 March 2011 at 5:02am Forum Moderator, 1796 Posts

I don't think you'd need more than a regular expression....
http://www.google.co.uk/search?q=php+regex+strip+javascript

I often don't use regex if I don't understand them and in this case I'd probably use one of the functions on http://php.net/manual/en/function.strip-tags.php that the google search yeilded, there is quite often alot of gold in the comments of the php doc pages...

Avatar
katja

24 March 2011 at 6:00am (Last edited: 24 March 2011 6:01am), Community Member, 46 Posts

Thanks for this, Barry

I did in the end implement HTMLpurifier though. I found that Andrew Short had used it in the rssconnector module, so that gave me an idea how to implement it. I put the HTMLPurifier code in a thirdparty directory under mysite/ , then created a class with a function in which config an purifier objects get created. This I can then call whenever I have some input that I would like to be purified.

It seems to work quite well. It might be a bit heavy, but I really like the idea that only specific tags are left through, rather than stripping out certain tags.

Cheers,
Katja