Ticket #5998 (new defect)

Opened 6 months ago

Last modified 4 months ago

Invalid Unicode characters

Reported by: shelleyp Assigned to: anonymous
Priority: normal Milestone: 2.9
Component: General Version: 2.3.3
Severity: normal Keywords: unicode invalid xhtml
Cc: rubys, schiller

Description

Wordpress does not check for invalid Unicode characters, such as the following:

U+FFFE U+FFFF

When the pages are served up as XHTML, allowing these characters through generates an XML error.

WordPress should filter out illegal Unicode code points.

Please see http://www.w3.org/TR/REC-xml/#NT-Char

Also, the regex here is incorrect, see [http://intertwingly.net/blog/2008/01/02/Keeping-On-Your-Toes this page].

Attachments

bug5998.patch (2.1 kB) - added by schiller on 04/05/08 03:47:35.
Patch. Assumes UTF-8. Only handles comment submission (not trackbacks, search queries, etc)

Change History

02/25/08 18:18:08 changed by shelleyp

This impacts on comments, ping backs, as well as search.

02/25/08 19:39:28 changed by rubys

  • cc set to rubys.

02/25/08 20:13:08 changed by schiller

  • cc changed from rubys to rubys, schiller.

04/05/08 03:47:35 changed by schiller

  • attachment bug5998.patch added.

Patch. Assumes UTF-8. Only handles comment submission (not trackbacks, search queries, etc)

04/14/08 03:47:08 changed by bertilow

I think a case can be made for this not being WordPress's job at all. The checking should be done in MySQL. If the database has been correctly set to UTF-8, no invalid characters should ever be stored in the database in the first place. If they're not stored, then they will not show up or do damage.

It would still be wise the check for them though, just in case. But perhaps this bug should be forwarded to MySQL.