Ticket #4409 (closed defect: fixed)

Opened 1 year ago

Last modified 1 year ago

KSES removes text after a non-tag less than sign

Reported by: mdawaffe Assigned to: mdawaffe
Priority: high Milestone: 2.3
Component: General Version: 2.2
Severity: critical Keywords: has-patch commit
Cc:

Description

Write a comment or a post with the following content while logged out or logged in as a user without the unfiltered_html cap.

This is a < less than sign.

The output will be the following.

This is a

Attachments

4409.diff (1.1 kB) - added by mdawaffe on 06/05/07 19:42:11.
possibility
4409b.diff (2.3 kB) - added by mdawaffe on 07/03/07 21:44:34.
A filter
4409c.diff (2.3 kB) - added by mdawaffe on 07/04/07 00:31:35.
better than b

Change History

06/05/07 19:42:11 changed by mdawaffe

  • attachment 4409.diff added.

possibility

(follow-up: ↓ 4 ) 06/05/07 19:46:29 changed by mdawaffe

4409.diff: a possible solution.

  1. Tweaks a kses regex.
  2. Converts
    This is a < less than sign.
    
    to
    This is a &lt; less than sign.
    
  3. Converts
    foo > br
    
    to
    foo <br>
    
    (and similar for any allowed tag). This is KSES' original behavior.

This will need some serious testing to ensure it doesn't open any security holes.

06/05/07 19:47:01 changed by mdawaffe

I mean

foo < br

06/05/07 19:54:01 changed by foolswisdom

  • priority changed from normal to high.
  • severity changed from normal to critical.

(in reply to: ↑ 1 ) 06/06/07 10:25:17 changed by westi

Replying to mdawaffe:

4409.diff: a possible solution. 1. Tweaks a kses regex. 2. Converts This will need some serious testing to ensure it doesn't open any security holes.

Is it worth taking an alternative approach to this and adding a new filter to post/comment content before the kses filter which converts lone < and > to &gt; and &lt; so as to not deviate from the stand kses code and preserve the current level of security?

06/13/07 16:46:06 changed by mdawaffe

Westi, Fine by me. KSES is already breaking the text up in a convenient way for looking for lone less than signs, is all.

06/13/07 20:58:22 changed by markjaquith

If you can do it outside of KSES without too much fuss or processing overhead, then we should go that route.

Note for posterity: HTML Purifier doesn't handle this any better than KSES, even though it does offer XHTML well-formedness and validity plus XSS filtering all in one package.

06/15/07 16:21:07 changed by AmbushCommander

Hi, this is the lead developer for HTML Purifier. The upcoming, newest version of HTML Purifier does in fact handle this case gracefully by changing the unescaped < into a literal. For your case, however, with one simple regex:

$html = preg_replace('/<([A-Za-z0-9])/', '&lt;$1', $html);

No mucking around kses necessary. This, however, will turn < br> into &lt; br&gt;

(follow-up: ↓ 9 ) 06/15/07 16:22:00 changed by AmbushCommander

Oops, I didn't wrap the code properly. It's really:

$html = preg_replace('/<([^A-Za-z0-9])/', '&lt;$1', $html);

07/03/07 21:44:34 changed by mdawaffe

  • attachment 4409b.diff added.

A filter

(in reply to: ↑ 8 ) 07/03/07 22:00:32 changed by mdawaffe

Replying to AmbushCommander:

{{{ $html = preg_replace('/<([A-Za-z0-9])/', '&lt;$1', $html); }}}

I don't think that regex is robust enough. "bob<sue" or "<3" would still get caught. Kids say the darndest things.

4409b.diff

  1. Add pre_kses filter to kses (right where it says we should), but rearrange the order slightly (in a way that does not effect kses' efficacy at all).
  2. Add regex to that filter to find and wp_specialchars()ize any lone less than signs.

Kses is not modified in any problematic way. Any strings that might have gotten stripped before but now aren't are run through both wp_specialchars and kses, so I don't believe there are any security issues.

07/04/07 00:31:35 changed by mdawaffe

  • attachment 4409c.diff added.

better than b

07/04/07 00:33:16 changed by mdawaffe

4409c.diff

  1. Use strpos() instead. substr_count() was needed for some debugging.

07/04/07 07:37:13 changed by westi

+1

Not tested but patch looks good to me

07/05/07 17:02:43 changed by mdawaffe

  • keywords set to has-patch commit.
  • owner changed from anonymous to mdawaffe.
  • status changed from new to assigned.

07/06/07 12:53:16 changed by markjaquith

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [5783]) Entitize lone less-than characters. Props mdawaffe. fixes #4409

07/06/07 17:55:17 changed by mdawaffe

  • status changed from closed to reopened.
  • resolution deleted.

Oops. The filter was supposed to have extra args.

$string = apply_filters( 'pre_kses', $string, $allowed_html, $allowed_protocols );

Sorry.

07/06/07 22:47:25 changed by markjaquith

(In [5787]) Pass extra args to pre_kses hook. Props mdawaffe. see #4409

07/19/07 17:32:43 changed by Nazgul

  • status changed from reopened to closed.
  • resolution set to fixed.