Ticket #3670 (new defect)

Opened 1 year ago

Last modified 3 months ago

Removing CDATA close tag ( ]]> ) unbalances the CDATA block

Reported by: scenic Assigned to: andy
Priority: normal Milestone: 2.5.2
Component: Template Version: 2.1
Severity: major Keywords: has-patch needs-testing dev-feedback
Cc:

Description

I'm not sure if this is a bug per se, but it breaks included JS in my posts that are escaped with a CDATA section. I've basically commented out the following line in the_content() every time I upgrade:

//$content = str_replace(']]>', ']]>', $content);

A user on the wp-testers list indicated that this was part of making sure that included CDATA blocks didn't break RSS feeds. I don't use CDATA sections directly in my RSS feeds, so I haven't tested this. In my particular case, the JS is used to embed Flash movies (e.g. YouTube? videos) in an XHTML compliant way (without embed tags). I have a custom plugin I've written that strips out the JS and replaces it with the embed tag in the RSS feed.

Perhaps we should use a flag to activate this when the app is going through a feed. Ideally, though, it would simply be removed. Odds are that the CDATA block is being used for a JS block in a post body, and since most RSS aggregators don't allow JS by default, it would be safe to simply remove CDATA blogs if is_feed() is set.

Attachments

3670.001.diff (1.6 kB) - added by AaronCampbell on 04/04/08 22:07:35.

Change History

01/25/07 06:27:31 changed by scenic

  • version set to 2.1.

01/25/07 06:34:05 changed by filosofo

Couldn't we just make this a filter that is applied for RSS feeds but not e.g. admins, like kses?

01/25/07 06:43:10 changed by andy

I was the one who pointed out the reason for that line. The JS-replacement plugin is probably saving your feeds from the problem that line was added to fix.

I think we can come up with a better solution for this. If there is a good reason to wrap some of your post content as CDATA, perhaps you would be better served by adding a filter to your plugin to wrap the_content as CDATA.

Has anyone else run into this?

01/26/07 16:10:43 changed by alexrabe

I would also appreciate an update to make difference between feed / content, my plugin has exact the same problem, cause when I want the XHTML validation, I need the CDATA implementation, cause the "socalled" flashvars for a Flash video plugin are diveded by ampersand's. At the moment it's possible to use <!-- around CDATA to covers the CDATA problematic in the conent, but it seems to be deprecate from W3C.

So more and More Flash integration trap into this problematic

02/10/07 17:58:14 changed by scenic

Andy, I agree that the filter is saving me from needing that line. But, embedded JS is a valid thing to do in the content. That replace function should only really happen when we are processing a feed. The global is_feed() (I think) method should just wrap that line at a minimum.

02/12/07 07:19:47 changed by andy

  • keywords set to dev-feedback.

Checking is_feed does not get to the root of the problem, which is that we are trying to nest CDATA sections where we should not. I don't know enough about XML but I think we ought to stop using CDATA markers in feed templates and use a function that can intelligently wrap any content in CDATA as needed, removing any nested markers.

Is there really any reason to encode them rather than simply remove them?

So what I propose, and what should be seen by somebody with XML chops, is this unintelligent function:

function wp_cdata($data) {

// Maybe test for conditions necessitating // CDATA markers and return if no need

$data = str_replace('<![CDATA[', , $data); $data = str_replace(']]>', , $data);

// Sometimes a trailing square bracket // can mess up XML parsers: if ( substr($data, -1) == ']' )

$data .= ' ';

return "<![CDATA[$data]]>';

}

02/12/07 07:22:59 changed by andy

function wp_cdata($data) {
 // Maybe test for conditions necessitating
 // CDATA markers and return if no need

 $data = str_replace('<![CDATA[', '', $data);
 $data = str_replace(']]>', '', $data);

 // Sometimes a trailing square bracket
 // can mess up XML parsers:
 if ( substr($data, -1) == ']' )
  $data .= ' ';

 return "<![CDATA[$data]]>';
}

02/13/07 19:29:04 changed by tombarta

Does wordpress support PHP < 4.0.5? If not, you could use arrays in str_replace:

$data = str_replace(array('<![CDATA[', ']]>'), array('', ''), $data);

02/21/07 16:03:17 changed by Nazgul

  • milestone changed from 2.1.1 to 2.1.2.

03/27/07 23:33:24 changed by foolswisdom

  • milestone changed from 2.1.3 to 2.3.

06/13/07 12:07:15 changed by tellyworth

Not sure how helpful this is, but it is kind of possible to nest CDATA sections. Technically I suppose it's double-encoding rather than nesting, but regardless, this is 100% valid and should work with any compliant XML parser:

<![CDATA foo <![CDATA bar ]]]><![CDATA[]> baz ]]>

It's actually two separate CDATA sections concatenated together.

When decoded by an XML parser, this becomes:

foo <![CDATA bar ]]> baz

Here's a PHP snippet to do it:

return '<![CDATA.str_replace(?]>', ']]]><![CDATA[]>', $str).']]>';

06/13/07 12:09:14 changed by tellyworth

Whoops, here's that code again:

return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';

09/12/07 22:09:17 changed by markjaquith

  • milestone changed from 2.3 to 2.4 (next).

Any updates, Andy?

03/12/08 03:14:57 changed by lloydbudd

  • milestone changed from 2.5 to 2.6.

03/30/08 10:21:54 changed by Nazgul

  • owner changed from anonymous to andy.

04/03/08 18:30:28 changed by AaronCampbell

I'm still a little unclear as to the purpose of this line of code. I keep running into this problem, so I'd like to see a fix (I'd be happy to submit a patch), but I don't understand why we are keeping it at all. If it's just needed for feeds, lets limit it to affecting feeds. If someone could clearly explain to me why it's here (when it's needed), I'll put a patch together.

04/04/08 22:07:35 changed by AaronCampbell

  • attachment 3670.001.diff added.

04/04/08 22:12:55 changed by AaronCampbell

  • keywords changed from dev-feedback to has-patch needs-testing dev-feedback.
  • priority changed from low to normal.
  • component changed from Administration to Template.
  • severity changed from normal to major.
  • milestone changed from 2.6 to 2.5.1.

This ticket has sat here forever with nothing getting done, so I added a patch to get the ball rolling. The patch will fix two problems. It will fix one expressed above by only replacing ]]> with ]]&gt; when it is processing a feed.

There was another problem because we so often use:

//<![CDATA[
alert('test');
//]]>

and strip_tags() leaves the leading // there, which looks ugly in the feeds as well. I added a simple statement in two places to handle that: $text = preg_replace('|//\s*<!\[CDATA\[|', '<![CDATA[', $text);