Ticket #2980 (closed enhancement: fixed)

Opened 2 years ago

Last modified 2 years ago

Improvements to wptexturize

Reported by: ecb29 Assigned to: anonymous
Priority: normal Milestone: 2.1
Component: Optimization Version: 2.0.3
Severity: normal Keywords: wptexturize optimize
Cc:

Description

The wptexturize function in functions-formatting.php can be significantly improved by some simple refactoring. My measurements show this reduces the time spent inside wptexturize from 24% to 16% of total wp(), and the time of the function itself from 600ms to 200ms. This also reduces the number of preg_replace calls dramatically, from 10,439 to 3,289 and total time from 74ms to 36ms. Also, we’ve gone from 54ms of 6,410 str_replace calls to 29ms of 1,405 calls.

function wptexturize($text) {
	$next = true;
	$output = '';
	$curl = '';
	$textarr = preg_split('/(<.*>)/Us', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
	$stop = count($textarr);
	
	for($i = 0; $i < $stop; $i++){
		$curl = $textarr[$i];
		
  	if (isset($curl{0}) && '<' != $curl{0} && $next) { // If it's not a tag
  		// static strings
  		$static_characters = array('&#8212;', ' &#8212; ', '&#8211;', 'xn--', '&#8230;', '&#8220;', '\'tain\'t', '\'twere', '\'twas', '\'tis', '\'twill', '\'til', '\'bout', '\'nuff', '\'round', '\'cause', '\'s', '\'\'', ' (tm)');
  		$static_replacements = array('---', ' -- ', '--', 'xn&#8211;', '...', '``', '&#8217;tain&#8217;t', '&#8217;twere', '&#8217;twas', '&#8217;tis', '&#8217;twill', '&#8217;til', '&#8217;bout', '&#8217;nuff', '&#8217;round', '&#8217;cause', '&#8217;s', '&#8221;', ' &#8482;');
  		$curl = str_replace($static_characters, $static_replacements, $curl);
  
  		// regular expressions
  		$dynamic_characters = array('/\'(\d\d(?:&#8217;|\')?s)/', '/(\s|\A|")\'/', '/(\d+)"/', '/(\d+)\'/', '/(\S)\'([^\'\s])/', '/(\s|\A)"(?!\s)/', '/"(\s|\S|\Z)/', '/\'([\s.]|\Z)/', '/(\d+)x(\d+)/');
  		$dynamic_replacements = array('&#8217;$1','$1&#8216;', '$1&#8243;', '$1&#8242;', '$1&#8217;$2', '$1&#8220;$2', '&#8221;$1', '&#8217;$1', '$1&#215;$2');	
  		$curl = preg_replace($dynamic_characters, $dynamic_replacements, $curl);
  	} elseif (strstr($curl, '<code') || strstr($curl, '<pre') || strstr($curl, '<kbd' || strstr($curl, '<style') || strstr($curl, '<script'))) {
  		// strstr is fast
  		$next = false;
  	} else {
  		$next = true;
  	}
  		
  	$curl = preg_replace('/&([^#])(?![a-zA-Z1-4]{1,8};)/', '&#038;$1', $curl);
  	$output .= $curl;
	}
	
	return $output;
}

I'll try and attach a patch...

Attachments

patch.diff (4.4 kB) - added by ecb29 on 07/27/06 20:35:30.
Patch diff

Change History

07/27/06 20:35:30 changed by ecb29

  • attachment patch.diff added.

Patch diff

09/21/06 01:22:40 changed by foolswisdom

  • milestone changed from 2.0.4 to 2.2.

09/21/06 03:14:00 changed by ryan

Looks good offhand. We'll need to test this well to make sure we don't break anything. Since texturize is run on display rather than on save, we can easily do side-by-side comparisons. Load a a page using the old version and save page source. Load the same page using the new version and save source. Diff the two page sources.

09/21/06 03:18:57 changed by ryan

Hmm, some of those replacements might be going the wrong direction.

11/19/06 03:40:07 changed by ecb29

It should be equivalent to the old function, as far as I can tell.

11/20/06 02:05:14 changed by ryan

The first six items in static_characters and static_replacements look like they should be swapped.

11/21/06 22:00:03 changed by ryan

  • milestone changed from 2.2 to 2.1.

11/21/06 22:00:11 changed by ryan

  • status changed from new to closed.
  • resolution set to fixed.

(In [4511]) Make wptexturize faster. Props ecb29. fixes #2980