Ticket #3495 (closed defect: fixed)

Opened 2 years ago

Last modified 1 year ago

Pingback excerpt-fetching code looks for a link to the SOURCE site instead of the TARGET site.

Reported by: markjaquith Assigned to: markjaquith
Priority: high Milestone: 2.1
Component: XML-RPC Version: 2.0.5
Severity: normal Keywords: has-patch
Cc:

Description

xmlrpc.php:1216-1218

$sem_regexp_pb = "/(\\/|\\\|\*|\?|\+|\.|\^|\\$|\(|\)|\[|\]|\||\{|\})/";
$sem_regexp_fix = "\\\\$1";
$link = preg_replace( $sem_regexp_pb, $sem_regexp_fix, $pagelinkedfrom );

So now $link is a regex-safe version of the SOURCE URL.

xmlrpc.php:1221-1232

foreach ( $p as $para ) {
	if ( $finished )
		continue;
	if ( strstr( $para, $pagelinkedto ) ) {
		$context = preg_replace( "/.*<a[^>]+".$link."[^>]*>([^>]+)<\/a>.*/", "$1", $para );
		$excerpt = strip_tags( $para );
		$excerpt = trim( $excerpt );
		$use     = preg_quote( $context );
		$excerpt = preg_replace("|.*?\s(.{0,100}$use.{0,100})\s|s", "$1", $excerpt);
		$finished = true;
	}
}

The SOURCE URL's paragraphs are iterated. Once one is found that contains the TARGET URL, it (mistakenly) looks for a link to the SOURCE URL and uses that as context for the excerpt. It doesn't find it, of course. But it doesn't really matter, because even if the context regex used the TARGET URL as it should, the excerpt regex matches the whole paragraph.

I thought I was going crazy when I saw this code... was pretty sure that I was missing something.

This dates back all the way to [2619]

Patch coming.

Attachments

pingback_excerpt.001.diff (2.4 kB) - added by markjaquith on 12/23/06 10:25:59.
Patch for trunk
pingback.diff (2.4 kB) - added by ruckus on 12/27/06 08:11:51.
Patch against 2.0.6-RC1

Change History

12/23/06 10:25:59 changed by markjaquith

  • attachment pingback_excerpt.001.diff added.

Patch for trunk

12/23/06 10:32:41 changed by markjaquith

  • owner changed from anonymous to markjaquith.
  • status changed from new to assigned.
  • milestone changed from 2.0.6 to 2.1.

Uploaded: pingback_excerpt.001.diff

  • use preg_quote() instead of wacky manual escaping
  • use the target URL for context
  • mark the link text to prevent more than one occurrence of that string from interfering
  • make the context-finding regex lazy
  • trim link text if it is more than 100 characters long

There. Now that should be functioning like it was intended to function. You get a nice excerpt with an absolute upper limit of 300 chars instead of the 1000+ char monstrosities that frequently showed up with the broken code.

I'll let this sit a day or two for auditing.

12/27/06 00:53:38 changed by ryan

Haven't tested yet but code looks good.

12/27/06 05:28:03 changed by tenpura

I have tested. It seems to work like it was originally intended.

12/27/06 06:10:06 changed by markjaquith

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [4667]) Pingback excerpt fetching improvements and fixes. fixes #3495

12/27/06 06:12:16 changed by markjaquith

  • milestone changed from 2.1 to 2.0.7.

Too late for 2.0.6, but definitely a 2.0.7 candidate.

12/27/06 08:11:51 changed by ruckus

  • attachment pingback.diff added.

Patch against 2.0.6-RC1

01/13/07 17:02:05 changed by ruckus

  • keywords set to bg|has-patch.
  • status changed from closed to reopened.
  • resolution deleted.
  • milestone changed from 2.0.7 to 2.0.8.

Reopening so we could maybe get this into 2.0.8

01/30/07 19:23:44 changed by markjaquith

  • milestone changed from 2.0.8 to 2.0.9.

Want to see what 2.1 users think of this before getting it into 2.0.x

02/21/07 14:56:48 changed by Nazgul

  • keywords changed from bg|has-patch to has-patch.
  • milestone changed from 2.0.9 to 2.0.10.

2.0.9 is out, so bumping the milestone to 2.0.10.

03/19/07 18:18:22 changed by foolswisdom

  • milestone changed from 2.0.10 to 2.0.11.

(follow-up: ↓ 11 ) 03/19/07 18:57:16 changed by Nazgul

@foolswisdom: Instead of bumping these tickets after each security release. Wouldn't it make more sense if we milestone them as "2.0.eventually" and set the final correct milestone when they're commited? It would save us some work. :)

(in reply to: ↑ 10 ) 03/19/07 19:44:45 changed by foolswisdom

Replying to Nazgul:

Wouldn't it make more sense if we milestone them as "2.0.eventually"

True enough. Next time, or maybe Mark J will do that ;-)

03/20/07 00:19:08 changed by ryan

Note that when a milestone is marked as completed, trac offers to move all open bugs to another milestone. It's not difficult to bulk move items from one milestone to another.

05/31/07 06:57:31 changed by Nazgul

Any change of this going in 2.0.11 or are we going to bump again?

06/05/07 03:34:33 changed by rob1n

  • milestone changed from 2.0.11 to 2.0.eventually.

I suppose we can just bump it -- if Mark wants to put it in the next 2.0.x release, then he can move the milestone and commit it.

09/13/07 05:15:58 changed by ryan

  • status changed from reopened to closed.
  • resolution set to fixed.
  • milestone changed from 2.0.eventually to 2.1.