Ticket #4554 (closed defect: fixed)

Opened 11 months ago

Last modified 8 months ago

Canonical trailing slashes

Reported by: ryan Assigned to: markjaquith
Priority: normal Milestone: 2.3
Component: Administration Version:
Severity: normal Keywords:
Cc:

Description

A post shouldn't be served via both /2007/06/foo and /2007/06/foo/. One should redirect to the other with the user's trailing slash preference as specified in the permalink structure being the canonical form.

Attachments

canonical_trailing_slash.diff (1.0 kB) - added by ryan on 06/28/07 00:10:59.
canonical.php (4.3 kB) - added by markjaquith on 06/28/07 02:16:41.
canonical_rehash.001.diff (6.7 kB) - added by markjaquith on 08/10/07 08:25:38.

Change History

06/28/07 00:10:59 changed by ryan

  • attachment canonical_trailing_slash.diff added.

06/28/07 02:16:41 changed by markjaquith

  • attachment canonical.php added.

06/28/07 02:18:39 changed by markjaquith

canonical.php is some code Matt and I threw back and forth a few times in March. Handles a whole lot of situations including trailing slashes, yes-www/no-www, and alternative permalink styles (i.e. ?p=N when using pretty permalinks).

06/28/07 02:20:51 changed by markjaquith

  • milestone set to 2.3 (trunk).

Oooh, and it also handles truncated URLs. Like if in an e-mail something got changed to:

http://example.com/2007/05/03/title-of-
my-awesome-post/

Clicking on http://example.com/2007/05/03/title-of- should forward you.

06/28/07 03:08:43 changed by matt

I remember that code! Let's get this in.

06/28/07 15:39:27 changed by ryan

We use that code on wpcom and it breaks plugins that add new links. I have to turn it off for several clients. That's why my patch is more limited. The link guessing when there's a 404 can be added back in. The stuff that builds a link from the query vars can probably be added back if we do it only when ! $wp->did_permalink. It's still pretty iffy though since it doesn't know what query vars have been added via plugin.

06/28/07 15:48:11 changed by ryan

For example, one client wanted links of the form /2007/week24/. The maps to a query of index.php?year=2007&w=24. The permalink redirector changes this to /2007/ since it doesn't know about "w". Since the link is already a permalink, simply not redirecting it will fix this situation. Mapping a request of index.php?year=2007&w=24 to a permalink will still break, however.

06/28/07 19:42:51 changed by markjaquith

How about disabling the redirection when we don't recognize all the query vars?

06/28/07 20:18:48 changed by Viper007Bond

Or having a way for plugins to register query vars and such so that it redirects properly, assuming there isn't one already.

06/28/07 20:27:01 changed by markjaquith

There is such a way... but the problem is that there are query vars we know about but don't know how to translate into a permalink. Like "w". And it's hard to tell whether a query var is default or was overridden. Big mess for little payoff.

New approach: look for known invalid alternative permalinks like ?p=N and redirect those.

We can still do trailing slash, no-www/yes-www, index.php$ stripping and 404-guessing, which will be the majority of the issues.

08/05/07 09:37:08 changed by snakefoot

Instead of just validating that the URL is using the correct format, then it would be nice if it validated that the url was the official permalink:

http://wordpress.org/extend/plugins/permalink-validator/

08/05/07 18:06:57 changed by markjaquith

There are negative side effects to such an approach... see the comments above.

08/06/07 20:09:16 changed by snakefoot

If the there was canonical urls for posts, pages and categories, then it would be a big step. Don't mind this issue being solved in multiple steps, and taking the easy ones first instead of trying to solve everything in one step.

08/10/07 08:25:38 changed by markjaquith

  • attachment canonical_rehash.001.diff added.

08/10/07 08:29:48 changed by markjaquith

  • owner changed from anonymous to markjaquith.
  • status changed from new to assigned.

canonical_rehash.001.diff is pretty much a rewrite.

The basic idea here is to look for known issues. It doesn't try to be the end-all. It fixes what it can, and doesn't bite off more than it can chew.


Features:

  1. Trailing slashes (enforces user preference)
  2. example.com vs www.example.com (enforces user preference)
  3. strips trailing /index.php or /index.php/
  4. strips /index.php/ if using mod_rewrite permalinks, which allows for seamless migration from PATHINFO to mod_rewrite
  5. Redirects ?p=N, ?page_id=N, ?m=YYYY, ?m=YYYYMM, ?m=YYYYMMDD, ?year=YYYY, ?year=YYYY&monthnum=MM, ?year=YYYY&monthnum=MM&day=DD, ?cat=N, ?author=N, ?paged=N to their "pretty" counterparts.
  6. Always preserves additional args in the query string.
  7. Redirects is_single() URLs with partial slugs. e.g. /2007/06/05/slug-fragment => /2007/06/05/slug-fragment-is-whole-now/
  8. Is non-greedy -- looks for specific cases that it can fix, but doesn't try to fix everything.

I've used dozens of test cases, but I probably forgot a few along the way. Let me know what you think.

08/12/07 17:20:55 changed by snakefoot

Since this patch expects that REQUEST_URI is available and correct, then it is even more important that #3514 is solved or else it will redirect constantly on IIS.

08/17/07 03:33:23 changed by markjaquith

Solved #3514 in [5889], so we should be good to go, here.

08/17/07 03:35:15 changed by markjaquith

(In [5890]) Canonical URLs, first swing. Props to Scott Yang, Ryan and Matt. see #4554 (and report bugs there, for now)

08/17/07 03:45:59 changed by markjaquith

(In [5891]) Oops... forgot the svn add. Thanks mdawaffe. see #4554

09/12/07 03:07:20 changed by ryan

Calling this done. Bugs can be handled in other tickets.

09/12/07 03:07:24 changed by ryan

  • status changed from assigned to closed.
  • resolution set to fixed.