Ticket #3451 (assigned defect (bug))

Opened 2 years ago

Last modified 1 month ago

Page URI canonization

Reported by: pah2 Assigned to: markjaquith (accepted)
Priority: normal Milestone: 2.8
Component: Permalinks Version: 2.1
Severity: normal Keywords: permalink slug canonization has-patch needs-testing
Cc: janbrasna

Description

The nice permalink URIs for posts or categories are case-insensitive, but the page URIs are not.

e.g. http://matt.wordpress.com/about/ cannot be reached via http://matt.wordpress.com/About/

This results in 404s being returned when a user incorrectly gets the case of the URI wrong. This is particularly a problem for weblogs that have migrated old pages to WordPress, and have external pages pointing to them with varying case applied to the URIs.

Attachments

3451.canonical.redirect-pages.diff (2.2 kB) - added by DD32 on 12/10/07 05:55:45.

Change History

12/07/06 04:50:19 changed by markjaquith

  • keywords set to needs-patch.
  • owner changed from anonymous to markjaquith.
  • version set to 2.1.
  • status changed from new to assigned.

For 2.2, I'd like to have some form of automatic URL correction in core... so if you're using http://example.com/about/ and someone puts in http://example.com/About/ or http://www.example.com/about/ or http://example.com/About/ or http://example.com/?page_id=2 or any number of "close, but not quite" URLs, it'll 301 to the real URL. That's good for search engine juice!

12/07/06 07:06:15 changed by Viper007Bond

For the record, page ID & lack of trailing slash (which we should also address) is currently accomplished via a great plugin which I use.

Examples:

http://www.viper007bond.com/about

http://www.viper007bond.com/?page_id=43

However, if we put something like this into the core, it needs a filter or a hook. That plugin currently has exclusion regex which is super helpful.

For example, this is my permalink structure:

http://www.viper007bond.com/archives/2006/12/01/blog-upgraded/

I cheat and have a page located at http://www.viper007bond.com/archives/. With the current code, you can't do that without breaking all your post permalinks (at least in 2.0.x it did). I accomplished it via a custom mod_rewrite rule in my .htaccess that loads up index.php/post-archives/.

03/27/07 22:53:44 changed by foolswisdom

  • milestone changed from 2.2 to 2.3.

09/12/07 22:02:27 changed by markjaquith

  • milestone changed from 2.3 to 2.4 (next).

Not handled by current canonical redirect code, but it's too late to start working on it.

12/09/07 13:38:16 changed by DD32

Should the Canonical redirect redirect it in this case, Or should the page matches be done case insensitivly?

Do search engines treat different capitalised urls as seperate? (thinking double content here)

12/09/07 21:59:29 changed by Viper007Bond

My vote goes for redirect to whatever the user entered for the stub. On Linux, you could have a site.com/file.php and a site.com/File.php and those would be two different scripts.

12/10/07 05:54:37 changed by DD32

the canonical code redirects ?page_id=43 to the correct page allready.

I've just made a patch up which attempts to redirect pages, However, it may be too greedy for the likes of some.

Lets say i have a page structure like this:

Sub-marine
About Me 
   Sub about me 
      sub-sub about me

The attached patch will redirect:

hostname/wordpress/abOUT-me/ => hostname/wordpress/about-me/
hostname/wordpress/abOUT-me/sub- => hostname/wordpress/about-me/sub-about-me/
hostname/wordpress/abOUT-me/sub-sub => hostname/wordpress/about-me/sub-about-me/sub-sub-about-me/
hostname/wordpress/sub- => hostname/wordpress/sub-marine/
hostname/wordpress/sub-sub => hostname/wordpress/about-me/sub-about-me/sub-sub-about-me/

Now, getting to the greedy part:
It also has the effect of redirecting things like:

hostname/wordpress/a => hostname/wordpress/about-me/
hostname/wordpress/su => hostname/wordpress/sub-marine/
hostname/wordpress/hta => hostname/wordpress/2007/11/05/htaccess/

In the case where there's multiple destinations possible, It selects the uppermost item alphabetically.
So hostname/wordpress/a will choose 'about-me' over 'azzes', and 'about-me' over 'parent/aaaa-sub-page'

If it comes accross a semi-permalink:
hostname/wordpress/perma/structure/post-na it'll still redirect it to that post: hostname/wordpress/perma/structure/post-name/ rather than sending it to hostname/wordpress/post-nam/

12/10/07 05:55:45 changed by DD32

  • attachment 3451.canonical.redirect-pages.diff added.

(follow-up: ↓ 9 ) 12/10/07 06:08:04 changed by Viper007Bond

Another possible solution for multiple destinations possible is to pull up the search template and list out all the post/pages for the user to pick from.

(in reply to: ↑ 8 ) 12/10/07 06:11:07 changed by DD32

Replying to Viper007Bond:

Another possible solution for multiple destinations possible is to pull up the search template and list out all the post/pages for the user to pick from.

Thats a much nicer option IMO.

The present redirect code(in trunk) just redirects it to the first item it comes accross(unordered) that fits the criteria that its managed to find. On second thoughts, that $order statement could probably be hard coded for the query if its to select only one item.

03/14/08 03:59:47 changed by djr

  • keywords changed from needs-patch to has-patch needs-testing.

12/11/08 03:56:15 changed by janbrasna

  • cc set to janbrasna.
  • summary changed from Page URIs are case-sensitive to Page URI canonization.
  • component changed from General to Permalinks.
  • milestone changed from 2.9 to 2.8.

Changing Summary since a) the original issue is no longer present in trunk (however it's not canonized, sic) and b) the comments lean towards a different issue - the canonization itself.

Please see revert [9649] and respective #6627 that caused some regression.

12/11/08 04:02:06 changed by janbrasna

  • keywords changed from has-patch needs-testing to permalink slug canonization has-patch needs-testing.