Ticket #4739 (reopened defect)

Opened 9 months ago

Last modified 3 months ago

Some icelandic/Norwegian/Danish letters do not work in page slugs

Reported by: einare Assigned to: westi
Priority: high Milestone: 2.6
Component: i18n Version: 2.2.1
Severity: major Keywords: needs-patch early
Cc:

Description (Last modified by westi)

When the page slug is generated from the post title, three icelandic letters are not converted correctly. These three letters are Ð ð, Þ þ and Æ æ. They should be converted to D d, TH th and AE ae but are not.

For instance, when I made a post with the title ‘Þátturinn’ the post-slug would become ‘þatturinn’ and when I tried to enter that address in my address bar it changed to ‘%c3%beatturinn’ and I got a ‘page not found’ error from Wordpress.

This can be fixed by adding the following six lines to formatting.txt, in the function remove_accents, inside the if (seems_utf8($string)) { condition.

chr(195).chr(144) => 'D', 
chr(195).chr(176) => 'd',
chr(195).chr(158) => 'TH',
chr(195).chr(190) => 'th',
chr(195).chr(134) => 'AE',
chr(195).chr(166) => 'ae',

Also (from #5952) When the post slug is generated from the post title, the letter 'Å' 'å' converts to 'a', should convert to 'aa' which is the general practice in countries using this character (Confer Wikipedia).

Furthermore, the Norwegian/Danish characters 'Æ' 'æ' and 'Ø' 'ø' should be converted to respectively 'ae' and 'oe'. As of now, these convert to '%c3%a6' and '%c3%b8'.

Attachments

4739.patch (2.8 kB) - added by einare on 08/13/07 15:42:29.
Fix for the ticket

Change History

08/13/07 15:42:29 changed by einare

  • attachment 4739.patch added.

Fix for the ticket

08/13/07 16:55:41 changed by Nazgul

  • keywords set to has-patch.
  • milestone changed from 2.2.3 to 2.3 (trunk).

08/28/07 21:18:11 changed by westi

  • keywords changed from has-patch to has-patch dev-reviewed.
  • owner changed from anonymous to westi.
  • status changed from new to assigned.

+1

08/29/07 17:34:15 changed by westi

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [5969]) Add utf8->ascii mappings for icelandic letters. Fixes #4739 props einare

09/20/07 08:54:29 changed by nbachiyski

  • status changed from closed to reopened.
  • resolution deleted.

This commit breaks permalinks of posts, containing these characters and posted using the old version of this function.

We should either revert it or pass all permalinks, which aren't manually edited, through the new sanitize title. IN order to achieve this we have to compare the output of the old and the new remove_accents functions.

09/20/07 09:09:30 changed by nbachiyski

Or maybe we should change the query post name matching, so that it uses the raw post name from the url, not the decoded one. If we don't do this we should be very careful in modifying sanitize_title's behaviour.

09/20/07 09:25:41 changed by Nazgul

  • keywords changed from has-patch dev-reviewed to developer-feedback.
  • priority changed from normal to high.
  • severity changed from minor to major.

09/21/07 15:38:21 changed by ryan

Affected posts can be fixed by resaving them. The old slug redirector will handle redirecting the old URL. But, that's not very friendly. For 2.3 we should probably revert the change.

09/21/07 19:36:13 changed by ryan

(In [6150]) Revert [5969]. It can break permalinks. see #4739

09/21/07 19:36:48 changed by ryan

  • milestone changed from 2.3 to 2.4.

Reverted for 2.3. We'll try to fix it properly for 2.4.

09/22/07 17:38:07 changed by westi

  • keywords changed from developer-feedback to needs=patch early.

I guess we need to make sure that any changes we make to the slug generation code they don't affect old posts in the way it currently does.

We should always be checking against the string we use to generate the permalink not a re-santized one.

09/23/07 15:30:34 changed by nbachiyski

westi, we aren't always generating the permalink based on information we have in the database. Usually the title is used, but users are allowed to enter their own slugs and we don't keep the original slug -- only the sanitized one.

09/25/07 00:34:28 changed by mdawaffe

  • keywords changed from needs=patch early to needs-patch early.

02/22/08 07:32:31 changed by westi

  • summary changed from Some icelandic letters do not work in page slugs to Some icelandic/Norwegian/Danish letters do not work in page slugs.
  • description changed.
  • milestone changed from 2.5 to 2.6.

Closed #5952 as a dupe of this and updated bug with more characters to fix.

Moving to 2.6 as this needs fixing early and lots of testing so we can be sure we don't break things.