Ticket #4040 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

WXR importing duplicates categories

Reported by: takayukister Assigned to: rob1n
Priority: low Milestone: 2.2
Component: Administration Version: 2.2
Severity: normal Keywords: has-patch commit
Cc:

Description

In admin panel [Manage] - [Import], importing WordPress eXtended RSS (WXR) incorrectly generate duplicated categories with same cat_name.

For example, this occurs under condition below.

At exporting:

cat_IDcat_namecategory_nicename
3書評book-review

(書評 means book review in Japanese.)

And after importing this WXR file, categories will look like below.

cat_IDcat_namecategory_nicename
3書評book-review
4書評%e6%9b%b8%e8%a9%95

Two 書評 categories are generated. One has original "book-review" nicename and another has "%e6%9b%b8%e8%a9%95" which is sanitized string of "書評". And all posts in 書評 category belongs to 書評 category of "%e6%9b%b8%e8%a9%95".

It seems that this duplication occurs only when a category has category_nicename which is not equal to sanitized cat_name.

For easy reproducing of this behavior, I'll attach example WXR file. And I'll attach a patch for this.

Attachments

wordpress.2007-03-28.xml (3.1 kB) - added by takayukister on 03/28/07 06:20:47.
WXR file for example. UTF-8 encoded.
import-wordpress.diff (0.8 kB) - added by takayukister on 03/28/07 06:21:24.
4040.diff (0.7 kB) - added by takayukister on 04/12/07 05:25:36.

Change History

03/28/07 06:20:47 changed by takayukister

  • attachment wordpress.2007-03-28.xml added.

WXR file for example. UTF-8 encoded.

03/28/07 06:21:24 changed by takayukister

  • attachment import-wordpress.diff added.

03/28/07 16:55:39 changed by foolswisdom

  • milestone changed from 2.4 to 2.2.

04/05/07 14:08:57 changed by rob1n

Wouldn't it be easier to use category_exists()?

04/06/07 01:14:41 changed by takayukister

category_exists() depends on category_nicename to check existence of category. And it uses sanitize_title(cat_name) to get the category_nicename.

Under the example described above, sanitize_title("書評") returns "%e6%9b%b8%e8%a9%95" and because there's been no category with category_nicename "%e6%9b%b8%e8%a9%95" ever, another 書評 category (cat_ID:4) is created.

My patch uses cat_name instead of category_nicename to check category existence.

04/10/07 18:48:19 changed by foolswisdom

  • keywords changed from has-patch to has-patch WXR.

04/10/07 23:15:25 changed by rob1n

  • keywords changed from has-patch WXR to has-patch commit.
  • owner changed from anonymous to rob1n.

04/12/07 01:49:47 changed by rob1n

  • status changed from new to closed.
  • resolution set to fixed.

(In [5246]) Use cat_name instead of cat_nicename when creating categories from import. Props takayukister. fixes #4040

04/12/07 02:08:43 changed by rob1n

(In [5247]) Use get_var and only select cat_ID. see #4040

04/12/07 05:24:44 changed by takayukister

  • status changed from closed to reopened.
  • resolution deleted.

I reopened to upload additional patch to correct two points below,

1. $post_cats should always include $cat_ID, regardless the category is existent or not.

2. Replace $post_ID to $post_id.

$post_ID is old ID on WXR file and $post_id is new ID. It's confusing naming (I'm sorry because I named $post_ID). It would be good to give them better names.

04/12/07 05:25:36 changed by takayukister

  • attachment 4040.diff added.

04/12/07 05:55:35 changed by rob1n

  • status changed from reopened to closed.
  • resolution set to fixed.

(In [5252]) Some fixes for another fix. Props takayukister. fixes #4040