Ticket #7612 (assigned enhancement)

Opened 4 months ago

Last modified 3 months ago

Tumblr importer

Reported by: hailin Assigned to: hailin (accepted)
Priority: normal Milestone: 2.8
Component: General Version:
Severity: normal Keywords:
Cc:

Description

Hao wrote a tool to convert Tumblr blog into WordPress XML file, which can be imported into WordPress blogs.

http://haochen.me/tumblr/ and blogged about it at http://haochen.wordpress.com/2008/08/19/export-your-tumblr-blog-to-wordpress/

We want to incorporate this into wpcore eventually.

Attachments

tumblr_export.php (15.0 kB) - added by hailin on 08/27/08 17:16:19.
export code using XML parser

Change History

08/27/08 16:11:18 changed by hailin

  • type changed from defect to enhancement.

I've reviewed and experimented the Tumblr export code from Hao.

The current wordpress.org and wordpress.com import code replies on

preg_match to extract tags, largely because of legacy reasons - powerful XML parsing modules such as SimpleXMLElement are not available in PHP 4.x. Using PHP 5.x built-in XML parsers can produce much cleaner and faster code. I can envision that we significantly improve our import code, by rewriting the XML parsing logic, once we switch to PHP 5.x.

Hao's current Tumblr export code replies on SimpleXMLElement, which is the preferred approach. I don't think it's worthwhile to rewrite it using our existing, old preg_match approach.

I've suggested taking a better alternative approach:

Since Tumblr has simple formats, we can directly parse it's xml and create WordPress posts and categories, thus eliminating the intermediary step of exporting it to a WordPress XML file.

The approach would be similar to wp-admin/import/rss.php

where the following functions are used to create post/category:

wp_insert_post($post); wp_create_categories($categories, $post_id);

I would suggest that we wait till we migrate to PHP 5.x to incorporate the Tumblr export code into wp core.

08/27/08 16:14:34 changed by hailin

Ryan's comment:

preg_match() is subject to the backtrack limits in php 5, one of the

things that tripped us last time we migrated to php 5. I too think it best to wait for php 5 and use a real parser.

Barry's comment:

I think it was subject to that in php4 as well, just that the default backtrack limit in php 5 it was reduced 10x or 100x.

I too think it best to wait for php 5 and use a real parser.

08/27/08 16:20:11 changed by hailin

  • owner changed from anonymous to hailin.
  • status changed from new to assigned.

Matt's comment:

This doesn't mean it shouldn't go in core, just do a detect at the beginning for the needed functions and show a friendly error message if not available. You should drop the patch on a Trac ticket.

Notes by Hailin:

I thought about using XML parsing class like http://www.criticaldevelopment.net/xml/doc.php. However, this violates the principle of keeping core small, and clean.

Let's see how the alternative approach goes (I believe that is better as it eliminates the middle step, and achieves one-click import).

08/27/08 17:16:19 changed by hailin

  • attachment tumblr_export.php added.

export code using XML parser

08/27/08 17:49:05 changed by Detect

Just leaving a comment here to join in on the discussion.

10/15/08 04:17:23 changed by jacobsantos

  • milestone changed from 2.7 to 2.8.