Make WordPress Core

Opened 17 years ago

Closed 15 years ago

Last modified 15 years ago

#4647 closed defect (bug) (wontfix)

Text in database should not be entity-encoded

Reported by: redsweater's profile redsweater Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.2.1
Component: General Keywords: needs-patch
Focuses: Cc:

Description

I've noticed that some text, e.g. the names of categories and the post_content, are stored in the database with XML-compatible (I think) entity encoding. For instance, the & character is stored in the database as "&"

Other fields, such as the Excerpt and Title for instance, store the same & character verbatim in the field as &.

It seems that for consistency, the text in the database should be of a standardized form. I would vote for not storing entity encoding in the database, as it seems more of a presentational thing.

To observe the issue, just write a test post in which the & character for instance appears in all possible text fields. Then observe the database directly to see what has happened.

This has particularly vulgar affects on the sanity of the text values returned by the XML-RPC interface, which I'll describe in another bug report.

Attachments (1)

default-filters.php.diff (579 bytes) - added by josephscott 16 years ago.

Download all attachments as: .zip

Change History (10)

#1 @foolswisdom
17 years ago

  • Milestone set to 2.4 (future)

#2 @redsweater
17 years ago

I was going to write another bug suggesting that the XML-RPC interface should do something to mitigate the effect this has on XML-RPC clients. But on further thought I think it's probably best that the XML-RPC interface serve as an honest interface to the content in the database. This should be fixed in the code that inappropriately inserts the entity-encoded text in the affected fields.

For example, it turns out that the XML-RPC interface's honesty is double-edged. I can write a post with content "Trial & Tribulation" and submit via XML-RPC, and it goes into the database without the problematic encoding. It's only when entered via the WordPress editor that the problematic encoding occurs.

However, a Category submitted via XML-RPC wp.newCategory does suffer the entity encoding problem, and goes into the database with it.

Long story short, anything that writes text to the database should, I think, take pains to make sure it goes in verbatim, and not entity encoded.

#3 @redsweater
16 years ago

  • Summary changed from Text in database is inconsistently entity-encoded to Text in database should not be entity-encoded

#4 @josephscott
16 years ago

  • Cc josephscott added

#5 @josephscott
16 years ago

This is happening because 'pre_term_name' in wp-includes/default-filters.php is encoding the data before it gets to the database. Removing that default filter fixes the problem for both category and tags.

#6 @djr
16 years ago

  • Keywords has-patch needs-testing added

#7 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; has-patch needs-testing removed
  • Milestone 2.9 deleted
  • Resolution set to wontfix
  • Status changed from new to closed

patch is irrelevant, as this is a huge workflow change... it needs to address the upgrade of everything that gets changed as well.

I'm closing as wontfix, pending a proper patch.

#8 follow-up: @redsweater
15 years ago

What do you mean when you say it needs to address the upgrade of everything that gets changed? Do you mean a user's existing (inconsistently encoded) data?

It seems to me that at least putting an end to the addition of inconsistent data to the database would be a valuable improvement. If Joseph's patch addresses the problem so that new users would not be building inconsistency into their database, it seems useful.

Is it normal policy of the WordPress team to close bugs as wontfix just because there is not currently an acceptable patch? Or is there more to the story of "wontfix"ing this bug than is summarized in the comment above?

Daniel

#9 in reply to: ↑ 8 @Denis-de-Bernardy
15 years ago

Replying to redsweater:

What do you mean when you say it needs to address the upgrade of everything that gets changed? Do you mean a user's existing (inconsistently encoded) data?

yeah.

It seems to me that at least putting an end to the addition of inconsistent data to the database would be a valuable improvement.

it could also introduce lots of issues if the data is not made consistent.

Is it normal policy of the WordPress team to close bugs as wontfix just because there is not currently an acceptable patch? Or is there more to the story of "wontfix"ing this bug than is summarized in the comment above?

understand it as wontfix until a patch that takes care of the upgrade is added, not wontfix ever.

feel very free to re-open the ticket. I closed it because it would have stayed open for another two or three years pending the needed patch.

Note: See TracTickets for help on using tickets.