There should be an option to affirmatively change the encoding of the database connection to something other than the database server's default.
Many hosts leave their servers at the default of MySQL which is 'latin1' with the 'sewdidh_ci collation'. This messes up many non-Latin-script blogs.
The current solution is to manually add a "mysql_query("SET NAMES utf8 COLLATE utf8_unicode_ci");" statement or similar in wp-db.php which may later get overwritten when upgrading, which complicates things further.
It may be useful to make the default value of this suggested option to actually be UTF8 since it cannot harm Latin-script languages but will be more straightforward for others. Obviously some other people will still need to change the value of this option to have their language scripts correctly encoded and avoid future problems, but it will be a straightforward process by then, instead to have to annually patch the code.