UTF8 and TYPO3 (updated)

To have a real utf8 TYPO3 Installation may be a difficult thing. "Real" in that case means that everything is UTF8: TYPO3 and Database!

To have a working utf8 Database may be difficult, and is not possible in any case (e.g. if you dont have permissions to configure the database.)

Additionaly you have to pay attention if you want to migrate an existing TYPO3 Installation to UTF8, which may be very difficult if you allready have diffrent charsets in your database.

This article would like to provide a deep overview over this subject.

What is charset and UTF8?

Ok just a small information on that topic: Every char is represented just with Bit's and Bytes (=8Bit) in your computer. Every application needs to know the mapping beetween this bitcodes and the characters (=the characterset). Therefore exits diffrent standards like the well known ASCII for example. Most charsets uses one byte for each char, but with such one-byte-charsets it is only possible to code 255 diffrent chars --> this is the reason why there are so many diffrent charsets, because every language may need its own set of chars. The problem gets even bigger if you think of all the chinese and cyrillic languages.

Therefore the Unicode standards were born, in its original intention 3bytes are used to code a char. To be more compatible to other charsets UTF8 was defined: In UTF8 a char could be encoded with one, two or three bytes! The trick is simple: if the first bit of a byte is '0' it means it is a one-byte-char: So it was possible to encode the first 128 chars similar to the well known ISO (or ASCII) charsets.

An example:

  • 'ä' is encoded in ASCII with this bits 11100100 (=228)
  • it is encoded in UTF8 with this bits: 11000011 10100100 (195 and 164)

This means if you interpret this UTF8 char as ASCII, you will get two chars "ä".

Got it?


Is a set of rules for comparing characters. So a DBMS can sort and compare stringvalues. ( a

TYPO3 Settings

To set UTF8 support in TYPO3 is simple: Just go to the Installtool and set the option forceCharset to "utf-8"

Mysql Settings

This is the difficult part. MYSQL DBMS has 6 diffrent settings for charactersets.

You can see the actual settings by executing the query:

show variables;
 character_set_client            | latin1                          
| character_set_connection        | latin1
| character_set_database          | utf8
| character_set_results           | latin1
| character_set_server            | utf8
| character_set_system            | utf8
| character_sets_dir              | /usr/share/mysql-500/charsets/

Read more: http://dev.mysql.com/doc/refman/5.0/en/charset.html

A.) Configure the mysql charactersets

This needs special rights on the server, you find informations in the mysql-reference. Normaly this has to be done:

Check mysqlserver settings (recompile or start with parameter "--character-set-server" to force utf8)

Check mysqlclient settings: Edit my.cnf and be sure that there is a line like:


B.) Force Charset by changing class.t3lib_db.php

Often it is easier to force the charset by executing the query:


So it is neccessary to modify the TYPO3 databaseclass "class.t3lib_db.php" and insert the line:

$this->admin_query('SET CHARACTER SET utf8');

You have to insert this after the mysql_pconnect() round line 897.

It is also possible to use the SQL "SET NAMES utf8", which in addition to the SQL above also sets characterset of the connection. (This may cause problems in some environments). Read more:

C.) Force Charset with setDBinit configuration (>TYPO3 4.0)

Since TYPO3 4.0 it is mot necessary to patch the class.t3lib_db.php. You can use the configurationoption "setDBinit":

(Thanks "pavel" for the tip)

Change Charset in an existing project

  1. Open a lhell on your TYPO3 server
  2. Make database-backup using mysqldump:
    mysqldump -u user -p database > backupfile.sql
  3. Drop the extisting database
  4. Create new database
  5. Be sure this new empty database is utf8. E.g. execute:
  6. Set TYPO3 force_charset Option (see above)
  7. Modify the backupfile if required! (*1)
    1. Change charset to utf8, for example by using the external tool recode:
      RECODE latin1...utf8 backupfile.sql
    2. Change all the "crate table" statements in this dumpfile. You have to replace "CHARSET=latin1" with "CHARSET=utf8". This can be done by using the commandline-tool sed:
      sed 's/CHARSET=latin1/CHARSET=utf8/' backupfile.sql > backupfile_utf8.sql
  8. Insert the changed databasedump:
    mysql -u user -p database < backupfile_utf8.sql

There may be some problems with TemplaVoila mapping or with special chars in some plugins. Normaly this could be solved by recoding the relevant templatefiles.

(*1) Note from ries van Twisk:
"What I wanted to mention is you don't have to recode
a MySQL dump since resent versions of MySQL dump
already dumps in utf-8."


Go to the backend. First check which charset is selected by the browser, it should be "UNICODE (utf-8)".

Then create a new page with special chars e.g. "ähm übung" and save it.
Go to the Tool "phpmyadmin" and search this record in the table "pages", if you see exactly the same title everthing works fine! (If not go to Mysql settings again :-))



[...Hope this was helpful...]

blog comments powered by Disqus