TypoScript

UTF8 and TYPO3 (updated)

To have a real utf8 TYPO3 Installation may be a difficult thing. "Real" in that case means that everything is UTF8: TYPO3 and Database!

To have a working utf8 Database may be difficult, and is not possible in any case (e.g. if you dont have permissions to configure the database.)

Additionaly you have to pay attention if you want to migrate an existing TYPO3 Installation to UTF8, which may be very difficult if you allready have diffrent charsets in your database.

This article would like to provide a deep overview over this subject.

What is charset and UTF8?

Ok just a small information on that topic: Every char is represented just with Bit's and Bytes (=8Bit) in your computer. Every application needs to know the mapping beetween this bitcodes and the characters (=the characterset). Therefore exits diffrent standards like the well known ASCII for example. Most charsets uses one byte for each char, but with such one-byte-charsets it is only possible to code 255 diffrent chars --> this is the reason why there are so many diffrent charsets, because every language may need its own set of chars. The problem gets even bigger if you think of all the chinese and cyrillic languages.

Therefore the Unicode standards were born, in its original intention 3bytes are used to code a char. To be more compatible to other charsets UTF8 was defined: In UTF8 a char could be encoded with one, two or three bytes! The trick is simple: if the first bit of a byte is '0' it means it is a one-byte-char: So it was possible to encode the first 128 chars similar to the well known ISO (or ASCII) charsets.

An example:

  • 'ä' is encoded in ASCII with this bits 11100100 (=228)
  • it is encoded in UTF8 with this bits: 11000011 10100100 (195 and 164)

This means if you interpret this UTF8 char as ASCII, you will get two chars "ä".

Got it?

Collations

Is a set of rules for comparing characters. So a DBMS can sort and compare stringvalues. ( a

TYPO3 Settings

To set UTF8 support in TYPO3 is simple: Just go to the Installtool and set the option forceCharset to "utf-8"

Mysql Settings

This is the difficult part. MYSQL DBMS has 6 diffrent settings for charactersets.

You can see the actual settings by executing the query:

show variables;
-----
 character_set_client            | latin1                          
| character_set_connection        | latin1
| character_set_database          | utf8
| character_set_results           | latin1
| character_set_server            | utf8
| character_set_system            | utf8
| character_sets_dir              | /usr/share/mysql-500/charsets/


Read more: http://dev.mysql.com/doc/refman/5.0/en/charset.html

A.) Configure the mysql charactersets

This needs special rights on the server, you find informations in the mysql-reference. Normaly this has to be done:

Check mysqlserver settings (recompile or start with parameter "--character-set-server" to force utf8)

Check mysqlclient settings: Edit my.cnf and be sure that there is a line like:

[client]
default-character-set=uft8

B.) Force Charset by changing class.t3lib_db.php

Often it is easier to force the charset by executing the query:

SET CHARACTER SET utf8;

So it is neccessary to modify the TYPO3 databaseclass "class.t3lib_db.php" and insert the line:

$this->admin_query('SET CHARACTER SET utf8');

You have to insert this after the mysql_pconnect() round line 897.

It is also possible to use the SQL "SET NAMES utf8", which in addition to the SQL above also sets characterset of the connection. (This may cause problems in some environments). Read more:
http://dev.mysql.com/doc/refman/5.1/de/charset-connection.html
http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html

C.) Force Charset with setDBinit configuration (>TYPO3 4.0)

Since TYPO3 4.0 it is mot necessary to patch the class.t3lib_db.php. You can use the configurationoption "setDBinit":

(Thanks "pavel" for the tip)

Change Charset in an existing project

  1. Open a lhell on your TYPO3 server
  2. Make database-backup using mysqldump:
    mysqldump -u user -p database > backupfile.sql
  3. Drop the extisting database
  4. Create new database
  5. Be sure this new empty database is utf8. E.g. execute:
    ALTER DATABASE databasename DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
  6. Set TYPO3 force_charset Option (see above)
  7. Modify the backupfile if required! (*1)
    1. Change charset to utf8, for example by using the external tool recode:
      RECODE latin1...utf8 backupfile.sql
    2. Change all the "crate table" statements in this dumpfile. You have to replace "CHARSET=latin1" with "CHARSET=utf8". This can be done by using the commandline-tool sed:
      sed 's/CHARSET=latin1/CHARSET=utf8/' backupfile.sql > backupfile_utf8.sql
  8. Insert the changed databasedump:
    mysql -u user -p database < backupfile_utf8.sql

There may be some problems with TemplaVoila mapping or with special chars in some plugins. Normaly this could be solved by recoding the relevant templatefiles.

(*1) Note from ries van Twisk:
"What I wanted to mention is you don't have to recode
a MySQL dump since resent versions of MySQL dump
already dumps in utf-8."

Testing

Go to the backend. First check which charset is selected by the browser, it should be "UNICODE (utf-8)".

Then create a new page with special chars e.g. "ähm übung" and save it.
Go to the Tool "phpmyadmin" and search this record in the table "pages", if you see exactly the same title everthing works fine! (If not go to Mysql settings again :-))

 

 

[...Hope this was helpful...]

blog comments powered by Disqus
  1. Andy 05.05.10 08:50

    Warning! Never use the following because it will create character set problems that are hard to solve:

    SET CHARACTER SET utf8;

    see: http://wiki.typo3.org/index.php/UTF-8_support#TYPO3_Install_Tool_Options

  2. michel rosinski http://www.rosinski.net/ 01.07.08 14:45

    Hallo

    Danke für die gute Anleitung. Ich habe auch mal eine geschrieben für ältere TYPO3 Versionen:

    http://rosinski.net/news/typo3-auf-utf-8-umstellen/

    vielleicht hilft es ja auch....
    LG Michel

  3. marc http://lettv.de/cms/ 11.02.07 01:55

    nänü , hier scheint das ja auch irgendwie nicht zu klappen

    ö,ä,ü,*-=!"§$%&/()

  4. TYPO3 Blogger http://typo3blogger.de/typo3-utf8-und-iso/ 10.12.06 18:40

    Ich denke mal jeder hatte schon einmal Probleme mit dem Zeichensatz und diese bestimmt auch schon mit TYPO3. Jeder kennt das, wenn man die funktionierende Typo3-Umgebung aus dem Testsystem auf den Live-Server schieben soll und siehe da “Alle ä, ...

  5. Daniel Pötzinger 02.12.06 01:18

    Why only Frontend? Makes no sense for me.
    But you can set the Frontend Charset of a site in your typoscript template.
    (Please refer the TSREF was something like config.metaCharset...)

  6. Simon Rönnqvist http://www.arcada.fi 30.11.06 17:21

    Have any idea how to convert an already existing Typo3-site to UTF-8? Only the output to the browser matters.

  7. Kürzlich bin ich mit unserem Intranet, das unter TYPO3 läuft, von einem Windows- auf einen Linux-Server umgezogen. Nun gab es einige Probleme mit dem Zeichensatz (Umlaute wurden nicht korrekt dargestellt etc.) und ich wollte die vorhandene Installati...

  8. Bruno, Webdesign http://www.brunodesign.de 21.11.06 13:02

    I had some problems with some comperable stuff. The local server had the right collation and typo3 was set to utf8. But when importing the data to the online server it had exactly to be told to user collation utf8 and that the server should not transplate them in latin while importing. A very strange behaviour in my opinion.

blogroll