CakePHP and Character Set in the Database

Today I found something in the CakePHP which saves me a lot of effort when I need to migrate the database as well as searching for non-English characters from the Database. What is the story:
I had 2-3 projects build with Cake which were for non-English environments. That means the localization of the projects as well as storing proper data into DB with proper encoding. The current project is the same.

In these projects I saw that the data was not stored properly in the MySQL. From the Cake project interface everything looks ok, but when I open phpMyAdmin – the chars looked like strange symbols /not ??? neither ӗ&୶/. So far I ignored this because as far it is working from CakePHP interface for me everything was ok.

Until today when I need a Search functionality which should get results even if the users searched by lowercase or uppercase. The example is: if you have an entry in the database “CakePHP” and when you searching for “cakephp” or “CAKEPHP” both results to extract “CakePHP” from the database. Looking pretty easy, right? Well, without the proper encoding was quite difficult. How I solve that problem.

Here is the solution:
In the config/database.php file as you probably know there are 2 connections set and they look something like this:

class DATABASE_CONFIG {

    var $default = array(
        'driver' => 'mysql',
        'persistent' => false,
        'host' => 'localhost',
        'login' => 'user',
        'password' => 'password',
        'database' => 'project_name',
        'prefix' => ''
    );

    var $test = array(
        'driver' => 'mysql',
        'persistent' => false,
        'host' => 'localhost',
        'login' => 'user',
        'password' => 'password',
        'database' => 'project_name-test',
        'prefix' => ''
    );
}

When I check the DBO file, I noticed that there is a missing node in these arrays. It’s optional, but I think it’s good to be placed even as empty variable when the default configuration is downloaded from svn.

 'encoding' => 'utf8'

This trigger a function which execute a query like this

SET NAMES {encoding}

after the connection is established, where encoding in my case is UTF8. With this the data is stored properly.

the new configuration will look like this:

class DATABASE_CONFIG {

    var $default = array(
        'driver' => 'mysql',
        'persistent' => false,
        'host' => 'localhost',
        'login' => 'user',
        'password' => 'password',
        'database' => 'project_name',
        'prefix' => '',
        'encoding' => 'utf8'
    );

    var $test = array(
        'driver' => 'mysql',
        'persistent' => false,
        'host' => 'localhost',
        'login' => 'user',
        'password' => 'password',
        'database' => 'project_name-test',
        'prefix' => '',
        'encoding' => 'utf8'
    );
}

Now the data is stored properly in the database and it’s possible to use functions in MySQL such as UPPER(), LOWER() etc.

23 thoughts on “CakePHP and Character Set in the Database

  1. Pingback: Johan babblar på. » Svenska tecken i CakePHP

  2. Pingback: Dummy’s guide for converting character set of a web application « memento

  3. TomNo Gravatar

    thx for a very nice post.
    i’ve been struggling with UTF-8 for quite a while already. however i still not sure about the different b/w utf8, UTF-8 utf-8

    any1 has any idea of which one we’d use?

  4. Nik ChankovNo Gravatar

    Tom, basically this is the syntax of the database format. if you look in the code of MySQL Database connector you will see there is a function:

    function setEncoding($enc) {
      return $this->_execute('SET NAMES ' . $enc) != false;
    }

    which do the magic 🙂
    So UTF8 or utf8 – it doesn’t matter in this case, about utf-8 – it is the HTML encoding

    Hope this make sense

  5. Pingback: Missing Features » Using UTF8 In Your CakePHP App

  6. WhyNotSmileNo Gravatar

    Thanks for this – solved my problem exactly! For ages I’ve been trying to get Cake (well, my website!) to allow me to enter text in Hebrew, and to save it properly and so on. I was getting really confused, but it turned out to be fairly simple – just setting the encoding to UTF-8. This piece was the final piece of the puzzle – storing the data in mySQL.

    Thank yuo!

  7. SvenNo Gravatar

    Thanks a lot! I wrote ‘encoding’ => ‘UTF-8’, instead of ‘encoding’ => ‘utf8’, – finally you made my day.

  8. RobertNo Gravatar

    Thanks Nik

    Like lots of other commenters, you saved me some time and frustration whilst trying to read in multiple languages without ending up with character soup.

    Thanks again!

Leave a Reply

Your email address will not be published. Required fields are marked *