svnsync tutorial

Scene from magnificent "One Flew Over the Cuckoo's Nest" By Miloš Forman

I was forced to use this tool when I wanted to move my old project repository into a new server (until then, its SVN repository was hosted on a free SVN hosting). The hosting provider forced me to pay for the svnadmin dump, so I searched for an alternative solution and svnsync did the job.

In comparison to svnadmin dump/load, svnsync is migrating subversion repository with no svnadmin access. The migration can be done when you have a working copy (checkout) of the source repository and a clean (svnadmin create) version of the target repository. The repository synchronization is just reading all revisions one by one and copying them into the target repository, unchanged.

First, take a look at this article. Here you can find that a hook needs to be created in the target repository: hooks/pre-revprop-change (removing .tmpl and granting executable permissions for the script). What is not mentioned in the article above is that you should probably put

#!/bin/sh 
exit 0;

as the hooks/pre-revprop-change script content, to avoid the error, as written in another article.

Now, initialize the synchronization (example call below):

svnsync init [target repository] [source repository]
svnsync init file:///home/user/svn/test-repo svn://free-hosting/that-forced-me-to-pay/for-the-svn-dump

and copy all revisions, while being in the working copy of the source repository (example call below again):

svnsync --non-interactive sync [target repository]
svnsync --non-interactive sync file:///home/user/svn/test-repo

Be patient and wait until it's finished. Afterwards, remove the useless hooks/pre-revprop-change script - and you're done!

extending doctrine admin module: filtered sum

Scene from "Breakfast at Tiffany's" by Blake Edwards (1961)

Another post for symfony beginners. I'll show how to display a sum (or any other function) of all elements filtered in an admin module. All elements - meaning the ones displayed on the current page (list pagination) and all the rest which is not currently visible. This is going to be really easy.

action

Let's start with calculations. We need to get the sum of all filtered elements and pass it to the View. As this article is not about MVC design pattern, I'll just override executeIndex action and put the calculations code inside (to make it as easy as possible, though calculations should be done in model, not controller).

class xxxActions extends autoXxxActions
{
  public function executeIndex(sfWebRequest $request)
  {
    parent::executeIndex($request);
    $query = $this->buildQuery()->copy();
    $root_alias = $query->getRootAlias();
    $total_data = $query
      ->limit(0)
      ->select("SUM({$root_alias}.cash_total) AS sum")
      ->fetchArray();
    $this->total_count = $total_data[0]['sum'];
  }
}

As you can see, we've got a Xxx model which holds the cash_total: decimal column, representing a sum of money. The $this->buildQuery()->clone() part does all the magic - we have the query with all filters set by the user in the interface. We will only tell doctrine to calculate the sum of all filtered elements for us ($total_count variable will be available in the indexSuccess.php template). The ->limit(0) clears the SQL limit clause, of course.

templates

This part is boring, actually. We need to override two more files: fetch them from cache and put them in the module/template directory. These files are: indexSuccess.php in which the line

<?php include_partial('event/list', array('pager' => $pager, 'sort' => $sort, 'helper' => $helper)) ?>

should be replaced with:

<?php include_partial('event/list', array('pager' => $pager, 'sort' => $sort, 'helper' => $helper, 'total_count' => $total_count)) ?>

and the _list.php partial which should have few lines added:

<tfoot>
  <tr><!-- added code starts here -->
    <th colspan="6">
      w sumie: <?php echo Tools::priceFormat($total_count, true) ?>
    </th>
  </tr><!-- added code ends here -->
  <tr>
    <th colspan="6">
       <?php if ($pager->haveToPaginate()): ?>
         <?php include_partial('event/pagination', array('pager' => $pager)) ?>
       <?php endif; ?>

As you can see, it's trivial, yet useful.

The sum of all filtered elements is visible in the footer of the doctrine admin module list table, as you can see below, but you can put it anywhere you want (as long as it's in the index action templates):

other functions

Of course, you can use other aggregate functions, such as average, minumum or maximum element - just take a look at the MySQL documentation. You may also create your own functions.

symfony basics: form default values

This is some stuff for symfony beginners, who still want to learn symfony 1.4. You may set default form values for all kind of forms (including doctrine forms). Set one default value at a time:

class XxxForm extends BaseXxxForm
{
  public function configure()
  {
    $this->setDefault ('field', 'value');
  }
}

or set a whole array of them:

public function configure()
{
  $this->setDefaults(array(
    'field_1' => 'value_1',
    'field_2' => 'value_2',
    // ...
    'field_x' => 'value_x'
  ));
}

default values for new objects

Sometimes you want to set the default form values just before the object is created, because it'd be easier for the aplication user to fill in some data. For example, the owner/author of a blog post may be set default to the current logged in user - or the date of an event may be set to now - and so on. This can be achieved with the isNew method of the doctrine form class (lib/form/doctrine/XxxForm.class.php):

if ($this->isNew())
{
  $this->setDefault ('created_at'¹, date('Y-m-d 00:00:00'));
  $this->setDefault ('created_by'², sfContext::getInstance()->getUser()³->getId());
}

note ¹: Timestampable Doctrine behavior used in this example,
note ²: Signable Doctrine behavior used in this example,
note ³: check this blog post to avoid using sfContext::getInstance().

moreover

You may implement whatever complex conditions you want your doctrine form to follow. Look at some of examples below:

current time - php time function
language/localization (default country when registering a new user) - use Accept-Language HTTP
default settings set for a registered user - fetch individual user settings from database (doctrine query)
last used item (category/product/etc.) - a user inserts or updates a large amount of data, when he choses a specific item (category/product) it can be saved in its session ($user->set/getAttribute()) - when another record is processed, last used item is used as default (which, again, lowers time needed for the user to work)

A well designed interface includes lots of form default values, so that users don't have to waste their time on picking up the same values over and over again.

symfony dynamic max_per_page

max_per_page

In this post I'll show a very easy and a really useful thing. It is dynamic max_per_page value of the list pager. Such feature gives you the possibility to change the number of elements displayed in a list just by one click. It can be used both in the frontend and backend (the pager is the same) - I'll use the doctrine generated admin module (and the interface will be placed inside the filters box): interface allowing backend user to change the max_per_page value in the 'filters' box

templates

First, let's make it visible. Add the following entry to application config/app.yml file:

all:
  const:
    max_per_page: [ 10, 25, 50 ]

Now we may refer to app_const_max_per_page config value which holds few standard max_per_page values to be used (this can be used in many different admin modules). Let's say we've got an admin module for our custom MyModel model. Now, override the cached _filters.php template: fetch it from cache/admin/dev/modules/autoMyModel/templates and put it in apps/APP/modules/my_model/templates. Take a look at the following part of the code:

          </td>
        </tr>
      </tfoot>
      <tbody>
        <!-- insert here -->
        <tr>
          <td colspan="2">

and insert few lines of code (replace the comment) to get the following:

          </td>
        </tr>
      </tfoot>
      <tbody>
        <tr>
          <td colspan="2">
            set maximum elements per page:
            <?php foreach (sfConfig::get('app_const_max_per_page') as $value): ?>
              <span class="max_per_page_selector"><?php echo link_to($value, 'my_model/setMaxPerPage?max='.$value) ?></span>
            <?php endforeach; ?>
          </td>
        </tr>
        <tr>
          <td colspan="2">

controller/action

The interface to change max_per_page is ready, so we have to improve the controller now. Let's add an action which stores the number of elements to display per page in user session (symofny has nice set(get)Attributes methods). So here it comes:

  /**
   * Sets my_model list's max per page config value, using user session
   * attribute.
   *
   * @param sfWebRequest $request
   */
  public function executeSetMaxPerPage(sfWebRequest $request)
  {
    $this->getUser()->setAttribute('my_model.max_per_page', $max = $request->getParameter('max'));
    $this->getUser()->setFlash('notice', 'max_per_page has been set to: '.$max);
    $this->redirect('@my_model');
  }

configuration generator

And finally, tell the pager to look for the custom value each time the list is going to be rendered. We need to override the method in configuration generator of the admin module:

class my_modelGeneratorConfiguration extends BaseMy_modelGeneratorConfiguration
{
  /**
   * Returns max_per_page config value for my_model module. If it's not
   * defined manually by the user, default value is returned.
   *
   * @return Integer
   */
  public function getPagerMaxPerPage()
  {
    if ($max = sfContext::getInstance()->getUser()->getAttribute('my_model.max_per_page'))
      return $max;
    else
      return parent::getPagerMaxPerPage();
  }

It's all as easy as it could be. The controller searches the servers for the current user session data and returns either the custom data (if found) or the default value (which is taken from the generator.yml file).

Unfortunately, the sfContext::getInstance() is used here (this causes a lot of problems when the default context problem occurs). After a quick look I didn't find the better way to access the user from the configuration generator (but if you know how to - let me know ;).

(I'm wondering why it's not built in into symfony).

doctrine act as signable plugin - new releases

After few months, new versions 1.2.2 and 1.2.3 of sfDoctrineActAsSignablePlugin have been released. In short, the plugin provides a Signable behavior which automatically stores information on who has created or updated a given object.

what's new

A fellow symfony developer, Daniel Möllenbeck, suggested that there are some options missing in the behavior configuration. Until version 1.2.1, the onDelete option was hardcoded as CASCADE, which - as Daniel emphasised - may cause problems when a given user is supposed to be deleted (physically from the database, not softDeleted).

ALTER TABLE customer
ADD CONSTRAINT customer_created_by_sf_guard_user_id
FOREIGN KEY (created_by)
REFERENCES sf_guard_user(id)
ON DELETE CASCADE;

ALTER TABLE customer
ADD CONSTRAINT customer_updated_by_sf_guard_user_id
FOREIGN KEY (updated_by)
REFERENCES sf_guard_user(id)
ON DELETE CASCADE;

Let's imagine user enters customers over and over, then leaves the company. Now the admin deletes the user - probably the hard way (directly on the database, maybe the user is auto-created from somewhere...) --> The constraint will trigger the deletion of all customers created by that user. I have some doubt anybody would be happy about that.

Having the constraint like that does the trick:

a) allow NULL values in created_by/updated_by columns:

ALTER TABLE customer
ADD CONSTRAINT customer_created_by_sf_guard_user_id
FOREIGN KEY (created_by)
REFERENCES sf_guard_user(id)
ON DELETE SET NULL;

ALTER TABLE customer
ADD CONSTRAINT customer_updated_by_sf_guard_user_id
FOREIGN KEY (updated_by)
REFERENCES sf_guard_user(id)
ON DELETE SET NULL;

b) forbid the deletion of the user:

ALTER TABLE customer
ADD CONSTRAINT customer_created_by_sf_guard_user_id
FOREIGN KEY (created_by)
REFERENCES sf_guard_user(id)
ON DELETE RESTRICT;

ALTER TABLE customer
ADD CONSTRAINT customer_updated_by_sf_guard_user_id
FOREIGN KEY (updated_by)
REFERENCES sf_guard_user(id)
ON DELETE RESTRICT;

c) do nothing

moreover

Another fellow symfony developer, Christoph Berg, helped me to trace the bug with fixtures on created_by/updated_by values. Now, the bug is fixed and all fixture data works perfectly.

the community

Taking the opportunity, I'd like to thank Daniel, Christoph and all other symfony developers who share their opinions on my work - they help to find bugs and suggest some good ideas on how to improve the code. Thanks to you, the plugin gets better and better all the time. Thanks, guys!

Please, feel free to share your opinion and comment on the plugin!

symfony custom configuration files

dynamic and cross-application configuration

If your project gets very big and it has several applications, you may want to create cross-application configuration - store it in one place and let all applications use it (no matter how big it is). Or simply if your configuration is becoming really big and you want to arrange it somehow, custom config files is exactly what you are looking for!

how to do that

That's only three easy steps. First, let's create a custom configuration YAML file (config/something.yml, in the project main directory, not the app config directory) and put some data there:

prod:
  test:  ok

dev:
  test: ko

all:
  foo:  bar
  john: doe

The second step is to create/update a config handler YAML file (config/config_handlers.yml) with following example content:

config/something.yml:
  class:    sfDefineEnvironmentConfigHandler
  param:
    prefix: something_

The last step is to register the configuration file in all applications you want (after this step, the configuration will be available for an application). Use the application configuration class (e.g. apps/frontend/config/frontendConfiguration.class.php):

public function configure()
{
  ....
  require_once($this->getConfigCache()->checkConfig('config/something.yml'));
}

Now all config options defined in the new file are available in the application with the _something prefix. The best way to check if the new file is available for the application is to run it in the dev mode in the browser and check the web debug toolbar, config icon, settings section. All configuration available for the application is displayed there. If you followed all three steps above, new file configuration should be available.

sfAdminDashPlugin trick

This plugin provides an easy-to use menu divided into sections, which is configurable in YAML files. If you want to have different menus for different applications, you just have to put all configuration in the application config (by default, it's stored in the plugin config). One way is to put all menu config in the app.yml file, but, as mentioned before, it can make the app.yml file grow to an enormous size, so a better solution is to create a menu.yml file, which is loaded by the config handler. Everything is done the same way as before:

create apps/APP/config/menu.yml file and define sfAdminDash menu there
create or update (if exists) the config handler file (apps/APP/config/config_handlers.yml)
finally, update the APP configuration class (apps/APP/config/APPConfiguration.class.php) by registering new config file

I hope some of you will find this flexible configuration useful :).

Doctrine Event Listeners vs symfony fixtures

introduction

This is just an easy tip on how to use symfony fixtures along with doctrine event listeners in a symfony project.

event listeners

Doctrine Event Listeners are the methods you define to trigger an activity always after/before a Doctrine Record is created, updated, deleted, etc. These can be: postInsert, preInsert, postDelete, preDelete and so on. They can be very useful to keep the code clean and logical. A good example of usage is uploading an image and creating many different files with different sizes, representing the same image (custom thumbnails). This could be done with overriding ImageForm upload mechanisms, but postInsert seems more clean and easier to maintain if uploading is done using more than one symfony form class.

fixtures

Fixtures are a built-in symfony feature which enables you to easily populate the database with some data. It is usually test data, but for some projects, some initial data can be held in fixtures (e.g. initial configuration values). They should be used in all projects, since it's a lot easier to spot any bug using data that pretends to be real.

the problem

The problem occurs when you combine those two features. Imagine you've got two models in your projects, A and B, related 1:1 (e.g. sf_guard_user with sf_guard_user_profile from sfDoctrineGuardPlugin). For each A record there has to exist the B record. This means, when the mother record is created (suppose A is the master model), the B record needs to be immediately created and related to A. A Doctrine Event Listener is what should be used - a A::postInsert() that creates the B record.

So far, so good - but we still want to use symfony fixtures. Let's say, we want to define 10 sample A records along 10 sample B records. So we write a fixture file with 10 A and 10 B fixtures. Next, we fire the

./symfony doctrine:build --all --and-load

command and what do we find? B records are doubled! And this can be quite difficult to spot!

Each time when a A fixture is saved, the postInsert is automatically executed, therefore new B record is created. So instead of creating records in the following order: A,A,A,...(10),B,B,B,...(10), we get A,B,A,B,A,B,...(10),B,B,B,...(10), having 10 A records and 20 B records. And this is not what we wanted. Fortunately, this can be quite easily fixed.

the solution

Just add one line to the Doctrine Event Listener:

public function postInsert($event)
{
  if ('cli' != php_sapi_name())
  {
    // some code here
  }
}

We're basing on the fact, that fixtures are loaded from a task, which uses command line interface (cli). Now, no fixtures use Doctrine Event Listeners.

data migration with symfony & doctrine

Scene from "Apocalypse Now" by Francis Ford Coppola (1979)

Diving into 'outsourcing applications with symfony': migrating data

This post was inspired by Fizyk's comment on the outsourcing applications with symfony article. Fizyk suggested that it would be a good idea to wrap all data migration functionalities into a symfony task(s). Basically, migrating data between different versions of the same project is the topic I want to focus on in this article.

new version of a project, written from scratch

In the previous article, we've been discussing developing a new version of an old project from scratch (e.g. because the old one was so inflexible that it was unable to be extended). The first thing we should do is to design new database structure that will hold all data that the present project holds (I assume that the database structure will have to be different, because if it was good, no one would redevelop entire project from scratch). Depending on your luck, it can be an easy task or a nightmare [I've already had redeveloped 2 projects with databases that were not even in the first normal form... yes, there are projects like this. Check this out: one of these projects were made by a really bad developer who never used any type of version control system. He used no development environment (only production) and no localhost machines to test new features before they are added. He made all modifications on the production server, even if it took 2 weeks until they were finished. And there was only FTP access on the server... Oh, yes, there ae projects like that... and you may be asked to improve such projects :)].

data migration

Now when we have two parallel database structures, we need to copy all data from the old one to the new one. Depending on the data (its size, structure, relations, constraints, etc.), I use one of the following:

pure Doctrine (Doctrine in - Doctrine out)
mixed (Doctrine in - SQL out)
pure SQL (SQL query)

For each data structure, I choose one of the above migration types very carefully (detailed descriptions below). If the project is big (and I assume this is the case), the development phase takes a lot of time. It's very important that you can run entire migration job:

as fast as possible: this enables you to run the migration very often, because it doesn't take too much time. And this gives you the possibility to improve the scripts, test them carefully (each mistake made during the migration will be painful when new project is started in production environment). And, finally, you can download the production (old project) database and migrate it to the new structure to compare the frontend of both projects - this helps to develop the business logic,
with wrapped interface: click once - make yourself a cup of tea - come back in 10 minutes - click again - go out with your dog - come back in 15 minutes - the migration is done. Or otherwise: imagine you have a project with 50 tables and you have to run 50 actions (e.g. by a click), each one for each table - or call 50 symfony tasks, not to mention that you need to call them in appropriate order (because data is related in RDBMS) and you have to watch out not to break the order. Of course, there has to be 50 small jobs, deep inside the migration system, but your interface should do the work for you automatically (check an example below).
avoid repeating the same manual modifications. This is difficult to discuss in general - I'll give an example. I've had an action table which stored users activity. The old project database had a record which stated that user X has replied to topic Y with post Z. And neither topic Y nor post Z existed in the database (again, if the project is so bad that it has to be redeveloped from scratch, such bad mistakes are very common). Of course, the migration system failed each time when symfony tried to insert a record that was related to a non-existent record. What to do?
- bad solution: remove this action record from you old database local copy (you download the old project database from the production server, load it to your SQL localhost machine and everythng is 100% safe)
- good solution: remove this record from production database, because it already points some stupid and non-existing stuff.
Why to do so? Because if such mistakes were made in the old project (data inconsistency), there can be quite a lot of such paniful action table, which will make a single migration job really long. And after you have done the whole migration 2 or 3 times, it'll be enough or dumb work and you won't want to do it any more. Again, this is only a simple example. Each situation will probably need a different solution

There is one more important thing: no matter how complicated your migration job is, always create a detailed list of steps to make, describing what is being done at the moment, why such order and what is already migrated/what is still waiting to be migrated. With such list, performing data migration is really easy and unstressful :)

Here you have an example of a migration job order from a project I did recently (all .sql files are 'pure SQL' migrations):

where the action http://polonia/backend_dev.php/migrator/migrateAll calls the following (these are the only 'pure Doctrine' migrations):

MigrationManager::migrateConfig();
MigrationManager::migrateForbiddenPhrases();
MigrationManager::migrateQuotes();
MigrationManager::migrateArticles();
MigrationManager::migrateReadingCategories();
MigrationManager::migrateReadingTextbooks();
MigrationManager::migrateCountries();
MigrationManager::migrateRegions();
MigrationManager::migrateSubpages();

pure Doctrine migration

This is the only code example in the outsourcing applications with symfony article. And this is probably the only migration type that would be wrapped in a symfony task, as Fizyk suggested. This is really easy - just fetch data from the old system and save it in the new system.

used for small amounts of data

Below is another example of this data migration technique - private messages sent between users of a social website:

static public function migrateMessages()
  {
    $wiadomosci = Doctrine_Query::create()
      ->from('Wiadomosci w')
      ->orderBy('w.wiad_id')
      ->fetchArray();
 
    foreach ($wiadomosci as $row)
    {
      $message = new Message();
      $message->setId($row['wiad_id']);
      $message->setTitle($row['temat']);
      $message->setText($row['tresc']);
      $message->setFromId($row['wiad_od']);
      $message->setToId($row['wiad_do']);
      $message->setDisplayed($row['przeczytane']);
      $message->setProvoke($row['zaczep'] == 't');
      $message->setCreatedAt($row['kolumna_data']);
      $message->setUpdatedAt($row['kolumna_data']);
      $message->save();
    }
  }

pure SQL migration

You may ask, why native SQL INSERT queries are constructed to be executed immediately, instead of using Doctrine? The answer is obvious: time difference. Doctrine is nice, but in fact, it is really slow. The problem with pure Doctrine migration is that it takes a lot of time to load all data with PHP/Doctrine and then to insert it into new structure. The table from the example above held over 50'000 records. At my local machine, it took several minutes to finish the job. It's too much (data migration should be as fast as possible). The solution to this problem is simply to create a sql script that will do the job thousand times faster:

INSERT INTO NEW_DB.message (
  id,
  title,
  text,
  from_id,
  to_id,
  displayed,
  provoke,
  created_at,
  updated_at
)
SELECT
  wiad_id,
  temat,
  tresc,
  wiad_od,
  wiad_do,
  przeczytane,
  IF(zaczep = 't', 1, 0),
  kolumna_data,
  kolumna_data
FROM OLD_DB.wiadomosci
ORDER BY wiad_id

Sometimes the case may be even worse. The action table, mentioned before, held over 400'000 records and the PHP/apache configuration would need really big limits (and the migration would take several hours to finish). The following is another sql script, migrating action data:

INSERT INTO NEW_DB.action (
  created_at,
  created_by,
  type_id_external,
  target_profile_id,
  forum_post_id,
  forum_topic_id
)
SELECT kolumna_data, u_id,
CASE akcja
  WHEN "profil" THEN 1
  WHEN "logowanie" THEN 2
  WHEN "wylogowanie" THEN 3
  WHEN "pm" THEN 4
  WHEN "zaczep" THEN 5
  WHEN "post" THEN 6
  WHEN "watek" THEN 7
END,
  IF (akcja = 'profil' OR akcja = 'pm' OR akcja = 'zaczep', akcja_id, NULL),
  IF (akcja = 'post', post_id, NULL),
  IF (akcja = 'watek', akcja_id, NULL)
FROM OLD_DB.akcje ORDER BY kolumna_data ASC

Pure SQL migration may be a little bit more complicated (processing data in PHP is easier) - but you'll surely save a lot of time.

used for huge amounts of data, as this can be the only possibility to migrate it

mixed migration: Doctrine IN / SQL OUT

Sometimes it can be impossible to create a magnificent "INSERT INTO SELECT FROM" query. PHP, although being as slow as a dead snail (in comparison to SQL database management systems), still gives more possibilities just because being an imperative programming language. The idea of this migration type is to fetch the old system data using Doctrine, generate the SQL code and execute it. You may execute such generated script in any way you like - db console, tools like phpmyadmin or even forcing Doctrine to execute raw SQL. If someone prefers, this can be also done as a symfony task - but if the project is really big, you don't benefit from wrapping such mechanism into a symfony task.

To fully understand the idea of this data migration type, take a look at the example below:

static public function getInexistentProfileIds()
  {
    $u_ids = Doctrine_Query::create()
      ->from('Uzytkownicy u')
      ->select('u.u_id')
      ->orderBy('u.u_id')
      ->fetchArray();
 
    $ids = range(1, $u_ids[count($u_ids)-1]['u_id']);
 
    foreach ($u_ids as $u_id)
      unset($ids[$u_id['u_id'] - 1]);
 
    return $ids;
  }

static public function generateSqlToRebuildUsers()
  {
    $ids = self::getInexistentProfileIds();
    $sql = 'INSERT INTO profile (`id`) VALUES ('.implode('), (', $ids).')';
    echo $sql;
    echo '<hr />';
    $sql = 'INSERT INTO sf_guard_user (`id`, `email_address`, `username`,`is_active`) VALUES (';
    foreach ($ids as $id)
      $sql .= $id.',"NULL-'.$id.'","NULL-'.$id.'", 0), (';
    echo $sql;
  }

The reason for doing mixed migration in this case is data inconsistency in the old project. Users were creating forum topics and posts. And when a user was deleted, posts and topics still referred to it. Migrating forum data returned dozens of errors, all of them was failing constraint - assigning topic/post to a non-existent user. The solution was to create dead user records (with deleted_at set), just for the SQL not to break the constraint. The first method fetches all IDs of the users that has been deleted, the second part generates the SQL code creating those dead records into the database. This is only an example, of course, and the example above may be found controversial :), but, believe me, migrating an entire system (which was really, really badly developed) is a difficult task itself, the time you have is limited and sometimes you are forced to use somehow brutal solutions. Anyway, it's up to you how you want to grab the generated SQL code. For me it was the easiest way to display it in the browser, copy-paste to phpmyadmin and execute. But you can generate a sql file, wrap it into a symfony task or do whatever you think of. The main idea is just to make the data migration faster.

and that's it

I hope that some of my hints will be useful if you ever need to migrate data between different versions of the same project. Comments are welcome :)

faster queries: indexing tables

When designing complex web applications, you have to pay attention to the project performance to make the framework handle your request as fast as possible. This involves optimising client side (clean CSS, clean HTML, fast Javascript, etc.) and server side (caching templates and queries, usage of the database and many others). We will concentrate on the database here. In short, the database should have such structure that all information fetched to handle any request should be accessible very fast. This short article will show you few facts and tricks about symfony projects performance.

built-in foreign key management

One of the brilliant features in symfony is creating indexes for foreign keys by default. This saves a lot of time for the developers and, surely, leverages the overall performance of all symfony applications. Below is an example schema:

Book:
  actAs:
    Timestampable: ~
    SoftDelete: ~
  columns:
    category_id:
      type: integer
      notnull: true
      comment: "kategoria książek"
    title:
      type: string(255)
      notnull: true
      comment: "tytuł"
    author:
      type: string(255)
      comment: "autor"
    description:
      type: string
      comment: "opis"
  relations:
    Category:
      class: BookCategory
      local: category_id
      foreign: id
      foreignAlias: Books

BookCategory:
  actAs:
    Timestampable: ~
    SoftDelete: ~
  columns:
    name:
      type: string(255)
      notnull: true
      comment: "nazwa"

Such schema will generate the following SQL code. Note that the book.category_id column is indexed (faster queries) and constrainted (no data inconsistency) at the same time, automatically:

CREATE TABLE book_category (id BIGINT AUTO_INCREMENT ...;
CREATE TABLE book (id BIGINT AUTO_INCREMENT ...
  INDEX category_id_idx (category_id) ...;
ALTER TABLE book ADD CONSTRAINT book_category_id_book_category_id
  FOREIGN KEY (category_id) REFERENCES book_category(id);

Obviously, you may create your custom indexes (and this will be discussed later). This section from The symfony and Doctrine book covers the topic of custom indexes.

optimising MySQL queries

Before you do anything to speed up your queries execution, you need to know what queries you have in your system. The obvious thing is to look at the powerful web debug toolbar. It's a great tool, but it won't tell you what to do when executing your queries takes too long. But it can point out which queries are definitely poorly written (they are highlighted with orange color then). Then it's time for you to solve the problem. Often, it may happen that you need to join tables (more about this is written in "less doctrine queries" article).

If the number of queries cannot be limited, probably you may need to add custom indexes on your tables. An index is an additional structure, bound to a table, that speeds up selecting the appropriate results (there are lots of good tutorials on this topic, such as the tizag tutorials). The database server, when executing any query, looks for the best structure that can be used to serve the results as fast as possible. We can ask our database server to analyse given any query to tell us how is it going to be executed. And the best tool to visualise this is the EXPLAIN statement (short reference here). We will optimise a heavy query executed on the homepage of a social website, using explain and adding custom index.

example - social website homepage problem

The manager of the social website wants the developers to emphasise the users who are the most active ones. For example, he wants to display last logged users at the homepage. The developers figured out that they need to create an action table that will store actions performed by users. Action and profile tables are related to each other - a simple JOIN will be used each time when the homepage action is executed: last logged x profiles are fetched from the database and displayed then.

The website has been set off. Many users have registered and the action table is growing bigger and bigger every day. After few months, it has over 300'000 records. The manager is very happy that his project is becoming popular, but he noticed that the homepage is being served few seconds slower than it was in the beginning. The developers tell him that they didn't run high performance tests and they have to spend some time on optimisation. The manager is not pleased that it was not considered before.

NOTE: always use test data when focusing on project performance

Symfony has a built-in fixture mechanism which allows you to easily generate lots of different data (see the jobeet tutorial). This is essential when you want to make sure that your project will manage with big popularity. Anyway, if you decide to generate really big amount of data, do NOT use any ORM. It consumes too much memory and generating fixtures takes a lot of your precious time. I'd suggest to generate raw SQL INSERT queries instead - they'll be a lot faster.

Okay, let's move on. Once you have got lots of data (either real or test), execute each action - one after another - and check its performance. First thing you should look at is the mentioned web debug toolbar in the top right corner of the screen when running dev application environment. You should be worried, when you see something like the following:

There is a big problem: at least one of the queries is unoptimal (orange color) and as a consequence, executing this action takes too much time (imagine, almost 5 secs per one execution is really long and it doesn't matter that I'm using my personal computer for testing). Left click on the query icon (the one to the right):

One query takes almost 4 seconds to be executed. This surely causes a serious performance problem! Don't panic, just let your database server analyse the query for you:

EXPLAIN
  SELECT a.id AS a__id, a.created_by AS a__created_by
  FROM action a
  LEFT JOIN profile p ON
    (a.created_by = p.id AND p.deleted_at IS NULL )
  WHERE (
    a.type_id_external = '2'
    AND p.avatar_id IS NOT NULL
    AND p.mode =4
  )
  ORDER BY a.created_at DESC;

Here we can see, that the query has to check at least 1690 p (profile) table rows. And each profile record stores a lot of text data, describing each website user. All this makes the query take such long time to execute. If we want to speed it up, we just have to read carefully the query and concentrate on all columns used (and the order of their appearance). The solution is to find the best index (this topic may be quite complex and independent on the framework you use - ask google about indexing database tables/optimising databases - and read those articles carefully).

In this case, the developers executed the following line in MySQL:

ALTER TABLE profile ADD INDEX deleted_at_idx (deleted_at);

which created an index on the deleted_at datetime column. Thanks to this index, the EXPLAIN command shows that only 10 different rows in the profile table have to be analysed for the query to be executed. And this is a great success - the execution time went down to 0,01 second. Imagine the 4 seconds difference for each homepage display. This is the benefit from optimising project databases.

By the way, I cannot understand why the deleted_at column in the SoftDelete behavior is not indexed by default, especially when you can turn on automatic not-null-check (sql callback):

$manager->setAttribute(Doctrine_Core::ATTR_USE_DQL_CALLBACKS, true);

Provided this line is present in the project configuration, each query that fetches a SoftDelete model will automatically add the "WHERE deleted_at IS NOT NULL" part. It's obvious that such column has to be indexed - the index can be complex though - and the deleted_at can be the last column in this index - but, anyway, default index on deleted_at is a good idea! As you can see, you have to pay attention to what queries are executed inside your projects!

Note: database server different versions use different indexes

Different database server versions may use totally different indexes to execute the same queries on the same database structure! Make sure you run your performance test in the (future) production environment. Otherwise, you may find your application execute unoptimised queries on the production server even if you spent a lot of time optimising it in your local dev machine.

In the example above, it turned out that the production server has a different db server than the developer's local machine. The developer didn't check it - he was not aware of the differences and their negative impact on project's performance. The index that has been built is useless in the production environment (so it should be deleted, because inserting each row is slowed down by this index). Moreover, it happedened, that the new index the developers needed to speed up the query should be build on the action table... pay attention to database server versions you work on!

how many indexes to create

Table indexes are really helpful and they speed up database performance. The more complex your application is, the more queries may be executed. In other words, the bigger your application is, the more queries it may need to provide good performance. But look out - do not create too much indexes and never create any index when you are not sure that it is used somewhere. Why? It's very simple - each index is an additional structure which uses some space and needs some time to be created and managed. When a record is inserted, updated or removed, each index has to be updated accordingly to the change made, which surely consumes time. If you create too many indexes, you may simply slow down your database. For example, each user login inserts new action record - then an action table with 10 indexes will be slower than an action table with only 3 indexes).

action

templates

other functions

default values for new objects

moreover

max_per_page

templates

controller/action

configuration generator

what's new

moreover

the community

dynamic and cross-application configuration

how to do that

introduction

event listeners

fixtures

the problem

the solution

Diving into 'outsourcing applications with symfony': migrating data

new version of a project, written from scratch

data migration

pure Doctrine migration

pure SQL migration

mixed migration: Doctrine IN / SQL OUT

and that's it

built-in foreign key management

optimising MySQL queries

example - social website homepage problem

NOTE: always use test data when focusing on project performance

Note: database server different versions use different indexes

how many indexes to create

further reading: this, this and this

Blog Archive

Followers