Symfony World blog is not maintained anymore. Check new sys.exit() programming blog.

faster queries: indexing tables

When designing complex web applications, you have to pay attention to the project performance to make the framework handle your request as fast as possible. This involves optimising client side (clean CSS, clean HTML, fast Javascript, etc.) and server side (caching templates and queries, usage of the database and many others). We will concentrate on the database here. In short, the database should have such structure that all information fetched to handle any request should be accessible very fast. This short article will show you few facts and tricks about symfony projects performance.

built-in foreign key management

One of the brilliant features in symfony is creating indexes for foreign keys by default. This saves a lot of time for the developers and, surely, leverages the overall performance of all symfony applications. Below is an example schema:

    Timestampable: ~
    SoftDelete: ~
      type: integer
      notnull: true
      comment: "kategoria książek"
      type: string(255)
      notnull: true
      comment: "tytuł"
      type: string(255)
      comment: "autor"
      type: string
      comment: "opis"
      class: BookCategory
      local: category_id
      foreign: id
      foreignAlias: Books

    Timestampable: ~
    SoftDelete: ~
      type: string(255)
      notnull: true
      comment: "nazwa"

Such schema will generate the following SQL code. Note that the book.category_id column is indexed (faster queries) and constrainted (no data inconsistency) at the same time, automatically:

  INDEX category_id_idx (category_id) ...;
ALTER TABLE book ADD CONSTRAINT book_category_id_book_category_id
  FOREIGN KEY (category_id) REFERENCES book_category(id);

Obviously, you may create your custom indexes (and this will be discussed later). This section from The symfony and Doctrine book covers the topic of custom indexes.

optimising MySQL queries

Before you do anything to speed up your queries execution, you need to know what queries you have in your system. The obvious thing is to look at the powerful web debug toolbar. It's a great tool, but it won't tell you what to do when executing your queries takes too long. But it can point out which queries are definitely poorly written (they are highlighted with orange color then). Then it's time for you to solve the problem. Often, it may happen that you need to join tables (more about this is written in "less doctrine queries" article).

If the number of queries cannot be limited, probably you may need to add custom indexes on your tables. An index is an additional structure, bound to a table, that speeds up selecting the appropriate results (there are lots of good tutorials on this topic, such as the tizag tutorials). The database server, when executing any query, looks for the best structure that can be used to serve the results as fast as possible. We can ask our database server to analyse given any query to tell us how is it going to be executed. And the best tool to visualise this is the EXPLAIN statement (short reference here). We will optimise a heavy query executed on the homepage of a social website, using explain and adding custom index.

example - social website homepage problem

The manager of the social website wants the developers to emphasise the users who are the most active ones. For example, he wants to display last logged users at the homepage. The developers figured out that they need to create an action table that will store actions performed by users. Action and profile tables are related to each other - a simple JOIN will be used each time when the homepage action is executed: last logged x profiles are fetched from the database and displayed then.

The website has been set off. Many users have registered and the action table is growing bigger and bigger every day. After few months, it has over 300'000 records. The manager is very happy that his project is becoming popular, but he noticed that the homepage is being served few seconds slower than it was in the beginning. The developers tell him that they didn't run high performance tests and they have to spend some time on optimisation. The manager is not pleased that it was not considered before.

NOTE: always use test data when focusing on project performance

Symfony has a built-in fixture mechanism which allows you to easily generate lots of different data (see the jobeet tutorial). This is essential when you want to make sure that your project will manage with big popularity. Anyway, if you decide to generate really big amount of data, do NOT use any ORM. It consumes too much memory and generating fixtures takes a lot of your precious time. I'd suggest to generate raw SQL INSERT queries instead - they'll be a lot faster.

Okay, let's move on. Once you have got lots of data (either real or test), execute each action - one after another - and check its performance. First thing you should look at is the mentioned web debug toolbar in the top right corner of the screen when running dev application environment. You should be worried, when you see something like the following:

There is a big problem: at least one of the queries is unoptimal (orange color) and as a consequence, executing this action takes too much time (imagine, almost 5 secs per one execution is really long and it doesn't matter that I'm using my personal computer for testing). Left click on the query icon (the one to the right):

One query takes almost 4 seconds to be executed. This surely causes a serious performance problem! Don't panic, just let your database server analyse the query for you:

  SELECT AS a__id, a.created_by AS a__created_by
  FROM action a
  LEFT JOIN profile p ON
    (a.created_by = AND p.deleted_at IS NULL )
    a.type_id_external = '2'
    AND p.avatar_id IS NOT NULL
    AND p.mode =4
  ORDER BY a.created_at DESC;

Here we can see, that the query has to check at least 1690 p (profile) table rows. And each profile record stores a lot of text data, describing each website user. All this makes the query take such long time to execute. If we want to speed it up, we just have to read carefully the query and concentrate on all columns used (and the order of their appearance). The solution is to find the best index (this topic may be quite complex and independent on the framework you use - ask google about indexing database tables/optimising databases - and read those articles carefully).

In this case, the developers executed the following line in MySQL:

ALTER TABLE profile ADD INDEX deleted_at_idx (deleted_at);
which created an index on the deleted_at datetime column. Thanks to this index, the EXPLAIN command shows that only 10 different rows in the profile table have to be analysed for the query to be executed. And this is a great success - the execution time went down to 0,01 second. Imagine the 4 seconds difference for each homepage display. This is the benefit from optimising project databases.

By the way, I cannot understand why the deleted_at column in the SoftDelete behavior is not indexed by default, especially when you can turn on automatic not-null-check (sql callback):

$manager->setAttribute(Doctrine_Core::ATTR_USE_DQL_CALLBACKS, true);
Provided this line is present in the project configuration, each query that fetches a SoftDelete model will automatically add the "WHERE deleted_at IS NOT NULL" part. It's obvious that such column has to be indexed - the index can be complex though - and the deleted_at can be the last column in this index - but, anyway, default index on deleted_at is a good idea! As you can see, you have to pay attention to what queries are executed inside your projects!

Note: database server different versions use different indexes

Different database server versions may use totally different indexes to execute the same queries on the same database structure! Make sure you run your performance test in the (future) production environment. Otherwise, you may find your application execute unoptimised queries on the production server even if you spent a lot of time optimising it in your local dev machine.

In the example above, it turned out that the production server has a different db server than the developer's local machine. The developer didn't check it - he was not aware of the differences and their negative impact on project's performance. The index that has been built is useless in the production environment (so it should be deleted, because inserting each row is slowed down by this index). Moreover, it happedened, that the new index the developers needed to speed up the query should be build on the action table... pay attention to database server versions you work on!

how many indexes to create

Table indexes are really helpful and they speed up database performance. The more complex your application is, the more queries may be executed. In other words, the bigger your application is, the more queries it may need to provide good performance. But look out - do not create too much indexes and never create any index when you are not sure that it is used somewhere. Why? It's very simple - each index is an additional structure which uses some space and needs some time to be created and managed. When a record is inserted, updated or removed, each index has to be updated accordingly to the change made, which surely consumes time. If you create too many indexes, you may simply slow down your database. For example, each user login inserts new action record - then an action table with 10 indexes will be slower than an action table with only 3 indexes).

further reading: this, this and this

No comments:

Post a Comment