home email about rss

MySQL: Fantastic searches using FULLTEXT

If you’ve got a website that deals with data then chances are you have a search field somewhere on that website. If you’ve got a search field on your website then chances are you want that search to return the most relevant information, chances are your customers want this even more than you do. We’ve all been frustrated by websites that provide a search box but don’t seem to provide any useful results. Using the following tutorial you can ensure that searches on your site are better than the competition.

Companies with big bucks to spend can indivually tailor their site searches to meet predicted keywords their customers are likely to use, this is backed by a powerful search engine. Searches of this scale are impractical for the smaller website designer who has to take into consideration time and budget - this is where the handy FULLTEXT index of MySQL comes into play.

Most developers are probably creating searches using SQL queries that look something like:

SELECT * FROM `books` WHERE `title` LIKE '%Computer Hardware%';

You may even use a search that uses a small piece of php (or other language) code to create a simple boolean search. By seperating two or more words in a search phrase we can apply an operater to create a more advanced search.

SELECT * FROM `books` WHERE `title` LIKE '%Computer%' OR `title` LIKE '%Hardware%';

At best searches based on the above queries will return a list of results order by the book title - at worst the results will be ordered by the sequence the data was entered into the database. What we want to achieve is a list that is ordered by relevance - this will be what your customer is wanting too. It is possible to order these results by relevance using some fancy scripting and a few mathmatical functions - it is also possible, and easier, using MySQL’s fulltext functions.

The first step to using this fulltext functionality is to create a table that meets the following requirements.

  • The table type must be MyIsam. Chances are you’re already using a MyIsam table but I do know cases where INNODB tables are preferred. If you use INNODB or another table type then i’m afraid you’ll either need to convert to MyIsam or not use a fulltext search.
  • The table must have a FULLTEXT index on one or more fields. This index is used to perform powerful fulltext searches. Only index the fields you intend to search on, excessive indexing can impair performance.
  • A FULLTEXT index can only be used on text fields (CHAR, VARCHAR or TEXT).

Creating a fulltext table is simple:

CREATE TABLE `fulltext_example` (
`id` int(11) NOT NULL auto_increment,
`author` varchar(128) NOT NULL,
`title` varchar(255) NOT NULL,
`content` text NOT NULL,
PRIMARY KEY (`id`),
FULLTEXT KEY `content` (`content`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;

The fulltext index (shown in red ) is the defining characteristic of a table that can use a fulltext search. If you use phpMyAdmin then you can click the phpMyAdmin Fulltext Icon icon to set a fulltext index on a field.

For the tutorial I have created a small dataset containing the first chapters of a small number of books. This table structure and the sample dataset used for this tutorial can be downloaded at the end of this article.

Like most of SQL a fulltext query uses descriptive keywords. The syntax for a fulltext search is MATCH() … AGAINST().

First performing a search on the data

SELECT author, title
FROM `fulltext_example`
WHERE MATCH (`content`)
AGAINST ('death')

This query returns a nice looking dataset ordered by relevance. Anyone who has read these books would probably agree that this is correct.

Author Title
Stephen King The Green Mile
Terry Pratchett Mort
Chuck Palahniuk Fight Club
George Orwell 1984
J. R. R. Tolkien Fellowship of the Ring

This works for a two word keyphrase too.
The query…

SELECT author, title
FROM `fulltext_example`
WHERE MATCH (`content`)
AGAINST ('human computer')

…and the dataset

Author Title
Robbie Allen, Matt Larson, Cricket Liu DNS on Windows Server 2003
Apple Mac OS X Server Open Directory Administrator
Charles Dickens Oliver Twist
Shelley Powers Learning JavaScript
Terry Pratchett Mort
George Orwell 1984

Once again the query returns pretty much what you’d expect to see - more relevant results at the top, less relevant results at the bottom.

For those who are curious about how the MySQL engine orders these rows you can view the relevance score that is assigned to each row using this query.

SELECT author, title, MATCH (`content`) AGAINST ('human computer') AS score
FROM `fulltext_example`
WHERE MATCH (`content`)
AGAINST ('human computer')

Resulting in:

Author Title Score
Robbie Allen, Matt Larson, Cricket Liu DNS on Windows Server 2003 0.54230254888535
Apple Mac OS X Server Open Directory Administrator 0.30518108606339
Charles Dickens Oliver Twist 0.10349141061306
Shelley Powers Learning JavaScript 0.089452371001244
Terry Pratchett Mort 0.043933693319559
George Orwell 1984 0.025113707408309

These scores, while impressive looking, are not actually very useful to the developer apart from to sort the data. It would possibly be useful if they highest attainable score was one - then maybe you could supply the end user with a percentage of accuracy for each row. The max score varies greatly depending on the dataset and you will frequently see scores of over one.

Fulltext searching does not end here, MySQL has added a number of useful features that I will quickly run through.

Boolean Searching: Once you fully master the possibilities of a boolean search you can open your search to a whole new world of possibilities. It’s as if you have a minature google - search using + and - symbols to add or exclude words, use the * symbol as a wildcard. There’s loads more too - see the links at the end of this article.

Query Expansion: Using query expansion with your fulltext search takes the pain away from thinking up all possible search terms and linking them to the relavent data, just to assist your end user. Query expansion aims to maximise search results by improving on the users choice of search words. A typical query expansion example changes a search for ‘database’ into a search for ‘database’,'dbms’,'mysql’ etc…

Finally I have to warn you of some of the pitfalls and traps you may fall into using a fulltext search.

  • Fulltext searches have a minimum word length. By default words of 3 or less characters are ignored. This can be changed in your mysql.ini file but the shorter your minimum word length the more impact you’ll see on performance.
  • The stopword list is used by mysql to avoid commonly used words. If your search phrase appears in the stoplist then it will be ignored.
  • Performance: A fulltext search is ideal for databases with many thousands of rows. Once you have a database with millions of rows in it then you probably want to find another solution (Oracle) as you will notice a significant decrease in performance. It is also worth noting that creating a FULLTEXT index on a table that already contains data is much slower than creating the index on an empty table then inserting the data.

I think that’s enough for today, any questions leave a comment and i’ll do my best to answer.

Dave Blake

Download: A zipped SQL file containing the table structure and sample data used in this tutorial.

fulltext_example.sql.zip

See Also:

http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html

http://dev.mysql.com/doc/refman/5.0/en/fulltext-query-expansion.html

This article was first seen on www.daveblake.co.uk and is used with permission.

Similar Posts: