Patrick Galbraith ([info]capttofu) wrote,

FULLTEXT redux

In my post from yesterday, I ended up resorting to using multiple-column fulltext indexes to make it so I was actually using the indexes on the columns specified in the query. Well, that worked. But it also resulted in HUGE indexes!

Before change:
-rw-rw---- 1 mysql mysql 1177773692 2008-06-05 12:37 items_text.MYD
-rw-rw---- 1 mysql mysql 1136713728 2008-06-05 12:37 items_text.MYI


After adding more indexes:

-rw-rw---- 1 mysql mysql 1156516200 2008-06-04 17:14 items_text.MYD
-rw-rw---- 1 mysql mysql 1978787840 2008-06-04 17:14 items_text.MYI


Furthermore, this made the table much harder to update. Replication kept lagging last night (nagios was complaining loudly).

I've since reverted back to the way I had it, not using the index, which is the least worse of my options currently. Someone asked me to try using this query:

select count(*) from items_text where MATCH (content)  AGAINST ('+iron +man' IN BOOLEAN MODE) or  MATCH (title) AGAINST ('+iron +man' IN BOOLEAN MODE)


This doesn't use indexes the way you would hope either! As soon as you add one more index to the query, even with "OR", you get this:

mysql> explain select count(*) from items_text where MATCH (content)  AGAINST ('+iron +man' IN BOOLEAN MODE)\G*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: items_text
         type: fulltext
possible_keys: content
          key: content
      key_len: 0
          ref: 
         rows: 1
        Extra: Using where
1 row in set (0.00 sec)

mysql> explain select count(*) from items_text where MATCH (content)  AGAINST ('+iron +man' IN BOOLEAN MODE) or  MATCH (title) AGAINST ('+iron +man' IN BOOLEAN MODE)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: items_text
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1325662
        Extra: Using where
1 row in set (0.00 sec)


So, back to the drawing board. Try another fulltext indexing option. This sure doesn't cut it.

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

  • 7 comments

Anonymous

June 5 2008, 18:31:47 UTC 3 years ago

FULLTEXT indexes

I found that FULLTEXT index is slow even when optimal (and it needs frequent optimizations). Since the next-best solution (Sphinx) is not yet usable , I recommend SolR/Lucene. Apart for much faster search the search itself is more flexible (you can find "iron" and "man" if there are no more than 3 words apart - for example). I know this doesn't exactly fit in Mysql replication chain but is the best solution I've fund so far. Maybe Sun should spend a bit of time to add SolR/Lucene as a storage engine.

[info]capttofu

June 6 2008, 15:17:02 UTC 3 years ago

Re: FULLTEXT indexes

It wouldn't even necessarily necessitate a storage engine - perhaps some glue to make it work as a internal index.

Anonymous

June 5 2008, 19:04:15 UTC 3 years ago

Combine the fields into one column

At first glance it appears you have 2 problems that are intertwined performance and replication.

I've hit this wall before w/ fulltext indexes. To get around it I recommend creating a new column that stores all the data you want to search (you can use triggers or add some code to populate the column).
SELECT COUNT(*) FROM items_text
WHERE (MATCH (new_column) AGAINST ('+iron +man' IN BOOLEAN MODE));

Because all the data is in the new_column you won't have to deal composite indexes and it will scale! As you add a column to the items_text, just add it to the new_column also. You're using boolean searches... you could get away with removing the single column fulltext indexes which will lessen the your diskio which should solve your replication issue.


-sean

[info]arjen_lentz

June 5 2008, 20:55:33 UTC 3 years ago

Sphinx.

Tried it?
http://www.sphinxsearch.com/

[info]capttofu

June 6 2008, 15:14:40 UTC 3 years ago

Re: Sphinx.

Not yet, but I will very soon. Our CTO is under the impression that documentation is lacking, but it's certainly worth a shot.

[info]awfief

June 6 2008, 14:25:02 UTC 3 years ago

This doesn't use indexes the way you would hope either! As soon as you add one more index to the query, even with "OR"

The problem with "OR" is that MySQL can only use 1 index per query (with exceptions, yes, but it's a general good rule).

You'll get much better performance if you do the queries separately. If you must do it in one query, use a UNION:

select count(*) from items_text
where MATCH (content) AGAINST ('+iron +man' IN BOOLEAN MODE)
UNION
select count(*) from items_text
where MATCH (title) AGAINST ('+iron +man' IN BOOLEAN MODE);

[info]capttofu

June 6 2008, 15:15:57 UTC 3 years ago

Thanks! I'll try this next as well. Antony messaged me last night with this same suggestion.
Create an Account
Forgot your login or password?
Facebook Twitter More login options
English • Español • Deutsch • Русский…