So, now I'm in the process of implementing a "Hot Cache" of feeds with an LRU (Least Recently Used) policy. The idea being, that this cache provides a smaller data set for performing the merge query against. We need to be able to handle storing much more data than we currently do while still being able to allow for fast merges.
The "hot cache" will basically work like this: If a merge request is made, it first checks a catalog table which indicates whether the merge's feeds are contained in the cache. If they aren't, an entry is made in this catalog table, triggering a UDF which calls a daemon which in turn calls a perl script to copy all the necessary data-- the items for the feeds that are part of the merge. Since the request for the merge must occur quickly as possible, the process cannot wait for the triggered script to complete, so the merge is performed against the full data set. The script that copies data to the cache, once completed, marks the entry in the catalog as being completed, and next time the merge request comes through, it uses the hot cache table instead.
The consistency of the cache is maintained by various triggers on other tables that ensure when the original data set is updated, deleted, added to, that those changes are also made in the cache.
Currently, there is only one hot cache table. To allow growth, there would also be the need for multiple cache tables. This is the part that is tricky. I would like to keep using triggers, but I'm still trying to figure out how to make it so I can use dynamic SQL in a trigger. I would want to have also a column in the catalog table that says what cache the data can be found in. If I'm to use a trigger to update the caches, I need to be able to have the table name be a variable in the trigger. The only way I've found to do this is by setting a table name variable, creating the trigger that does something like "set @query= concat('INSERT ... ', @tablename, ...) ; prepare stmt from @query; execute stmt" but unfortunately, trying to do this one gets:
ERROR 1336 (0A000): Dynamic SQL is not allowed in stored function or trigger
Argh.
So I can always fall back on using triggered UDF to call the perl script.
Why am I posting this? I'm just musing, and wondering what some other developers who read this have done to allow for being able to store a lot of data and still have a means to use that data. How to shard. I have several ideas, and each needs to be tested.