PhilPapers has been running up against the limitations of MyISAM for a while now. The problem is that updates lock the entire table on MyISAM. With 15,000 users a day at peak, many of whom update bibliographic data, PhilPapers is getting too busy for this architecture.
The solution I currently prefer is to use two copies of the main table: one for authenticated users (who can update it), and a read-only copy for anonymous users. The read-only copy would be updated periodically, maybe every hour or so. The lag wouldn't normally be apparent because selects made by authenticated users would be routed to the master copy they are updating.
I hope that splitting the load this way is going to help a lot because only 16% of search queries on PhilPapers are made by authenticated users. This strategy should eliminate 84% of query clashes. The strategy could also be extended by keeping track of which users have recently updated the main table and only routing those to the master. In this way I think we'd eliminate virtually all clashes. We could also route all queries of a certain type to the read-only copy.
One remaining question is how to do the copying. I've considered using MySQL replication, but it's not clear that it's possible to replicate tables/databases within a single server, and I'm reluctant to introduce a dependency on another machine/VM. Anyway, since replication is almost synchronous this wouldn't solve the problem: the replication updates would cause as many locks on the 'read-only' copy as the user updates cause on the master copy. The approach I favour at this point is to rebuild the read-only table like this on a periodic basis:
create table main_ro_tmp like main;
insert into main_ro_tmp select * from main where not deleted;
drop table main_ro;
rename table main_ro_tmp to main_ro;
I use a temporary table to build the new version because the insert statement currently takes about 4-5 minutes due to all the indexes on this table. It would be a little faster to create the indexes after having inserted the data, but I don't think the complication is worth the code complexification (I couldn't use "like main" anymore to automatically track whatever the index config on main is).
Ideally they're be a way of blocking selects on main_ro between statements 3 and 4, but "lock tables main_ro write, main_ro_tmp write" doesn't work: MySQL complains that there's a transaction in process. Looks like renaming is incompatible with locks. Any suggestions appreciated, though I think statement 4 will execute fast enough it's not a major concern.