This is usually where physics gets messy. It's all fun and games until someone suggests that maybe all objects are not perfectly convex, such as... the game level? There is convex decomposition, yes, but I'm not totally convinced that's a silver bullet. Convex decomposition is awfully complicated, and requires heavy preprocessing. It's definitely a good idea in many cases, but there will always be raw triangles.

Convex decomposition or not, you need some sort of mid-phase, finding which triangles/primitives collide with a specific object. A lot of work has been put into this, and I think most people today agree that a quantized, binary AABB tree is the ideal solution.

What's interesting here is how people usually query these AABB trees. The output from the broad phase is a list of pairs with overlapping bounding volumes. These pairs are then processed one by one, and in the case of a triangle mesh, the mid-phase is engaged to find the relevant triangles/primitives. After that, the near phase finds the actual contacts.

What I would like to suggest is to query the whole dynamic scene against the AABB tree all at once. That is, instead of colliding the AABB tree with a single object (single AABB) multiple times, you collide it with another AABB tree, representing all moving objects. This is especially relevant if combined with a Dynamic Bounding Volume Tree broad phase, as suggested by Erwin Coumans. In this case, all moving objects are already in a dynamic AABB tree of their own. Objects tend to appear in clusters, so objects close together also tend to collide with the same triangles. Doing the mid-phase this way saves you from drilling down the compressed AABB tree multiple times, which gives an instant performance gain. The tree/tree traversal is a bit of a mind job at first, but the Bullet DBVT implementation is a really good reference.

As usual, I'm really lazy and not doing my side-by-side comparisons, but it might show up later.