Long-s versus short-s

Tim Spalding notes an interesting detail in Googles books ngram viewer:

Google can’t tell between an f and an ſ, the “s without a bar” more properly known as a long, descending or medial s. … If nothing else we can now follow the demise of the ſ with precision.

So, I was curious. Some arbitrarily selected graphs:

curse vs. curfe

stall vs. ftall

search vs. fearch

In all three cases, there’s a general trend of ſ-dominance from approximately 1700 to 1800, but the pre-1700 period appears a lot more mixed. Some example results explain why:

All ſ, all correctly identified as “s” and not “f”. Why, I wonder, and why does it break down later? Are Google just better at training their OCR on “older” typefaces? Is there a threshold setting somewhere to look for unexpected “f”s and turn them into “s”s, which is mostly disabled for eighteenth-century and later material? And why do all three ſ-variant graphs show the same 1570-1610 step?

Questions, questions…