Screen Scraping to the Rescue

I play this online game these days. It is a casual turn based game. Nothing heavy. It is a bit addicting. Occasionally I look to compare my progress against other players. There is a ratings board on the web site. But here is the problem. Every so often, I get a message stating that I visited the ratings board too often. Then I cannot see my rank. WTF?

This is a free game. So it is not like I am losing money. But this should not be that hard. There are around 1000 total players. At any given time, only 10 of them are online. How hard can it be to support a rankings page? Yeah they are probably querying a database, and sorting my character level and experience.

What? Are they running an Excel database LOL? I bet it is MySQL. And hello? Can you cache the data please? Performance problem solved. No charge. Time to take matters into my own hands. I hear the source code for the site is available. No need for me to whine. I should just implement the cache idea and demonstrate an elegant solution.

My first step was to get a snapshot of all the ratings screens. Next I am going to code up a parser to grab the raw data out of the HTML. Then I think I will import this stuff into my own database. Might not even need to do the caching if I tune the SQL correctly. For now I may just use a free Oracle database. I could just as easily use MySQL. I think I already have an instance running on my machine right now.

This is going fun. In the end, might even need to host the game on my own site. Pwned.