Previously I had released my SiteScan application as soon as I coded it up. Now I realize that there were some bugs in that initial release. Specifically the program would only detect the first URL per line of text in the HTML code. On my blogs this is not a problem, since my blog code only has one URL per line. However I tested out my program on big sites like MSN and found it utterly lacking.
I will provide an updated version of the program soon that fixes this problem. Another issue that is becoming plain to me is that the program is slow. Part of the problem is due to the fact that the program has to download each web page linked in to find new links. It also has to keep a unique list of links so it does not go in circles when pages link to each other. However I have some ideas to have the program do more than 1 thing at a time to significantly speed up the throughput.
These are exciting times. Once I fix the SiteScan application, I plan to combine some other ideas with the SiteScan program to produce a truly useful and comprehensive tool. Not sure what I am going to name this composite application. I am open to suggestions.
Work Smarter not Harder
-
We have large data sets in my current project. Every year tons more data is
loaded into the system. So we only keep the majority of data for 4 years.
After...