SiteScan Version 1.1 Release

I have uploaded SiteScan version 1.1 to my file server. This version fixes the bug in which the program did not look for multiple URLs per line of text. Previously when I ran SiteScan, providing a Home Page of MSN resulted in little to no URLs. Now the updated version zooms through MSN and grabs all the links on the page. Most of the content on the main MSN page points to other domains. So SiteScan mainly crawls just the MSN main page.

Now that I have this major bug fixed in SiteScan, it is time for me to join SiteScan code with some newer ideas into a new mega program. Not sure what the name of this program will be. But it should allow you to simulate traffic to all pages of your web site. This mega app could have all kinds of interesting uses - legit and otherwise. For example you could check the performance of your web server. Or if you needed to test your page view counters, you could leave this app run all day and "go to town".

All the code is here for me to put this mega app together. I just need to sit down and do it.

SiteScan Known Issues

Previously I had released my SiteScan application as soon as I coded it up. Now I realize that there were some bugs in that initial release. Specifically the program would only detect the first URL per line of text in the HTML code. On my blogs this is not a problem, since my blog code only has one URL per line. However I tested out my program on big sites like MSN and found it utterly lacking.

I will provide an updated version of the program soon that fixes this problem. Another issue that is becoming plain to me is that the program is slow. Part of the problem is due to the fact that the program has to download each web page linked in to find new links. It also has to keep a unique list of links so it does not go in circles when pages link to each other. However I have some ideas to have the program do more than 1 thing at a time to significantly speed up the throughput.

These are exciting times. Once I fix the SiteScan application, I plan to combine some other ideas with the SiteScan program to produce a truly useful and comprehensive tool. Not sure what I am going to name this composite application. I am open to suggestions.

SiteScan App Released

I am proud to present my latest application SiteScan. This application will scan you entire web site and extract all the links on it. You just have to provide it with your home page's URL. Then click the Scan button. The list of URL links will be output to file "sitescan" located in the same directory as the SiteScan program.

Be warned that this application can take a long time to run for large web sites. I have a fast Internet connection and web server where my biggest blog is hosted. SiteScan took 15 minutes to scan the whole blog. Part of the time is because the app has to download and look for links in all your pages. The other time consuming activity is to make sure the scanner does not get stuck in link circles. If Page 1 links to Page 2, which in turn links back to Page 1, then SiteScan has to detect this and not just bounce back between both pages forever.

This application assumes that the site you are scanning is static. That is, if you go and change the web site which the program is running, you may get unexpected results. I plan to kick off SiteScan tonight and have it scan all of MSN. Let's hope this job finishes by morning. I will let you know how it goes. In the mean time, enjoy the SiteScan application.

For Programmers Only

I have started my next software project. It is called "sitescan". The program shall scan you entire web site, and identify all the links in it. I have given the program some thought. It should not take too long to create the first version of it. While I am busy coding up this app, I thought I would discuss some issues related to distributing software like I do.

Currently I code up most of my apps using Visual C++ version 6.0. This is actually a very old version of the C++ compiler from Microsoft. However I have found that it is still very useful. When I release a new application to the public, I want the experience to be simple. I just want to give my users a single executable file that they can double-click and run.

My executable that gets distributed at release time must contain all of the logic build into the executable. I don't like bothering my users with complicated install programs. Visual C++ normally depends on some additional dynamic linked library (DLLs) provided by Microsoft. I choose instead to have the library code embedded directly in the executable. Here is how you accomplish this with Microsoft Visual C++ version 6.0 standard edition:
  1. Choose Settings from the Project menu
  2. On the Project Settings dialog, click on the C/C++ tab
  3. On the left, change the "Settings For:" combo box to Win32 Release
  4. In the Project Options edit control on the bottom, change /MD to /MT
  5. In the Project Options edit control on the bottom, remove /D "_AFXDLL"

After you have followed steps 1 through 5 above, rebuild the release version of your application. The resulting executable has all of the dependent library code built in. This does make the executable bigger. However everything it needs it included in the executable. I don't recommend this method for huge applications. But for the size of the apps I have been releasing, this makes it very easy for the user to run my app with minimal problems during install.

Be on the lookout for an upcoming post when I release my sitescan application to the public.

Indexed by Google

I started this blog a few days ago. One of the goals I had was to get the blog indexed by Google as quickly as possible. It appears that this goal has been met. A little over 2 days after the blog was created, it got indexed by Google. I can now do searches and find my blog showing up in the Google Search results.
There were a few key elements to immediately getting my blog indexed by Google:
  1. I added links to the blog on every page of my main web site
  2. I created a new post in all my major blogs, linking to my new blog
  3. I submitted my blog URL to many web directories that have high Page Rank

Now let's see if I can keep up the good work. I really want the Google bot to keep coming back and indexing the new content I add to the blog. I might continue to add my blog URL to other web directories with lesser page rank. However I believe what I really need to do is to continue to generate unique, interesting, and frequency new content for the blog.

I do have a number of other programs which are already complete, but have little to do with black hat activities. Maybe I will post one or two of them here. Perhaps I can modify them slightly to show some interesting programming techniques. Either way I shall also continue to generate new ideas and write and release programs here for your use. For now they shall all be free of charge. Enjoy.

New Idea for Prog

I have a good idea for my second program for Black of Hat. It will be a type of web crawler. You give it a web site. It then finds all the pages on that web site, and records their URLs. I can imagine a number of uses for such a program. For example you could analyze your web site.

My goal is to knock out this program and release it here on my blog in a couple days. I hope that it is as easy as I think to write the code. This program will be the stepping stone for others like it. For example I could then write a program which identifies dead links on your site. Or I could do a link analysis of your sight for Search Engine Optimization purposes.

Keep an eye out for this second program coming soon to Black of Hat. I think I shall call it Link Crawler.

Search Engine Submission

Submit Your Site To The Web's Top 50 Search Engines for Free! I manually submitted my blog to the big search engines (Google, MSN, Yahoo). Now I am looking at other methods to get the word out. My plan is to submit my site to some web directories. While researching how to do this, I came across a free service which submits your site to a lot of search engines. How nice. They don't require a reciprocal link back to them on your web site. I always feel good when a service does not try to strong arm you with such tactics. To reward their kindness, I decided to give them a link back anyway. If you want to easily submit your web site to a ton of search engines, check out the Free Web Submission service. Click on the button on the upper left hand corner of this post.