BlueDragon Blog
Here you'll find tips and information about making the most of BlueDragon, which offers several compelling implementation alternatives for your CFML applications. This blog was created originally by Charlie Arehart, who was New Atlanta CTO from 2003-2006.,He has since moved on to become an independent consultant but continues to answer comments raised in existing blog entries. BlueDragon continues, and you should look to the newer BlueDragon blog, from New Atlanta president, Vince Bonfanti, for more updated information.

BlueDragon Advantage - Easier Site Spidering (and no localhost limitation)

posted Thursday, 23 December 2004

If you've ever used CFINDEX to create a searchable index of files, you've probably lamented that it only indexes the content of files, which is fine for static files. But for web application filetypes like .cfm, .asp, .etc., you generally want to do index the results of running the pages, often called spidering the site. But CFINDEX can't do that--unless you're using BlueDragon.

We've added a new option, CFINDEX TYPE="website". The KEY attribute is used to specify the URL of the site (file or directory) to be spidered. Examples include:

<CFINDEX TYPE="website" KEY="http://domainname/">
<CFINDEX TYPE="website" KEY="http://domainname/dirname/filename.ext">

When spidering a web site, the KEY attribute indicates the starting page, which doesn't necessarily have to be the home page of the web site. Indeed, you could create separate search collections for sub-sections of a web site if desired.

The spidering process simply follows the links found in the starting page, processing any links that result in text/html files formats (.cfm, .htm, .jsp, .asp, .aspx, etc.).  Recursion is automatic, following pages through the site. It's also smart enough not to follow links to sites other than the one you're spidering.

Some will want to point out that it IS possible to do spidering in CFMX, using the vspider.exe command line utility. Its existence is a surprise to many, as it's not documented in the normal CF documentation. Unfortunately, it's also limited to only spidering your localhost. BlueDragon doesn't impose such a limit, nor is there any limit on the number of documents you can index or search.

There are other details to note, including options for letting the index be synchronous or not, so be sure to read the Compatibility Guide (or Enhancements Guide, as of 6.2) for more info.




1. a reader left...
Thursday, 30 December 2004 2:56 pm

This is a really great feature to point out--I was faced with this limitation of CFMX on a project recently and we actually have CFMX and BlueDragon both running now so we can do site spidering across multiple sites. Without getting too much into the gory details, in our case we use BlueDragon to spider several sites, then dump the spider results into a SQL Server database so it can be picked up by a CF 5 server and rolled into that server's Verity collections. This was a very crucial piece of the puzzle on a project with a very short deadline, and BlueDragon came in extremely handy to solve this quickly and easily.

Matt Woodward [mpwoodward@gmail.com]


2. Stephen Moretti left...
Wednesday, 19 January 2005 1:34 pm

Charlie.....

Have you forgotten that you can do type="custom" index using CFINDEX? This will take a query and chuck it in the text index. CFSEARCH can be passed a comma delimted list of indexes that it will search through, so even if you don't put the static and the dynamic content into one index you can still search both at once. Or is this a limitation on Bluedragon that I missed and it can't take a query to index?

Well whatever, being able to give CFINDEX a URL is still very cool!


3. Charlie Arehart left...
Sunday, 27 February 2005 1:56 pm

Stephen, I'm not sure what you're getting at. We do indeed support the CUSTOM attribute. This was just about spidering sites. Can you clarify what you were getting at?