If you've ever used CFINDEX to create a searchable index of files, you've probably lamented that it only indexes the content of files, which is fine for static files. But for web application filetypes like .cfm, .asp, .etc., you generally want to do index the results of running the pages, often called spidering the site. But CFINDEX can't do that--unless you're using BlueDragon.
We've added a new option, CFINDEX TYPE="website". The KEY attribute is used to specify the URL of the site (file or directory) to be spidered. Examples include:
<CFINDEX TYPE="website" KEY="http://domainname/">
<CFINDEX TYPE="website" KEY="http://domainname/dirname/filename.ext">
When spidering a web site, the KEY attribute indicates the starting page, which doesn't necessarily have to be the home page of the web site. Indeed, you could create separate search collections for sub-sections of a web site if desired.
The spidering process simply follows the links found in the starting page, processing any links that result in text/html files formats (.cfm, .htm, .jsp, .asp, .aspx, etc.). Recursion is automatic, following pages through the site. It's also smart enough not to follow links to sites other than the one you're spidering.
Some will want to point out that it IS possible to do spidering in CFMX, using the vspider.exe command line utility. Its existence is a surprise to many, as it's not documented in the normal CF documentation. Unfortunately, it's also limited to only spidering your localhost. BlueDragon doesn't impose such a limit, nor is there any limit on the number of documents you can index or search.
There are other details to note, including options for letting the index be synchronous or not, so be sure to read the Compatibility Guide (or Enhancements Guide, as of 6.2) for more info.
This is a really great feature to point out--I was faced with this
limitation of CFMX on a project recently and we actually have CFMX and
BlueDragon both running now so we can do site spidering across multiple
sites. Without getting too much into the gory details, in our case we use
BlueDragon to spider several sites, then dump the spider results into a SQL
Server database so it can be picked up by a CF 5 server and rolled into
that server's Verity collections. This was a very crucial piece of the
puzzle on a project with a very short deadline, and BlueDragon came in
extremely handy to solve this quickly and easily.
Matt Woodward [mpwoodward@gmail.com]
Have you forgotten that you can do type="custom" index using CFINDEX? This will take a query and chuck it in the text index. CFSEARCH can be passed a comma delimted list of indexes that it will search through, so even if you don't put the static and the dynamic content into one index you can still search both at once. Or is this a limitation on Bluedragon that I missed and it can't take a query to index?
Well whatever, being able to give CFINDEX a URL is still very cool!
Stephen, I'm not sure what you're getting at. We do indeed support the
CUSTOM attribute. This was just about spidering sites. Can you clarify what
you were getting at?