Subscribe | Alerts via Email
View All Quotes
“Furious activity is no substitute for understanding.”
-H. H. Williams
<April 2008>
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

©2012 Cal Zant
Sign In
Total Posts: 107
This Year: 0
This Month: 0
This Week: 0
Comments: 0

I have our production web servers set up to email me notifications when unhandled exceptions occur, and if a site is publicly accessible crawlers, spiders, and other types of search bots can make this a pain.  Most crawlers try to go through pages they have in their index, and pull new content.  If you have promo-type pages that are only up for a limited amount of time, they will try to access that URL later on after it is gone, and bam ... another email in my box.

There are CAPTCHA components out there, but they aren't really appropriate for this scenario ... I don't want site users to have to read some squiggly letters and type those out before the site sends me an error email.  So I have written some pretty simple code that helps me filter out if the request came from a crawler.  My 10 second Goggle on the subject didn't turn up much, so maybe this will help someone else out there.  If you figure out a way to improve it, please post some comments or contact me.

The solution is two part.  I have a regular expression pattern stored in the web.config, and then a single "IsBot" method that returns true if the current request is from a crawler and false otherwise.  I store the pattern in the web.config because it is still evolving, and probably far from "all encompassing" ... but when an error email slips through that is from a bot, I will look at the agent it presents itself as and then add a new keyword to the pattern to detect that crawler in the future.  If I was to guess, I would say the pattern posted here probably detects over 90% of hits from crawlers ... as of today.

Current pattern in web.config's AppSetting setting section:

<add key="botRegex" value="bot|crawler|spider|slurp|ask|teoma" />

C# IsBot method:

/// <summary>
/// Returns true if the current request is from a bot crawling the site, and false otherwise.
/// </summary>
public static bool IsBot
{
    get
    {
        // If this method can't access the current context that means the executing thread doesn't have access
        // to the current request's properties ... since we can't pull any agent information we have to assume
        // this is not a bot.
        if(HttpContext.Current == null)
            return false;
        
        string HTTP_USER_AGENT = "";
        if (HttpContext.Current.Request.ServerVariables["HTTP_USER_AGENT"] != null)
            HTTP_USER_AGENT = HttpContext.Current.Request.ServerVariables["HTTP_USER_AGENT"].ToLower();

        // Check to see if the user agent field contains any of the terms in the botRegex set in the web.config
        string expression = ConfigurationManager.AppSettings["botRegex"];
        Regex botRegex = new Regex(expression);
        return botRegex.IsMatch(HTTP_USER_AGENT);
    }
}
Tuesday, April 29, 2008 2:45:22 PM (Central Standard Time, UTC-06:00)  # 
...
using System.Globalization;
using System.Threading;
...

string thisPhrase = "HELLO WORLD!";
lblOutput.Text = Thread.CurrentThread.CurrentCulture.TextInfo.ToTitleCase(thisPhrase.ToLower());

That's it ... this example would convert the string to "Hello World!"  Pretty easy, huh?

Thursday, April 24, 2008 12:04:48 PM (Central Standard Time, UTC-06:00)  # 

The major credit agencies (Equifax, Trans Union, Experian, and Innovis) all sell aggregate credit information to any bidder. Direct mail and credit companies buy lists based on demographics including zip code, income band and credit payment patterns ... and then mail out to thousands or even millions of "potential customers."  The Fair Credit Reporting Act entitles you to contact to contact these agencies and request they stop seding you card solicitations and related offers.  There are two easy ways to opt out:

  • Online: You can go to www.optoutprescreen.com, and fill out very short, one page form electronically that will opt you out for 5 years.  If you want to opt out permanantly you have a print out a form and mail it in.
  • Over The Phone: Call 1-888-5 OPT OUT (available 24 hours a day).  All you need to know is your address, former address within two years, and social security number.

Both options will update each of the major credit agencies for you.  It will take a couple months to see the effects from either the online or over the phone request, because the credit agencies have to process the request, update their "opt out lists" ... and then after that time they won't sell your information anymore.  But if a company buys your information the day before they update the list, you might not see their promotion/application until a couple months later ... so you have to be patient, but eventually they should stop coming.

Wednesday, April 23, 2008 12:41:49 PM (Central Standard Time, UTC-06:00)  # 

A lot of times I have user controls or even entire pages that are customized for the authenticated user ... but the output rendered by that user control or page doesn't change that often.  It makes sense to cache this information, but I still want it to be personalized for each user.  For example, if you have a menu on every page of a site that only contains links relevant for the user that is signed in ... you can actually use the OutputCache directive to allow ASP.NET to cache the output of that control or page for a particular duration for that user, instead of having to recreate the menu on every page load.  This can save a significant amount of processor cycles and memory, and make your site more scalable.  Here is how you do it:

  1. Add this directive to top of your user control (.ascx) or page (.aspx) file:

    <%@ OutputCache Duration="60" VaryByParam="none" VaryByCustom="UserID" Shared="true" %>

    Duration: The amount of time in seconds to cache the content
    VaryByParam: Indicates whether the content should vary based on the QueryString
    VaryByCustom: The custom value used to determine if the content is already in cache
    Shared: Whether multiple pages should be able to access the cached content (default is false)

  2. Then create a new method in the Global.asax file that defines how to find the value for the custom parameter:

    public
    override string GetVaryByCustomString(HttpContext context, string arg)
    {
        if (arg == "UserID")
            return context.User.Identity.Name;
        
        return string.Empty;
    }

The GetVaryByCustomString method is called very early in the life-cycle of a request ... even before the SessionState has been populated, so you can't use a custom value from the SessionState to vary the content.  At least that is 99% true.  If you are using the OutputCache inside a user control, the SessionState is actually available when the method is called.  So if you ran into this problem on a particular page, one (not very elegant) solution would be to move all the content into a user control and then OutputCache that control so you can access the SessionState value in the GetVaryByCustomString method.

Another use for customized OutputCache is on things like product details pages where the ProductID is in the QueryString.  Here is what the URL for a page like this might look like: Details.aspx?ProductID=34.  In this scenario you can OutputCache like this:

<%@ OutputCache Duration="7200" VaryByParam="ProductID" %>

OutputCache is the most effecient way to utilize caching in ASP.NET.  I also use the HttpRuntime.Cache API frequently to cache data, but when most consultants come on-board to help make a site more scalable ... they start with OutputCache.  For more info, check out the OutputCache documentation, or this article on Caching Portions of an ASP.NET Page.

Tuesday, April 22, 2008 2:06:58 PM (Central Standard Time, UTC-06:00)  # 

The new Report Designer in SQL Server 2008 is actually a combination these two products from SQL Server 2005:

  • Report Builder - A stand-alone application distributed through SQL Reporting Services that is pretty high-level and designed for end-user ad-hoc reporting.  The files it generated were the standard .rdl files ... however, there were many limitations this software like the ability to only have one report region (couldn't make a single report that contained a few different types of information or queries) and you could only pull from one data source (typically a single report model).
  • Report Designer - This was a tool that developers and other IT pros could use to design more specialized or complicated reports inside Visual Studio.  The interface was less than intuitive, but it didn't have those limitations found in Report Builder.  It also saved files in .rdl format, but if you created a report with Report Designer ... Report Builder would not be able to edit that report.  However, if the report was created in Report Builder, you could always open it with Report Designer and modify it there.

So now there is just one product ... SQL Server 2008 Report Designer.  It looks more like the stand-alone Report Builder, but has the Office 2007 look and feel including the ribbon control.  This tool is supposed to be a common platform that is easy enough to use so end-users (e.g. management, executives) could create ad-hoc reporting, but also contain all of the functionality needed by IT pros to build more complicated reports as well.  Seems like a tall bill to fill, but it isn't improbable in theory ... so maybe they got it right.

The link below is a very recent "web exclusive" article from SQL Server Magazine that gives fairly in-depth preview on the product, and some examples and screenshots.  Check it out:

http://www.sqlmag.com/Articles/Index.cfm?ArticleID=98830

Saturday, April 19, 2008 9:42:19 AM (Central Standard Time, UTC-06:00)  #