Recently a customer using EPiServer 4.61 reported a problem with Google no longer indexing their site since turning on friendly URLs*. After some searching through the EPiServer forums and then further investigations on the web it turns out that the problem is not specific to the EPiServer functionality but rather indicative of a wider issue with Microsoft.Net 2.0 and URL redirection.
Apparently the issue relates to something in the Html32TextWriter class, a full description of the issue and fixes can be found here.
There seems to be a great deal of people experiencing problems with this combination but few offering clear succinct advise about what to do about it. Hopefully the following will assist with anyone experiencing this problem.
Diagnosing the issue
- Your web site seems to disappear from Google and other search engines
- You are using URL rewriting to direct the user to the correct version of the page
- You are using ASP.Net 2.0
Testing the issue
- Download and install Fiddler a HTTP debugging proxy http://www.fiddlertool.com/fiddler/
- Hit a page on your site (deeper then the homepage) using the following request headers
Accept: */* Accept-Encoding: gzip, x-gzip User-Agent: Mozilla/4.0
You should get a result of 200 which is a success
- Change the request headers to the following and attempt to hit your site again
Accept: */* Accept-Encoding: gzip, x-gzip User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
If you are suffering from the issue you will get a 500 error
Fixing the issue
There seem to be a number of ways to fix the issue if you have access to the code doing the rewriting you can modify some of the code, however, because we are using EPiServer and we don’t want to modify the behavior of the URL rewriter then the easiest way seems to be adding some browser capability files to your site.
The following is an example of the browser capabilities file that I created for the GoogleBot
<!-- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) --> <browsers> <browser id="Googlebot" parentID="Mozilla"> <identification> <userAgent match="Googlebot/(?'version'(?'major'd+)(?'minor'.d+))" /> </identification> <capabilities> <capability name="browser" value="Googlebot" /> <capability name="version" value="${version}" /> <capability name="majorversion" value="${major}" /> <capability name="minorversion" value="${minor}" /> <capability name="activexcontrols" value="true" /> <capability name="backgroundsounds" value="true" /> <capability name="cookies" value="true" /> <capability name="css1" value="true" /> <capability name="css2" value="true" /> <capability name="ecmascriptversion" value="1.2" /> <capability name="frames" value="true" /> <capability name="javaapplets" value="true" /> <capability name="javascript" value="true" /> <capability name="jscriptversion" value="5.0" /> <capability name="supportsCallback" value="true" /> <capability name="supportsFileUpload" value="true" /> <capability name="supportsMultilineTextBoxDisplay" value="true" /> <capability name="supportsMaintainScrollPositionOnPostback" value="true" /> <capability name="supportsVCard" value="true" /> <capability name="supportsXmlHttp" value="true" /> <capability name="tables" value="true" /> <capability name="vbscript" value="false" /> <capability name="w3cdomversion" value="1.0" /> <capability name="xml" value="true" /> </capabilities> <controlAdapters markupTextWriterType="System.Web.UI.HtmlTextWriter"></controlAdapters> </browser> </browsers>
- In Visual Studio 2005 create a new folder in the root of your website called “App_Browsers”
- Create a new file in the directory called “googlebot.browsers”
- Paste the above xml into the browser file and save. (the important bit here is the control adapters section that is telling the framework to use the System.Web.HtmlTextWriter class instead of the Html32TextWriter class specifically for the GoogleBot)
- Retest your application using Fiddler and the second of the two request headers, this time around you should get the 200 success response.
(* a mechanism where EPiServer builds the url for the page based on the site structure rather then a page id as a querystring parameter)
References
- The issue described - this was the most helpful article there is also an example of a browser caps file for Yahoo http://todotnet.com/archive/0001/01/01/7472.aspx
- The fiddler HTTP debugging tool http://www.fiddlertool.com/fiddler/
- MSDN document on the browser capabilities schema http://msdn2.microsoft.com/en-us/library/ms228122.aspx
- Article from EPiServer Tech News regarding Friendly URL’s http://www.episerver.com/en/NewsEvents/…
- If you are using Friendly URL’s with EPiServer then I recommend you use the FURLX enhancements from EPiServer research Jerms discusses this in one of his blog entries http://episervernz.blogspot.com/2006/08/epise…
Update: EpiServer have added a news article to their site with a downloadable copy of the browser capabilities file. Read it here
Rss Feed