We always used to use onclick events in links to get the visitor to specific places in our shopping system that we did not want Google to follow. This was a deliberate move to do things like set the number of products shown per section page etc but to stop Google and the other search engines indexing the pages in this way as it would mean multiple versions of the same content being indexed with little or no differences except the querystring and the number of products etc.
Our shopping system uses a cart that is retrieved by an ID located in a cookie, if the cookie is not established a default cart is used. This cart stores various settings that the shop uses to display the site, currency, language, prices with or without tax, products per section etc as well as the more standard products in cart, and eventually checkout details including customer information. The default cart is used until something is done by the visitor that requires something other than the default cart, logging-in, changing a preference or adding something to the cart. At this point a database entry is created and the reference to it stored in a cookie. This was principally done to avoid the huge number of carts created by a bot visiting the site (they don't accept the cookie so a new cart is created with each page visited) we also have the facility at this point to set a preference of no cookies that when set automatically appends every link within the site with a query string that identifies the cart ID, thus allowing a visitor with a browser not accepting cookies to use the shop (this is potentially hazardous because the last thing we want to happen is for Google to index the pages with the querystring in place as this would mean potentially infinite version of the same content being indexed with only a difference in the query string...)
I was interested to see why large number of carts were still being created so started storing the user-agent, the user IP and the URL that was called that created the cart and in that way we started to see that Google was indeed following these onclick events.
We never use links with querystrings in the URL except when they are called by onclick events as having these indexed is not ideal. So by seeing that carts are being created by the Google bot by following URLs that are only available through onclicks. Interestingly so is the Alexa bot, but there is no evidence of Bing, Yahoo etc being able to follow the onclicks..?
We have dealt with the issue of these pages being indexed by the system setting the pref or adding the product etc and then redirecting to a clean URL stopping the pages being indexed with the query string. In theory Google won't have accepted the cookie so as soon as they are redirected they are back to browsing the site with the default cart.
We could have run some sort of filter that detected if the user agent was a bot but this seems to be against the guidelines of Google of showing them the same content as a normal visitor.
I guess the next experiment is whether and how Google is submitting forms, apparently it does submit simple forms, so does that mean its going to get into the checkout and customer areas? these forms might be too complex with the validation that's in place but I would imaging that the form that adds products to the cart (which is usually only requires one select to be set - qty) would be within its capability. I think its also going to be worth trying to work out if Google ever accepts cookies: apparently not but it would render the above redundant... I might have to add a bit of code that counts the pages each cart 'visits' and see if the carts created by Google bot ever get to more that one page which without the cookie, it shouldn't. It might also be worth seeing if Google ever tells us the user agent is something else and re-crawls the site to see if the content is the same we might be able to discover this by looking at the user agents IP and seeing if it occurs with another user agent.
I would appreciate some feedback by anyone facing similar issues or who has similar questions
Click the stars below to give this article a mark out of 10
Post your comments...
We would really appreciate any comments or additions that you have. Include a link in your comment and if we think your comment is appropriate we will publish it. If found this article in any way useful we would really appreciate you bookmarking the page with any of the social bookmarking links provided.
I've been counting the number of pages each cart views for about a week now and can see that any cart created by the user agent declared as Googlebot viewed zero pages. This would suggest that indeed Google is not able to keep any cart assigned by a cookie. I guess this follows what google says and the common consensus, but worth checking..
Posted By: Paul
I was very pleased to find this article it saved me a lot of time