A friend of ours who is a web developer called us saying they were disappointed with how their recent WordPress installation was performing with the search engines. While they liked using the WordPress back end, they felt like they lost enough ranking that they were considering dumping WordPress.
What’chu talkin’ ’bout, Willis?
We can’t have our favorite blogging and content management system being bad mouthed like that, so we sprang into action to defend our beloved WordPress.
What we found was quite eye opening and we felt that we needed to share this to help somebody else from getting caught in the same situation.
The problem wasn’t with WordPress (yay!) it was that fact that WordPress had been installed in a directory that had a previous website installed in it.
Robots.txt and sitemap.xml that the old site had auto generated were still in the root directory and were completely screwing up how the search engines were handing the new WordPress site.
The sitemap.xml file was instructing the search engines to read pages that weren’t there.
The robots.txt file was instructing the search engines to ignore pages that had been moved.
So basically everything they wanted to be seen was being instructed to be ignored and everything they didn’t want to be seen was being allowed to be seen. Talk about a recipe for frustration!
To add insult to injury, the reason for the robots.txt file in the first place was that they were installing development versions of their customers’ websites in sub-directories of their main domain. WordPress, doing its job, was telling the world about these sites. So now you’ve got a domain that’s telling the search engines that it does web development, is a steakhouse, has a soccer team, and will help you with your dry cleaning. Talk about a case of Dissociative Identity Disorder.
Here was our custom prescription to bring sanity to their website, but you should use these instructions any time you install WordPress in a directory that previously contained a site:
- Correct the robots.txt file – this file had directions to an out of date sitemap and also had exclusions that were relevant to the old website structure. Make sure that anything you don’t want the search engines to see is accurately listed. Just remove the reference to the sitemap altogether. Install a plugin to let WordPress take care of it.
- delete the out of date sitemap – we don’t need it, WordPress can do better than that.
- install and activate the “Google XML Sitemap” plugin to allow WordPress to create and maintain your sitemap.
- go to Dashboard / Settings / XML-Sitemap and generate the sitemap for the first time. After that, it will notify the search engines of any site updates.
- Install the plugin call “Redirection“. You want to review the 404 report in the modules section. That’s going to tell you what pages visitors (search engines) are trying to hit and failing on. Any time you see a 404, immediately create a redirection for that bad page to a corresponding page on the new site.
These were special instruction pertaining to sites that were being developed. If you’re working on a site that doesn’t pertain specifically to your primary site, you should follow these directions to keep from confusing the search engines.
- In each test installation of WordPress, go to Dashboard / Settings / Privacy and check the box that say “I would like to block search engines, but allow normal visitors”. By default WordPress is trying to help you by letting the world know what’s going on with your web site. In the case of site development, you don’t want WordPress touting the merits of your customers’ sites on your domain name.
- If you’re doing a lot of development, it may be worth it to register an alternate domain name exclusively for development. That would at least get the search engine confusing customer content off to another domain name. That way if you forget to block search engines, or update a robots.txt file, you’re not damaging the ranking of your primary domain.
Follow these steps to keep your website from inadvertently confusing the search engines.