I completed a backlink audit recently, and this is the post I wish I’d had when starting the tedious task of identifying the nasty links. Not all dodgy links are obvious, heck some are even near-impossible to find, especially when you have a spreadsheet containing thousands of them.
This is not a post about how to do a backlink audit from A-Z – that’s already been written about loads of times. Instead, I’m going to take you through how to identify patterns in your backlink data to quickly and accurately uncover spammy links.
I’ve written this post for the greater good of all SEOs, and yes, you’re welcome.
Wait – do I even need to do a backlink audit?
There has been some confusion since the last Penguin update as to whether or not SEOs even need to carry out backlink audits anymore. After all, Google has said that now they only devalue spam links as opposed to penalising the site receiving them, right?
Well, the short answer is: yes, you probably should continue to carry out backlink audits and update your disavow file. You can read more about this here and here.
Why can’t I just use an automated tool to find the bad links?
I know it’s tempting to get automated backlink tools such as Kerboo to do all the hard-lifting for you. Unfortunately, though, this isn’t a great idea.
In the backlink audit I did recently, 93% of porn links were assigned a link risk of ‘neutral’ with a score of 500/1,000 (0 being the safest link and 1,000 being the riskiest). Links from the BBC also received a ‘neutral’ rating, with some getting a higher risk score than the porn links! Go figure.
Automated backlink tools can be super valuable; however, this is because of all the data they draw together into a single spreadsheet, as opposed to them being particularly accurate at rating the risk of links. To rely solely on their link risk metrics for your backlink audit is a quick ticket to trouble.
Is this guide relevant to my site?
This post is not a ‘one-size fits all’ strategy to a backlink audit, so please use your common sense. For example, below I recommend that root domains containing the word ‘loan’ are generally indicative of unscrupulous sites. However, if you’re doing a backlink audit for a financial services firm, then this generalisation is less likely to apply to you.
It’s up to you to think about the guidelines below in the context of the site you’re auditing and to adjust accordingly.
You will need
Before you start, you will need to have all your backlinks neatly assembled in a spreadsheet along with the following information:
- URL (one example per linking root domain)
- root domain
- anchor text
- citation flow (Majestic) or domain authority (Ahrefs or Moz)
- trust flow (Majestic) or domain trust (Ahrefs or Moz)
- IP address
- page language
- link location
- and anything else you can think of that could be useful
This article can bring you up to speed if you’re not sure how to assemble this data. Make sure to combine data from as many sources as possible, as different SEO tools will contain different information and you don’t want to miss anything! As I said earlier, I would also recommend Kerboo as one of your data sources, as it pulls a lot of the information you could want into one place.
How to spot the patterns
Fortunately for us, the bad guys almost always do their dirty work in bulk, which makes life easier for us good guys who inevitably have to clean up after them. It’s rare to find one dodgy directory submission or a single piece of spun content containing a paid link. This is a massive help – use it to your advantage!
I highly recommend creating a pivot table of your data so that you can see how many times an issue has occurred in your data set. This can help you to quickly spot patterns.
Above: spotting suspicious anchor text using a pivot table
For example, let’s say you’re doing a backlink audit for a clothing site. By pivoting for anchor text, you might be able to quickly spot that ‘buy cheap dresses’ appears several times. Given the commercial nature of this anchor text, it’s likely it could be spam. You could spot check some of these URLs to make sure, and if they’re consistently dodgy, you can reasonably assume the rest of the links with this anchor text are too.
Above: putting together a pivot table to spot anchor text frequencies (view large version of gif)
Another thing I like to do is to dump my data into a word cloud generator. This is useful because it visualises the data (the bigger the word, the more times it appears in your dataset). It can help me to quickly catch something that looks like it shouldn’t be there.
Keeping on top of your data
Make sure you make a note as you work that explains why you’ve decided to disavow a set of links. It helps not just at the end when you’re reviewing your links, but will also be a big help when you come to spot patterns. It will also stop you from revisiting the same links multiple times and asking yourself ‘why did I decide these were bad links?’
Above: screenshot from my recent backlink audit with ‘action’ and ‘reason’ columns
Examples of common patterns to find bad backlinks
I’m now going to give you specific examples of bad links which you can use to find patterns in your data.
It’s not always a clear-cut answer as to whether a link is spam or not, however, the guidelines below should help guide you in the right direction.
When you’re unsure about a link, ask yourself: ‘if it wasn’t for SEO, would this link even exist?’
Words to look for in the root domain or URL
X-rated words in the URL
You’ll immediately want to disavow (unless of course, these are relevant to your site) any x-rated links. These usually contain one of the following terms in their URL:
- sex (also sexy can result in some shady sites)
- and any more dodgy words you can think of that relate to orgies, orgasms and other obscenities
Be careful not to accidentally disavow URLs where ‘sex’ is in the middle of a word – such as sussexhotels.com or essex.ac.uk. This will require some manual spot checking.
Root domain contains references to directories & listings
Next, you want to look for any URLs that indicate manipulative SEO link-building tactics. Directories are an obvious example of this, and while not all directories are bad (here is a good article on how to tell the difference), generally those created purely for link-building purposes contain the following words in the root domain:
- ‘directory’ – especially ‘dir’ and ‘webdir’
- ‘links’ – especially ‘weblinks’, ‘hotlinks’ or ‘toplinks’
You might notice I’ve specifically said ‘root domain’ as opposed to ‘URL’ here. There is a reason for this: you might find lots of URLs in your dataset where ‘links’ is in the URL path. As a general rule, these are a lot less likely to be manipulative links. Compare http://www.lutterworthyouththeatreacademy.co.uk/links.html with www.speedylinks.uk. One of these is spam, and the other isn’t – can you spot the difference?
Root domain contains references to SEO
You’ll also find that if the root domain contains SEO or web-related terms, it’s likely it exists simply to serve the purpose of building links. Look out for the following words in the root domain:
Bear in mind that lots of sites have ‘search’ pages, so your best bet is to focus on the root domain for this to be an indication of anything suspect.
Content farms are another common feature of a poor backlink profile. Look for any domains that contain ‘article’.
Other dodgy root domains
The following keywords in the domain are usually indicative of dodgy link-building practices:
- ‘com’ (such as com-com-com.com – yes, really)
Root domain contains consonant or number clusters
Another obvious sign is any root domains which simply do not make sense. You’ll likely have lots of domains linking to your site consisting of bundles of consonants and letters, such as ‘1073wkcr.com’ or ‘a0924111232.freebbs.tw’. Watch out for domains like these, as more often than not they are low quality.
You can easily find URLs like this by sorting your root domain column from A-Z. You will find that:
- any domain starting with a number will appear at the top of your list.
- scrolling to the bottom to letters x, y and z usually throws up lots of domains with consonant clusters that do not make sense.
The ccTLD is uncommon
Uncommon ccTLDs are usually indicative of dodgy sites. Any site worth its salt will try and obtain the .com, .net, .org, .edu or relevant country ccTLD for its domain name. The less common ccTLDs are an indication of a lower quality site and those examples I found in my most recent backlink audit which indicated spammy sites were:
- .properties, etc
Looking at titles for further clues
When the domain name or URL isn’t particularly insightful, the page title is the next place to look. Look out for the same keywords listed above, as well as the following phrases:
- ‘most visited web pages’
- ‘reciprocal links’
- ‘link partner’
- ‘link exchange’
- ‘seo friendly’
Another clue is to find any site titles that are completed unrelated to the niche of your site. Titles that contain commercial terms are particularly suspect, such as
- ‘louis vuitton belts’
- ‘nike shoes’
As I mentioned before, bad backlinks often operate in bulk, and there’s nothing like a load of duplicate titles to lead you hot on the heels of a group of spammy URLs.
What can anchor text tell us?
Is it keyword-heavy?
A popular SEO tactic in the pre-Penguin days was to link to your site with keyword-heavy or commercial anchor text, such as ‘cheap red dresses’. Make sure to put together a pivot table of your anchor text so you can quickly scan for any recurring anchor text that looks suspiciously well-optimised and check out these links to see if they’re legit – they probably aren’t.
Does it make sense?
In addition, any anchor text that simply doesn’t make any sense or is completely unrelated to the site you’re auditing is highly likely to be low quality.
Is the language consistent with the rest of the page?
Finally, any anchor text that is in a different language to the rest of the content on the page is likely to be a paid link. You can use the ‘language’ column (provided by Ahrefs and Kerboo) to see what language the page is in, and you can compare this to the language of the anchor text of your links. Anywhere where there is a mismatch is likely to be suspicious.
Duplicate root IP address
Pivot your data to see if there are several with the same IP address. If there is a block of URLs that share the same IP address and one of these is spammy, it could be likely that the rest are too.
Make sure to do a manual spot check of the sites to make sure you’re not disavowing anything harmless. For example, sites hosted at blogspot.com and wordpress.com are commonly hosted at the same IP address, and many of these will be harmless.
Where on the page is the link located?
In many backlink reports, there’s a column which tells you where on the page the link is located. In Kerboo, this column is called ‘link section’, and it’s another nifty tool for us to use in our hunt for dodgy links. Filter this column for keywords contained in the footer and sidebar to see if there are any which look suspicious on opening the page.
Footer and sidebar links are prime locations for dodgy backlinks. Why? Because these are site-wide, they are often targeted for paid link placements as the recipient of the link can often benefit from the most link equity in this way.
In addition, if the link is providing no value to users on the site (for example, if it’s completely unrelated to the site content, which is likely if it’s a paid link) then the footer is a useful place to essentially ‘hide’ the link from users while still providing link equity to the recipient.
Where is the link pointing to?
In the ‘link to’ column, look out for links pointing to the ‘money pages’ on your site – these are any pages which are revenue-drivers or particularly important for other reasons, such as product pages or sign-up pages.
It’s natural in a backlink profile to have the majority of links pointing to your homepage; this is where most people will link to by default. It’s much harder to build links to pages deeper in a site, especially product pages, as it’s not particularly natural for people to link here.
By glancing an eye over links which point to money pages, it’s likely you could spot a few suspicious links which have been previously built to help boost the rankings of important pages on your site.
Taking things to the next level
All the tips I’ve shared with you so far have involved mining data that is easily accessible to you in your backlink spreadsheet – things such as root domain, URL, page title and anchor text.
To take your backlink audit up a level, it’s time to get savvy. This is where Screaming Frog comes in.
Using Custom Search to spot link directories
You know how earlier we mentioned that not all directories are bad? Well, an easy way to spot if a directory exists solely for link-building purposes is to see if the page contains phrases such as ‘submit link’, ‘link exchange’ or ‘add your site’.
These telltale phrases will not necessarily be in the URL or page title of your link, so this is why it’s necessary to take things up a step.
To find pages which contain these terms, you can run a crawl of your backlink URLs using the Screaming Frog Custom Search feature.
Above: using Screaming Frog ‘Custom Search’ to find web pages containing suspicious text
Once the crawl is finished, you can then download the URLs that contain the phrases above. These will most likely be some obvious link directories that you’ll want to disavow pretty sharpish.
Using Custom Search to spot spun content
The Screaming Frog custom search feature isn’t just useful for finding directory links. This is where you really need to put on your detective hat and to have a good think of any patterns you’ve noticed so far in your backlink audit.
When I did my audit recently, I noticed a recurring theme with some of the paid links. There were links to other sites with commercial anchor text that kept appearing alongside the link to the site I was auditing. This was a piece of spun content that had been copied and pasted across multiple sites and forums, and whoever had done the work was clearly being lazy, lumping a load of unrelated links together in one paragraph.
Apart from the fact the text made no sense whatsoever, the anchor text of these other links was extremely commercial: ‘cheap nike free run 2 for men’ and ‘chanel outlet UK’ where a recurring theme.
Above: example of spun content that appeared in my recent backlink audit
I’d tried to find a pattern in the URLs or titles of these pages, but it was a bit hit and miss. It was then that I realised I could do what I had done to find the directory links – Screaming Frog custom search.
I, therefore, carried out a Screaming Frog crawl that looked for recurring anchor text such as ‘cheap nike’ and ‘chanel outlet’ to identify any URLs that I hadn’t yet uncovered. It was extremely useful and allowed me to identify some URLs that up to that point I had been unable to identify from the data in my spreadsheet alone.
To wrap up
If you’ve made it this far, congratulations! I appreciate this post was a lot of writing, but I hope it’s really helped you to dig out any dodgy links that were lurking under the surface.
If there’s one thing to take away, it’s to look for any patterns or consistencies in the dodgy links that you find, and to then use these to dig out the less obvious links.
Do you have certain criteria that you find helpful when identifying bad backlinks? Comment below!