We're looking to develop a methodology to rank a few hundred domains based on their propensity to incur an algorithmic penalty in Google. A couple years ago, Google got more aggressive with it's determination of "unnatural links" and we're looking to take a list of sites and rank them. This is possibly a simple statistical problem, but we have a number of questions to ensure that we're computing these rankings based on something that is sound and will stand up to the scrutiny of the persnickety SEO community.
POTENTIAL DATA INPUTS:
As inputs for this computation, we have a number of tools at our disposal. Each of these tools is independent of each other, indexing their own subset of the web and computing metrics that are loosely based on the original PageRank methodology to some degree. What follows is a non-exhaustive list of those available metrics, highlighting only the ones that we believe would be useful in determining this rank score.
Moz (formerly SEOmoz) - Moz invented the "open" link index and as such their metrics are largely the industry standard. They don't update as often as some of the other tools, but they offer very precise data points. See the following metrics, with more detail at their accompanying links.
- Domain Authority - https://moz.com/learn/seo/domain-authority
- MozRank - https://moz.com/learn/seo/mozrank
- MozTrust - https://moz.com/learn/seo/moztrust
- Spam Score - https://moz.com/blog/spam-score-mozs-new-metric-to-measure-penalization-risk
- Number of Linking Root Domains
- Total Number of Links
- Number of NoFollow Links
- Number of DoFollow Links
Majestic (formerly MajesticSEO) - Moz's main competitor with what many believe is the biggest index of links.
- Citation Flow - https://majestic.com/support/glossary#CitationFlow
- Trust Flow - https://majestic.com/support/glossary#TrustFlow
- Topical Trust Flow - https://majestic.com/support/glossary#TopicalTrustFlow
- Number of External Backlinks
- Number of Referring Domains
- Number of Referring IPs
- Number of Referring Subnets
Quality of Link Velocity - This is a qualitative binary metric that we would determine based on whether we believe they are building too many links too fast.
Ahrefs - This is the later entrant into the link index game. They have one of the best UIs, update the quickest and arguably have the biggest index.
- Number of backlinks
- Number of Referring Domains
- Domain Rating - https://blog.ahrefs.com/new-algorithm-for-domain-rank/
- Number of NoFollow
- Number of Dofollow
- Quality of Link Velocity - This is a qualitative binary metric that we would determine based on whether we believe they are building too many links too fast.
SearchMetrics Essentials - This is a platform that crawls a number of domains and collects rankings and extracts other features from the site. This will give us an indication of the trend of visibility. There's also a visibility score. We'd identify this qualitatively as to whether the domain is trending upward or downward.
CognitiveSEO - This tool is pretty in-depth with regard to link analysis. It offers link visualization and a specific computation of the percent of links that are algorithmically determined as unnatural based on its own machine learning algo. The user (us) has to determine which anchor text is considered branded versus miscellaneous vs. commercial. From those inpuots, the tool makes a determination.
The metrics returned are:
- Percentage of unnatural links
- Number of unnatural links
- Percentage of suspect links
- Number of suspect links
- Percentage of OK links
- Number of OK links
- Number of Links Analyzed
- Average authority
- avg of Good authority
- avg of low authority
- avg of high authority
Here's more info on how their unnatural link tool works. http://cognitiveseo.com/blog/3068/automatic-unnatural-link-detection/
We can provide all of this data for a subset of the list in order to support development of the methodology in CSV format.
Our main question around the usage of the CognitiveSEO is, how should we determine statistical significance? As it stands, the tool pulls a relatively low number of links per site. However, they just made an upgrade that allows us to pull a bigger sample size. Should we be computing stat based on the standard formula or will a certain small percentage of links per domain be enough?
OUTPUTS OF THIS ENGAGEMENT
We're looking for specific direction there as well as a formula for ranking based on some combination of the above metrics. We'd also like an written explanation describing the methodology that can be inserted into the resulting whitepaper. We're happy to give you credit for that methodology once we publish.
We're of course open to any insights as to what else we might want to consider based on the expertise of whomever we end up working with.
This project should require 10 hours of work.