Subscribe to RSS Feed

Author Archive

May 2010 Linkscape Update (and Whiteboard Explanations of How We Do It)

Posted by randfishAs some of you likely noticed, Linkscape’s index updated today with fresh data crawled over the past 30 days. Rather than simply provide the usual index update statistics, we thought it would be fun to do some whiteboard diagrams of how we make a Linkscape update happen here at the mozplex. We also felt guilty because our camera ate tonight’s WB Friday (but Scott’s working hard to get it up for tomorrow morning).

Linkscape, like most of the major web indices, starts with a seed set of trusted sites from which we crawl outwards to build our index. Over time, we’ve developed more sophisticated methods around crawl selection, but we’re quite similar to Google, in that we crawl the web primarily in decending order of (in our case) mozRank importance.

For those keeping track, this index’s raw data includes:

  • 41,404,250,804 unique URLs/pages
  • 86,691,236 unique root domains

After crawling, we need build indices on which we can process data, metrics and sort orders for our API to access.

When we started building Linkscape in late 2007, early 2008, we quickly realized that the quantity of data would overwhelm nearly every commercial database on the market. Something massive like Oracle may be able to handle the volume, but at an exorbitant price that a startup like SEOmoz couldn’t bear. Thus, we created some unique, internal systems around flat file storage that enable us to hold data, process it and serve it without the financial and engineering burdens of a full database application.
Our next step, once the index is in place, is to calculate our key metrics as well as tabulate the standard sort orders for the API

Algorithms like PageRank (and mozRank) are iterative and require a tremendous amount of processing power to compute. We’re able to do this in the cloud, scaling up our need for number-crunching, mozRank-calculating goodness for about a week out of every month, but we’re pretty convinced that in Google’s early days, this was likely a big barrier (and may even have been a big part of the reason the "GoogleDance" only happened once every 30 days).
After processing, we’re ready to push our data out into the SEOmoz API, where it can power our tools and those of our many partners, friends and community members.

The API currently serves more than 2 million requests for data each day (and an average request pulls ~10 metrics/pieces of data about a web page or site). That’s a lot, but our goal is to more than triple that quantity by 2011, at which point we’ll be closer to the request numbers going into a service like Yahoo! Site Explorer.
The SEOmoz API currently powers some very cool stuff:

  • Open Site Explorer – my personal favorite way to get link information
  • The mozBar – the SERPs overlay, analyze page feature and the link metrics displayed directly in the bar all come from the API
  • Classic Linkscape – we’re on our way to transitioning all of the features and functionality in Linkscape over to OSE, but in the meantime, PRO members can get access to many more granular metrics through these reports
  • Dozens of External Applications – things like Carter Cole’s Google Chrome toolbar, several tools from Virante’s suite, Website Grader and lots more (we have an application gallery coming soon)

Each month, we repeat this process, learning big and small lessons along the way. We’ve gotten tremendously more consistent, redundant and error/problem free in 2010 so far, and our next big goal is to dramatically increase the depth of our crawl into those dark crevices of the web as well as ramping up the value and accuracy of our metrics.
We look forward to your feedback around this latest index update and any of the tools powered by Linkscape. Have a great Memorial Day Weekend!Do you like this post? Yes No

Continue Reading »
Comments Off
All Links are Not Created Equal: 10 Illustrations on Search Engines’ Valuation of Links

Posted by randfishIn 1997, Google’s founders created an algorithmic method to determine importance and popularity based on several key principles:

  • Links on the web can be interpreted as votes that are cast by the source for the target
  • All votes are, initially, considered equal
  • Over the course of executing the algorithm on a link graph, pages which receive more votes become more important
  • More important pages cast more important votes
  • The votes a page can cast are a function of that page’s importance, divided by the number of votes/links it casts

That algorithm, of course, was PageRank, and it changed the course of web search, providing tremendous value to Google’s early efforts around quality and relevancy in results. As knowledge of PageRank spread, those with a vested interest in influencing the search rankings (SEOs) found ways to leverage this information for their websites and pages.
But, Google didn’t stand still or rest on their laurels in the field of link analysis. They innovated, leveraging signals like anchor text, trust, hubs & authorities, topic modeling and even human activity to influence the weight a link might carry. Yet, unfortunately, many in the SEO field are still unaware of these changes and how they impact external marketing and link acquisition best practices.
In this post, I’m going to walk through ten principles of link valuation that can be observed, tested and, in some cases, have been patented. I’d like to extend special thanks to Bill Slawski from SEO By the Sea, whose recent posts on Google’s Reasonable Surfer Model and What Makes a Good Seed Site for Search Engine Web Crawls? were catalysts (and sources) for this post.
As you read through the following 10 issues, please note that these are not hard and fast rules. They are, from our perspective, accurate based on our experiences, testing and observation, but as with all things in SEO, this is opinion. We invite and strongly encourage readers to test these themselves. Nothing is better for learning SEO than going out and experimenting in the wild.
#1 – Links Higher Up in HTML Code Cast More Powerful Votes

Whenever we (or many other SEOs we’ve talked to) conduct tests of page or link features in (hopefully) controlled environments on the web, we/they find that links higher up in the HTML code of a page seem to pass more ranking ability/value than those lower down. This certainly fits with the recently granted Google patent application – Ranking Documents Based on User Behavior and/or Feature Data, which suggested a number of items that may considered in the way that link metrics are passed.

Those who’ve leveraged testing environments also often struggle against the power of the "higher link wins" phenomenon, and it can take a surprising amount of on-page optimization to overcome the power the higher link carries.
#2 – External Links are More Influential than Internal Links

There’s little surprise here, but if you recall, the original PageRank concept makes no mention of external vs. internal links counting differently. It’s quite likely that other, more recently created metrics (post-1997) do reward external links over internal links. You can see this in the correlation data from our post a few weeks back noting that external mozRank (the "PageRank" sent from external pages) had a much higher correlation with rankings than standard mozRank (PageRank):

I don’t think it’s a stretch to imagine Google separately calculating/parsing out external PageRank vs. Internal PageRank and potentially using them in different ways for page valuation in the rankings.
#3 – Links from Unique Domains Matters More than Links from Previously Linking Sites

Speaking of correlation data, no single, simple metric is better correlated with rankings in Google’s results than the number of unique domains containing an external link to a given page. This strongly suggests that a diversity component is at play in the ranking systems and that it’s better to have 50 links from 50 different domains than to have 500 more links from a site that already links to you. Curiously again, the original PageRank algorithm makes no provision for this, which could be one reason sitewide links from domains with many high-PageRank pages worked so well in those early years after Google’s launch.
#4 – Links from Sites Closer to a Trusted Seed Set Pass More Value

We’ve talked previously about TrustRank on SEOmoz and have generally reference the Yahoo! research paper – Combating Webspam with TrustRank. However, Google’s certainly done plenty on this front as well (as Bill covers here) and this patent application on selecting trusted seed sites certainly speaks to the ongoing need and value of this methodology. Linkscape’s own mozTrust score functions in precisely this way, using a PageRank-like algorithm that’s biased to only flow link juice from trusted seed sites rather than equally from across the web.
#5 – Links from "Inside" Unique Content Pass More Value than Those from Footers/Sidebar/Navigation

Papers like Microsoft’s VIPS (Vision Based Page Segmentation), Google’s Document Ranking Based on Semantic Distance, and the recent Reasonable Surfer stuff all suggest that valuing links from content more highly than those in sidebars or footers can have net positive impacts on avoiding spam and manipulation. As webmasters and SEOs, we can certainly attest to the fact that a lot of paid links exist in these sections of sites and that getting non-natural links from inside content is much more difficult.
#6 – Keywords in HTML Text Pass More Value than those in Alt Attributes of Linked Images

This one isn’t covered in any papers or patents (to my knowledge), but our testing has shown (and testing from others supports) that anchor text carried through HTML is somehow more potent or valued than that from alt attributes in image links. That’s not to say we should run out and ditch image links, badges or the alt attributes they carry. It’s just good to be aware that Google seems to have this bias (perhaps it will be temporary).
#7 – Links from More Important, Popular, Trusted Sites Pass More Value (even from less important pages)

We’ve likely all experienced the sinking feeling of seeing a competitor with fewer and what appear to be links from less powerful pages outranking us. This may be somewhat explained by the value of a domain to pass along value via a link that may not be fully reflected in page-level metrics. It can also help search engines to combat spam and provide more trusted results in general. If links from sites that rarely link to junk pass significantly more than those whose link practices and impact on the web overall may be questionable, they can much better control quality.
NOTE: Having trouble digging up the papers/patents on this one; I’ll try to revisit and find them tomorrow.
#8 – Links Contained Within NoScript Tags Pass Lower (and Possibly No) Value

Over the years, this phenomenon has been reported and contradicted numerous times. Our testing certainly suggested that noscript links don’t pass value, but that may not be true in every case. It is why we included the ability to filter noscript in Linkscape, but the quantity of links overall on the web inside this tag is quite small.
#9 – A Burst of New Links May Enable a Document to Overcome "Stronger" Competition Temporarily (or in Perpetuity)

Apart from even Google’s QDF (Query Deserves Freshness) algorithm, which may value more recently created and linked-to content in certain "trending" searches, it appears that the engine also uses temporal signals around linking to both evaluate spam/manipulation and reward pages that earn a large number of references in a short period of time. Google’s patent on Information Retrieval Based on Historical Data first suggested the use of temporal data, but the model has likely seen revision and refinement since that time.
#10 – Pages that Link to WebSpam May Devalue the Other Links they Host

I was fascinated to see Richard Baxter’s own experiments on this in his post – Google Page Level Penalty for Comment Spam. Since then, I’ve been keeping an eye on some popular, valuable blog posts that have received similarly overwhelming spam and, low and behold, the pattern seems verifiable. Webmasters would be wise to keep up to date on their spam removal to avoid arousing potential ranking penalties from Google (and the possible loss of link value).

But what about classic "PageRank" - the score of which we get a tiny inkling from the Google toolbar’s green pixels? I’d actually surmise that while many (possibly all) of the features about links discussed above make their way into the ranking process, PR has stayed relatively unchanged from its classic concept. My reasoning? SEOmoz’s own mozRank, which correlates remarkably well  with toolbar PR (off on avg. by 0.42 w/ 0.25 being "perfect" due to the 2 extra significant digits we display) and is calculated with very similar intuition to that of the original PageRank paper. If I had to guess (and I really am guessing), I’d say that Google’s maintained classic PR because they find the simple heuristic useful for some tasks (likely including crawling/indexation priority), and have adopted many more metrics to fit into the algorithmic pie.
As always, we’re looking forward to your feedback and hope that some of you will take up the challenge to test these on your own sites or inside test environments and report back with your findings.
p.s. I finished this post at nearly 3am (and have a board meeting tomorrow), so please excuse the odd typo or missed link. Hopefully Jen will take a red pen to this in the morning!Do you like this post? Yes No

Continue Reading »
Comments Off
How to Measure & Improve SEO: eMetrics London 2010 Presentation

Posted by randfishLast week, I gave a 45-minute presentation at eMetrics London on a variety of analytics for SEO topics. The presentation slide deck is embedded below:
Metrics for SEO by Rand Fishkin (eMetrics London 2010)
The presentation went into more depth in person, but topics included:
Some basics:

  • Measuring traffic against macro query growth
  • Measuring against search market share
  • Measuring against temporal trends
  • Keyword selection based on traffic quantity, quality and difficulty of ranking
  • Choosing keyword messaging to optimize conversion rate
  • Tracking CTRs on search results
  • Identifying crawl errors using a variety of tools
  • Tracking rankings – when, where and why it’s useful 

A handful of intermediate level tactics:

  • Getting beyond "last-click" attribution
  • Evaluating indexation for SEO
  • Tracking vertical search results using filters

And some more advanced items:

  • Evaluating metrics for predicting search results ordering and valuing links/content
  • Applying metrics to improve your SEO
  • Valuing social media together with search
  • Discussing the relative impacts (both primary and second-order effects) that social has on rankings

Happy weekend everyone!Do you like this post? Yes No

Continue Reading »
Comments Off
Debating the Value (and Meaning) of "Great Content" for SEO

Posted by randfishThe SEO industry, like many others, has private forums, chat threads and groups of connected individuals whose interactions happen largely behind closed doors. Today, I’d like to pull back a curtain and share a debate that occurred between a number of CEOs in the search marketing industry over the last few days that I think you’ll find both fascinating, and hopefully, valuable, too.
The topic is the concept that content quality is highly correlated or predictive of high rankings in the search engines. This isn’t a cut-and-dry debate, but a more nuanced and, yes, subjective look at content quality from a wide range of perspectives.
First, I’ll introduce our players (these are just the folks who agreed to have their contributions published), after which we can dive into the discussion:

Stephan Spencer is VP of SEO Strategies at Covario, co-author of The Art of SEO, founder of Netconcepts (recently acquired by Covario), and inventor of the GravityStream SEO proxy technology (now rebranded as Covario’s Organic Search Optimizer).

Gord Hotchkiss is the President and CEO of Enquiro, author of the BuyerSphere Project and a leading expert and research on online and search user behavior.

Thad Kahlow is the CEO of BusinessOnLine, one of the nation’s leading online marketing agencies, successfully launching hundreds of solutions for clients including American Red Cross, Caterpillar, Sony, NEC, Sybase, and Hasbro, to name a few.

Eric Enge is the President of Stone Temple Consulting, a 16 person SEO and PPC consulting firm with offices in Boston and Northern California. Eric is co-author of The Art of SEO from O’Reilly.

Chris Baggott co-founded ExactTarget and authored the popular book: Email Marketing By The Numbers.  He is currently co-founder/CEO of Compendium an Enterprise Social Content Publishing software and writes about Best Practices for Blogging on his own blog. 

 
Richard Zwicky the Founder and President of Eightfold Logic (formerly known as Enquisite), a predictive insights and search and social analytics platform used by enterprises and agencies around the world. A serial entrepreneur, Richard is the author of multiple patents and has been involved in online marketing since the late 1990’s.

Lawrence Coburn is the CEO and co-founder of DoubleDutch – the first white label geolocation platform.  He is also an Editor at The Next Web’s geolocation blog, and a mentor at io ventures – a San Francisco based startup incubator.

Will Critchlow is a co-founder of Distilled, a London & Seattle based SEO consultancy. He speaks regularly at industry conferences on analytics, data-driven optimization and data visualization.

 
 Rand Fishkin is… the author of this post :-)

Thad Kahlow (in reference to these three posts):
Great Content ≠ Great Rankings
We disagree.   Great content is so important to ranking well.  It may not be the only factor, that content has to be found, be newsworthy and incorporate keywords that will drive traffic but the basic principle that content is a huge factor to generate competitive rankings.
Rand Fishkin:
Great content ≠ Great Rankings
I’ll fight tooth and nail on this one. Great content is a really good thing to do for many reasons, but I’d doubt it correlates to great rankings any better than PageRank does (or doesn’t).
Eric Enge:
Speaking of fighting tooth and nail, you have now earned my $0.02 (:>). Great content may not equate to great rankings by itself, but if we look at the integrated whole of a web marketing strategy, where link building and social media promotion are the driving components of success, great content is a MUST.  I think it would be a disservice to put anything out there that suggests otherwise.  Accordingly, I would suggest:
"In the absence of a marketing strategy to leverage it, great content will not necessarily drive great rankings, but if you are looking to create a major web property (for your market space) then great content is a requirement.  It’s impact on the promotion of your web site is fundamental.  Obtaining links and getting positive feedback from social media communities is far easier with great content."
Stephan Spencer:
Great content doesn’t automatically mean great rankings. In other words, it is not a foregone conclusion that great content will necessarily rank just because of its quality. The content may deserve to be ranked, but if no one knows about it, or if the site architecture is so atrocious that it repels the spiders, then it won’t rank. It’s as important to actively promote that great content as to have created it. I’m simply making an argument against that tired old phrase “Build it and they will come.” Don’t let my comments dissuade you from creating high quality content though! Indeed, it’s a likely prerequisite for SEO success, especially when the keywords being targeted are highly competitive.
Chris Baggott:
So if Eric is $0.02…..I’m $0.001  :-)
I’d like to chime in on the content question. Isn’t this an issue of competition?

I wrote a post talking about one of Rand’s slides at Web 2.0 showing 4 word phrases being the highest converting. We took a look at our own client base and found exactly the same correlation. The client I use in the example has decent domain authority but not much else other than content and categories specifically relevant to the longer tail terms they are targeting. When we talk about SEO don’t we need to differentiate the fat head tactics from the long tail tactics?
Rand you made a great case for the long tail and conversion in your deck. Vanessa Fox in her new book makes the statement that 56% of all searches return no ads. As the world starts to appreciate that the tail is only going to get longer it seems like content is going to be getting more an more important. Am I crazy making the assumption that the lower the query competition the bigger the role of content relevance, recency and frequency plays in driving high converting traffic?
Rand Fishkin:
Agreed – for the long tail, domain authority + enough juice to get lots of pages indexed + the mere mention of the phrase combo = you’re often ranking top 5
Eric – agree with you as well, it certainly makes many things easier, but so many people in our industry (and outside of it) think and promote the idea that "great content" (which, IMO, has been repeated so often it’s nearly lost meaning) will get you rankings. Great marketing will get you rankings, often regardless or in spite of content quality.
Lawrence Coburn:
I agree wholeheartedly.  To do well in the tail, you need deep (though not necessarily high quality) content.  To match 4-5-6 word queries at scale, you need to have a lot of content to draw from.

On a related note, the May Day update changed something around exactly these sorts of queries, and for us, not for the better.  I’m curious as to where the traffic that was going to us, and other big, broad, content sites, is now going.
Rand Fishkin:
Yeah – we also took about a 10% hit in the tail of search traffic from Google to seomoz.org and that was weird. Previous updates have always only helped us do better or stayed the same. Digging into traffic data, it appears to be fewer pages receiving any traffic, which tells me it’s most likely an indexation issue – Google getting pickier about what it keeps in the index.
Thad Kahlow:
Re: great content = great rankings.
I agree, and don’t think many could disagree, that creating great content, upon itself will deliver great rankings.  But when we look at this issue from a much broader context (30k ft), Google’s mission is to provide the most relevant experience (not just SERPs). Better content provides a better experience.
So I digress because I believe this topic addresses a systemic ailment within search (may piss off a few ole school’s with this one)… but we as SEOs spend significantly too much time obsessing (me included)  over algorithmic loop holes, updates, dances, undulations…to the point of reaching diminished returns.  And I humbly (as much as I can be) suggest that if we as SEOs spent more time with our clients focusing on the end users needs when launching a search campaign and built unique, relevant content, and less focus on the extreme nuisances on the algo (yes, you need an extremely sound SEO best practices foundation +some)… Google, the client and most importantly the end users are better served = Search Industry wins.  Otherwise, we are all fighting the battle of “out- optimizing” the other and not on the ultimate mission- winning the “relevant experience” war.
In sum- a significant focus on creating creative relevant content should be a major focus of every search solution, yet far few do.
Gord Hotchkiss:
Couldn’t agree more with Thad (surprise, surprise)…
And I would go even further. Search is rapidly growing beyond relevance as a metric of success to usefulness. Relevance is, and always has been, simply a measurable proxy for usefulness. Expect Google algos to start finding signals of usefulness, across multiple content buckets, and using that to determine what gets shown when, and to whom.

So, more and more, SEO and prospect intent have to align and chasing algos becomes moot. I think we have to worry much less about systematic testing against a black box algo and worry more about understanding what our prospects want to do. That’s where the search engines have to head.
Eric Enge:
Thad – great restatement of what I was saying.
We all need to remember where Google (and Bing) are going.  They want high quality content.  Over time, they WILL get it.  Winning the "relevant experience" war will help you build great traffic now, and secure your business from the inherent risks of changes in Google’s algorithm (because those changes will likely be a positive for you)
Rand Fishkin:
I’m going to, oddly enough, say that I disagree with a few of these statements.

Much as I would love to believe the engines will eventually reverse into signals that push higher quality content above more popular content, I don’t think that will ever be the case.

Every other field is the same – it’s not the fantastic, artistic, often foreign-language, personally compelling films that win Oscars or sell big at the box office. It’s not the authentic, possibly awkward, but highly dedicated, humble and talented politicians who win elections. It’s not the news with the most substance, science and accuracy that earns the front page headlines. In every facet of human life – it’s what’s popular and what’s marketed.

I believe that as SEOs, we owe it to our clients to let them know that accessibility and quality are certainly bases they need to hit, but they won’t necessarily win the battles or the war, even in the long term.

As Google/Bing/etc turn to new signals, they’re looking at things like personalization, social search, Twitter data, usage data, etc. – these aren’t things that "can’t be gamed" or that predict "quality content" – they’re just like data points society uses to value films, politicians and news stories. That’s why my belief is that SEO isn’t about "great" content or "the most useful" content. It’s about the "most marketable" content targeted to demographics that are likely to fulfill the search engines’ signals. Today, that’s those on the web who create links. Tomorrow it could be those who tweet and share on Facebook. In years to come, it might be a wider swath of web users, but they will still be influence-able the way humans always are – through psychologies that persuade them to take action in the kinds of ways the engines measure.

I’ll ask a final question – does anyone here believe that the highest converting landing page is the one that does the best job explaining the product or the one that taps into the science of persuasion (social proof, ego, scarcity, etc.)?

At the 30K foot level, I think Google is about representing popularity and relevance on the web the same way it’s done in real life. They’re not trying to re-invent the way humans consider/judge/evaluate content.

The above is, of course, opinion.
Gord Hotchkiss:
I think you’re right Rand…increasing, Google will try to pick up sociological and “human” based signals, rather than arbitrary semantic calculations. If you think about PageRank, it’s really a network based signal based on what they had to work with at the time, hyperlinking structures. Today, we have social networks and I’m sure there are a few people at Google smart enough to determine emergent behaviors out of the complexity of that network structure – SocialRank.

The second piece of this is personalization..identifying context relevant tasked based intent, and matching the network wide signals to that. Again, difficult to optimize against this…no universally true baseline to test against!

So, with the absence of a consistent and testable environment, we have no option but to switch our focus to people instead. If that’s where Google is going (and I know Microsoft is heading in that direction), we have to be going there too….
Richard Zwicky:
I’d disagree that there are no universal baselines, nor is it the best quality content, nor the most content that drives this.
Actually, I think that in its own way, Google always has tried to pick up on sociological and human based signals.  The reality is that in the past, the dimensions for input were quite flat, and that allowed us to consider things two-dimensionally:  Very simply; the site, and other sites that linked in, with just a little outside input.
The data points being examined were finite, and relatively easy to manipulate.  As the networks have grown, and the ability and manners in which people have interacted has changed, so have a lot of the notions.  Social networks are a dimension which doesn’t necessarily connect directly to any one site at any time, but the activity therein sends very definite market signals about complex behaviour patterns globally which can be used to alter the algorithmic concepts of relevance.
I’d disagree that there are no baselines to test against, or optimize against.  It’s just the field of perspective to provide the analysis is different. Trends, baselines and norms are hard to determine on the individual, or even among small groups, but norms can be established over time, contextual variances defined, and then norms applied to other new or unique segments.
I would argue that the change in signal measurement is analogous to the change in communities that’s occurred in the last 200 years. (in North America)  As you move through these periods, signals, outreach, measurement all change, as did the tools of marketing. Here’s a very short synopsis, to give you an idea of my perspective….
200 years ago, most people were born, raised, and died within 25 miles of the same place.  Very few people ventured out, went away to school, etc…  This was your community.  You were raised with, worked with, and socialized with the same group of people.  Their interests were your interests.  Any wonder there was a caste / class system?
~150 years ago, rail networks were established, and movement increased.  People traveled, but not too distantly, and usually only to hubs.  Your community expanded a little, but not much.  But you were exposed to more and more.
~100 years ago the automobile age started.  People now traveled through a larger area, their regular range of movement grew to a ~100 mile radius.  Now, you often were working with people you’d not encountered while growing up, your children were traveling further and further away to school, and your community was different based on interests.
~1945 – The modern automobile age began.  Now working 2 hours away from home was "normal" (funny how the Internet’s changing that part back!).  Your home community was distinct from work.  Husband and wife each had different communities and interactions during the day.  Signals became much noisier.  Marketing had to become more sophisticated. Messaging bounced around more.
~194X – Telephones in every home became common (not that long ago!) – first "Buzz marketing"  ??  Still individual to individual…
~1960 – Televisions in every home became common… mass visual communication, and marketing.
~197X – The IT age starts, you know how this goes….
Communities?  Nothing like they even were when I was growing up.
Today, like most of yours,  mine is global, not local. It’s based on a huge range of interests, and people I’ve encountered globally through my life.  I don’t have a single community I participate in regularly, I have many.  I fade in and out from time to time as interest grows and fades.  The buzz in one community is generally on different topics from one to another, and yet there are consistent common threads through all of them, no matter how disconnected.  
Marketing, measuring, and responding the way a search engine needs to? It needs to monitor all the signals, across all communities, and understand how contextual relevance shifts.  In essence, if I use the above analogy, the signals the engines used to monitor would be akin to where we were in community evolution somewhere between 100 years ago and 1945.  The dimensions to be measured, and factored in are so far beyond how most traditional marketers think is unfathomable (which is this group’s opportunity).
Chris Baggott:
This is an opinion I agree with.  My only point from earlier has to do with popularity….compared to what?   It’s a lot easier to be "popular" in a smaller pond.    :-)
Will Critchlow:
I’m a bit late to this party. A couple of late thoughts:
Firstly, I thought today’s xkcd was appropriate:

Secondly, I think that a lot rests on how we define ‘great content’. However we define it, I think Rand is correct that it cannot (except in rare cases) be sufficient – at a minimum it needs a strategy of repeated delivery that leads to enough of a following to bring the links it needs. I would like to bundle up a degree of ‘linkability’ into the definition of ‘great content’ though. Rand – I think your definition (where you compare it to great artwork, or honest politicians) is too narrow. I believe that ‘great’ in this context can be defined as the right combination of populist within the right niche, remarkable (in the Seth Godin sense of "likely to be remarked-upon") as well as the purer content metrics.
Finally, I was thinking about this in the context of the Mayday update, when it seems to me that we saw a change in the relative likelihood of content to succeed depending on where it appears. Best estimates show that we saw a move from long-tail rankings for content on large, powerful domains to rankings for smaller, more niche domains. Although this kind of relative change is nothing new, it is a timely reminder that content doesn’t operate in a vacuum.
I think Will’s actually done a remarkable job summing things up – it all depends what we mean by "great" content and how we think about the evaluation of that word by all the signals the engines measure today and might measure tomorrow.
Hopefully, this debate has been valuable to you – we felt, after looking back through the thread, that there was a lot of great stuff that deserved wider review and more thought. We’d all love to hear what you’ve got to say/share on the subject.Do you like this post? Yes No

Continue Reading »
Comments Off
SMX London: Ranking Factors in 2010

Posted by randfishSomehow, my flight from Seattle landed just before the newest Icelandic ash cloud began shutting down airports across UK airspace. As a result, I was able to present at SMX London this morning. The presentation is included below.
This slide deck focuses on things that are probably in the search engines’ ranking algorithm today (e.g. the reasonable surfer model), might be in today and probably will be more in the future (e.g. tweet data) and may or may not get in (e.g. Facebook’s open graph).
SEO Ranking Factors 2010 SMX London


Everyone at SMX noted that this year’s event is much busier and more active than last year. Goodbye recession; hello London as a new center of the search world.
Looking forward to your comments as always!
p.s. I’m planning to write a much more comprehensive post about the "reasonable surfer" patent, but in the meantime, be sure to read Bill Slawski’s analysis.Do you like this post? Yes No

Continue Reading »
Comments Off
Some Opinions on the SEO Myths & Realities Fight

Posted by randfishA few weeks back, Stephan Spencer (one of my Art of SEO coauthors) authored a post for SearchEngineLand entitled 36 Myths that Won’t Die But Need To. I certainly recommend checking out the post, but be warned of some highly contentious comments. The tweets and offline feedback were similarly up-in-arms and it’s easy to understand why.
SEO is a field where reputation is a huge part of your ability to perform well. Because the search engines don’t publish comprehensive guidelines (or even guidelines that cover 1/10th of the material necessary for good SEO work), businesses rely on the savvy of individual consultants, contractors and employees. If your boss reads Stephan’s article and sees him contradicting advice that you’ve been giving for years, faith erodes and with it, job security. Luckily (or perhaps unluckily), there’s probably 5-10 articles you can find on the web that support your side of the story, many from quality, trusted sources.
The lack of standards sucks. But, it’s also the reason our industry is so exciting. New experiments & experiences can reveal critical data about search engine operations. The ability to become an expert is open to anyone with the skills and perseverance to see it through. But, no matter how hard you try, it’s hard to overcome some of the persistent myths of the SEO field – I’ve been caught in plenty of them myself (and who knows, maybe still am today).
This post is going to look at some of those nagging, lingering falsehoods that continue to thwart good SEO efforts, specifically those that Stephan called out and faced strong resistance. As always, this is my opinion, based on my experience (see the moz disclaimer) except in cases where research and data exists, in which case it’s my opinion that the research cited is good enough to warrant that opinion :-)
How Significantly Does Personalization Affect Rankings?
Stephan Says:
Although it is true that Google personalizes search results based on the user’s search history (and now you don’t even have to be logged in to Google for this personalization to take place), the differences between personalized results and non-personalized results are relatively minor. Check for yourself. Get in the habit of re-running your queries — the second time adding &pws=0 to the end of Google SERP URL — and observing how much (or how little) everything shifts around.
Comments Include:
I’m not sure I agree with your statement under #5 that personalization changes are “relatively minor”. I’ve been seeing some drastic rank changes due to personalization. I just posted about it at http://www.rypmarketing.com/blog/49-are-google-serp-personalizations-relatively-minor.whtml While there are still “absolute rankings” that display most of the time, your site can be ranked much higher or lower, based on personalization.
My Opinion – They’re both right. Personalization seems to primarily affect areas in which we devote tons of time, energy and repeated queries. This means for many/most "discovery" and early funnel searches, we’re going to get very standardized search results. It’s true that it can influence some searches significantly, but it’s also true that, at least in my experience, 90%+ of queries I perform are unaffected (and that goes for what I hear/see from other SEOs, too). The linked-to post above actually helps to validate this, showing that while rankings changes can be dramatic, they only happen when there’s substantive query volume from a user around a specific topic.
Do We Need to Update Our Homepages Every Day to Maintain Rankings?
Stephan Says:
"It’s important for your rankings that you update your home page frequently (e.g. daily.)" This is another fallacy spread by the same aforementioned fellow panelist. Plenty of stale home pages rank just fine, thank you very much.
Comments Include:
It actually is important. Sure, a stale home page might rank, but Google definitely takes freshness into account in rankings. I’ve seen rankings boosts whenever I post new content.
This varies from niche to niche, of course a site can rank well whilst remaining static, it may also have a considerable number of links pointing to it. In a competitive niche where the link volume/quality is pretty even, then regular updates to the home page, and other pages within the site can make all the difference – to describe this as a fallacy is a fallacy itself.
My Opinion - There was a time when I was pretty convinced this was true. I did lots of testing around it for my clients sites and would put in time each day making sure new content appeared on their homepages. Today, I’m much less of a believer. Stephan is certainly correct that plenty (if not the overwhelming majority) of homepages and, indeed, web pages that rank well for many queries are static. I do think it’s a great idea to continually have new content linked-to from homepages – by linking to the latest blog posts, YOUmoz posts and marketplace postings, the SEOmoz homepage helps drives spiders to revisit frequently and crawl these new posts (though RSS pings may make that obsolete).
Overall, I wouldn’t advise updating pages just for the sake of possibly getting a "fresh content" boost. QDF operates on unique, fresh, individual pages (or older pages that are earning newly fresh links). I’d have serious doubts as to whether anything in Google’s ranking system rewards pages that simply change frequently – it doesn’t pass my smell test.
How is Google Treating "Reciprocal" Links?
Stephan Says:
Trading links helps boost PageRank and rankings. Particularly if done on a massive scale with totally irrelevant sites, right? Umm, no. Reciprocal links are of dubious value: they are easy for an algorithm to catch and to discount. Having your own version of the Yahoo directory on your site isn’t helping your users, nor is it helping your SEO.
Comments Include:
Google places less weight on reciprocal links that they used to, but they still count. I’ve done numerous link exchange campaigns for websites, and seen huge boosts in rankings. At the end of the day, would you rather have a reciprocal link from another site in your niche, or no link at all? The answer is obvious.
Reciprocal links aren’t necessarily of dubious value. Consider this example:
I’m a news site. I link to CNN because it’s CNN and they have news. One day, CNN links to me (huzzah). Technically, this is a reciprocal link, but no way in hell is Google going to discount the value of the link because the sites are linking to each other. So now you have to determine intent — and how do you do that?
In many niches, every authority site links to every other. Not only is it natural, but these are the most relevant possible links. So what you seem to be saying is that Google lowers the value of a site’s most relevant links — thereby increasing the relative value of irrelevant or off-topic ones. That makes sense how?
My Opinion - This one really depends on how we’re defining "reciprocal links."
The post you’re reading links to Stephan’s SELand article. Would Stephan updating that post with a link here potentially hurt both our rankings? No.
However, if SEOmoz built a link directory on our site (ironically humorous because, as long time readers may recall, we used to have one) and promoted linking to your site if you reciprocated with a link back here, I’d be more concerned. This is essentially link graph manipulation and while it’s a fine line to tread, plenty of folks have crossed it in the past and, as Stephan notes, unnatural reciprocal link behavior is remarkably easy to spot on a link graph.
I wouldn’t be concerned at all with a technically "reciprocated" link, but I would watch out for schemes and directories that leverage this logic to earn their own links and promise value back to your site in exchange. Also – watch out for those who’ve evolved to build "three-way" or "four-way" reciprocal directories such that you link to them and they’ll link to you from a separate site – it’s still attempted manipulation and there’s so many relevant directories out there; why bother!?
Keyword Density is Not Used – How Many Times Do We Have to Say It?
Stephan Says:
Keyword density is da bomb. Ok, no one says “da bomb” anymore, but you get the drift. Monitoring keyword density values is pure folly.
Comments Include:
Folly? Hardly. If you’re trying to rank for a keyword, you want to make sure you use it a few times on a page. That’s just common sense. Of course, you don’t want to overuse a keyword, or it might come across as spammy. Any smart SEO pays attention to KW density.
My Opinion - Again, we’re likely coming down to semantics. The formula for keyword density – a percentage of the total number of words on the page that are the target phrase – is indeed folly. IR scientists discredited this methodology for relevance decades ago. Early search engines and information retrieval systems already leveraged TF*IDF as a far more accurate and valuable methodology.
In my opinion, the reason the myth persists is that sometimes, optimizing towards a keyword density can actually improve your relevance and targeting of TF*IDF. I’ll make an analogy – let’s say you believe flight is accomplished not by lift, thrust, drag and weight, but rather by reaching a particular velocity in a bird-shaped device. It’s entirely possible that you might stumble upon flight, or flight-like elements even without understanding the physics. That said, could you honestly call yourself an aeronautics engineer?
If we’re going to call ourselves professional SEOs, we should bother to learn the science. Yes, adding additional instances of a keyword term or phrase to a page might indeed help your rankings (usually not massively and almost never in highly competitive spaces), but that does not mean that the keyword density average you’ve been using is accurate or that engines leverage the metric. Spreading this ignorance of math and science does little to further the SEO field’s reputation - let’s end it. 
Do Hyphens in Domain Names Really Suck for SEO?
Stephan Says:
Hyphenated domain names are best for SEO. As in: san-diego-real-estate-for-fun-and-profit.com. Separate keywords with hyphens in the rest of the URL after the .com, but not in the domain itself.
Comments Include:
Hyphens in domain names are less than ideal for flagship businesses because they’re hard to communicate, but you better believe Google ranks domains with keywords in them highly, even if they contain hyphens. Again, it’s less than ideal (a hyphen-less .org or .net is preferable to a hyphenated .com), but if the top choices aren’t available, a domain that includes a hyphen can be a decent substitute.
Don’t make a blanket statement that having hyphens in your domain hurts your potential. This is just fallacy. Yes, hyphens suck for direct traffic, as the domain is more likely to spelled incorrectly. But when it comes to search, domains with hyphens in them do just fine.
My Opinion – They suck. Yes, I realize that technically, they may not have a formal algorithmic component (though I’m guessing part of Google’s spam filter early warning system does look at hyphens, particularly when there’s more than one in a domain name). But, they certainly correlate with worse branding value, which means fewer links and citations, less reputation in the eyes of visitors and potential business partners, less viral spread through word-of-mouth and, as the comments note, lower type-in traffic.
All of those are going to have a 2nd-order impact on rankings through metrics like inbound links, social mentions and usage data (to whatever degree you believe that mya be a signal). Thus, hyphens in domain names do, indeed, suck for SEO (and lots of other stuff). I’ve never liked SEO practices that operated in a vaccum or didn’t consider usability, virality, positioning, branding or other basic marketing techniques. Going back to the analogy above, it’s like the aeronautics engineer who doesn’t consider seats a necessity. Sure, it flies, but who exactly will pay for a ride?
Does Click-Through Rate Matter?
Stephan Says:
The clickthrough rate on the SERPs matters. If this were true then those same third-world link builders would also be clicking away on search results all day long.
Comments Include:
Don’t assume that clickthrough rates don’t matter just because of some potential abuse that would happen if absolutely zero logic were built in.
In regards to CTR influencing rankings, there are a number of things that lead me to suspect that user behavior does affect search results.
I’m sure you are familiar with the so-called google \honeymoon period\ that seems to occur when a new site launches. The site will rank highly for a few weeks, and then see a dramatic drop in SERPs. I’ve launched over a dozen sites in the past year, and have noticed this pattern.
I believe this goes beyond QDF, it’s a site-wide phenomenon. The hypothesis is that Google will temporarily rank a new site highly, to see how users perceive the site. If people visit the site, and then immediately hit the back button to return to the SERPs, that’s a good signal that the site did not meet the needs of the user, and that google should not rank it as highly.
I am on the fence, I could literally flip a coin whether it is myth, magic, or the CTR really does make a difference. If it does it is such a small difference it’s nothing I would ever focus on for success.
My Opinion – I’ve written and spoken about this extensively in the past and it doesn’t need a great deal of re-hashing. I will, however, say that should any SEO ever discover that it substantively impacts rankings, we’re going to be faced with an army of zombie botnets trying to take over our computers not to send email spam, but to click on links through our "reputable" Google accounts. Just look at the hacks of Facebook, Twitter & Wordpress over the past few weeks and ask yourself – if any spammer could show any financial incentive or ability of clicks to influence Google, would we really have as (organic) click-fraud free a world as we do today? 
We do have one data point from Google that suggests they look at some kinds of less manipulate-able click data. A Googler speaking at the first SMX East show in New York mentioned during his session that Google will record searches that are performed frequently with no clicks, followed by query refinement or abandonment, as potential searches that need work (because it seems no one likes the results). If this is what you mean when referring to click-data being used in the engines, I think that’s completely reasonable.
Do H1 Tags Help with Rankings?
Stephan Says:
H1 tags are a crucial element for SEO. Research by SEOmoz shows little correlation between the presence of H1 tags and rankings. Still, you should write good H1 headings, but do it primarily for usability and accessibility, not so much for SEO.
Comments Include:
H1 tags are very important, I’ve seen pages rank well for targeted keywords once the tag has been tweaked to be more targeted, not spammy or purely for SEO, but well written. Ok, in some cases it may not be “crucial” but after the title tag I think it’s up there as one of the most important on site factors.
My Opinion - Covario’s research is spot on; I got to listen to and speak with their chief scientist, Dr. Matthias Blume, at a conference in Silicon Valley. It also matches up to our correlation and rankings model data. You’re invited to repeat on-page keyword prominence testing and check the results for yourself (more on search engine testing methodologies here). H1 tags are very slightly better than Bold/Strong tags for keyword usage and both are barely better than simply using the keyword on the page (in any text format).
In every instance I’ve seen a report of H1s improving rankings, it’s been because the keyword phrase was now included as some of the first text on the page and provided an additional instance of the target term and title element in the on-page copy. As Stephan recommends in the comments, try taking a site with H1s and replacing them with CSS styles that mimic the text formatting. You may see tiny fluctuations in a few close rankings, but likely little else.
All that said, H1s are still a best practice. If you’re building a site from scratch today, you should certainly use them for headlines, and they do provide some (albeit quite tiny) benefits for SEO. However, I feel incredibly guilty about the many times in my SEO consulting career I pushed hard for engineering and development teams to get H1s right in the markup when it generated such tiny results. That time would have been far better spent on dozens of other projects. If I can, I’d love to save you that same embarassment and disappointment. H1s may fit with SEO stereotypes, but that doesn’t make them a high priority, high value activity. If you don’t believe the research of others, do your own, then listen to the results.
Can Linking to Other Sites Help You Perform Better?
Stephan Says:
Linking out (such as to Google.com) helps rankings. Not true. Unless perhaps you’re hoarding all your PageRank by not linking out at all — in which case, that just looks unnatural. It’s the other way around, i.e. getting links to your site — that’s what makes the difference.
Comments Include:
Not true. Matt Cutts has said that linking out to high quality websites is one of the many factors that they use to evaluate a site. NOTE: the comment references the below copied text below from this post by Matt Cutts (on Google’s webspam team):
Q: Okay, but doesn’t this encourage me to link out less? Should I turn off comments on my blog?
A: I wouldn’t recommend closing comments in an attempt to “hoard” your PageRank. In the same way that Google trusts sites less when they link to spammy sites or bad neighborhoods, parts of our system encourage links to good sites.
My Opinion – I suspect there may be some small, positive effects of linking out to relevant, quality sites and pages for SEO. However, Stephan’s likely correct in his assertion that just linking to a "high Domain Authority" or "high PageRank" site won’t normally help. He’s also right to say that hoarding link juice is likely a very bad move. You can listen to the NYTimes’ SEO, Marshall Simmonds, talk about how adding external links to articles on the site had a noticeable positive impact on the Times’ rankings and traffic.
I don’t have correlation or ranking models data on this, nor have we experimented internally to the degree that I’d feel comfortable calling this a settled debate. My instincts say Google probably considers outbound links in some form or fashion, but I doubt it’s a huge ranking factor. It might be more important than H1s, though :-)
PageRank is a Good Predictor of Rankings?
Stephan Says:
Your PageRank score, as reported by Google’s toolbar server, is highly correlated to your Google rankings. If only this were true, our jobs as SEOs would be so much easier! It doesn’t take many searches with SEO for Firefox running to see that low-PageRank URLs outrank high-PR ones all the time. It would be naive to assume that the PageRank reported by the Toolbar Server is the same as what Google uses internally for their ranking algorithm.
Comments Include:
Come on now. It’s true that a lot of people place too much emphasis on PR, but let’s not take it to the opposite extreme and say it’s irrelevant. PR is not the be-all-end-all of rankings, but it still matters. Having a high PR homepage clearly means *something*.
I probably couldn’t disagree with anything more than this one. I guarantee a website that has homepage PageRank 6 and then 2 page deep pages having PageRank 5 and trailing off into 4’s and 3’s get’s WAY more traffic than the one with PageRank 3 and trails off into 2’s and 1’s. PageRank is not 100% accurate, but it’s an extremely good indicator, it’s not just make believe or useless non-sense that authoritative sites have PageRank; 6, 7, 8, 9, 10.
My Opinion - They’re both right (though the "guarantee of traffic on the PR6 vs. 5 site" sounds like a bet this commenter’s opponent could win many, many times over). Our data on PageRank correlation is very solid and suggests that yes, PR is positively correlated with rankings on Google.com (though much less so in Google.co.uk – sorry Brits!). However, the degree of correlation is not overwhelming and there are far better single metrics if rankings correlation is your goal.
I would strongly get behind Stephan’s statement that what the toolbar server reports is not what Google uses internally. They’ve messaged this many times. It’s also very true that PageRank is only one of a plethora of ranking signals, and plenty of PageRank 3 pages outrank PageRank 6 or 7 pages for given queries.
Does Great Content Equal Great Rankings?
Stephan Says:
Great Content = Great Rankings. Just like great policies equals successful politicians, right?
Comments Include:
I see no one is criticizing "Great content = great rankings." This is job number one.
My Opinion – I think the commenter may have missed Stephan’s intended sarcasm. I am in full agreement that great content ≠ great rankings. This is no more true than the statement: "the way to win elections is to propose the best legislative ideas."
Marketing, promotion, networking, partnerships, virality, incentives and hundreds of others feed into the inputs for a site’s success on the web. Unless you believe that links are meaningless and Google’s content analysis systems can read and rank content like a human (e.g. Google thinks the Times’ article on Brown’s stepping down was more adroitly perceptive than the Post’s), the ability to draw in links, which is not and likely never will be about the "best content" will have an overwhelming impact on rankings.
The future likely holds greater usage of data from social media and social web interaction, but even this depends on far more than the content’s quality. Those brands and sites that have early-adopting, viral-sharing, people-connecting, idea-distributing users invested in promoting their work are likely to be long term winners with little regard for comparative levels of content quality. 

There’s lots more fun and interesting discussion on the SearchEngineLand post, but hopefully these will spark some interesting chats in the comments here as well.Do you like this post? Yes No

Continue Reading »
Comments Off
Are Webmasters Using Canonical URL Tags, Nofollows? The Latest Linkscape Update Has the Data

Posted by randfishIt’s an exciting day at SEOmoz – Linkscape’s index has updated with fresh data crawled in the past 30 days. This update also gives us a chance to show off lots of interesting data points around the web’s usage of search-specific tags and directives. Let’s dive in!
The Canonical URL Tag Grows in Popularity
Rel Canonical is here to stay. Websites have been growing in their adoption of the tag since it’s announcement and this index has the highest number and percentage of URLs employing it to date.

The overall numbers are still small. Canonical URLs are on less than half of 1% of all pages, and I suspect duplicate content is much more prevalent, thus giving SEOs a lot of opportunity to help sites apply this directive. Don’t forget that you CAN use the tag on the original version of the page, too.
Usage of Nofollowed Links Falls
It would appear that the nofollow directive is falling out of favor, as evidenced by the chart below:

Nofollow use is down, both on external links and internal links, though it’s taken more of a hit on internal links. Perhaps that’s a sign more SEOs are getting internal nofollows removed after Google’s announcement on the topic.
May 2010’s Linkscape Index Stats
Linkscape’s index this month has the largest number of unique, root domains we’ve ever indexed and has improved quality in several other ways as well. For example, some of you reported some link spammers that were highly effective in gaming page/domain authority scores, and those should be fixed in OSE.

  • Pages: 41,202,970,156 (41 Billion)
  • Subdomains: 289,291,281 (289 Million)
  • Root Domains: 85,725,739 (85 Million)
  • Links: 424,255,504,138 (424 Billion)

You can see a chart of growth in the number of root domains (e.g. *.domain.com) below:

This shows the growth we’ve been doing in reaching more new sites and getting a broader picture of the web. We’ve taken to heart the feedback that it’s frustrating when we don’t have any data on a site and are reaching out in accordance (these numbers may also show that there’s lots more websites getting registered and earning links).
I’ve also embedded a chart below showing Linkscape’s raw index URL count:

You’ll notice that at the beginning of this year, we ramped up index size at the request of our users. Unfortunately, we found that this didn’t correlate well to quality or usefulness in every case, so we’ve been refining our crawl selection and metrics before we attempt to scale up again. We do plan to grow the index again, but we’re much more concerned with the value of the links and pages we report back, so we won’t grow just for the sake of numbers – as Danny Sullivan and Google themselves have pointed out many times, size ≠ quality.
Changes to How OSE & Linkscape Define "Followed" vs. "Nofollowed"
Based on some more feedback from users and API partners, we’ve made a change to how we define "followed" and "nofollowed" links through our API, and you’ll see this in Open Site Explorer. Our friends noted that links containing the rel="nofollow" attribute aren’t the only ones that don’t pass link juice, so we’ve gone ahead and made two buckets as below:
Followed:

  • 301 redirects
  • normal HTML links
  • pages that meta refresh (Google appears to treat these like 301s)
  • pages with rel="canonical" directives to another URL

Nofollowed:

  • links marked with rel="nofollow"
  • links on pages with the meta robots "nofollow" directive
  • feed autodiscovery links for blogs/RSS feeds (we’re fairly sure Google doesn’t treat these as juice-passing links)
  • 302 redirects

If you’re using the API to pull in link data, you’ll see these new delineations, which should also help with previous disparaties in link count numbers (because adding followed+nofollowed previously didn’t include some of these other types of links).
Some News on the SEOmoz API
We are proud to announce the release of a Linkscape Ruby gem.  This gem contains all of the code we used to access the Site Intelligence API and power Open Site Explorer. If you were looking for a time to get started with our API, this bit of sample code should make it even easier.  For more information about the gem, check out the Ruby section of Sample Code page here.
We’re also making it easy to track future updates via the Linkscape Schedule in our API wiki. If you haven’t yet checked out the API, now’s the time – you can build remarkable things for on-site analysis, link data extraction or anything else that requires trillions of links :-)
A Fond Farewell to Nick Gerner
Unfortunately, I’ve got some sad news to report as well. Nick Gerner, who helped to create Linkscape in 2008, is leaving the team next week. He’s been an incredible engineer and a good friend to everyone here at SEOmoz and many of our colleagues in the community as well. We wish him well and can’t wait to see what he does next (he’s assured us it’s something exciting in the startup world).
If you’ve been connecting with Nick regarding the API, you can send those requests to Sarah Bird and feel free to pass any direct questions about Linkscape to sitesupport where Ben, Chas & Phil are helping to improve the index and our tools on that front. 
Looking forward to the discussion – hope this weekend post doesn’t intrude on too much family time. Don’t forget to have a great Mother’s Day!Do you like this post? Yes No

Continue Reading »
Comments Off
The Search Engine Landscape in 2010: Rand’s Presentation from Web 2.0 Expo

Posted by randfishI spoke this afternoon at the Web 2.0 Expo in San Francisco to a room full of curious marketers & site owners. After the session, there was a lot of requests for the slide deck and thus I’ve shared it below.
Web20 Fishkin Search Landscape
I’ve also added Stephan Spencer’s presentation on technical SEO, keyword research & targeting below:
SEO_ From Soup to Nuts Presentation 1
And, last, but not least, Eric Enge’s presentatino on link building from the conference as well
Link Building: The Key to Rankings by Eric Enge
This presentation opportunity comes courtesy of O’Reilly, who published The Art of SEO late last year (somehow, I’ve neglected to mention it on SEOmoz until now). Big thanks to my co-authors, Eric Enge, Stephan Spencer & Jessie Stricchiola for helping out.Do you like this post? Yes No

Continue Reading »
Comments Off
30 SEO Problems & the Tools to Solve Them (Part 2 of 2)

Posted by randfishLast November, I authored a popular post on SEOmoz detailing 15 SEO Problems and the Tools to Solve Them. It focused on a number of free tools and SEOmoz PRO tools. Today, I’m finishing up that project with a stab at another set of thorny issues that continually confound SEOs and how some new (and old) tools can come to the rescue.
Some of these are obvious and well known; others are obscure and brand new. All of them solve problems – and that’s why tools should exist in the first place. Below, you’ll find 20+ tools that answer serious issues in smart, powerful ways.
#1 – Generating XML Sitemap Files
The Problem: XML Sitemap files can be challenging to build, particularly as sites scale over a few hundred or few thousand URLs. SEOs need tools to build these, as they can substantively add to a site’s indexation and potential to earn search traffic. 
Tools to Solve It: GSiteCrawler, Google Sitemap Generator

GSiteCrawler: Downloadable software to create XML Sitemaps

Download a few files from Google Code and Install on Your Webserver

Looks like Google Webmaster Tools, doesn’t it? :-)
Both GSiteCrawler & Google Sitemap Generator require a bit of technical know-how, but even non-programmers (like me) can stumble their way through and build efficient and effective XML Sitemaps.
#2 – Tracking the Virality of Blog/Feed Content
The Problem: Even experienced bloggers have trouble predicting which posts will "go wide" and which will fall flat. To improve your track record, you need historical data to help show you where and how your posts are performing in the wild world of social media. What’s needed is a cloud based tracking tool that can sync up with the Twitters, Facebooks, Diggs, Reddits, Stumbleupons & Delicious’ of the web to provide these metrics in an easy-to-use, historical view.
Tools to Solve It: PostRank Analytics

PostRank’s nightly emails keep me wracking my brains for better blog post ideas
PostRank sends me nightly reports on how the SEOmoz blog performs across the web – numbers from Digg, Delicious, Twitter, Facebook and more. By using this, I can get a rough sense of how posts perform in the social media marketplace and, over time, hopefully train me to author more interesting content.
Addition: Melanie from Postrank added a discount code in the comments for SEOmoz users! Use the coupon code "SEOmoz" in order to get three free months instead of just one.
#3 – Comparing the Relative Traffic Levels of Multiple Sites
The Problem: We all want to know not only how we’re doing with web traffic, but how it compares to the competition. Free services like Compete.com and Alexa have well-documented accuracy problems and paid services like Hitwise, Comscore & Nielsen cost an arm and a leg (and even then, don’t perform particularly well with sites in the sub-million visits/month range).
Tools to Solve It: Quantcast, Google Trends for Websites

If a site has been "Quantified," no other competitive traffic tool on the web will be as accurate

Since both sites are "Quantified," I can be sure the data quality is excellent
I’ve complained previously about the inaccuracies of Alexa (as have many others). It’s really for entertainment purposes only. Compete.com is better, but still suffers from lots of inaccuracy, data gaps, directionally wrong estimates and a general feeling of unreliability in the marketplace. Quantcast, on the other hand, is excellent for comparing sites that have entered their "Quantified" program. This involves putting Quantcast’s tracking code onto each page of the site; you’re basically peeking into their analytics.
Sadly, Quantcast isn’t on every site (and their guesstimates appear no better than Compete when they don’t have direct data). Fortunately, one organization has stepped up with a surprisingly good alternative – Google.

Google Trends for Websites allows you to plug in domains and see traffic levels. Much like AdWords Keyword Tool, the numbers themselves seem to run high, but the comparison often looks much better. Google Trends has become the only traffic estimator I trust – still only as far as I could throw a Google Mini, but better than nothing.
#4 – Seeing Pages the Way Search Engine Do
The Problem: Every engineering & development team builds web pages in unique ways. This is great for making the Internet an innovative place, but it can make for nightmares when optimizing for search engines. As professional SEOs, we need to be able to see pages, whether in development environments or live on the web the same way the engines do.
Tools to Solve It: SEO-Browser, Google Cached Snapshot, New Mozbar

_
A longtime favorite site of mine, SEO Browser lets you surf like an engine

_
Poor Google; that’s all they see when they crawl our pretty site
SEO-Browser is a great way to get a quick sense of what the engines can see as they crawl your site’s pages and links. The world of engines may seem a bit drab, but it can also save your hide in the event that you’ve put out code or pages that engines can’t properly parse.

_
I wonder if Googlebot ever gets tired of blue, purple and gray…
Google’s own cached snapshot of a page (available via a search query, as a bookmarklet, or in the mozbar’s dropdown) is the ultimate research tool to know what the engine "sees." The only trouble is that it works in the past only (and only on pages that allow caching). To get a preview, SEO Browser or our friend below can be useful.

The mozbar lets you dress up like Google whenever the occasion is right
One of Will Critchlow’s feature requests in the new mozbar was the ability to switch user agents, turn off JavaScript and images and, in essence, become the bot in your browser. Luckily, he also forced us to place a gray overlay in the right-hand corner that alerts you to the settings you’ve changed and gives you an easy, one-click "return to normal." Browsing like a bot = solved!
#5 – Identifying Crawl Errors
The Problem: Discovering problems on a site like 302 redirects (that should be 301s), pages that are blocked by robots.txt (here’s why that’s a bad idea), missing title tags, duplicate/similar content, 40x and 50x errors, etc. is a task no human can efficiently perform. We need the help of robots – automated crawlers who can dig through a site, find the issues and notify us.
Tools to Solve It: GSiteCrawler, Xenu, GGWMT

Mmmm… Parallel Threads

She canna hold on much longer cap’n!
We’ve already covered GSiteCrawler in this post, but for those unaware, it can be a great diagnostic tool as well as a Sitemap builder. Xenu is much the same, though somewhat more intuitive for this purpose. Tom’s written very elegantly about it in the past, so I won’t rehash much, other than to say – it shows errors & potential issues Google Webmaster Tools doesn’t, and that can be a lifesaver.

Doh! I think we messed up some stuff when KW Difficulty relaunched :(
Google Webmaster Tools is extremely popular, well known and well used. And yet… lots of us still have crawl errors we haven’t addressed (just look at the 500+ problems on SEOmoz.org in the screenshot above). Exporting to Excel, sorting, and sending to engineering with fixes for each type of issue can save a lot of heartache and earn back a lot of lost traffic and link juice.
#6 – Determine if Links to Your Site Have Been Lost
The Problem: Sites don’t always do a great job maintaining their pages and links (according to our data, 75% of the web disappears in 6 months). Many times, these vanishing pages and links are of great interest to SEOs, who want to know whether their link acquisition and campaigning efforts are being maintained. But how do you confirm if the links to your site that were built last month are still around today?
Tools to Solve It: Virante’s Link Atrophy Diagnosis

Does that mean Stuntdubl & SEOmoz are "going steady?"
This tool comes courtesy of the great team over at Virante, and it’s a pretty terrific application of an SEO need and Linkscape data through the SEOmoz API. The tool will check the links reported from Linkscape/Open Site Explorer and determine which, if any, have been lost. Many times it’s just links off the front page of blogs or news sites as archives fall to the back, but sometimes it can help you ID a link partner or source that’s no longer pointing your way in order to facilitate a quick, painless reclamation. The best part is there’s no registration or installation required – it’s entirely plug and play.
Addition: Russ from Virante added a discount code in the comments for SEOmoz users! Use the coupon code "seomoz30" in order to get more results from these tools.
#7 – Find 404 Errors on a Site (without GG WM Tools) and Create 301s
The Problem: Google’s Webmaster Tools are great for spotting 404s, but the data can be, at times, unwieldy (as when thousands of pages are 404ing, but only a few of them really matter) and it’s only available if you can get access to the Webmaster Tools account (which can stymie plenty of SEOs in the marketing department or from external consultancies). We need a tool to help spot those important, highly linked-to 404s and turn them into 301s. 
Tools to Solve It: Virante’s PageRank Recovery Tool

3.99 mozRank for ~0.00 effort
The thinking behind this tool is brilliant, because it solves a problem from end to end. By not only grabbing well-linked-to pages that 404, but actually writing the code to create an .htaccess file with 301s to your choice of pages, the tool is a "no-brainer" solution.
#8 - See New Links that are Sending Traffic (and Old Ones that Have Stopped)
The Problem: Most analytics tools have an export function that, combined with some clever Excel, could help you puzzle out the sites/pages that have started to send you traffic (and those that once were but have stopped). It’s a pain – manual labor, easy to screw up and not a particularly excellent use of your precious time.
Tools to Solve It: Enquisite

I love the ability to look across the past few months and see the trend of new pages and new domains sending links, as well as identifying links that have stopped sending traffic. Some of those may be ripe for reclamation, others might just need a nudge to mention or link over in their next piece/post. This report is also a great way to judge how link building campaigns are performing on the less-SEO focused pivot, sending direct traffic.
#9 – Research Trending/Temporal Popularity of Keywords
The Problem: Keyword demand fluctuates over time, sometimes with little warning. Knowing how search volume is impacted by trending and geography is critical to SEOs targeting fields with these demand fluxes.
Tools to Solve It: Google Insights, Trendistic

Hmmm…. Maybe we should launch Open Webmaster Tools next?

We need to make it out to India & Brazil more often, too!
Google Insights is great for seeing keyword trending, related terms and countries of popularity (though the last of these we’ve found to be somewhat suspect at times). However, sometimes you’re really interested in what’s about to become popular. For that, turning to trend sites can be a big help.

Although it doesn’t yet have a "suggest" feature to help identify terms & phrases that may soon become popular searches, it does help establish the "tipping point" at which a buzzword in Twitter may become a trend in web search. As we’ve discussed in the WhiteBoard Friday on Twitter as an SEO Research Tool, finding the spot at which search volume begins spiking can present big opportunities for fresh content.
#10 – Analyze Domain Ownership & Hosting Data
The Problem: When researching domains to buy, considering partnerships or conducting competitive analysis, data about a site’s hosting and ownership can be essential steps in the process.
Tools to Solve It: Domaintools

We should make sure to re-register this domain…
Long the gold standard in the domainer’s toolbox, DomainTools (once called whois.sc) provides in-depth research about a domain’s owners, their server and, sometimes most interestingly, the other domains owned by that entity. BTW – they’re spot on; SEOmoz owns about 80 other domains besides our own (though we only really use this one and OpenSiteExplorer right now).
#11 – Investigate a Site/Page’s History
The Problem: What happened on this page last month or last year? When conducting web research about links, traffic and content, we all need the ability to go "back in time" and see what had previously existed on our sites/pages (or those of competitors/link sources/etc). Did traffic referrals drop? Have search rankings changed dramatically? Did a previously available piece of content fall off the web? The question really is – how do we answer these questions?
Tools to Solve It: Wayback Machine

Before 2005, we were on a different domain!

_
If you remember this version of the site, you’re officially "old school"
Yeah, yeah, you’ve probably heard of the Wayback Machine, powered by Alexa’s archive of the Internet and endlessly entertaining to web researchers and pranksters alike. What might surprise you is how valuable it can be as an SEO diagnostic tool, particularly when you’re performing an investigation into a site that doesn’t keep good records of its activity. Reversing a penalty, a rankings drop, an oddity in traffic, etc. can consume massive amounts of time if you don’t know where to look and how. Add Wayback to the CSI weapons cache – it will come in handy.
#12 - Determine Semantically Connected Terms/Phrases
The Problem: Chances are, the search engines are doing some form of semantic analysis (looking at the words and phrases on a page around a topic to determine its potential relevance to the query). Thus, employing these "connected" keywords on your pages is a best practice for good SEO (and probably quite helpful to users in many cases as well). The big question is – which words & phrases are related (in the search engines’ eyes) to the ones I’m targeting?
Tools to Solve It: Google Wonder Wheel

_
Nothing about "Yellow Shoes?"
We don’t know for certain that this is a technique that provides massive benefit, but we’re optimistic that tests are going to show it has some value. If you’d like to participate in the experiment, take related phrases from the Wonder Wheel and employ on your pages. Please do report back with details :-)
#13 – Analyze a Page’s Optimization of Images
The Problem: When image search and image accessibility/optimization is critical to your business/client, you need tools to help analyze a page’s consistency and adherence to best practices in handling image dimensions, alt attributes, etc.
Tools to Solve It: Image Analyzer from Juicy Studio
 
Doh! We need to add some dimensions onto our images.
It’s not the prettiest tool in the world, but it does get the job done. The image analyzer will give any page a thorough evaluation, showing missing alt tags, image dimensions (which can help with page rendering speed) and informing you of the names/alts in a thorough list. If you have image galleries you’re aiming at image search optimization, this is a great diagnostic system.
#14 – Instant Usability Testing
The Problem: Fast feedback on a new landing page, product page, tool design or web page (of any kind) can be essential to smoothing over rough launches. But tools aren’t enough – we need actual human beings (and not the biased ones in our friend groups or company) giving fast, functional feedback. That’s a challenge.
Tools to Solve It: Five Second Test, Feedback Army

It can’t be that easy, can it?

Wow… It totally is! Here I am helping give feedback to a local geek squad.

Users are easier to come by than we think
Both FeedbackArmy & FiveSecondTest offer the remarkable ability to get instant feedback from real users on any page, function or tool you want to test at a fraction of the price normal usability testing requires. What I love is that because it’s so easy, it makes that first, critical step of reaching out to users a low barrier to entry. Over time, I hope systems like these help make the web as a whole a more friendly, easy-to-use experience. Now there’s not excuse!
#15 – Measure Tweet Activity to a URL Across Multiple URL Shortener Platforms
The Problem: You’ve got your bit.ly, your j.mp, your tinyurl, your ow.ly and dozens more URL shorteners. Between this plethora of options and standard HTML links pasted into tweets, keeping up with all the places your URL is being shared can be a big challenge.
Tools to Solve It: Backtweets

Tweeting links in the middle of the night is fun!
Bit.ly can track bit.ly and many other services offer their own tracking systems, but only Backtweets is aggregating all of the sources and making it easy to see what people are saying about your pages no matter how they encode it. Now if only we could get this to integrate with PostRank and Search.Twitter.com and Trendistic and make the interface super-gorgeous and have it integrate with Google Analytics… and… and…
#16 – BONUS: Determining Keyword Competition Levels
Bonus! I mentioned last week in a comment that I’d make a post about the new Keyword Difficulty Tool. Since this post is all about tools anyway, I figured I’d toss it in and save you the trouble of clicking an extra link in your feedreader.
The Problem: Figuring out which keywords have more/less demand than which others is easy (and Google does a great job of it most of the time).
Tools to Solve It: New Keyword Difficulty Tool
The real problem was that our previous keyword difficulty tool attempted to use 2nd order effects and non-direct metrics to estimate the competitiveness level of a particular keyword term/phrase. While it’s true that more popular/searched-for keywords TEND to be more competitive, this is certainly not always the case (and in fact, SEOs probably care a lot more about when a keyword has high traffic and relatively weak sites/pages in the SERPs more than anything else). The new tool attempts to fix this by relying on Page Authority (correlation data here) and using a weighted average of the top ranking sites and pages.

Running five keywords at a time is way better than one
(we’re working to add more – promise)

The best bet here looks like "best running shoes" – relatively lower difficulty, but still high volume

Oh yeah, looking at the top positions, a few dozen good links and some on-page and we’re there
Reversing the rankings is never easy, but parsing through KW Difficulty reports certainly makes it less time-consuming. Watch out for the scores, though – a 65% is pretty darn tough, and even a 40% is no walk in the park. At last, I feel really good about this tool; it was suffering for a good 18 months, and it’s nice to have it back in my primary repertoire with such solid functionality.

I’m sure there are plenty of remarkable tools I’ve missed and there are likely questions about these problems, too. Feel free to address both in the comments!
p.s. This was written very late at night and I need to be up and on a plane at precisely butt-o’clock tomorrow morning, so editing will have to slide until Jen wakes up and gives this a good once-over. Sorry about any errors in the meantime :-)
Note from Jen: I finally woke up and made a few minor edits. :) I also added a discount code from Virante "seomoz30" AND a discount code from PostRank "SEOmoz". Tools Rule!Do you like this post? Yes No

Continue Reading »
Comments Off
Take the 2010 SEO Industry Survey; Your Peers Need You!

Posted by randfishI’m very excited to announce that we have just put the finishing touches on our second, biennial SEO Industry Survey!  We ran our first industry survey in 2008, and learned and shared a lot about the SEO community. This survey follows up on a lot of questions we asked last time, but includes a greater focus on other areas of organic search marketing as well.
With this survey we hope to find out and share with the world:

  • Who are the people in the SEO community?
  • How do they learn about SEO and sharpen their skills?
  • How are companies embracing search marketing?
  • Which tools and tactics do people in the industry use to support their SEO and social media efforts?

The survey only takes about 10 minutes, and we will again share the results once we’ve had time to work through the data.

Prizes!
As an added incentive, we are offering some cool prizes for people who completethe survey, including one 32G iPad!  We are also giving away 3 Flip Mino HD Cameras and 10 people will win thier pick of brand new SEOmoz gear. We will announce the winners of the sweepstakes by June 4th. Follow us on Twitter to stay up-to-date.
A Look Back
It will be interesting to see what has changed in the search marketing industry in the past two years. In fact, let’s take a look at a few examples from the survey taken in 2008!
Way back in 2008, people thought that Yahoo! was more likely to challenge Google than Microsoft. Nice guess, but, obviously we know now that’s not accurate. Ask, Baidu, and Wikia all battled for third place in people’s opinion.

 
Check out this data about the use of nofollow. Since SMX Advanced last year when Matt Cutt’s told us that nofollow no longer worked the way we thought for PageRank sculpting, I wonder if these numbers have changed?

Share the Survey!
The survey ends May 21, 2010 (about a month) so tell your friends in the industry about it. The more people that take the survey, the more interesting and valid the results we can share.
Tweet it Share it on Facebook
Partners
Many thanks to our friends who are helping to promote the survey over the next month. It’s important to get a broad audience from across the industry. Be sure to check them out:


 
 Do you like this post? Yes No

Continue Reading »
Comments Off