What is Crawling for SEO?

 

What is Crawling?

Crawling (or spidering) is when Google or another search engine send a bot to a web page or web post and “read” the page. Don’t let this be confused with having that page being indexed. Crawling is the first part of having a search engine recognize your page and show it in search results. Having your page crawled, however, does not necessarily mean your page was indexed and will be found.

Pages are crawled for a variety of reasons including:

  • Having an XML sitemap with the URL in question submitted to Google
  • Having internal links pointing to the page
  • Having external links pointing to the page
  • Getting a spike in traffic to the page

To ensure that your page gets crawled, you should have an XML sitemap uploaded to Google Search Console (formerly Google Webmaster Tools) to give Google the roadmap for all of your new content.

indexed pages from sitemap

In Google Search Console you can see what was submitted and what was indexed.

What getting crawled means is that Google is looking at the page. Depending on whether or not Google thinks the content is “New” or otherwise has something to “give to the Internet,” it may schedule to be indexed which means it has the possibility of ranking.

Also, when Google crawls a page, it looks at the links on that page and schedules the Google Bot to check out those pages too.  The exception is when a nofollow tag has been added to the link.

 

What is the Difference Between Crawling and Indexing?

Many terms are continually thrown around in the SEO world, many of which seem to be synonymous. Crawling and Indexing are a perfect example of two words that are used incorrectly. Whether or not the writer understands the difference in meaning, many SEO articles lead readers to believe the two words mean the same thing—but they most definitely do not.

So, exactly what is the difference between crawling and indexing?  Before we get into the difference between crawling and indexing, we must first explain what it means to have your site/page indexed.

In no way does having your page crawled mean that it has been indexed and even has a chance to be found in a Google search.

What Does Being Indexed Mean?

Having your page Indexed by Google is the next step after it gets crawled. As stated, it does not mean that every site that gets crawled get indexed, but every site indexed had to be crawled. If Google deems your new page worthy, then Google will index it. After your page is indexed, Google then comes up with how your page should be found in their search.

At this point, Google decides which keywords and what ranking in each keyword search your page will land. This is done by a variety of factors that ultimately make up the entire business of SEO. Also, any links on the indexed page is now scheduled for crawling by the Google Bot.

It’s not only those links that get crawled; it is said that the Google bot will search up to five sites back. That means if a page is linked to a page, which linked to a page, which linked to a page which linked to your page (which just got indexed), then all of them will be crawled.

This process is the basis of why external links that come to your site are so important. The higher the quality of the page that ultimately links to you, the better you will rank in the all-powerful Google Search.

This is what many SEO companies charge big money for—creating (or allowing the creation of) many links that will come to your site from high-quality web sites using keywords you want to be found by. It’s not the ONLY thing that an SEO Company might do, but it’s almost guaranteed to be on the list.

 

How Can I Tell What Google has Indexed?

Google showing indexed pages

Google Search Console showing Indexed Pages

Although you NEED your site to be crawled, you WANT it to get indexed. There are several ways to determine what Google has indexed on your site.

One is to simply go to Google.com and click on Settings at the bottom right then choose Advanced Search. From there, scroll down to “site or domain” put in your website and hit Search. This will show you everything that Google has indexed. It should include pages, posts, and photos and possibly other such items as feeds.

The preferred way to see exactly what Google has indexed (because you have some control over fixing it) is to use Google Search Console (previously named Google Webmaster Tools). We aren’t covering how to set up Google Search Console in this article, but if you have a website, it NEEDS to be done.

Google Search Console lets you upload an XML Sitemap, which lets you tell Google what you would LIKE for them to index and how often they should check back for changes. Google Search Console also provides a ton of valuable information on your website and is really the only two-way communication with Google that exists.

It is always a good idea to run a quick, free SEO report on your website also.  The best, automated SEO audits will provide information on your robots.txt file which is a very important file that lets search engines and crawlers know if they CAN crawl your website.   Although some of the free SEO reports you will find across the web may be nothing more than a lead generation tool, One Click SEO offers (what we consider to be) the Best SEO Audit Tool with the promise that no one will harras you.

 

How Does Google Decide What to Index?

This is the real question everyone should be asking. At the end of the day, Google will index new, fresh content that Google believes will improve the user experience of THEIR clients—the people who go to Google and search for something. They are very picky about trying to provide the most relevant websites for a specific search term. If you’re copying pages or are using copy that’s otherwise already in their index, then there’s no need to index yours.

You may have heard the term “Duplicate Content” thrown around in SEO articles. Duplicate content is a point of contention for many SEO gurus, but I say that at best, it confuses Google on which page to rank, and at worst, you get penalized. At the end of the day, stay away from duplicate content.  But I digress.

If what you wrote is BETTER or provides more information or if Google otherwise believes that showing your page as opposed to the other pages will give their clients a better experience, they will index and rank your site. This is why providing fresh, new SEO-rich blog content is so important. The more quality pages indexed with internal links to other pages within your site, the better for SEO.

 

YAY!  Now I Understand SEO!

Not Quite!  We are just scratching the surface of what Google likes or how to effectively leverage SEO. Depending on your type of business, there are different ways to have your company found in a Google search. For instance, if you are a bricks-and-mortar type of business with a storefront, you’ll want to focus on Local SEO.

Local SEO focuses on searches that include a city or location. For instance, if you wanted to find an SEO Service in New Orleans, you’d Google New Orleans SEO. That type of search will provide you with local results for a Search Engine Optimization Company. If you’re a dry cleaner, you know this type of searching is important to you, but if you provide online training, then your geographical location isn’t as important.

If you dig on this article, then sign up to get other cool stuff like this directly in your inbox! Just look for the red box at the bottom-left of the page. If you’re familiar with RSS feeds, you can get our last 10 articles HERE.