What is Indexing, PageRank and why are they so important?

Gyula Olah - electrical engineer, author, publisher

Gyula Olah – electrical engineer, author, publisher

When a Search Engine, such as Google – the King of Search Engines nowadays – has been to your site and added the page to its index directory, this is known as being indexed.
In order to fulfill this requirement perfectly, Google operates an enormous computer park to crawl billions of web sites day by day. The program that does this activity is called Googlebot.
This crawling Bot reads and entirely analyze your the content and structure of your web site. It means it creates a special “map” designating the number and location of your applied keywords, the relationship between the words, links, images and their ALT tags, heading structure. All of these elements have great impact on the successful indexing process. This index of your site and the relevant incoming links are the major factors in where your site appears in the natural search – also called “organic search” – results in the SERPs. The Googlebot can “remember” the results of the previous crawling and will compare it with the current. Here comes the importance of sitemap.

How can a search engine determine which URLs to index (shortly: what is indexing)?

If you are ready with your site, you would want to create a sitemap where all of your page URLs will be listed. See the example below:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>

Index Status - Example - What Is Indexing And PageRank

Index Status – Example – What Is Indexing And PageRank

Note: Your automatically generated sitemap (or e.g. a Blogger or WordPress sitemap) may differ from the example above. Furthermore, if your domain name is not “com” extension, then use yours instead. “Changefreq” determines how often you change your content. Always double-check your sitemap.xml file before uploading and submitting it to the search engines.

However Sitemaps do not directly affect your place in the search results, it is more than suggested to place them (if you have e.g. more standalone WordPress software based sites belonging to your main site) because their content will give information about your entire site’s structure (pages). If your sitemap is up-to-date (it should be) it will inform the spiders (bots) about your changes on your site, too. Without it, the crawler won’t be able to recognize your changes on time and it can be disadvantageous for you.

Note: Microsoft (Bing) and Yahoo both officially support the Open Sitemap Standard. Google doesn’t provide sitemap generating tools currently. If you want to create one, visit http://www.xml-sitemaps.com where you can do this job, below 500 URLs.

If you want to read and learn more concerning our today’s topic ‘what is indexing” and how this process looks in the practice, just visit my post: How Search Engines Work In Practice – Web Crawlers, Spiders.

How to discourage search engines from indexing some URLs?

If you have some parts, sections (URLs) you don’t want to be indexed (if a URL is indexed, it will/may come up in the search results and this is not always useful for you), you should use robots.txt text file.
I’ll quote www.scrubtheweb.com‘s description about robots.txt because they have done great work in summarizing it (and on other fields, too):

“There are two main reasons why we recommend you use the robots.txt file. They are:

  1. Every robot type search engine that visits your Web site will always request the robots.txt file before doing anything else. Because of this you should provide this file. If you do not have the robots.txt file on your Web site then your Web server will send a Page Not Found error message to the search engine that is visiting. This is taxing on your Web server and errors should always be avoided when possible. Simply including a robots.txt file on your Website prevents these errors and keeps search engines happy.
  2. Ask, Bing, Google™, Yahoo!®, and other search engines now allow you to use the robots.txt file to provide the Sitemap: directive to help search engines index your entire Website better.”www.scrubtheweb.com

What is the content of this robots.txt file?

Usually it is very simple (on an average HTML site but not the same on e.g. Blogger):

a. To Allow All Search Engines To Access Your Entire Website

User-agent: *
Disallow: /azr94v2hh2lg/

Note: this “/azr94v2hh2lg/” “tells Google as well as any other robot type search engine to stay out of your “/azr94v2hh2lg/” directory. Of course you do not have this directory on your Web server. By doing this we’re providing a real disallow directive which will keep search engines happy and not deliver any errors. You can replace the “/azr94v2hh2lg/” with any other directory name you wish just as long as the directory does not and will never be part of your Website and it will have the same effect.” – www.scrubtheweb.com.

b. To Exclude All Search Engines From Part Of The Website

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private/

The correct placement of it is as follows: http://www.example.com/robots.txt that is it has to be placed in your root directory.

What is PageRank – Ranking -, and how do the search engine rank pages?

PageRank - Example - What Is Indexing And PageRank?

PageRank – Example – What Is Indexing And PageRank?

When Google indexes your page(s), it also uses specific metrics to basically “grade” your site. If we see our example above, Google is willing to give back the best results if you search something. It is not a game, it is a business. Let’s go to the search engine, and see it at work. Imagine, how many web pages need the search engine reads through in order to give you the best results in return.
Pagerank is a number to specify a position for a web page in the search results. This number shows how a particular web page may come up during a search, how relevant it is considered to in the given topic by the search engines.
For this purpose all major search engines use special mathematical formulas – called “ranking algorithms” – to determine the rank of a web page (Google loves to name them from various animals such as “panda”, “penguin”, etc.). These algorithms have an “ever-changing” feature: especially Google likes to alternate it almost from year to year. Although the applied exact mathematical algorithms may vary considering different search engines, the principle is the same: to give a position to the examined web page.
Let’s quote Google’s explanation for determining the PageRank of a web page:

“Traditional search engines rely heavily on how often a word appears on a web page. Google uses PageRank™ to examine the entire link structure of the web and determine which pages are most important.”
“It then conducts hypertext-matching analysis to determine which pages are relevant to the specific search being conducted. By combining overall importance and query-specific relevance, Google is able to put the most relevant and reliable results first.”

“PageRank is patented by Stanford, and the name PageRank likely comes from Larry Page.

What Does PageRank Measure?
PageRank measure’s a web page’s importance.
The Page and Brin’s theory is that the most important pages on the Internet are the pages with the most links leading to them. PageRank thinks of links as votes, where a page linking to another page is casting a vote.
This makes sense, because people do tend to link to relevant content, and pages with more links to them are usually better resources than pages that nobody links.
PageRank doesn’t stop there. It also looks at the importance of the page that contains the link. Pages with higher PageRank have more weight in “voting” with their links than pages with lower PageRank. It also looks at the number of links on the page casting the “vote.” Pages with more links have less weight.
This also makes a certain amount of sense. Pages that are important are probably better authorities in leading web surfers to better sources, and pages that have more links are likely to be less discriminating on where they’re linking.

How Important Is PageRank?
PageRank is one of many factors that determines where your web page appears in search result ranking, but if all other factors are equal, PageRank can have significant impact on your Google rankings.”
Source: http://google.about.com/od/searchengineoptimization/a/pagerankexplain.htm

What does it mean? If your page has higher PR – together with good linking structure – than your competitors’, examining the same search term, your page will come up more frequently during the search than the others’ and/or will overtake them. Now you can learn the necessary ingredients from this short quote how to get higher ranking.

How can we reach the higher rank position?

We have examined the ranking process above. As you can remember, it is based on an ever-changing, special mathematical algorithm applied by search engines, e.g. Google. Its components are:

1. Well-organized, well built page and site structure with keyword-based relevant content.

Title, Description, Meta Tags, keywords, IMG ALT Tags, site content.

How to write a good site content?

As Google says: “Provide high-quality content on your pages, especially your home page. This is the single most important thing to do. If your pages contain useful information, their content will attract many visitors and entice webmasters to link to your site.”

Your site content has to be focused on the niche. You must know something about your product, service or offered information and handle it as your specialty. You have to know what information people are looking for and you should that provide. In one sentence: you have to make relevant content on your site, that people could find what they are seeking.
Before writing your content, do a Keyword Research. It is extremely important, because this is where you can find out what people are looking for currently. If you search for keywords, the easiest solution to go to Google’ Keyword tool (https://adwords.google.com/, now it is called Keyword Planner), or Wordtracker. There you can do some research interpreting the various aspects of the tool and examining the opportunities offered.
On the other hand, you need to understand the “habit” how people search. The average number of words they type in is 3.1. It means, that these 3 words play the most important role if you want to reach your forthcoming visitors or customers.

To ease your work I’ve made a completely standalone chapter on this site about the usage of the various Meta Tags. You can follow me through text and video presentations how to optimize some simple example-scripts. You will see how important these elements are: Importance And Usage Of Meta Tags, Content HTML To Get Higher Ranking.

2. Good and reliable linking structure that is you need quality links.

If you’ve read thought Google’s explanation above, you already know that the number of quality incoming links is very important in the ranking process. But its opposite is also true: avoid using – and accepting – low quality or spamming links because it can decrease your popularity and PageRank as well.

How to obtain high quality backlinks?

First of all, the content you are offering on your web site must be valuable to the visitors. This is the best way to build a reader base. If someone consider your the content or a particular article of your site or blog to “import” to his own site, you’ll get valuable Trackbacks and backlinks in return.
On the other hand, there are so many ways such as taking part on relevant niche Forums, write not spamming but useful, interesting comments on others’ web site, creating intelligent and not offensive, precious answers on Yahoo Answers or ASK. Avoid using of low quality, automated bulk article submitter software because it can easily turn your advantageous position to the opposite.
Later, I’ll come back to link building strategies.

Back to Google PageRank.

I almost forgot to mention one important thing: Google measures every single page on your site. On this way it may happen, that your main site page (visible first typing in your domain name) has obtained higher PageRank than the other, newer ones.

Do you want to see your Page’s current Google PageRank? Below I’ll provide you a simple easy-to-use tool with the help of http://www.prchecker.info:

Check Page Rank of your Web site pages instantly:

This page rank checking tool is powered by Page Rank Checker service

Note: Google™ search engine and PageRank™ algorithm are the trademarks of Google Inc.

What is Alexa Index (Rank)?

Alexa: Main Site Login Screen - What Is Indexing And PageRank

Alexa: Main Site Login Screen – What Is Indexing And PageRank

Alexa Index is considered to one of the most important standard on the Internet. Everywhere you go you will find this Alexa Number or Ranking. The lower the number, the better the position of your site on th Internet. This number comes out of a special metrics Alexa uses.

As they define:

Alexa.com, “The Global Leader in Analytics

Alexa is the leading provider of free, global web metrics. Search Alexa to discover the most successful sites on the web by keyword, category, or country. Use our analytics for competitive analysis, benchmarking, market research, or business development. Use Alexa’s Pro tools to optimize your company’s presence on the web.”

One thing is quite sure, everyone who deals with SEO, first will asks for your site’s Google PageRank then Alexa Rank, number Google and Bing links, number of incoming/outgoing links, then Popularity such as Facebook Likes (page + own account), Twitter tweets, presence on StumbleUpon, Digg, and so on.
For your guidance, the top sites are currently (24th May 2015):
1. Google.com
2. Facebook.com
3. Youtube.com
4. Yahoo.com
5. Baidu.com
6. Amazon.com
7. Wikipedia.org
8. Taobao.com
9. Twitter.com
10. Qq.com
11. Google.co.in
12. Live.com
and more. This short list was made on the basis of the public “top 500 list” of Alexa.com, www.alexa.com.

If you want to know more about Alexa such as its history or WayBackMachine (interesting, see a short intro nelow), please visit Wikipedia’s page: http://en.wikipedia.org/wiki/Alexa_Internet.

Wayback Macine is really interesting.

The Wayback Machine is a digital archive of the World Wide Web and other information on the Internet created by the Internet Archive, a non-profit organization, based in San Francisco, California. It was set up by Brewster Kahle and Bruce Gilliat, and is maintained with content from Alexa Internet. The service enables users to see archived versions of web pages across time, which the Archive calls a “three dimensional index.” – taken from Wikipedia, http://en.wikipedia.org/wiki/Wayback_Machine.

Wayback Machine - What Is Indexing And PageRank

Wayback Machine – What Is Indexing And PageRank

As you can see, Alexa has been having great importance with collecting WEB data for years. With using these data, supported by WayBack Machine, you can easily take a time travel and see even your site how it looked in the past (if you forgot it) or you will get the chance to analyze other sites. This info can be very useful when you investigate or study some facts about various sites.

I hope I could answer the question “What is indexing and PageRank” and this info will be able to contribute to your better understanding.

As you can see, there is a lot of to do. So, head is up and go ahead! Examine the Importance And Usage Of Meta Tags.

Thank you for reading!
Gyula Olah electrical engineer, author, publisher, the owner of SEO Service Guide site.

New SEO software tool

SEOprofiler is a cool web-based SEO software tool that helps you to get high rankings on Google, Yahoo and Bing. You can also check the backlinks, Google rankings and Google AdWords of any website for free. Just enter a domain name in the following search box:

SEO Rank Monitor - The Most Complete Ranking Tracker

Just another complex SEO SolutionSEO Rank Monitor (Try it for 14 Days FREE!):

SEO Rank Monitor - The Most Complete Ranking Tracker