PageRank Basics
PageRank measures the probability that a random user will hit a particular Web page.
It is a valid proxy measure of popularity and quality of a document. Search results
on Google are ranked by over 100 factors. PageRank is one of the most important factors.
For well optimized pages, competing for top rankings on Google is really the competition of
PageRank or link popularity. As I explained in
Search Engine Optimization,
a page with lower PageRank can be placed on top position for certain search terms by optimizing
on-page ranking factors.
The mathematical formula of PageRank is simple yet elegant.
PR(A) = (1-d) + d (PR(P1)/N(P1) + ... + PR(Pi)/N(Pi) ... + PR(Pn)/N(Pn))
PR(A) - the PageRank of a page A.
d - a damping factor between 0 and 1, usually set to 0.85.
PR(Pi) - the PageRank of a incoming page Pi.
N(Pi) - the number of outgoing links from the page Pi
The PageRank of a Web page is a sum of contributed PageRanks of all pages linking
to it. By linking to a page, the page contributes its PageRank value divided by the number of links on that page.
The algorithm doesn't distinguish internal links (links within a site) from external links
(links from other sites). It is important to make the distinction for page ranking
since Web site owners do have control over how pages are linked within a site.
There're four factors that will impact the PageRank value of a Web page.
- # of internal incoming links the more pages are linked to a page,
the higher the PageRank value the page will have.
- # of internal outgoing links the more pages a page links to, the lower the PageRank it will have.
- # and quality of external incoming links the more external pages link to a page, the higher the PageRank
the page will have. The higher the PageRank values the external incoming pages
have, the higher the PageRank the page will have.
- # of external outgoing links the more external links a page has,
the lower the PageRank values of all pages within a site will have.
Displayed PR vs Calculated PR
Most people believe that incoming link from a page of PR value 4 with 10 outgoing links is
worthy more the a link from a page of PR value 6 with 100 outgoing links based on the calculation:
4/10 = 0.4 > 6/100 = 0.06
This is the result of confusing PR (0 -10) displayed in Google toolbar with the
PR calculated from PageRank algorithm. How PR is calculated is a public knowledge,
but how PR is displayed is a closely guarded trade secret of Google. A reasonable
assumption is that total number of PR calculated in each level of PR displayed (0 - 10 ) should be equal.
This assumption is confirmed by the fact that there're lot of more
pages with lower PR than the number of pages with higher PR. My estimate shows that
PR calculated increases by a factor of 7 (rang from 3 - to 50 or more) for
each level of displayed PageRank on average. The estimated PR calculated for a PR 10 displayed
is about 3,000,000 if we assume there're 100 pages with PR 10 displayed. The PR calculated for PR 6 displayed is about 2,000, and
the PR calculated for PR 4 displayed is about 60, then we have:
60/10 = 6 < 2000/100 = 20
The implication is simple, one quality incoming link is worthy much more than
dozens of low quality incoming links.
This calculation was performed before Google expanded its index database from 3.2 billions to 4.2 billions.
The expansion of the index database may increase the number of pages within the same PageRank level (0 -10),
it may not necessarily decrease the PageRank levels of existing pages.
Impact of Outgoing External Links
Some believe that the PR value of a page is not impacted by the # of
outgoing links. This is true according to the algorithm if the algorithm
is not recursive. The simulation using PageRank Calculator demonstrates that
large number of external outgoing links will significantly reduce the PageRank
values of a site.
Related Topics From PageRank to SiteRank Google's 2005 Superbowl Update - the Whole Picture
|