The idea of choosing a rather audacious title for the article is not just to catch a few eyeballs! It’s a genuine attempt to point out some obvious flaws in the ubiquitous Google Search. For decades now, we have got accustomed to taking what Google Search offers, as the best results for our queries. But, are they the best results really? Let’s find out!
Quite tangential to the way in which Google performs its search, there is an alternate perspective to searching supported by the concept of Information Foraging. This theory has caught the fascination of many researchers and empirical evidence is available to show how search results based on this theory, may provide better matches to user search queries. More on this promising concept discussed below.
Before pointing out what Google Search does not do well, it’s only fair that I highlight what it does do, quite remarkably. So, let me start with a summary of how Google Search works, how it pulls out search results from trillions of web pages.
Instead of merely theorizing the search process, let’s use this article itself as a demo case and see how Google search would discover it once it gets published on the Whatfix Academy.
Google in its own words, claims that its search process is divided into 3 steps:
60 TRILLION+ WEB PAGES are available on the internet. Google first navigates the web by a process called CRAWLING, that means links are traced from page to page. This way new pages get identified, changes in existing pages get updated and obsolete content is cleaned up regularly.
A massive hardware and software infrastructure is employed for this crawling process. It consists of a set of powerful computers running a complex algorithmic program nicknamed “Googlebot”, which determines which sites to crawl, how often, and how many pages to fetch from each site.
How Googlebot Works – Source
Google’s crawl process begins with a list of web page URLs, generated from previous crawl processes, and augmented with Sitemap data provided by webmasters. But even for the most invasive web crawlers, only about 40-70% of the web pages on the internet are covered. Site owners can choose whether their sites are to be crawled – fully, partially or not at all. This matters especially because crawlers are a big drain on system resources, and the web-site may not be able to afford it.
So how would this article get crawled and recognized by Google search, for instance? First, there would already be several references to Whatfix articles in the Google Index discovered during previous crawl attempts. A link from the Whatfix Academy home page will lead the crawler to this new article page.
Index Process of Google
The end result of the extensive crawling process is an INDEX of all visited web pages. A list of keywords or phrases known as “metadata” present in each web page is compiled and mapped to the index. Especially information included in key content tags and attributes, such as Title tags and ALT attributes. However, rich media files and dynamic web pages get ignored.
Coming back to our example. After crawling, the Google Index would now have a reference to this article along with a few keywords picked up as metadata. We might choose to pick phrases like “How search works”, “Information Foraging”, etc., and ensure they are made visible to the Googlebot. So the article’s URL and the chosen few keywords find a place in the Google Index.
The Google index takes the content it receives from Googlebot and uses it to rank pages. Google search INDEX is supposedly over 100 million gigabytes!
All the above discussion is about what Google does even before a user types his search query into the search window! Because Google doesn’t have time to do much after the query is entered. The query results are supposedly presented before the user is less than 1/8th of a second, as per Google claims!
So, once the query keywords are entered, Google Search simply goes to the Index and hunts for the most relevant match. Relevance is determined by many factors, the most significant one being a PAGERANK. PageRank is a measure of how well “connected” the web page is, how many incoming links from other web pages lead to this page.
Google search has recently killed its right-hand side ads and flooded the visible part of Page 1 results with sponsored links. So even the best result according to Google gets a mention only at the bottom of the page!
Taking our example forward, let’s assume an inquisitive web user has just typed “How Google search works” in his browser. What are the tricks I could employ to make sure my article shows up in the search results somewhere on top?
If I sit back and do nothing, it would probably show up on Page 6 or 60 of Google search, that’s as good as not there. So, I start by “pushing” the article on social media. Post it on Facebook (of course already having a million friends on Facebook helps!). Share to all LinkedIn connections – insist that they share it further. Tweet about anything under the sun but insert the link of the article on the sly somewhere in the middle! In short, I need to spend more time marketing my article than I did to write it in the first place!
In all the noise about crawling, indexing and ranking, one crucial fact is sidelined. Web pages get noticed by Google search simply due to the efforts of the web master or creators of web content. But, what about the opinion of the consumer of the content, the one who reads from the web page and evaluates whether the content is satisfactory or not?
Google’s answer to this question is this. PageRank of a web page increases if readers recommend the page explicitly – in terms of likes, shares, comments, responses, grades and so on. But herein lies the catch. How many readers of online content are active in expressing their opinion? Statistics reveal that the number of “Like” or “Share” responses are merely 1.5-2% of the total number of viewers. So, the key to this puzzle might be to find out what a passive consumer feels about the page, from his implicit behavioral actions.
Information Foraging theory provides a few answers to this question. It is a product of the Human-Computer Interaction research at the Palo Alto Research Center (previously Xerox PARC) by Stuart Card, Peter Pirolli, and colleagues. Information foraging uses the analogy of wild animals gathering food, to analyze how humans collect information online.
Let me bypass the complex mathematical models associated with the information foraging theory and cut to the chase. Key findings of the information foraging theory are as follows:
Assigning maximum weightage to the reader’s response who is the ultimate consumer of the web content is a paradigm shift in search result ranking. As opposed to ranking a web page based on the webpage creator’s abilities to create strong linkages, which Google follows and encourages.
When an information seeker finds a good web source, what are the things he is likely to do? Let us visualize:
It has been experimentally proven that the above implicit actions have a strong positive correlation with the user explicitly liking the contents. By assigning proportional weightage to the above user actions, we can derive a Page ranking based on the merit of the web page content, rather than the marketing muscle of a web-site. If search results are sorted based on such a metric, quality of web content is surely likely to improve.
Let’s take a look at a Case Study and understand the comparison between the search query results.
Take for instance an online search for a news item covering a regional event. The best coverage is present in a local online newspaper, which has a few hundred loyal and passionate readers and carries articles written with heart and soul. Ideally, this article should take precedence over a superficial, brief news snippet churned out by a national daily covering the same issue. Google search will, unfortunately, pick the national daily.
Alternate search models, with due importance given to content, rather than networking clout are the way forward. So, if you search for “interactive walkthroughs”, Whatfix pops up among the first results!