Monday, January 28, 2008

Google Q&A

from “55 Ways to Have Fun With Google” By Philipp Lenssen_2006

Google Q&A is a fun answer feature built directly into the Google.com web search. It answers certain questions right above the search result, so there’s no need for you to visit a web page – the answers themselves are extracted from web pages.

You haven’t seen this before? Give it a try by entering the following:


Albert Einstein birthday

Above the web page results there will now be a box reading:

Albert Einstein – Date of Birth: 14 March 1879


This works with a whole lot of search queries. You can even enter
Who is Clark Kent ... and have Google reveal to you “Clark Kent is the civilian secret identity of the fictional character Superman.” All of the following yield direct Questions & Answers results (note the answers are not always correct!):


Population of Germany

President of USA

President of France

Birthday of George Bush

Birthday of Albert Einstein

What is the birthday of Albert Einstein?

Who was President of the USA in 1996?

When did Isaac Asimov die?

Isaac Asimov date of birth

Isaac Asimov birthday

What is the birthplace of Bono?

Bono birth place

Who is Prime Minister of England?

Where is the Eiffel tower

Where is the Statue of Liberty

When was Star Wars released?

Who is the Queen of the United Kingdom?

Who wrote the Hitchhiker’s Guide to the Galaxy

Catch-22 author


Permutated Sentences


Before Google’s Q&A feature, a fun way to find instant facts was to move around the words of a question sentence until you hit on an answer. To explain, let’s say your question is “When was Albert Einstein born?” We remove the first word, “when”. We’ll now do a search for the several possible rearrangements of the words, and check the Google page count for each:


“Albert was Einstein born” (0 results)

“born was Albert Einstein” (0 results)

“Albert Einstein was born” (17,500 results)

“Albert was born Einstein” (5 results)

... and so on.


The one phrase search of these returning the most results is our “fact finder.” In this case it would almost certainly be “Albert Einstein was born”, and the continuation of this sentence contains our answer. This can be automated, but takes a while as going through all permutations requires many Google searches. FindForward’s “Ask Question” search (findforward.com/?t=answer) returns the following answer (you can see there are some left-overs from the snippet which aren’t meaningful in this context):


1879, Albert Einstein was born on March 14, 1879 German born American physicist who developed the special and general theories of relativity.

Google Hacking

from “55 Ways to Have Fun With Google” By Philipp Lenssen_2006

Yes, I am a criminal. My crime is that of curiosity.

Mentor, The Hacker Manifesto


There’s a sport called “Google Hacking” which is all about searching for seemingly private websites using Google. In fact, you can only find public websites using Google, because private (password-protected) pages can’t be found by Google – so it’s no real hacking (let alone “cracking,” which would consist of deleting, changing or abusing the found data). But it’s fun nevertheless, and often enables people to discover pages someone was hoping for to stay private. This happens when the site is misconfigured, i.e. when the webmaster doesn’t know enough about how to set up a website.


Here are some of the most popular and powerful “Google hack” search queries. Enter them at your own risk, and know that every once in a while you step onto a so-called honeypot (a fake website set up to lure hackers into it, with the goal of finding out more about them and their tactics).


Finding Error Messages


Search for: A syntax error has occurred” filetype:ihtml


You’ll find: Pages which caused errors the last time Google checked them. This may hint at vulnerabilities or other unwanted side-effects.


How this works: The first phrase simply looks for an error the target server itself did once output. The “filetype” operator on the other hand restricts the result pages to only those which have the “ihtml” extension (which are sites using Informix). A related search is “Warning:mysql_query()”.


Finding Seemingly Private Files

Search for: (password passcode) (username userid user) filetype:csv


You’ll find: Files containing user names and similar.


How this works: The “filetype” operator makes sure only “Comma Separated Values” files will be returned. Those are not typical web pages, but data files. “(password passcode)” tells Google the file must contain either the text “password” or “passcode,” or both (the “” character means “or”). Also, result pages are restricted to those containing either of the words “username,” “userid” or “user.”


Finding File Listings


Search for: intitle:index-of last-modified private


You’ll find: Pages which list files found on the server.


How this works: The “intitle” operator used above will ensure that the target page contains the words “Index of” in the title. This is typical for those open directories which list files (they will have a title like “Index of /private/foo/bar”). “Last modified” on the other hand is a column header often used on those pages. And the word “private” makes sure we’ll find something of interest. A related search query which finds FTP (File Transfer Protocol) information is intitle:index.of ws_ftp.ini


Finding Webcams


Search for: powered by webcamXP” “ProBroadcast”


You’ll find: Public webcams set up by people to film a location, or themselves.


How this works: “Powered by WebcamXP” is a text found on specific kinds of webcam pages. A related search query to find cameras is inurl:“ViewerFrame?Mode=”.


Finding Weak Servers


Search for: intitle:“the page cannot be found” inetmgr


You’ll find: Potentially weak (IIS4) servers.


How this works: An old Microsoft Internet Information server may hint at security issues. This is one of many approaches that can be used to find such a weak server.


Finding Chat Logs


Search for: something “has quit” “has joined” filetype:txt


You’ll find: Chat log files showing what people talked about in a chat room.


How this works: Though the files found are all public, not everyone chatting on IRC (the Internet Relay Chat) is aware of potential logging mechanisms. The “filetype” operator makes sure only text files are found, and “has quit”/ “has joined” are automated messages appearing in chat rooms. This search is your chance to tune into people’s chatter. Note you should replace “something” with the thing you are looking for.


Thursday, January 17, 2008

Special Syntax

from “Google Hacks” By Paul Bausch, Tara Calishain, Rael Dornfest_2006

In addition to the basic AND, OR, and phrase searches, Google offers some rather extensive special syntax for narrowing your searches.

As a full-text search engine, Google indexes entire web pages instead of just titles and descriptions. Additional commands, called special syntax , or advanced operators, let Google users search specific parts of web pages for specific types of information. This comes in handy when you're dealing with more than eight billion web pages and need every opportunity to narrow your search results. Specifying that your query words must appear only in the title or URL of a returned web page is a great way to specify your results without making your keywords themselves too specific. Following are descriptions of the special syntax elements, ordered by common usage and function.

intitle:

intitle: restricts your search to the titles of web pages. The variation allintitle: finds pages in which all the specified words appear in the title of the web page. Using allintitle: is basically the same as using intitle: before each keyword:

intitle:"george bush"

allintitle:"money supply" economics

You may wish to avoid the allintitle: variation because it doesn't mix well with some of the other syntax elements.

intext:

intext: searches only body text (i.e., it ignores link text, URLs, and titles). While its uses are limited, it's perfect for finding query words that might be too common in URLs or link titles:

intext:"yahoo.com"
intext:html

There's an allintext: variation; but again, this doesn't play well with others.

inanchor:

inanchor: searches for text in a page's link anchors. A link anchor is the descriptive text of a link. For example, the link anchor in the HTML code O'Reilly Media is "O'Reilly Media."
inanchor:"tom peters"

As with other in*: syntax elements, there's an allinanchor: variation, which works in a similar way (i.e., all the keywords specified must appear in a page's link anchors).

site:

site: allows you to narrow your search by a site or by a top-level domain. The AltaVista search engine, by contrast, has two syntax elements for this function (host: and domain:), but Google has only the one:

site:loc.gov
site:thomas.loc.gov
site:edu
site:nc.us

Be aware that site: is no good for searching for a page that exists beneath the main or default site (i.e., in a subdirectory such as /~sam/album/). For example, if you're looking for something below the main GeoCities site, you can't use site: to find all the pages in http://www.geocities.com/Heartland/Meadows/6485/; Google returns no results. Use inurl: instead.

inurl:

inurl: restricts your search to the URLs of web pages. This syntax usually works well for finding search and help pages because they tend to be regular in composition. An allinurl: variation finds all the words listed in a URL but doesn't mix well with some other special syntax:

inurl:help
allinurl:search help

You'll see that using the inurl: query instead of the site: query has one immediate advantage: you can use it to search subdirectories.
While the http:// prefix in a URL is ignored by Google when used with site:, search results come up short when it is included in an inurl: query. Be sure to remove prefixes in any inurl: query for the best (read: any) results

link:

link: returns a list of pages that link to the specified URL. Enter link:www.google.com and you'll get a list of pages that link to the Google home page, http://www.google.com (not anywhere in the google.com domain). Don't worry about the http:// bit; you don't need it and, indeed, Google appears to ignore it even if you do put it in. link: works just as well with "deep" URLs http://www.raelity.org/apps/blosxom/, for instanceas with top-level URLs such as raelity.org.

cache:

cache: finds a copy of the page that Google indexed even if that page is no longer available at its original URL or has since changed its content completely:
cache:www.yahoo.com

If Google returns a result that appears to have little to do with your query, you're almost sure to find what you're looking for in the latest cached version of the page at Google.
The Google cache is particularly useful for retrieving a previous version of a page that changes often.

filetype:

filetype: searches the suffixes or filename extensions. These are usually, but not necessarily, different file types; filetype:htm and filetype:html will give you different result counts, even though they're the same file type. You can even search for different page generatorssuch as ASP, PHP, CGI, and so forthpresuming the site isn't hiding them behind redirection and proxying. Google indexes several different Microsoft formats, including PowerPoint (.ppt), Excel (.xls), and Word (.doc):

homeschooling filetype:pdf

"leading economic indicators" filetype:ppt


related:

related: , as you might expect, finds pages that are related to the specified page. This is a good way to find categories of pages; a search for related:google.com returns a variety of search engines, including Lycos, Yahoo!, and Northern Light:

related:www.yahoo.com
related:www.cnn.com

While an increasingly rare occurrence, you'll find that not all pages are related to other pages.

info:

info: provides a page of links to more information about a specified URL. This information includes a link to the URL's cache, a list of pages that link to the URL, pages that are related to the URL, and pages that contain the URL:

info:www.oreilly.com
info:www.nytimes.com/technology

Note that this information is dependent on whether Google has indexed the specified URL; if it hasn't, the information will obviously be far more limited.

phonebook:
phonebook: , as you might expect, looks up phone numbers:
phonebook:John Doe CA
phonebook:(510) 555-1212

define:

define: gives you a page full of definitions of a word from around the Web:
define:paradigm

Google often displays related phrases in addition to definitions and the URLs where the definitions were found.

movie:

Use the movie: syntax to find reviews of movies on the Web, like this:

movie:matrix

You can also use a zip code or a city and state combination to find local theater listings and movie showtimes:

movie:97333
movie:corvallis, or


music:

music: explicitly searches for music-related information:
music:pink floyd

You're given a page that splits results into matching artists, albums, and lyrics, and you can choose to explore any of these areas in depth.

Tuesday, January 8, 2008

Full-Word Wildcards

from “Google Hacks” By Paul Bausch, Tara Calishain, Rael Dornfest_2006

Some search engines support a technique called stemming, in which you add a wildcard characterusually * (asterisk) but sometimes ? (question mark)to part of your query, requesting the search engine to return variants of that query using the wildcard as a placeholder for the rest of the word. For example, moon* would find moons, moonlight, moonshot, etc.

Google doesn't support explicit stemming. It didn't used to support stemming at all, but now it implicitly stems for you. So, canine dietary will yield results for dog diet, diets, and other variations on the theme.

Google does offer a full-word wildcard. While a wildcard can't stand in for part of a word, you can insert a wildcard (Google's wildcard character is *) into a phrase, and the wildcard will act as a substitute for one full word. Searching for tHRee * mice, therefore, finds three blind mice, three blue mice, three green mice, etc.

What good is the full-word wildcard? It's certainly not as useful as stemming, but then again, it's not as confusing to the beginner. * is a stand-in for one word; ** signifies two words, and so on. The full-word wildcard comes in handy in the following situations:

1. Checking the frequency of certain phrases and derivatives of phrases, such as: intitle:"methinks the * doth protest too much" and intitle: "the * of Seville"

2. Filling in the blanks on a fitful memory. Perhaps you remember only a short string of song lyrics; search using only what you remember rather than randomly reconstructed full lines.

3. Let's take as an example the disco anthem "Good Times" by Chic. Consider the following line: "You silly fool, you can't change your fate."

4. Perhaps you've heard that lyric, but you can't remember if the word "fool" is correct or if it's something else. If you're wrong (if the correct line is, for example, "You silly child, you can't change your fate"), your search will find no results and you'll come away with the sad conclusion that no one on the Internet has bothered to post lyrics to Chic songs.

5. The solution is to run the query with a wildcard in place of the unknown word, like so:

6. "You silly *, you can't change your fate"

7. You can use this technique for quotes, song lyrics, poetry, and more. You should be mindful, however, to include enough of the quote to find unique results. Searching for "you * fool" will glean far too many irrelevant hits.

Google Web Search Basics

from “Google Hacks” By Paul Bausch, Tara Calishain, Rael Dornfest_2006


Whenever you search for more than one keyword at a time, a search engine has a default strategy for handling and combining those keywords. Can those words appear individually anywhere in a page, or do they have to be right next to each other? Will the engine search for both keywords or for either keyword?

Phrase Searches

Google defaults to searching for occurrences of your specified keywords anywhere in the page, whether side by side or scattered throughout. To return the results of pages containing specifically ordered words, enclose them in quotes, turning your keyword search into a phrase search, to use Google's terminology.
On entering a search for the keywords:
to be or not to be

Google will find matches where the keywords appear anywhere on the page. If you want Google to find you matches where the keywords appear together as a phrase, surround them with quotes, like this:
"to be or not to be"

Google will return matches in which only those words appear together (not to mention explicitly including stop words such as "to" and "or".
Phrase searches are also useful when you want to find a phrase but aren't quite sure of the exact wording.

Basic Boolean

Whether an engine searches for all keywords or any of them depends on what is called its Boolean default. Search engines can default to Boolean AND (searching for all keywords) or Boolean OR (searching for any keywords). Of course, even if a search engine defaults to searching for all keywords, you can usually give it a special command to instruct it to search for any keyword. Lacking specific instructions, the engine falls back on its default setting.
Google's Boolean default is AND, which means that if you enter query words without modifiers, Google will search for all your query words. For example, if you search for:

snowblower Honda "Green Bay"

Google will search for all the words. If you prefer to specify that any one word or phrase is acceptable, put an OR between each:
snowblower OR snowmobile OR "Green Bay"

Make sure you capitalize OR; a lowercase or won't work correctly

If you want to search for a particular term along with two or more other terms, group the other terms within parentheses, like so:
snowblower (snowmobile OR "Green Bay")

This query searches for the word "snowmobile" or phrase "Green Bay" along with the word "snowblower." A stand-in for OR, borrowed from the computer-programming realm, is the | (pipe) character, as in:
snowblower (snowmobile | "Green Bay")


Negation

If you want to specify that a query item must not appear in your results, prepend a (minus sign or dash):
snowblower snowmobile -"Green Bay"

This will search for pages that contain both the words "snowblower" and "snowmobile," but not the phrase "Green Bay."
Note that the symbol must appear directly before the word or phrase that you don't want. If there's space between, as in the following query, it won't work as expected:
snowblower snowmobile - "Green Bay"

Be sure, however, to place a space before the - symbol.

Explicit Inclusion

On the whole, Google will search for all the keywords and phrases that you specify (with the exception of those you've specifically negated with, of course). However, there are certain words that Google will ignore because they are considered too common to be of any use in the search. These words"I," "a," "the," and "of," to name a feware called stop words.
You can force Google to take a stop word into account by prepending a + (plus) character, as in:

+the king

Stop words that appear inside of phrase searches are not ignored. Searching for:
"the move" glam

will result in a more accurate list of matches than:
the move glam

simply because Google takes the word "the" into account in the first example but ignores it in the second.

Synonyms

Every so often, you get the feeling that you're missing out on some useful results because the keyword or keywords you've chosen aren't the only way to express what you're looking for.
The Google synonym operator, the ~ (tilde) character, prepended to any number of keywords in your query, asks Google to include not only exact matches, but also what it thinks are synonyms for each of the keywords. Searching for:
~ape

turns up results for monkey, gorilla, chimpanzee, and others (both singular and plural forms) of the ape or related family, as if you'd searched for:
monkey gorilla chimpanzee

along with results for some words you'd never have thought to include in your query.
Google figures out synonyms algorithmically, so you may be surprised to find results that your garden-variety thesaurus would not have suggested. (Synonyms are bolded along with exact keyword matches on the results page, so they're easy to spot.)

Number Range

One of the more difficult things to convey in an Internet search query is a rangeof dates, currency, size, weight, height, or any two arbitrary values.
The number range operator, .. (two periods), looks for results that fall inside your specified numeric range.

Looking for that perfect pair of Prada pumps, size 5 or 6? Try this for size:
prada pumps size 5..6

Perhaps you're looking to spend $800 to $1,000 on a nice digital SLR camera; Google for:
slr digital camera 3..5 megapixel $800..1000

The one thing to remember is always to provide some clue as to the meaning of the range, e.g., $, size, megapixel, kg, and so forth.

You can also use the number range syntax with just one number, making it the minimum or maximum of your query. Do you want to find some land in Montana that's at least 500 acres? No problem:

acres Montana land 500..

On the other hand, you might want to make sure that raincoat you buy for your terrier doesn't cost more than $30. That's possible too:
raincoat dog ..$30


Google normally does not recognize special characters such as $ in the search process. But because the $ sign was necessary for the number feature, you can use it in all sorts of searches. Try the search "yard sale" bargains 10 and then "yard sale" bargains $10. Notice how the second search gives you far fewer results? That's because Google is matching $10 exactly.

Simple Searching and Feeling Lucky

The I'm Feeling Lucky™ button is a thing of beauty. Rather than giving you a list of search results from which to choose, you're whisked away to what Google believes is the most relevant page given your search (i.e., the first result in the list). Entering washington post and clicking the I'm Feeling Lucky button takes you directly to http://www.washingtonpost.com. Trying president will land you at http://www.whitehouse.gov.

Case Sensitivity

Some search engines are case-sensitive; that is, they search for queries based on how the queries are capitalized. A search for "GEORGE WASHINGTON" on such a search engine would not find "George Washington," "george washington," or any other case combination.

Google is case-insensitive. If you search for Three, tHRee, THREE, or even THREE, you get the same results.

Friday, January 4, 2008

What is GOOGLE


from “55 Ways to Have Fun With Google” By Philipp Lenssen_2006

Google is more than just the search engine. Even though that alone wouldn’t be too bad, either, because it allows us to quickly receive answers from the web to almost any question asked. Today while I’m writing this book, Google consists of dozens of services (google.com/sitemap.html). Some you may have heard of, like Gmail, or Google Maps. Others are more obscure, like Google Base, Google Page Creator, Google Writely or Google X, and even Google experts can have a hard time keeping track.

To understand what people know of Google – and what they think is fun about it – I asked my sister Judith about the different services. Afterwards, I asked UK programmer and Google expert Tony Ruscoe (ruscoe.net/blog/) about these services. Both were urged to take a guess in case they were clueless about the answer. Well, who’s right then? I won’t judge, but instead will let you read their answers now!

Asking a Google Novice

Judith, what is Google Talk?
Judith: I believe that’s a text to speech program to read out things for you.

What is Google Earth?
Judith: I know that one! You can view the whole globe from above. You can zoom close into every country.

What is Picasa?
Judith: That’s a fun drawing program to create Picasso-like paintings.

What is Gmail?
Judith: That’s an email client.

What are the Google Labs?
Judith: That’s a place to propose interesting ideas for Google to add to their products. The suggestions are filtered by Google engineers and finally, they will be implemented.

What is Google Maps?
Judith: I don’t have a clue.

What is Google Scholar?
Judith: Google for students, without any adult websites.

What is Google Video?
Judith: That’s a search engine, similar to an image search, but for videos instead.

What is Google Images?
Judith: The same like a search engine for words, but with images.

What is Google Answers?
Judith: That’s a place where you can ask questions for other people to answer. If the answer is right, those who answered will get money.

What is Google Catalogs?
Judith: You can see pages taken from catalogs, for example when you enter “teddy bear,” you will see catalog pages containing teddy bears.

What is Froogle?
Judith: That could be a parody site acting just like Google... no matter what you enter, all you get are results containing images of frogs.

What are Google Alerts?
Judith: That’s when Google sees you are searching for illegal material online and you click on one of the result pages. This can have legal consequences.

What is Google Blogger?
Judith: That’s a weblog community run by Google.

What is Google Desktop?
Judith: That’s like Microsoft Windows but made by Google. E.g. it contains a word processor.

What are Google Groups?
Judith: Those are chat rooms on any conceivable topic. You can login to talk.

What is Google X?
Judith: I have no idea! Well, I suppose it’s a kind of Google-related riddle or puzzle game.

What do you think is fun about Google?
Judith: Searching for people. That’s nothing particularly special or uncommon, but it satisfies your curiosity about someone you want to know more about.

Asking a Google Expert

Tony, what is Picasa?
Tony: It’s a photo management/ organization application. You can download a program that allows you to manipulate your images.

What is Google Talk?
Tony: It’s an IM – Instant Messenger – application that allows online conversations and VoIP, Voice over IP.

What is Google Earth?
Tony: It’s fantastic! I’ve told my friends that it’s arguably the best thing to appear on the Internet this year! Seriously though, it’s a program that allows you view the earth from space. You can zoom in and view certain areas really close up.

What is Google Labs?
Tony: In my view, Google Labs isn’t really a service as such. It’s simply a name they give to many new releases that don’t quite make it to Beta. It often consists of smaller projects that some of the Google Employees create in their 20% time.

What is Google Local?
Tony: It’s pretty much like an online service directory, like the Yellow Pages. In fact, Google Local UK uses Yell.com for its results, I think. It’s recently been integrated with Google Maps so that it’s easier to see where the businesses are located.

What is Google Scholar?
Tony: It’s an online search that searches educational papers and theses, things like that.

What is Google Video?
Tony: It’s a video search that searches for videos that have been uploaded by the public or by a number of different associations who have agreed to let their content be available for free. I think it only searches the description or transcript that’s been provided by the user.

What is Google Answers?
Tony: Google Answers is an “ask the expert” service where you can submit a question, name your price and, hopefully, get an answer from an expert in the field.

What is Froogle?
Tony: It’s an online price comparison service to help you with your online shopping.

What are Google Alerts?
Tony: Basically, Google will send you an email whenever something new appears in the Google web results or Google News.

What is Google Desktop?
Tony: Google Desktop started off as a desktop application – Google Desktop Search – that enabled you to search your PC for information. I think it’s turned into something much bigger now, where you can add your own bits to it. I’ve never used it.

What are Google Groups?
Tony: Google Groups encapsulates Usenet groups as well as Usenetstyle groups that have been created by Google Account owners. They are basically discussion forums/ mailing-lists.

What was Google X?
Tony: I think it was a service similar to the existing home page that used a Mac OS X style interface. It appeared in Google Labs but then disappeared. Presumably because of legal reasons... but we don’t know. I never saw it, but I’ve seen some copies of it.

What is Google Base?
Tony: Good question. It seems to be everything! It’s an online repository where people can upload practically any data that has a structure. It can be used for storing things like recipes, people profiles and classified ads. So you can advertise anything you might have for sale – although there’s no way to take payment via Google Base at the moment. In short, it’s an online database application.

What is Google Analytics?
Tony: It’s a web stats analysis application. You place some JavaScript in your website which then collects data from your visitors using cookies. Google Analytics takes all this data and analyzes it, creating graphs and reports about your visitors’ trends.

What is Google Sets?
Tony: It’s in Google Labs. I looked at it a long time ago so I’ve forgotten exactly what it does! I think it’s a service that lets you provide several items – up to five, I think – and Google will suggest some more items that are in the same group.

What do you think is fun about Google?
There are a lot of things that make Google fun. It can be used to settle the most basic of arguments. We often use it in the office when we don’t believe what someone is saying. We run the risk of being fooled by the “If it appears on Google, it’s true!” rule! Their services are always interesting. Waiting for a new service can be exciting. It gets people talking... Very often, the services aren’t ground-breaking – but the way Google present them is. Take Gmail and Google Maps. These types of services had been around for years, yet all of a sudden you could just sit and play with Google Maps for hours!