Blown To Bits

Archive for the ‘Search’ Category

Data Mining and the Search for Terrorists

Tuesday, October 7th, 2008 by Harry Lewis

In Chapter 2 of Blown to Bits we discuss the Total Information Awareness program, which was cut short but replaced by several other programs aimed at identifying terrorists by sifting through massive quantities of everyday data. A National Research Council report released today comes to the conclusion that such data mining efforts don’t work very well and wouldn’t be a good idea even if they did. The panel is no bunch of hippie leftists; it includes a former Secretary of Defense, computer science experts, and a former president of MIT.

The CNet summary of the report is here. The bottom line:

The most extensive government report to date on whether terrorists can be identified through data mining has yielded an important conclusion: It doesn’t really work.

A National Research Council report, years in the making and scheduled to be released Tuesday, concludes that automated identification of terrorists through data mining or any other mechanism “is neither feasible as an objective nor desirable as a goal of technology development efforts.” Inevitable false positives will result in “ordinary, law-abiding citizens and businesses” being incorrectly flagged as suspects.

This reminds me of the NRC report on strong encryption (Chapter 5), which recommended against legislative efforts to prevent the export of encryption software. It didn’t immediately settle the political argument, but eventually reason won out. Will reason prevail here, and will we go back to a probable-cause basis for searches of our personal information? Or will we act on arguments like the one Senator Gregg used in favor of regulating encryption: “Nothing’s ever perfect. If you don’t try, you’re never going to accomplish it. If you do try, you’ve at least got some opportunity for accomplishing it”?

Geolocation+BarCode Scanning = Killer Cell Phone App?

Monday, September 29th, 2008 by Harry Lewis

In a piece we published in May, we noted:

[Geolocation] data would be a goldmine for advertisers targeting their ads at cell phones — they would love to know not only who you are, but where you are. And it would be a boon for shoppers, too — imagine being able to ask, when Nordstrom’s doesn’t have your favorite stockings in your size, if any nearby store has them in stock.

But to do that, you’d have to have enough information about what you were looking for to type in the identifying information, or else spend time Web browsing, a clumsy process on a cell phone. Not very realistic as we described it.

A Japanese company has an application for Google’s new phone that does something similar to what we had in mind, but much more practical and more widely useful. You see the stockings in the store, whip out your phone, and point the camera at the bar code on the package. The camera doesn’t actually photograph the bar code; it reads it, and then gives you back a list of nearby stores with the same item, and what they are charging for it.

Brilliant. If I were running a fancy department store with high ceilings and high overhead, I’d be shaking in my shoes at the prospect of people using my nice premises for shopping, and discount stores for buying.

More on retail applications of bar code scanning by cell phone in this article by Erik Hermansen.

A Billion Dollar Search Query Mistake

Tuesday, September 9th, 2008 by Hal Abelson

Blown to Bits readers of chapter 4 know that we should stop to think before acting on the information produced by search engines.¬† Yesterday, a Florida stock analyst didn’t stop¬† ‚Äî and United Airlines stock lost 75% of its value, a billion dollars, in 15 minutes.¬† The stock largely recovered, down only 10% by day’s end but investors who sold at the low are stuck; and other airline stocks were affected as well.

Yesterday’s panic was the result the Bloomberg News Wire printing a one-line note about a Florida investment newsletter’s note about an article on the web site of a Florida newspaper reporting that United had filed for bankruptcy.¬† The article, which originally appeared in the Chicago Tribune, was accurate reporting, except that it was from 2002, and it was located in the archive section of the Florida paper’s web site.

It seems that an analyst at Income Securities Advisors did a Google search for “bankruptcy 2008”, which turned up the story, and then passed it on without checking it or, one might suspect, without reading it carefully.¬† In the inevitable finger pointing, one of inevitable finger pointees is Google, with the newspaper asking how a link to a 6-year-old story from their archive got returned from a query indicating “2008”.¬† The article didn’t even appear in yesterday’s newspaper, but, as Google points out in its defense, was listed as one of the “most popular” on the paper’s web site, which the Google search engine took as an indication that the article was, well, popular.

One might imagine a more careful search search engire, one that would double check the actual dates of news article, or even identify their original sources.¬† But more to the point, it wouldn’t hurt to have more careful people, especially those who are being paid to supposedly analyze information, not just uncritically accept and pass along the results coughed up by mysterious computer programs.

According to the president of the securities company, his researcher didn’t verify the story before passing it on because, “we are a reading service,” and since the story appeared in the paper “I don’t think that calls for us to check it out.” (As quoted in the Chicago Tribune.)

That’s an interesting view: it’s OK for¬† professional analysts to do their job by typing in search queries and passing on the results without having to apply any judgment.¬† I bet we could get a computer program to do that.¬† We could call it “Google”.

Search Histories, Caylee Anderson, and Bill Gates

Saturday, September 6th, 2008 by Harry Lewis

Caylee Anderson is the Florida toddler whose mother Casey failed to report her missing for a month and who has been jailed for child endangerment (she’s out on bail). No one yet knows what happened to the little girl, but CNN reports this tidbit today:

Authorities said they have found traces of chloroform in the car Anderson drove and Internet searches of chloroform Web sites on her computer.

Searching computers is as much a part of criminal forensics now as searching a crime scene or the home of a suspect. And because, as we say, bits don’t go away, it can be even harder to eradicate digital fingerprints than it is to eradicate real ones.

Most likely the authorities were just checking the web browser history on Casey’s computer. If you don’t know what I’m referring to, look for a “History” menu on your browser; it’ll show ¬†where you’ve been to on the Web. The default setting on Safari, a browser I use on my Mac, is to save the history for a week, but I can make it longer. It’s a convenience; every now and then I want to go back to something I was looking at a few days ago, and by using the history I can find it quickly. When I search using Google, the history records not just that I was using Google, but what I was searching for. Bingo, if you’re a gumshoe and can get access to my machine. (There is an entirely separate issue of whether Google is keeping its own record of my searches and would turn it over to law enforcement. We talk about that in Blown to Bits also.)

Suppose Casey wanted to cover her tracks — what should she have done? Well, Safari has a “Clear History” command; that would be a good place to start. There’s also a “Reset Safari” menu item (try it — it will let you choose what to reset and give you the option of canceling or following through). Firefox calls this “Clear Private Data.”

But most people are PC and Internet Explorer users. I assumed Casey is too, and checked what Microsoft says about clearing the history of Explorer searches.

Have you seen those Mac ads where a geeky Bill Gates figure fumbles about the complexities of Vista, side by side with a cooler, more normal Mac user? (As a personal caricature, it’s actually unfair to Bill; when he was the age of the actor, he was wiry and energetic, like a coiled spring, not the doughy goofball the ad depicts. Of course, the ad doesn’t claim that’s supposed to be Bill. And in any case ads aren’t required to be fair about things like that.)

Here’s what Microsoft has to say about How to Clear the History Entries in Internet Explorer for version 6:

1. Close all running instances of Internet Explorer and all browser windows.
2. In Control Panel, click Internet Options.
3. Click the General tab, and then click Clear History.
4. Click Yes, and then click OK to close the Internet Options dialog box.

If the cached addresses are still listed in the Address box in Internet Explorer, use the following steps:

1. Quit Internet Explorer.
2. Delete all of the values except for the (Default) value from the following registry key:

HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\TypedURLs

NOTE: Values in this registry key are listed as Url1, Url2, Url3, and so on. If you delete only some values and the remaining values are not in consecutive numerical order, only some of the remaining entries are listed in the Address box. To prevent this behavior from occurring, rename the remaining values so that they are in consecutive numerical order.

Even if Casey had tried to cover her tracks, she probably couldn’t have managed, if she was using the version of Explorer that is most widely in use. No wonder Microsoft is mounting its own funky advertising campaign, starring Jerry Seinfeld and the real Bill Gates, to humanize its products.

And no wonder Google sees an opportunity with its new Chrome browser, as we discussed recently. And indeed, no wonder, as David Pogue noted, Chrome has

something called Incognito mode, in which no cookies, passwords or cache files are saved, and the browser’s History list records no trace of your activity. (See also: Safari, Internet Explorer 8 [which is now available in Beta].) Google cheerfully suggests that you can use Incognito mode “to plan surprises like gifts or birthdays,” but they’re not fooling anyone; the bloggers call it “porn mode.”

That’s a useful feature for anyone planning a crime, too!

P.S. There is yet another issue. Even if the history isn’t visible through the menu commands, traces of it may well still be stored on disk in a way that a brute force search of disk blocks, one by one, would reveal. “Deleted” doesn’t actually mean that the bits have been destroyed utterly. In both the offense and defense of computer forensics, you can almost always do a better job if you spend more time and money, so how confidently one can say that bits are “gone forever” depends on the cash value you attach to destroying them or discovering them.

Those Chinese Gymnasts, Exposed Again

Wednesday, August 20th, 2008 by Harry Lewis

As previously reported by the New York Times and noted in this blog (The Google Cache Strikes Again), two of the medal-winning Chinese female gymnasts are only 14 years old, according to rosters posted on Chinese web sites at the time of earlier competitions. (They have since been furnished with passports showing them now to be the minimum Olympic eligibility age of 16.) The NYT found a copy of the roster cached at Google (see pp. 124-126 of Blown to Bits for an explanation of how this works).

Now blogger Stryde Hax has found traces of incriminating rosters at the Chinese search engine Baidu — the one controlled by Chinese authorities. Links to the two cached copies are here and here — though I don’t expect they will stay visible for long, now that they are being publicized. You need to read Chinese to pick out the gymnasts’ names.

As we say in the book, search is power. And bits don’t go away.

The whole concept of truth is being shaken by developments like this. Will the IOC be able to continue to accept the word of Chinese authorities that those new passports have the girls’ real birthdates and those old records are wrong for some reason?

Fun With Google Insights

Saturday, August 16th, 2008 by Harry Lewis

In Blown to Bits, we stress that search is a new form of power. You have to hand it to Google; they recognize that sharing the power is good for everyone. Google Insights (http://www.google.com/insights/search/#) empower everyone to find out who is looking for what.

The site pitches itself as a set of business tools — figure out who your customers are, predict demand, etc. But you can use it for anything. For example, where are people most interested in “anthrax”? Iraq. (Locations are determined by IP addresses; there’s no way to know who’s actually doing the searching. The people in Iraq who are interested in “anthrax” could well be Americans.) How about “Nuclear bomb”? Pakistan, though the U.S. is right behind, and interest everywhere is waning. (The data go back to 2004, but you can choose a different time frame.) The next two lovely examples are courtesy of Ethan Zuckerman. “Email Extractor Lite 1.4” — a tool for extracting email addresses from large quantities of text — has most interest in the west African countries of Cote d’Ivoire and Burkina Faso. You don’t suppose people there want to use it to produce spam, do you? And “keygen” — a source of digital keys for unlocking pirated software — is of most interest in Cambodia, Russia, and Belarus.

Have fun. Your level of worldly experience — and perhaps the sickness of your mind — are the only limits to what you can learn about the interests of our fellow members of the human race.

Google News: Russians Approaching Savannah

Monday, August 11th, 2008 by Harry Lewis

After yesterday’s heavy post, I thought I’d go with something lighter today. Google News accompanied a story on the conflict between Russia and Georgia with a map locating the battles in the American South!

The Google cache strikes again

Monday, July 28th, 2008 by Harry Lewis

The New York Times had several good bits stories over the weekend. The Education Week article about de-tagging Facebook photos, for example. Cheap, ubiquitous sensors–digital cameras in the hands of teenagers and college students–combined with the vast Facebook social network have resulted in lots of embarrassing party photos appearing online every Sunday morning. When their peers tag the photos with the names of the people appearing in them, the photos turn up in searches for the names of the revelers. So every Sunday afternoon the hung-over youth “de-tag” the photos, which remain visible but unsearchable. (And if you’re the only one not tagged in the photo, well, that creates an interesting social tension–you’re saying you’re the only one who believes that your reputation is going to be damaged by being seen with the others at that party!)

But my favorite is the story about the perhaps under-age Chinese gymnasts. They have passports showing their age as 16, the minimum allowed in Olympic competition. But the enterprising reporters think some may be as early as 14. Why?

The Times found two online records of official registration lists of Chinese gymnasts that list He’s birthday as Jan. 1, 1994, which would make her 14. A 2007 national registry of Chinese gymnasts — now blocked in China but viewable through Google cache — shows He’s age as “1994.1.1.”

Another registration list that is unblocked, dated Jan. 27, 2006, and regarding an “intercity” competition in Chengdu, China, also lists He’s birthday as Jan. 1, 1994. That date differs by two years from the birth date of Jan. 1, 1992, listed on He’s passport, which was issued Feb. 14, 2008.

Nice detective work. Some earlier public list of athletes had the correct date, goes the theory; Google indexed it and kept a copy, as Google generally does; the Chinese later decided to make the athlete a couple of years older, and took the web page down; but Google’s cached copy is still visible from the U.S. site where it is stored. Just like the example on page 125 of Blown to Bits. Except in this case, the cached copy itself is blocked inside China, even though it’s a copy of a Chinese web page. Bits are awfully hard to eradicate–it will be interesting to see if this incident becomes a problem for the Chinese team.

McCain and Google

Thursday, June 19th, 2008 by Harry Lewis

In 1992, George H. W. Bush exhibited pleased astonishment when he discovered that supermarkets used barcode readers for the prices of items at the cash register. It should perhaps not have been surprising that he had not recently done any grocery shopping, but it raised the question of whether his familiarity with the way the world actually works had given him the right instincts on policy issues as president.

Last week John McCain said he was using Google (or perhaps “a Google”) to vet his vice presidential candidates. It’s certainly true, as he went on to say, that “What you can find out now on the Internet — it’s remarkable.”¬†The remark seemed a bit off, not that it isn’t true, but because, like Bush’s, it seems to indicate a bit too much surprise for the time it was spoken, and too little sense of the technology’s limitations. (Blown to Bits might be good bedtime reading for him.)

A friend pointed me to a video from earlier in the campaign that may help explain McCain’s comment about Google. McCain acknowledges that he doesn’t use a personal computer at all. “I am an illiterate who has to rely on my wife for all the assistance that I get,” he says. He doesn’t seem proud of it, though, and maybe his more recent “Google” remark shows he is trying to catch up.

The campaign is showing plenty of heat around energy policy. Will technology policy be an issue at all? How well prepared are the candidates to discuss the challenges that lie ahead? What would be some good questions they might be asked during whatever “debates” may occur?

Google is #1

Tuesday, April 22nd, 2008 by Harry Lewis

Google is the #1 brand in the world, according a Millward Brown report, Top 100 Most Powerful Brands ‘08. The ranking formula multiplies “Intangible earnings” by “Portion of intangible earnings attributable to brand” by “Brand earnings multiple.” Others will have to judge whether these three factors are the right ones, whether their values can be determined meaningfully, and whether that is the right way to combine them. I am a bit skeptical. The #2 brand? GE. #3 is Microsoft, #4 is Coca-Cola, and #5 is China Mobile.

If Google is the #1 brand—and that does feel right, whatever calculation produced the result—the implication is astonishing. The top brand in the world is one that almost no one had heard of a decade ago. The earliest reference I could find to “Google” in a search of newspaper archives was a May 31, 1998 column by Bradley Peniston in the Annapolis, MD Capital, entitled “Yahoo for new search engine.” (That’s leaving out all the articles about the Barney Google comic strip.) A week later, in his next column, Peniston had to explain where to find Google—on the Stanford web site!