Personal Privacy and the Public Internet
John E. Carter
Kennesaw State University
November 10, 2001
The Internet provides a wealth of sources for information, products, and services of all types, making it a convenient place for consumers to research topics and make purchases. Although Internet users know that some personal data will be required to make a purchase, they are often unaware of the personal data that can be collected without their knowledge by simply visiting a Web page or reading e-mail. This paper addresses some of the ways unauthorized personal information has been and is being collected and steps that can be taken to prevent or avoid this collection.
To make an online purchase, an Internet user must provide a certain amount of personal information to the vendor. This information usually includes the user's name, address, telephone number, e-mail address, and credit card data. There have been many reported cases of security failures at online vendors of products and services. Because of this publicity, most Internet users are aware that there is a potential for the information they provide to be exposed to the world whether by human error, careless security practices by a vendor, or a successful attack by a hacker. However, most users are not aware of the amount of personal information that can be collected without their consent when they do such ordinary things as visiting a Web page, opening a document, or reading an e-mail message.
How can personal information be collected without the user's knowledge? Information about an Internet user can be collected in many ways, including the underlying protocol of the Web, "cookies", banner advertisements,"Web bugs", and hi-tech "toys". A user may also provide information to an online vendor to reduce annoyances. Almost every user has seen a popup ad for the X10 wireless video cameras and some of the ads are almost full-screen in size. The X10 company is aware that people can become annoyed when the same popup ad keeps appearing. Some of the ads have a "Click here to disable this ad" button that takes the user to an X10 page and promises not to show the ad again for 30 days. (X10 popup, October 2001.) This requires that a "cookie" (a small text file recognized by a Web server) be installed on the userís computer. A small piece of information has been collected about the user: this computer has seen an X10 ad because the usual link to the popup inhibitor page is via an ad for some X10 product. The X10 site also knows the IP address, operating system, and browser version of that computer. If a user visits the popup inhibitor page directly by using the URL in the bibliography, the same information will be collected. This is covert data collection because the user did not intend to give information but simply wanted to suppress an annoyance.
How does a server know so much about a userís computer? The protocol of the Web requires that a certain amount of information be exchanged between a userís Web browser and the Web server with which it is communicating. The server needs to know the type and version of the browser because different versions have different capabilities, such as support for Java or on-line forms. The Web server knows the page from which the user came and the page to which the user goes because that information is part of the environment shared by the Web server and the userís Web browser. This information is intended to provide the Web site with "How did they find us?" (referring Web page) and "Where did they leave us?" (last page viewed) information.
The Web is an inherently stateless environment, with no record of previous interactions between a Web server and a Web browser. The connection between Web browser and Web server is repeatedly made and dropped as items of data are transferred. This requires that some method be used to track the status of a Web browserís interactions with a Web server. This was the initial purpose of cookies.
Cookies are created by Web browser commands sent from a Web server. The browser responds to the command by creating a text file containing one or more NAME=VALUE pairs. A typical command would be in the format:
Set-Cookie: NAME=VALUE; expires=DATE;
path=PATH; domain=DOMAIN_NAME; secure
The only required attribute is the initial NAME=VALUE which identifies the cookie. The attribute expires=DATE defines the lifetime of the cookie. In the case of the X10 ad disabler, the date would be expected to be 30 days from the date the user visited the ad disabler page. The PATH attribute specifies a subset of URL's in the domain that set the cookie. A value of "/foo" matches "/foobar" and "/foo/bar.html." The default is "/" which gives access from any location on the server. The default DOMAIN_NAME is the name of the host that originated the cookie. Cookies can only be retrieved by a server in the same domain as the server that set the cookie (usually the same server). If "secure" is specified, the content of the cookie can only be retrieved over a secure connection (URL beginning https://). Cookies are based on several Request For Comments (RFC) documents. RFC's are the standard way of introducing new features to the Internet, with a proposal being made (the RFC) and interested parties responding to the proposal. The pertinent documents for cookies are RFC 822, RFC 850, RFC 1036, and RFC 1123. (Netscape, 1999)
Every person who has visited a Web portal or search engine has seen banner advertisements. Some of them are small and simple; others are large and complex. However, they all have the potential to collect information about the user. The process of loading an image (the banner ad) requires that the userís Web browser contact the server that provides the image. In the case of banner ads, the link to that image also provides other information to the image server. (Smith, November 1999.) The image server logs all requests that it receives. These requests must include at least the IP address of the userís computer so that the image can be sent back to the requesting computer. Cookies are used to track the userís movements from page to page. The server can set a cookie on the userís computer that effectively contains the name of the server setting the cookie, and the date, time, and page on which the ad was viewed. As the user moves from page to page, the ads may be different but often are coming from the same image server or another image server from the same advertising company. The image server can read the cookies it has previously set to determine other pages the user has visited. Some cookies may be useful, such as retaining a userís ID and preferences at a frequently visited site, but others are just for collecting marketing data.
There is obviously some cost involved in developing and maintaining the infrastructure to create, install, and track Web bugs and their associated cookies. The tracking process provides income to the collectors of data because marketing organizations will pay for user profiles. The value of a user profile depends on a number of factors, such as ZIP code and buying habits, but is generally in the range of 10 cents to $2.50. (Sullivan & Jones, November 1999.)
The difference between data collected by banner ads and traditional surveys is that when viewing banner ads, unlike when talking to someone with a clipboard, the user is unaware of being surveyed. DoubleClick.com, a major provider of banner ads, is being sued for collection of personal information in violation of the privacy rights guaranteed by Californiaís Constitution. The trial is tentatively scheduled for January 2002. (Electronic Frontier Foundation, June 2001)
Consider the user who only browses the Web and never downloads anything from the Internet: no programs, no pictures, and no music. This user is also at risk because code, called "Web bugs", can be invisibly embedded in Web pages. A Web bug is an invisible graphic image whose link to the image server carries additional information. This is very much like the extra link information passed by banner ads, but there is no visible image and thus no indication that an information transfer may have taken place. Web bugs are most often used to collect advertising and usage information, such as which pages are read most often. However, the bugs can also be written to capture the user's IP address, read a file on the user's hard drive (perhaps a cookie containing the userís name and e-mail address), even write to the user's hard drive (an executable file that collects data while the user is online). A Web bug can also be attached to an e-mail message in order to generate a message back to the originator when the original message is replied to or forwarded. (Olsen, March 2001.) The collection of data for selective advertising is big business and now offers the advertiser banner ads targeted to a specific household. (Naviant, October 2001.)
Web bugs can also be placed in Microsoft Office documents (Word, Excel, or PowerPoint). A Web bug could be used for valid purposes such as tracking who sees a confidential corporate document or which media organizations have read a press release. (Lemos, August 2000.) This type of bug could also be used to determine with whom a user exchanges e-mail. A funny story in Word format or a presentation in PowerPoint could have a bug embedded and the document sent to a user. The user opens the document and the bug sends its data to the image server. The user then forwards the document to some number of people who also view the document. The bug sends back information about each person who opens the document.
For people who use an HTML-enabled e-mail program, such as Netscape or Internet Explorer, a Web bug can be embedded directly into an e-mail message. The bug will send its message to the image server when the e-mail message is read. If the message is forwarded to other recipients who use an HTML-enabled e-mail program, the bug will send its message when each user opens the message. (Olavsrud, February 2001.) This might be an incentive for users to forsake the bells and whistles of Outlook and the e-mail components of Netscape and Internet Explorer. Older versions of Eudora are not HTML-enabled. The user loses the ability to see a butterfly flit across the text of a message, but Web bugs also lose their ability to track the user.
Banner ads and cookies are not the only covert means of data collection. Technology "giveaways" should also be suspect. One example is the :CueCat, a bar code reader shaped like a cat. Radio Shack has distributed thousands of these devices for free to enable people to access an item at radioshack.com by scanning its associated bar code in the Radio Shack catalog. Other vendors have placed :CueCat bar codes in magazine articles and printed advertisements to provide an easy way for consumers to access their Web sites. The downside of the device is that its associated :CRQ software transmits sufficient information for its creator, Digital:Convergence, to record every bar code a user scans. Although Digital:Convergence states that they do not track individual users, the potential to track each user exists in the current software: the GUID assigned when the product is registered is sent along with each bar code scanned. The :CueCat also comes with a TV/computer interface that allows the computerís sound card to monitor TV audio for :CueCat audio signals. When an audio cue is received, the software treats it the same as a scanned code and connects the computerís Web browser to the associated site. (Privacy Foundation, September 2000.) Not only can a userís interest in an online product or service be determined but also the userís choice of TV programs. Short of legislation that enforces user privacy, is it realistic to expect that a company would not use this data to profile individual users?
The covert data collection by advertisers and others is not intentionally malicious, although some of the programs are poorly written and sometimes cause problems with Web browsers and other programs. There is malicious intent on the part of people attempting to access an Internet user's computer by one of the many "port scan" programs that methodically search for an unprotected connection point on a computer. These programs check for commonly available ports, such as those used for file and printer sharing and hundreds of other programs. The only indication that such an attack is in progress is unexpected data flow, such as modem lights flashing when the user is not performing any actions on the computer. Some of the ports can provide full access to a computer, giving the hacker read, write, and erase capability on the user's hard drive. (Freund, March 2000.) The solution for this type of attack is to ensure that all unused ports are disabled. There are online services that will check a user's computer for open port vulnerabilities: Gibson Research (ShieldsUp!!) and DSLReports (SecureMe) are two of many such services. These scans can provide a user with a list of open access points on a computer, the risks associated with each open port, and suggestions for closing that port. Firewall software such as ZoneAlarm (www.zonelabs.com) can intercept attempts to access the ports on a computer, alerting the user to such attempts and providing a means for determining the owner of the IP address from which the attack is coming. A software firewall may also be able to indirectly detect a Web bug embedded in a document. If Microsoft Word attempts to access the Internet while a user is viewing a Word document, there would be reason to believe that the document contains some type of link that is making the request.
For users with high-speed access to the Internet such as DSL (digital subscriber line) or a cable modem, a hardware firewall/router device can be configured to make the computer(s) behind it effectively invisible. This removes most of the concerns about someone scanning a computer for an open port but does not prevent a Web bug from sending back data: the bugís messages look like normal Web traffic. The most secure solution is to use both hardware and software firewalls to provide both incoming and outgoing privacy protection.
Is the collection of little pieces of information really a bad thing? "Most privacy violations don't come from whopping big intrusions but from the aggregation of hundreds of small bits of knowledge, none of which individually seems important. Who cares if someone knows your ZIP code or your social security number? What about a tossed-out receipt from your ATM or an old credit card receipt? What's your mother's maiden name? But put those violations all together and you're well on your way to identity theft -- or worse." (Vogt, March 2001.) The data mining that is done to find some possible corporate advantage involves correlating all available data to find patterns; the patterns found can uniquely identify an individual without that person ever giving permission for anyone to collect or use that data.
Potential threats from Web bugs can be reduced by using a filter program such as Personal Sentinel from Intelytics.com. It provides a graphical display of the privacy risk of a Web page and can operate as a personal firewall filtering out undesirable content, online advertisements, and third-party cookies. Bugnosis.com offers a free add-in that can detect the "invisible" (very small) graphic images that usually indicate a Web bug, but the product only works with Internet Explorer 5 for Windows. When a suspect graphic is detected, the add-in provides an audible alert, pops up a window that identifies the suspect image, and displays the image's URL.
Security usually comes at a price, but basic personal security on the Internet is relatively inexpensive. Much information on privacy and security is freely available on the web from organizations such as the Electronic Privacy Information Center (www.epic.org, the Privacy Rights Clearinghouse (www.privacyrights.org, the Electronic Frontier Foundation (www.eff.org, the World Wide Web Consortium (www.w3c.org, and the Privacy Forum (www.vortex.com/privacy.html). The ZoneAlarm software firewall is free for personal use and can be downloaded from zonelabs.com. Personal Sentinel from interlytics.com is $18.95, about the price of a game.
Does anyone other than "techies" know or care that spyware methods of covert data collection exist? Surprisingly, the answer is yes. The World Wide Web Consortium has proposed the Platform for Privacy Preferences (P3P) as a solution to Internet usersí privacy concerns. (www.w3c.org/P3P)
There is political support for P3P in the United States House of Representatives. (Tillett, June 2001.) However, the P3P specification has a great information disparity that is heavily weighted in favor of the Web site. "The required data elements (from the user) in P3P are: Name, Birthdate, Gender, Employer, Department, Job title, Home address, Business address, Bill to address, Ship to address. The Web site must identify itself (although it appears that this can be as little as its Web address) and specify its privacy practices in relation to the data being requested." (Coyle, November 1999.) This looks less like personal privacy protection and more like a gift for the Direct Marketing Association. A more equitable exchange of information would provide as much information about the Web site as is requested from the user. It seems reasonable that the Web site should provide the identity of the actual owner of the site and current, valid contact information including a mailing address and a telephone number.
A future concern for privacy is Microsoft's Passport, which they are advertising as the ultimate way for an Internet user to be identified to an online vendor. Microsoft will maintain a master database of all people signed up as Passport users and provide authentication to vendors. The Passport can also serve as a digital wallet and is incorporated into Windows XP. The concept of one database as the entry point to the Internet "seems to undermine [the Internetís] distributed nature." (Gates, August 2001.) There is a real concern that the Passport software on a user's computer is insecure. During use, the user's name and password are kept in memory in plain text. Just as the LoveBug virus could read a user's e-mail address book and send messages, a similar virus could scan a known area of memory, capture the user's Passport name and password, and send that information to the virus originator - or to everyone in the user's address book. (Rash, September 2001.) Microsoft now has an effective monopoly of the desktop operating system and office tools market. Consider how Microsoft has handled competition in the past: developing and giving away Internet Explorer to take the market from Netscape; buying Cooper Software, the original developers of Visual Basic. Is it unreasonable to see the single massive Passport database as an attempt by Microsoft to control access to the Internet?
A person's level of privacy, whether on and off the Internet, depends on who is listening. Determining whether there are listeners and who they may be is the responsibility of the person desiring privacy. Constant diligence in staying abreast of current threats (both technological and political) and available measures to reduce or eliminate those threats is the only way to maintain privacy.
Can personal privacy be protected on the Internet? Only if the user is aware of the potential risks to privacy and takes steps to counter those risks. The Electronic Frontier Foundation lists 12 ways to protect online privacy and provides detailed reasoning for each one. (McCandlish, September 2001.)
1. Do not reveal personal information inadvertently.
2. Turn on cookie notices in your Web browser, and/or use cookie management software or infomediaries.
3. Keep a "clean" e-mail address.
4. Donít reveal personal details to strangers or just-met "friends".
5. Realize you may be monitored at work, avoid sending highly personal e-mail to mailing lists, and keep sensitive files on your home computer.
6. Beware sites that offer some sort of reward or prize in exchange for your contact or other information.
7. Do not reply to spammers, for any reason.
8. Be conscious of Web security.
9. Be conscious of home computer security.
10. Examine privacy policies and seals.
11. Remember that YOU decide what information about yourself to reveal, when, why, and to whom.
12. Use encryption!
It is possible to achieve a degree of personal privacy on the Internet, but this goal requires that the user continually seek education about, and be actively involved in, maintaining that privacy. New threats to privacy appear almost daily, whether a new e-mail virus or a more subtle way of collecting user information. The Internet user who seeks privacy must be alert to new developments in both privacy erosion and privacy protection. Some of the privacy-oriented organizations listed previously offer both scheduled and "new threat" e-mail updates in their areas of concern. Vendors of anti-virus software publish updated virus definition lists on at least a weekly basis and sometimes more frequently. These sites usually have current information about viruses, worms, and the "terrible new virus" e-mail messages that are only hoaxes.
"Eternal vigilance is the price of liberty." (Wendell
Phillips, 1852.) Eternal vigilance is also the price of maintaining personal privacy.
Gates, Dominic. (2001). Are Microsoft's Papers in Order? The Industry Standard. August 20-27, 2001, 4.
Tillett, L. Scott. Pols Push Privacy Standards. InternetWeek. June 7, 2001.www.internetweek.com/story/INW20010607S0007. Accessed: October 7, 2001.
Vogt, Carlton. The issue is privacy and the outlook is grim. InfoWorld. March 20, 2001. //www2.infoworld.com/articles/op/xml/01/03/23/010323opethics.xml?Template=/storypages/printfriendly.html Accessed: October 6,2001.
Olsen, Stefanie. Reversal of Fortune -- tracking Web Trackers. ZDNet. March 5, 2001.www.zdnet.com/zdnn/stories/news/0,4586,2692472,00.html Accessed: October 8, 2001.
Rash, Wayne. Your Stolen Passport. ZDNet. September 26, 2001. techupdate.zdnet.com/techupdate/stories/main/0,14179,2814881,00.html Accessed: October 8, 2001.
Electronic Frontier Foundation. June 6, 2001. Judge Rules Alleged DoubleClick Privacy Violations Sufficient to Go to Trial.www.eff.org/Privacy/Marketing/20010606_eff_doubleclick_pr.html Accessed: October 8, 2001.
Coyle, Karen. Some Frequently Asked Questions About Data Privacy and P3P. Computer Professionals for Social Responsibility. November 21, 1999.www.cpsr.org/program/privacy/p3p-faq.html Accessed: October 8, 2001.
McCandlish, Stanton, EFF Technology Director. EFFís Top 12 Ways to Protect Your Online Privacy. September 27, 2001.www.eff.org/Privacy/eff_privacy_top_12.html Accessed: October 7, 2001.
Privacy Foundation. The :CueCat Bar Code Reader. September 22, 2000.www.privacyfoundation.org/privacywatch/print.asp?id=44&type=0 Accessed: October 5, 2001.
Smith, Richard M. The Web Bug FAQ. Electronic Frontier Foundation. November 11, 1999.www.eff.org/Privacy/Marketing/web_bug.html. Accessed: October 27, 2001.
Olavsrud, Thor. InterNews. HTML E-mail Clients Susceptible to 'Wire-Tapping'. February 5, 2001.www.internetnews.com/dev-news/article/0,,10_579871,00.html. Accessed: October 30, 2001.
Lemos, Robert. ZDNet News. No easy way to exterminate 'Web bugs'. August 31, 2000. techupdate.zdnet.com/techupdate/stories/main/0,14179,2622610,00.html Accessed: October 30, 2001.
Sullivan, Jennifer and Jones, Christopher. Wired News. How Much Is Your Playlist Worth? November 3, 1999.http://www.wired.com/news/technology/0,1282,32258,00.html. Accessed: October 31, 2001.
Intelytics products.www.intelytics.com. Updated: Unknown. Accessed: October 10, 2001,
X10 pop-up.www.x10.com/x10ads1.htm. Updated: Unknown. Accessed: October 9, 2001.
Netscape. Persistent Client State - HTTP Cookies.home.netscape.com/newsref/std/cookie_spec.html. Updated: 1999. Accessed: October 30, 2001.
Last update .