During a recent stint at a startup creating a data marketplace based on a commodities exchange model, I created a method for measuring the somewhat abstract concept of “Privacy.” The measurement and analysis of privacy factors was surfaced in a Privacy Score, providing feedback on an entire data portfolio, and as a graphic measurement of privacy disclosure for a single data source.

Before I’m done explaining how I scored privacy, I have to climb up a mountain and explain a little of social network and graph theory. However, before we uncoil our rope, I’d like to address the question of why I decided to provide what appears to be the first practical Privacy Score.

My first task upon joining that earlier company was to gather data samples from various sources so that their file formats could be analyzed, and the upload and conversion software developed. Friends and family were asked for their data and their uniform response was “Why, and what are you going to do with it?” After explaining the company’s personal data marketplace business proposition (when you’re the first, it may take a couple of tries), it turned out that a key part of the “what are you going to do with it?” question was really about our security and privacy practices. The question was expected, but the level of concern from friends and family was unexpected. People who felt I could be trusted with key assets and personal health were asking me this question, and not sending their data until it was sufficiently answered. It must be a REALLY important question.

We responded by incorporating privacy as a key component to our business plan and service architecture. We developed innovative methods and designs so that those concerns were answered. We filed for patents on several innovative ideas that we discovered in treading this new path. The concepts and design behind the Privacy Score was one of those filings. Yeah, I’m kind of proud of that.

That’s the “Why?”, now let’s start the climb and I’ll explain the “How?”.

As luck would have it, at the time I started gathering data for the marketplace inventory, I was also reading Duncan Watts’ excellent book, Six Degrees: The Science of a Connected Age. As I was reading, I started thinking about how reputation and privacy were closely linked, and how they were examples of a “cascade” type of power law relationship. To put it into layman’s terms: If small rocks are stacking up on a hillside ledge, the rock pile will build up until slides have a high probability of happening with each new addition. It’s a gradual build-up with occasional small, limited releases, until a big rock comes along and knocks it all away.

The core idea that releasing enough information about yourself eventually causes the loss of some privacy, is easy enough to grasp. That’s a pretty simple power law function where all elements are roughly equal in weight and keep building until small slides start happening. The probability of a slide – a metaphor for an impact to your relationship – increases greatly as you reach the knee of a curve and keep adding stuff.

What was missing from many privacy discussions was the big thing that comes crashing like the Kool-Aid Man through your rockpile, and sends your reputation flying. That’s the “KAM Factor”.

The base for the Privacy Score is a power law function modified by a kicker – the “KAM Factor”. The next tweak to the equation addressed the question, “Once your privacy/reputation has sustained the “KAM” release, how much worse can things get?” It turns out that they can get a bit worse. If Punchy (from Hawaiian Punch) and a red bull come running down the mountain with the Kool-Aid Man, the rock pile is going to release with a little more energy than the Kool-Aid Man alone. If your reputation suffers from your being an embezzler, and on your way to court you also get cited for littering, you’ll find that the judge can think even less of you. The power law is still active as a minor component of your Privacy Score, once the KAM Factor has been established as the main driver of privacy and reputation.

The above description provides a map to the shape of, and key factors in, the privacy scoring model. What’s not described is how the sensitivity weights for the data elements are assigned. That requires an understanding as to which data elements are KAMs. In other words, which elements are the ones that people will be sensitive about releasing? There are two key components to sensitivity concerns that can be simply stated as: “What do they know?” (we call this “Persona”) and “Do they know that it was me?” (this is “Identity”).

Persona elements describe the container that is a person: What kind of things they like, what products they own, and what they do. A well rounded Persona is a reputation impact that hasn’t happened yet, but could if an Identity assignment is made.

Identity elements are those that decrease one’s anonymity. Identity elements are things like a name, an online alias, an email address, a SSN, or an account number. A frequented location is also a type of Identity element, but one that behaves differently from the other naming elements previously described. A location could also be a Persona element describing the environment you did your activity in. In creating a sensitivity score for adding up Identity elements, location-related elements are separated out and treated differently, before finally being added into both the Identity and Persona components of the Privacy Score.

Once the Identity and Persona component scores are calculated, they are factored together to calculate the Privacy Score. That Privacy Score is an estimate of risk for reputation impact from the information you are releasing. It’s worth stating that the Privacy Score does not include a factor for whether the potential impact would be positive or negative for your reputation.

The Privacy Score is calculated using probabilities and sensitivity weights appropriate for a general audience. If your data contains values that are far from normal, or if someone already holds a secret about you that is also released to them in an aggregated, but detailed, report, your probability for reputation impact increases. Likewise, the single element sensitivity values are assigned may be different than what you would rate for those elements.

The Privacy Score algorithm provides a unique view into the amount of information you are releasing about yourself, but it also has limits. Please spend time to understand how those limits relate to your circumstances and feelings around privacy. Remember that you own the final responsibility for your Privacy and Reputation.

Update: I received feedback indicating cross-cultural confusion about the Kool-Aid Man. For examples of his body of work, see here, here, and here.

Like others involved in the emerging privacy marketplace, I think a lot about what “Privacy” means. There are many ways to approach this question, and this post is just one of the ways that I have been thinking about answering it.

When people talk about online privacy, what do they mean? Most “Privacy” concerns seem to fall into the general buckets of:

  • Will I be bothered by people trying to sell me stuff?
  • Will others think bad of me?
  • Will my property be damaged or taken?
  • Will I be harmed?

The evolution of our concern for privacy is certainly a thought provoking topic (get started here and here). Back when humans built their homes in whatever cave they could evict the current resident from, a failure to keep private had immediate health concerns. If someone, or something, knew about my daily business, they could steal my food supply, my mate, my home, or my life simply by waiting for me to sleep in my usual place. It was a competition for survival, and the more you know of your competitor, the likelier you were to live.

Concerns about harm to person or property are the ones that your primitive self, your atavistic side, still recognizes. Have you ever felt someone’s eyes on you even though you couldn’t see them? Has a co-worker tracked your actions so that they could gain advantage at the next meeting? Have the hairs on the back of your neck raised as you entered your credit card number into an online store? That’s your old lizard brain, Freud’s “id,” speaking to you.

That old lizard shouldn’t be brushed aside. In today’s online world there are stalkers waiting to do you harm. One could be sitting next to you at the coffee house watching your WiFi packets pass by as you login to your bank account. Another could be hacking the travel website you’re using to plan next month’s 4 week safari in Kenya. That info could be sold to someone who will have a leisurely time emptying your house.

We’re lucky that most predators are simple opportunists who don’t make a business out of such things. Most methods for evading opportunists involve common-sense precautions. Still, there are the few shadowy stalkers who greatly profit by invading our privacy. Evading all of their techniques is much more difficult, and could require one to go completely off the grid. As one old punch line puts it, “You call that living?”

Given that the planning predator is rare, if you practice the simple personal security techniques aimed at circumventing the opportunists, you likely won’t come to the attention of the more cunning ones.

The Electronic Privacy Information Center has a list of online resources that both inform and provide methods for limiting your online exposure. In addition to those resources, please also include our friends at Reputation.com, Personal Data Ecosystem and TRUSTe in your own resource list.

While sensational and usually well publicized, it’s rare for online stalking to result in someone getting bashed over the head with a rock. Much more common is the type of economic stalking carried out by credit bureaus, insurance companies, catalog retailers, etc. Economic stalkers look for outliers in the population to target: Post about a rare disease with your real name, and you may look forward to difficulties getting health insurance. Friend the wrong organizations on Facebook, and you may watch your credit score go down.

The casual online surveillance and taking of your personal data by behavioral trackers has similarities to that time when Gronk took Mord’s stone ax and flint supply while he slept. In the Personal Data ecosystem, data taken by others is an asset that is lost forever.