While working on the Facebook Privacy Informer App, I had to tackle the issue of “Scope of Distribution” of your personal information. Actually, this should be more properly named as “Scope of (Intended) Distribution”. Facebook privacy controls allow you to set the distribution of various aspects of your Facebook profile. In general, the controls allow you to set distribution to:

  • (The inappropriately named) “Only Me”
  • A subset of your friends,
  • Your friends
  • Groups that you belong to
  • The general public

Why does Facebook say “Only Me” when you share information with Facebook? Shouldn’t the setting be labeled, “Only Facebook (and whoever they decide to share it with)?”. Even when you spend the time to tune those controls, there will certainly be leakage of your information beyond your intended settings.

Facebook has enough money that you would think your biggest issues would be their intended privacy violations (sales of tracking ads) and your own privacy control lapses (friending people you don’t personally know). Unfortunately that’s not really true. There is a 1 in 4 chance that your account will be hacked this year. Given the information that Facebook acknowledges it holds about you, and other information it won’t tell you about, that’s somewhat alarming. With all that information, and many examples of leaky security, what happens when the almost inevitable major breach occurs?

Still… Facebook is a very useful and entertaining service for many of us. So the issue is not how fast we run away from it, but how we control our risk to value ratio. The Privacy Informer Apps that DataBanker is creating are intended to provide feedback on your risk and strategies for reducing that risk.

The DataBanker Privacy Informer for Facebook app is currently in development and has had limited demos. One of the issues I had to incorporate into the risk scoring strategy was Facebook’s distribution scope controls. Once I  added that factor to the scoring model, I saw that it could also be used to incorporate security and reputation risks into the scoring. An example of a security issue is when Facebook says that it will only share information with your friends, but then one of your friend’s account gets hacked. A reputation issue is when Facebook gives you control over some information, but then hides other information about you that it intends to monetize. In both cases, there is an expansion of scope beyond the limit your settings indicated. In the DataBanker model, if you set that level to be “Friends”, I adjust the risk value calculation to include some leakage to the public.

That adjustment begs the question, how does one know how much to tweak the value? That’s where some interesting tools and data sources can provide value.

TACO by Abine and Ghostery created browser add-ons that show you the tracking cookies that a website is using, and allow you to set the one you want to block.

Privacy Choice provides a database of tracking cookies, and information about both the cookie and the organization behind the cookie that help to quantify the reputation and risk of each cookie.  Privacy Choice also provides tools for safe surfing and privacy policy creation and analysis. That’s very useful information for web surfers, web developers and others making tools for a safer web.

Web of Trust takes the problem of creating reputation scores and crowd-sources it. It provides a browser add-on that in real time shows what others think of the website you viewing, and allows you to contribute your own rating.

Taken together, these tools and the databases behind them inform algorithms that apply both mathematically derived and experiential data to the problem of assigning a reputation score to a website. I then use that reputation score to adjust the intended distribution scope variable. In the end, I provide a simple numeric value that relates to your privacy risk, along with information on those factors that stand out as being riskiest.

Cross-posted to DataBanker.com

I was on a development death march for the weeks leading into the Internet Identity Workshop #13 (conference notes to be posted soon on the IIW website), but I succeeded and showed the Facebook Privacy Informer App at the conference. The goal of the Privacy Informer App is to analyze the inherent privacy risks associated with a particular website or online service. It then convolves the inherent risk metrics with how the viewer has configured their website and browser privacy settings, and generates a final number that rates your personal privacy risk (see this earlier post for more info on the algorithm). Detail data, and strategies for controlling that risk while still getting value out of the website or service, are also provided as a result of the analysis.

Back in August, when Facebook made major changes in how they present your privacy settings and how they dynamically load their pages, I had to do a major retooling of the screen scraping code in the app. So I created a table driven, asynchronous, sequencing engine in cross-browser compatible JavaScript. I also used Kynetx to trigger the app when the browser loads the Facebook Privacy Settings page. The engine runs from the viewer’s browser, which has some advantages and disadvantages over one that runs as a web service.

To make the basic sequencing engine useful, I added several “filters” and actions that can be included in the sequence table, to scrape the information off of Facebook and send it out to my server for scoring. The weakness of that approach is that I had to put the Facebook page into an iframe.

Those of you familiar with using iframes know that while they’re useful for creating mash-ups, some websites abuse them to steal Google link “mojo” from the organization that actually created the content. For that reason, many websites include code that detects iframes and refuses to render the content. And that’s what Facebook recently did to break my Privacy Informer app again. Other apps that review your Facebook privacy settings, like the Reclaim Privacy app, appear to have been broken by that same change.

Now, I have to create a true browser add-on to do the screen scraping without an iframe. That also means that I have to create an add-on for at least four browsers – Safari, Internet Explorer, Firefox and Chrome. It helps that I only need to put some of the URL detection, context data and sequencing into the add-on, and that I can leave a lot of the code in JavaScript. That should reduce the difficulty inherent in supporting multiple browsers.

I’ll be done with a Facebook and Chrome version of the app soon, and will post it on DataBanker.com. My next post will describe how Facebook helped me extend the privacy scoring algorithm to include security and reputation issues. I’ll also list a small sampling of services that provide useful data and tools for understanding your online privacy.

For the last few weeks, I’ve been helping improve the web presence of a local business organization that  promotes the independent businesses of Hunterdon County, NJ. Check it out at HunterdonFirst.org.

An interesting aspect of the website is that each business has the ability to edit their own page. A couple of businesses have taken advantage of this opportunity, and I’ll be working with more to build their own content and strategy.

The website uses WordPress as its core tool, which I extended with a child theme for Presswork, and several useful plugins including:

Modifications were made to several of the plugins to get them to work the way I wanted them to, but all provided a great starting point and I’m very thankful to their creators.

I’ll be taking some of what I did for Hunterdon First and updating this website soon.

Maslow’s Hierarchy of Needs, a popular tool for many pop-psych discussions, also provides a useful framework for discussing privacy. The privacy concerns that I described in my previous post can be mapped to Maslow’s Hierarchy as:

  • Will I be harmed? => Safety
  • Will my property be damaged or taken? => Safety
  • Will others think bad of me? => Esteem
  • Will I be bothered by people trying to sell me stuff? => Self Actualization

Let’s think about that last one. Is Self Actualization a useful label for my concern about being bothered? I do think that being bothered takes my attention and resources away from my prime task of being the best “Dwight Irving” that I can be.

It‘s interesting that there is no privacy concern in my list that can be related to the levels of Physiological, Love and Belonging, or Self-Transcendence. Given those gaps, I wonder if I’m missing something. Should “Publicy” and “Publicness” (see Stowe Boyd and Jeff Jarvis ) be considered as the privacy concepts that come in at the Self-Transcendence level? Some think so. Even if Publicness and Publicy should be the goals of my quest for enlightenment, I’d rather make that decision myself by controlling the release of my data, than to let others grab my data and make the decision for me.

I’m working through Facebook’s Privacy settings this morning as part of a new design and engineering project. Johnny Lang in the background singing “Good Morning Schoolgirl” seems very apropos.

Have you taken the time to look through your Facebook settings lately? While I expected most of what I saw, what really struck me as weird were the permissions that may be allowed for friends-of-friends, and for the apps that friends install. Like many others, I personally know all of my Facebook friends. Like most others who use Facebook, I have friended some who are only brief acquaintances. Even of those friends that I know well, I don’t have a lot of trust in their ability to identify online scams and data harvestors. And given what little trust I have for my friends in that area, none of it transfers to friends-of-friends or the apps my friends use.

Why would anyone give friend-of-friend access to their detailed profile and social network information? Why would Facebook, by default, allow friends-of-friends to view my birthday, wall posts (and my friends’ wall posts), political and religious views,  and photos? Why should the apps that my friends install have access to my profile by default?

If you don’t already have a good understanding of Facebook privacy settings, I suggest that you read this. If you also want to see all the permissions that Facebook apps may request, check out this.

During a recent stint at a startup creating a data marketplace based on a commodities exchange model, I created a method for measuring the somewhat abstract concept of “Privacy.” The measurement and analysis of privacy factors was surfaced in a Privacy Score, providing feedback on an entire data portfolio, and as a graphic measurement of privacy disclosure for a single data source.

Before I’m done explaining how I scored privacy, I have to climb up a mountain and explain a little of social network and graph theory. However, before we uncoil our rope, I’d like to address the question of why I decided to provide what appears to be the first practical Privacy Score.

My first task upon joining that earlier company was to gather data samples from various sources so that their file formats could be analyzed, and the upload and conversion software developed. Friends and family were asked for their data and their uniform response was “Why, and what are you going to do with it?” After explaining the company’s personal data marketplace business proposition (when you’re the first, it may take a couple of tries), it turned out that a key part of the “what are you going to do with it?” question was really about our security and privacy practices. The question was expected, but the level of concern from friends and family was unexpected. People who felt I could be trusted with key assets and personal health were asking me this question, and not sending their data until it was sufficiently answered. It must be a REALLY important question.

We responded by incorporating privacy as a key component to our business plan and service architecture. We developed innovative methods and designs so that those concerns were answered. We filed for patents on several innovative ideas that we discovered in treading this new path. The concepts and design behind the Privacy Score was one of those filings. Yeah, I’m kind of proud of that.

That’s the “Why?”, now let’s start the climb and I’ll explain the “How?”.

As luck would have it, at the time I started gathering data for the marketplace inventory, I was also reading Duncan Watts’ excellent book, Six Degrees: The Science of a Connected Age. As I was reading, I started thinking about how reputation and privacy were closely linked, and how they were examples of a “cascade” type of power law relationship. To put it into layman’s terms: If small rocks are stacking up on a hillside ledge, the rock pile will build up until slides have a high probability of happening with each new addition. It’s a gradual build-up with occasional small, limited releases, until a big rock comes along and knocks it all away.

The core idea that releasing enough information about yourself eventually causes the loss of some privacy, is easy enough to grasp. That’s a pretty simple power law function where all elements are roughly equal in weight and keep building until small slides start happening. The probability of a slide – a metaphor for an impact to your relationship – increases greatly as you reach the knee of a curve and keep adding stuff.

What was missing from many privacy discussions was the big thing that comes crashing like the Kool-Aid Man through your rockpile, and sends your reputation flying. That’s the “KAM Factor”.

The base for the Privacy Score is a power law function modified by a kicker – the “KAM Factor”. The next tweak to the equation addressed the question, “Once your privacy/reputation has sustained the “KAM” release, how much worse can things get?” It turns out that they can get a bit worse. If Punchy (from Hawaiian Punch) and a red bull come running down the mountain with the Kool-Aid Man, the rock pile is going to release with a little more energy than the Kool-Aid Man alone. If your reputation suffers from your being an embezzler, and on your way to court you also get cited for littering, you’ll find that the judge can think even less of you. The power law is still active as a minor component of your Privacy Score, once the KAM Factor has been established as the main driver of privacy and reputation.

The above description provides a map to the shape of, and key factors in, the privacy scoring model. What’s not described is how the sensitivity weights for the data elements are assigned. That requires an understanding as to which data elements are KAMs. In other words, which elements are the ones that people will be sensitive about releasing? There are two key components to sensitivity concerns that can be simply stated as: “What do they know?” (we call this “Persona”) and “Do they know that it was me?” (this is “Identity”).

Persona elements describe the container that is a person: What kind of things they like, what products they own, and what they do. A well rounded Persona is a reputation impact that hasn’t happened yet, but could if an Identity assignment is made.

Identity elements are those that decrease one’s anonymity. Identity elements are things like a name, an online alias, an email address, a SSN, or an account number. A frequented location is also a type of Identity element, but one that behaves differently from the other naming elements previously described. A location could also be a Persona element describing the environment you did your activity in. In creating a sensitivity score for adding up Identity elements, location-related elements are separated out and treated differently, before finally being added into both the Identity and Persona components of the Privacy Score.

Once the Identity and Persona component scores are calculated, they are factored together to calculate the Privacy Score. That Privacy Score is an estimate of risk for reputation impact from the information you are releasing. It’s worth stating that the Privacy Score does not include a factor for whether the potential impact would be positive or negative for your reputation.

The Privacy Score is calculated using probabilities and sensitivity weights appropriate for a general audience. If your data contains values that are far from normal, or if someone already holds a secret about you that is also released to them in an aggregated, but detailed, report, your probability for reputation impact increases. Likewise, the single element sensitivity values are assigned may be different than what you would rate for those elements.

The Privacy Score algorithm provides a unique view into the amount of information you are releasing about yourself, but it also has limits. Please spend time to understand how those limits relate to your circumstances and feelings around privacy. Remember that you own the final responsibility for your Privacy and Reputation.

Update: I received feedback indicating cross-cultural confusion about the Kool-Aid Man. For examples of his body of work, see here, here, and here.

Like others involved in the emerging privacy marketplace, I think a lot about what “Privacy” means. There are many ways to approach this question, and this post is just one of the ways that I have been thinking about answering it.

When people talk about online privacy, what do they mean? Most “Privacy” concerns seem to fall into the general buckets of:

  • Will I be bothered by people trying to sell me stuff?
  • Will others think bad of me?
  • Will my property be damaged or taken?
  • Will I be harmed?

The evolution of our concern for privacy is certainly a thought provoking topic (get started here and here). Back when humans built their homes in whatever cave they could evict the current resident from, a failure to keep private had immediate health concerns. If someone, or something, knew about my daily business, they could steal my food supply, my mate, my home, or my life simply by waiting for me to sleep in my usual place. It was a competition for survival, and the more you know of your competitor, the likelier you were to live.

Concerns about harm to person or property are the ones that your primitive self, your atavistic side, still recognizes. Have you ever felt someone’s eyes on you even though you couldn’t see them? Has a co-worker tracked your actions so that they could gain advantage at the next meeting? Have the hairs on the back of your neck raised as you entered your credit card number into an online store? That’s your old lizard brain, Freud’s “id,” speaking to you.

That old lizard shouldn’t be brushed aside. In today’s online world there are stalkers waiting to do you harm. One could be sitting next to you at the coffee house watching your WiFi packets pass by as you login to your bank account. Another could be hacking the travel website you’re using to plan next month’s 4 week safari in Kenya. That info could be sold to someone who will have a leisurely time emptying your house.

We’re lucky that most predators are simple opportunists who don’t make a business out of such things. Most methods for evading opportunists involve common-sense precautions. Still, there are the few shadowy stalkers who greatly profit by invading our privacy. Evading all of their techniques is much more difficult, and could require one to go completely off the grid. As one old punch line puts it, “You call that living?”

Given that the planning predator is rare, if you practice the simple personal security techniques aimed at circumventing the opportunists, you likely won’t come to the attention of the more cunning ones.

The Electronic Privacy Information Center has a list of online resources that both inform and provide methods for limiting your online exposure. In addition to those resources, please also include our friends at Reputation.com, Personal Data Ecosystem and TRUSTe in your own resource list.

While sensational and usually well publicized, it’s rare for online stalking to result in someone getting bashed over the head with a rock. Much more common is the type of economic stalking carried out by credit bureaus, insurance companies, catalog retailers, etc. Economic stalkers look for outliers in the population to target: Post about a rare disease with your real name, and you may look forward to difficulties getting health insurance. Friend the wrong organizations on Facebook, and you may watch your credit score go down.

The casual online surveillance and taking of your personal data by behavioral trackers has similarities to that time when Gronk took Mord’s stone ax and flint supply while he slept. In the Personal Data ecosystem, data taken by others is an asset that is lost forever.

I spent last night as a volunteer at the Sherman Theater in Stroudsburg, PA for a Moe. concert. Long story short, Great Time! Go here for the full story.

By Dwight Irving

Note:  Updated on Dec 6, 2009 – Found a bug in the code.  I’ve updated the code below and have a notation where the bug was.

A while ago, I needed to figure out how to search for database entries that were within a specified distance from a central point.  The code was needed for the search function at CrossroadsAngel.com where you can search for members by location.  I found many pages that had a similar goal of calculating great circle distance between any two points on a sphere, but that’s not quite the same problem.  I found a few web pages that described how to convert linear distance to delta latitude and longitude, but I didn’t find any complete code examples that worked across all border conditions.

The key equation for translating linear distance to latitude and longitude on a sphere is the Haversine formula.  Latitude boundary conditions for this formula occur at the  north and south poles and the equator.  For longitude, the boundary conditions occur at the prime and 180th degree meridians. For distance, the boundary condition is the circumference of the Earth.

I made a simplifying assumption that the earth is a perfect sphere.  To make the database search easier I used a rectangular bounding box rather than a circular one.  The rectangle used encloses the circle and therefore increases the distance from the centerpoint at any angles other than 0, 90, 180 and 270 degrees (0, PI/2, PI, 3PI/2 radians).

I thought it would also be handy to be able to use either kilometers or miles in the function, so I’ve included a flag where miles are true and kilometers are false

In PHP the code is as follows:

class diGeoBox
{

public $LatN;
public  $LatS;
public  $LonE;a
public  $LonW;

public function __construct($Lat1, $Lon1, $Distance, $boolMiles)
{

$RadCalcConst=M_PI/180;
$InvRadCalcConst=180/M_PI;

// Convert degrees to radians
$Lat1 = $Lat1 * $RadCalcConst;
$Lon1Rad = $Lon1 * $RadCalcConst;

// Miles or kilometers?

if ($boolMiles)
      $DistRad = $Distance * 2.52780586e-4; // 1/3956 radius of the earth in miles
else
      $DistRad = $Distance * 1.57070574e-4; // inverse radius of the earth in km

if ($DistRad >= 2*M_PI)
{

$this->LatN = 90;
$this->LatS = -90;
$this->LonE = 180;
$this->LonW = -180;
return;

}

//     North Latitude boundary
$this->LatN = $Lat1 + $DistRad;
// Convert back to degrees
$this->LatN = $this->LatN * $InvRadCalcConst;
// check for wrapping
if ($this->LatN > 90.0)
      $this->LatN = 90.0 – fmod($LatN,90.0);

// South Latitude
$this->LatS  = $Lat1 – $DistRad;
// Convert back to degrees
$this->LatS = $this->LatS * $InvRadCalcConst;
// check for wrapping
if ($this->LatS < -90.0)
      $this->LatS = -90.0;

// if at a pole, Longititude goes from -180 to 180
if ( abs($Lat1) == M_PI/2)
{

$this->LonE = 180;
$this->LonW = -180;

}
else
{

$tmp=sin($Lat1);
$tmp=$tmp*$tmp;

// calculate East Longitude boundary
$dLon = abs(atan2(sin($DistRad) * cos($Lat1), cos($DistRad)- $tmp));
$this->LonE = fmod( $Lon1Rad + $dLon +M_PI,2*M_PI )-M_PI;
// convert back to degrees
$this->LonE = $this->LonE * $InvRadCalcConst;

//check to see if wrapped around 180 longitude
//**  There was a bug here in the original version where
//**  Lon1 used to be Lon1Rad
if ($this->LonE < $Lon1)
{

// LonW is an equal amount on the other side of $Lon1 as $LonE
$this->LonW = $Lon1-($this->LonE + 180);

}
else
{

// LonW is an equal amount on the other side of $Lon1 as $LonE
$this->LonW = 2*$Lon1 – $this->LonE;
// See if it wrapped
if ($this->LonW < -180)
      $this->LonW = $this->LonW + 360;

}

}

}     //__construct()

}         //diGeoBox

Example php code using the diGeoBox function and tests the 180 degree meridian and the polar conditions is below.

$BoundingBox = new diGeoBox( 0, -170, 15000, 1);
echo ‘LonE = ‘.$BoundingBox->LonE. “\n”;
echo ‘LonW = ‘.$BoundingBox->LonW. “\n”;
echo ‘LatN = ‘.$BoundingBox->LatN. “\n”;
echo ‘LatS = ‘.$BoundingBox->LatS. “\n”;

Using this function at CrossroadsAngel.com requires that street address and zip codes be converted into latitude and longitude coordinates (geocoding).  Because CrossroadsAngel.com is a commercial website, and because I need to store converted locations in a local database for performance reasons, I am using the NAC Geographic Products webservice for stored conversion, and Google Maps API for geocoding the “search center” location.  The Google Maps API is accessed using JavaScript from the client browser and the converted coordinates are sent to the server for query execution.

By Dwight Irving

Another long pause between blogs. < sigh… >

I’ve been very busy working towards launch at CrossroadsAngel.com, with our first live event on July 9th at the Black Potatoe Music Festival in Clinton, NJ. It looks like (barring a meltdown of the development process) we’ll be ready to sign up our initial batch of members at the festival, and give away some branded merch.

I’d like to describe CrossroadsAngel.com a bit more, but it’s not in me to try to hype something that I can’t show you. Instead, I’ll just repeat the basic description that CrossroadsAngel.com is a B2B social network for the performing arts industry. There is an emphasis on B2B that I believe is lacking in our potential competitors, and that is where our opportunity lies. Mining the aggregated network activity will be the basis for providing marketing services to businesses in that industry. Member privacy and “trust” are extremely important to the social business model, so anonymized / aggregated data will be the basis for quantitative methods.

I’ll be back to blogging regularly soon. I have somethings to say about how the OASIS Blue Initiative is driving the home energy applications of the future, and how the energy service providers want to control your fridge. What next, we rent our appliances from the power company? Remind you of anything? Hmm?

If I don’t post earlier, I’ll definitely be back on the blog patrol when CrossroadsAngel.com goes live in the next couple of weeks.