Opt-in/Opt-out: How to Add Shades of Gray

Posted on October 26, 2013 in Privacy / Reputation, Social networking | 0 comments

Update: See related CMO.com article

The choice of opting in or out, as presented by many businesses to their customers, is a limiting one. Adding shades of gray to the options improves the likelihood that your customers will find one they like. Enlightened e-newsletters are already using this principle by allowing their readers to select from a list of different newsletters that target different aspects of the subject matter. Many of these newsletters also allow readers to select the period (daily, weekly, monthly) at which they’d like to receive the newsletters. As described in research by Wilson, et al, providing users more choices leads to higher customer satisfaction and the sharing of more data.

I was lucky enough to sit in on a talk by Professor Sadeh of Carnegie Mellon University describing this research a few years ago. It was then that I decided to create methods to enable “shades of gray” for a broader set of data types. This paper describes an application that enables its users to share their location. Shades of gray can be easily added to location by changing the resolution of the data. Resolution can be changed by adding noise or decreasing the data precision. A simple resolution set for location could be:

  1. Latitude & longitude provided to the precision of the measurement method
  2. 10 meters (e.g. in a building)
  3. Block
  4. Zip code
  5. City
  6. State
  7. Country
  8. Hemisphere

How do you create similar resolution sets for other types of data. The first step is to create a taxonomy of the different data types you’ll be dealing with. Make the taxonomy as simple as it can be without losing important distinctions. A data taxonomy for data related to a bike ride might be:

  • Location
  • Speed
  • Acceleration (could be calculated from other data)
  • Altitude
  • Rider age
  • Rider weight
  • Rider heartbeat
  • Rider respiration rate
  • Pedal force (could be calculated from other data)
  • Gear selection
  • Cadence (could be roughly calculated from other data)
  • Bike weight
  • Bike type
  • Bike manufacturer
  • Bike component model & manufacturer
  • Bike drive train efficiency (could be roughly calculated from other data)
  • Temperature
  • Humidity
  • Date-time
  • Tire pressure

Some of the bike ride data elements are dynamic, requiring small sampling intervals, while others change slowly enough that they could be described using a single sample that is structured as part of the ride’s “context”. Some of the data elements, as noted above, could be calculated from other data elements and reasonably removed during a follow-up normalization step.

The above data set works well when comparing bike ride to bike ride, but what if you want to compare a bike ride to other types of transportation or recreation. One obvious way to generalize the taxonomy is to change things like “bike weight” to “vehicle weight”, and “rider age” to “component age” that is linked to “Participant” as the “Component”. The next step is to normalize the data element units into a standard unit set. Luckily, scientists have been doing this for a long time and have a ready-made set of units for physical measurements,

Now that I have a generalized taxonomy to describe the data elements, and a standard set of units so that I can compare bike rides with moon trips. How do I create the shades of gray for data sharing? Each element of the taxonomy has to have at least one meaningful resolution set. A taxonomy element may have more than one meaningful resolution set depending on the possible contexts that element may be used in.

Taxonomy Element Context Resolution Set
Speed Biking 1. As measured
2. Nearest 2.235 m/s (5 MPH)
3. Above/below avg.
4. Moving(T/F)
Speed Flying 1. As measured
2. Nearest 25.72 m/s (50 Knots)
3. Nearest 51.44 m/s (100 Knots)
4. Above/below cruise speed
5. Above/below Mach 1
Heart beat Exercise 1. As measured
2. Nearest 5/sec
3. Nearest 10/sec
4. Above/at/below target
5 Measured (T/F)
Heart beat Medical 1. As measured
2. Nearest 5/sec
3. Nearest 10/sec
4. Healthy(T/F)
5. Beating (T/F)

A lot of work? Maybe, but we did it for the DataBanker personal data marketplace design and found it to be easier than many would think. The biggest hurdle is usually digging in and creating the taxonomy. Meaningful resolution sets are easily created once you know how the data will be used, and where the privacy sensitivity points will be. Putting the shades of gray into backoffice operation was simple with this methodology. An additional bonus is that adding privacy metrics was a breeze.