A team of computer science researchers at Princeton University engaged in a study that actively deceived hundreds of businesses, non-profits, and private citizens – costing some of them thousands in legal fees. How did they let this happen? That’s an excellent question.
A Quick History on Pushing the Boundaries of Ethics for the Sake of Research
There are plenty of stories bouncing around – well-spread even before the Internet – about rather unprincipled academic studies. Some of the best-known ones are the Stanford Prison Experiment and the Milgram Experiment. If you don’t remember psychology class, Milgram tested students’ willingness to obey authority by punishing a fellow student. Both the original Ghostbusters film and Afterlife had scenes spoofing Milgram’s work with a twist.
The reason those stories have been passed around for decades was not to point out the excellence of the work but the dangers of pushing the boundaries of ethics for research purposes. Stanford was shut down less than a week after launching when the subjects became serious threats to each other. However, outrage over Milgram’s work was tempered by the fact that no punishment was actually administered, and 84% of the subjects tested later reported they were positively affected by the test.
There is a fine line, and an ongoing debate in academic circles on how close to the line researchers can tread. There is value in getting untainted results and being entirely aboveboard, and the general consensus in recent years has shifted overwhelmingly towards protecting the subjects of academic studies.
According to Dr. Gerald Koocher at Harvard, “the federal standard is that the person must have all of the information that might reasonably influence their willingness to participate in a form that they can understand and comprehend.”
So was that applied here or not?
Examining Website Privacy
Per the Research team’s public statement, “The study aims to advance understanding of how websites have implemented the data rights provisions of European Union and California privacy law, specifically the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).
To give you some background, on May 25, 2018, The GDPR went into effect, forever changing how personal data can be collected and used across the Internet. Shortly after that, California became the first state in the US to follow suit, with the CCPA enforced after on July 1, 2020.
The two laws have many similarities, although sometimes the vocabulary differs (i.e., personally identifiable information [PII] versus personal data), but there are also significant differences. For example, GDPR just concerns individual personal data, while CCPA protects an entire consumer household.
The biggest difference, particularly for what follows in this article, is that the GDPR covers any EU resident’s information, while the CCPA applies almost exclusively to for-profit businesses. With the GDPR, if your website collects an email or sets a cookie on any computer owned by an EU resident, they must consent. For CCPA, there are specific parameters. Companies must have an income over $25-million, collect data from more than 50,000 users, and generate at least half of their income from selling that data. Those numbers will become very important shortly.
Both laws have been pointed to as examples of measures that can be implemented across the country and the world. In fact, multiple states are using California’s laws as a model for their own legislation and watching to see how things get implemented.
Once again, from the study’s public website: “Our goals are to accurately describe how websites have operationalized these new user rights, whether websites are extending these rights to non-EU citizens and non-California residents, and whether websites are effectively authenticating users when they exercise these rights.”
On its face, the study looks like a reasonable thing to examine. Especially given the implications for most, if not all users, studying or establishing best practices moving forward is a good thing. The problem lies with the execution of the study parameters.
Weaving a Web of Deception
First, let me point out that, for whatever reason, the Princeton University Institutional Review Board determined that the privacy study did not constitute human subjects research. They could not be more wrong.
While there are many automated aspects of websites – increasingly so, as AI bots get better at offering ‘human appearing interactions’ – behind each website is at least one human being impacted by the study’s approach. And as I’ll explain in a moment, the specific method the team used to gather their data, in fact, caused many people to be affected at every site polled – the tactics used turned it into a human subject research study.
In what I assume was an attempt to simulate real users while preserving as much clinical methodology as possible, the Princeton scholars established six email servers to use to contact a sampling of websites. These included servers that would cause the emails to appear to come from American, but also French and Russian origins.
But they didn’t stop there. Emails were sent out, using false names, purporting to be residents of foreign countries, with “a few questions about your process for responding to California Consumer Privacy Act (CCPA) data access requests.” And in a weird, ironic twist, instead of personally vetting each site to determine who the appropriate contact person was, the study used automation to send duplicate copies of these emails to multiple emails at each website selected.
A Study in Scare Tactics
Remember those CCPA numbers I mentioned before? CCPA only applies to companies who sell their collected user data for a profit in excess of $25-million. So despite the multiple cookie popups you likely see at every website you visit, the California law only applies to select companies. Google and Facebook need to worry about it. Personal websites, blogs, no-profit social networks, and charity sites don’t.
But the doctoral students at the Princeton Computer Sciences department didn’t do the work of sorting and only contacting websites that appeared to meet the criteria of CCPA (or GDRP for that matter). Instead, they grabbed “a sampling” of sites from an external list and blasted away.
The emails did make it clear that they weren’t submitting a request but only asking what the process was. But every email concluded with this rather threatening line, “I look forward to your reply without undue delay and at most within 45 days of this email, as required by Section 1798.130 of the California Civil Code.”
And let me reiterate again, a significant number of the websites contacted were NOT subject to the above codes – even though many of them have a clear privacy statement posted.
The tone and tenor of the emails didn’t just rankle or put the fear of God into independent websites for which the law didn’t apply. It also sparked questions from the larger companies that were subject to the guidelines of CCPA. Emails were forwarded to lawyers, tech gurus, and webmasters – was the request legitimate? Were the emails some sort of phishing scam? Was there some encoded attack intended to cripple the site? And how should they respond?
One commenter on Twitter pointed out that his clients had spent around $10,000 trying to determine the best response to the emails. Another mentioned their “group of over 100 small businesses ALL felt threatened legally and felt they would be sued.”
Further, Jeff Kossef, one of the leading experts on cybersecurity law, points out that the Princeton emails have now complicated matters in another way. Businesses may ignore legitimate requests, misinterpreting them as more from the study.
Who Watches the Watchmen?
The earliest instance I found of the purposely misleading and disconcerting emails was a report from Joe Wein, a software engineer, and anti-fraud activist. He posted all the way back in April about receiving an email at his Tokyo-based company, asking about GDRP, with similar warnings to respond or else. He traced the email headers back to one of the servers now listed on the Princeton study’s site – registered in March of 2021. Another fun fact – the emails themselves collected data that would be considered a violation under GDPR, if not CCPA.
The study has been suspended, and as of December 31, 2021, the researchers, including Professor Jonathan Mayer, claim to be deleting all of the information gathered. Other than a few updates and a halfhearted apology from Principal Investigator Mayer, Princeton has been assiduously silent on the subject of the study and its ethical implications.
Princeton’s Research Integrity and Assurance team have not seen fit to comment at all. Also silent are Chad Pettengill and Susan Keisling, who run Princeton's Institutional Review Board (IRB). They approved the study and decided it was ethically acceptable to falsely represent the university’s data-gathering mechanism because it didn’t affect people, which it clearly did. Perhaps they’ll come forward soon, as their winter break ends on January 10, 2022.
The Ball's In Your Court, Princeton
So the question remains, what are the ethical responsibilities for the research team? For the IRB? And what of the costs – in time and money – incurred by hundreds of bloggers trying to decode the purpose of the email communications. Who is responsible for paying their legal and tech support fees, not to mention the waste of hundreds of man-hours, when the purpose of the study could have been achieved by being above board?
Do these companies, many of whom, again, the laws do not apply to, have to just suck it up, or will the Princeton University doctoral research team be held financially accountable? And what measures need to be put into place to prevent similar ethical issues in the past?
We’re waiting, Princeton University. What will your answer be?
More Articles by Wealth of Geeks
This article was produced and syndicated by Wealth of Geeks.
Featured Image Credit: Pexels.