Date of Award
8-2024
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
School of Computing
Committee Chair/Advisor
Long Cheng
Committee Member
Feng Luo
Committee Member
Mert Pese
Committee Member
Zhenkai Zhang
Abstract
Voice Personal Assistants (VPA) such as Amazon Alexa and Google Assistant are quickly and seamlessly integrating into people’s daily lives. Meanwhile, the increased reliance on VPA services raises privacy concerns, such as the leakage of private conversations and sensitive information. Privacy policies play an important role in addressing users’ privacy concerns and developers are required to provide privacy policies to disclose their apps’ data practices. In addition, voice apps targeting users in European countries are required to comply with the GDPR (General Data Protection Regulation). However, little is known about whether these privacy policies are informative and trustworthy on emerging VPA platforms. In this dissertation, we first evaluate the quality of privacy policy for voice apps on VPA platforms. Then, we proceed to develop tools that enable developers to improve the effectiveness and accessibility of privacy policies.
We first conduct large-scale data analytics to systematically measure the effectiveness of privacy policies of voice apps in the US marketplace. We analyze the privacy policies of 64,720 Amazon Alexa skills and 16,002 Google Assistant actions. Our findings reveal a worrisome reality of privacy policies in two mainstream voice app stores. For the 17,952 skills and 9,955 actions that have privacy policies, there are many voice apps with incorrect privacy policy URLs or broken links. We find that 1,755 Alexa skills and 192 Google actions provide a broken privacy policy URL. Amazon Alexa has more than 56% of skills with duplicate privacy policy URLs. 6,047 Google actions do not have a privacy policy although they are required to provide one. Amazon and Google even have official voice apps violating their own requirements regarding the privacy policy. Our work also includes a user study to understand users’ perspectives on the privacy policies of voice apps. We have reported our findings to both Amazon Alexa and Google Assistant teams, and received acknowledgments from both vendors.
Secondly, we analyze the privacy policies of Alexa skills in European marketplaces, focusing on whether their privacy policies and data collection behaviors comply with the GDPR. We collect a large-scale European skill dataset that includes skills in all European marketplaces with privacy policies. To classify whether a sentence in a privacy policy provides GDPR information, we gather a labeled dataset consisting of privacy policy sentences and train a BERT model for classification. Then we analyze the GDPR compliance of privacy policies from European skills. Using a dynamic testing tool based on ChatGPT, we check whether skills’ privacy policies comply with GDPR and are consistent with the actual data collection behaviors. Surprisingly, we find that 67% of privacy policies fail to comply with GDPR and don’t provide necessary GDPR-related information. For 1,187 skills with data collection behaviors, we find that 603 skills (50.8%) don’t provide a complete privacy policy and 1,128 skills (95%) have GDPR non-compliance issues in their privacy policies. Meanwhile, we find that the GDPR has a positive influence on European privacy policies when compared to non-European marketplaces, such as the United States, Mexico and Brazil.
Thirdly, we develop a static analysis tool named SkillScanner to eliminate the gap between platform policies and developers. To understand the causes of privacy and policy violations in skills, we conduct a user study with 34 third-party skill developers, focusing on whether they are aware of the various policy requirements defined by the Amazon Alexa platform. Our user study results show that there is a notable gap between VPA’s policy requirements and skill developers’ practices. To prevent the inflow of new problematic skills, we design and develop SkillScanner, an efficient static code analysis tool to facilitate third-party developers to detect privacy and policy violations early in the skill development lifecycle. To evaluate the performance of SkillScanner, we conducted an empirical study on 2,451 open source skills collected from GitHub. SkillScanner effectively identifies 1,328 different policy violations from 786 skills. 694 of them are about privacy issues and 298 skills violate the content guidelines defined by Amazon Alexa. Our results suggest that 32% of these policy violations are introduced through code duplication (i.e., code copy and paste).
Fourthly, the constrained interfaces of Alexa pose a challenge to effective privacy notices, where currently Alexa users can only access privacy policies of skills over the Web or smartphone apps. This in particular creates a challenge for visually impaired users to make informed privacy decisions. Therefore, we propose the concept of Privacy Notice Over Voice, an accessible and inclusive mechanism to make users aware of the data practices of Alexa skills through the conversational interface. We conduct a user study with smart speaker users and Alexa skill developers to understand their attitudes toward the Privacy Notice Over Voice mechanism and most participants agreed that it could provide better accessibility than traditional privacy policies for Alexa users. Informed by our user study results, we design and develop a tool named SkillPoV to automatically generate a reference implementation of Privacy Notice Over Voice through static code analysis and instrumentation. Through comprehensive evaluation, we demonstrate the effectiveness of SkillPoV in capturing data collection from skill code, generating concise and accurate privacy notice content using ChatGPT, and instrumenting skill code with the new privacy notice mechanism without altering the original functionality.
Recommended Citation
Liao, Song, "Ensuring the Privacy Compliance of Voice Personal Assistant Applications" (2024). All Dissertations. 3722.
https://open.clemson.edu/all_dissertations/3722
Author ORCID Identifier
0000-0002-5264-7573