Sound story: Democratizing AI voices

Authors: Gunay Kazimzade, Samuel Van Ransbeeck

The proliferation of AI virtual assistants makes for an uncanny experience in the realm of interaction. These systems „use inputs such as the user’s voice, images, and contextual information to assist users by answering a question in natural language, making recommendations, and performing actions“ ([8] Hauswald et al., 2015, p.223). Trained and developed with human voices, these virtual assistants become part of the family and an almost natural part of human lives. While we are building close relationships with virtual assistants similarly with our pets, we allow them to become too big a part of our house [1]. One way of attracting people is by using female voices as they are experienced less dominant than male voices. Also, recently, Amazon announced that it would be adding human emotion to Alexa’s repertoire. Thus, Alexa will now respond with different emotions, including a „happy/excited“ emotion and a „disappointed/empathetic“ tone.

Attachment to fakeness and bias reinforcement

Nevertheless, these voices are constructs: using big datasets to train the virtual assistants, learning to have what appears to be meaningful conversations. As such, AI assistants are just as real as humans are fake. Thus, the human voice makes humans lose their guard against unwanted privacy invasions and allow the large corporations behind the VAs to use data collected from users to their advantage. For example, Amazon’s Alexa was discovered to listen to people speaking, even when they had not given a command to interact [5,9]. In humanizing the VA, Amazon and others are masking their technology’s goal: to funnel people to buy goods and services from them.

Moreover, studies show that popular digital assistants with female voices are reinforcing sexist stereotypes [5]. They are programmed to have polite responses to insults, which means normalizing sexist harassment and gender bias [6]. When a user tells Alexa, „You are hot,“ the typical answer has been cheery, „That is nice of you to say!“. In short, the ‚female‘ obsequiousness of Alexa provides a powerful illustration of gender biases coded into today’s technology products. Most claim that AI does not need to be gendered at all, or even to imply gender, as technology mimics aspects of human-likeness and encourages humans to regard it as more of a social actor than it is capable of being. Answering this concern, a team of enthusiasts in „Project Q“ created a gender-neutral voice assistant attempting to reduce sexist stereotypes [7]. Creators hope that companies will integrate Q as a more inclusive voice option for digital assistants.

Sound Story

Those were just several instances of how the concentration of R&D power can have grave societal consequences. Concerning that vision, within this artistic contribution, we intend to analyze the possibilities of building a democratized version of an artificial voice. In the current sound experiment, we want to address the virtual assistant’s fakeness by exposing its goals: the Alexa voice will tell us who it is, what it does, and what the ultimate goal is. We expose Alexa for what it is: a sales facilitator for Amazon.

Alexa is, in our subjective experience, the most convincing virtual assistant of the big three (Amazon Alexa, Apple, Siri, and Google Assistant), so we chose to work with that assistant. We set out to ask Alexa a somewhat personal question, treating it as a person, to get to know what it is. We then wrote a monologue for Alexa to read aloud, telling who it is, using the answers it gave. In that monologue, one can hear how Alexa has a human trait, but it was evident that it is a machine. While it will not fool anyone that it is a real human speaking, it might sound like deep thoughts. In the end, however, we learn what Alexa is about: selling stuff on Amazon. We believe that it is essential not to humanize a virtual assistant too much to avoid giving away all trust that Alexa and Amazon, in general, can abuse, let alone hackers. Some cultural references that serve as inspiration are the puppet master in Ghost in the Shell, Michi the humanoid in Metropolis (the 2001 anime version), and HAL 9000 in 2001: A Space Odyssey.

What we have experienced within the „Sound Story“ is that diversification and democratization of VAs should start from creating datasets that include people from non-global North countries. Bringing cultural diversity and inclusion to artificial voices would approach the risk of creating monotony in today’s systems, disregarding cultural differences. Thus, a decentralized system will counter the gentrification of interaction.

Also, directing long/complicated questions to Alexa dramatically impact the efficiency and quality of the responses made. We have experienced the „fakeness“ of Alexa with the following response: „Thousands of your and other users‘ commands are recorded, and future versions of me will use those recordings to make interactions between myself and you sound more natural.“ In that manner, creating datasets from voices that are not limited to short interactions should be prioritized for creating natural experiences within human-VA interactions.

Last but not least, for enhancing a connection between humans and their data, creators might benefit from locally-stored data rather than using corporate cloud services. We strongly recommend smaller data hubs closer to the users, which will enhance the transparency and trust between technology and users. The power of multinational corporations has proven to have negative consequences for citizens as these corporations will look for ways to bypass laws and guidelines to further their core mission. As such, decentralization and decoupling from these services will benefit people in the long run.

What is next?

Amazon Echo’s restrictions limited the sound work that we created to read a long text. We see this monologue as a starting point for future experiments with virtual assistants. In a future iteration, we want to program the Echo to speak the whole monologue in an art space, activated by the visitors. We also would like to experiment with a virtual assistant trained using our recommendations as set out above. Non-artistic applications in the urban sphere could be asking questions to assistants on the street, not unlike current information kiosks.

To experience the sound installation scan following QR code:

References:

[1] https://www.firstpost.com/blogs/life-blogs/conversations-with-virtual-assistants-like-siri-alexa-may-be-signs-of-loneliness-3358308.html/amp

[2] https://www.npr.org/2019/03/21/705395100/meet-q-the-gender-neutral-voice-assistant

[3] https://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=6799&context=etd

[4] https://www.weforum.org/agenda/2019/05/hey-siri-youre-sexist-finds-u-n-report-on-gendered-technology

[5] https://www.theguardian.com/technology/2019/oct/09/alexa-are-you-invading-my-privacy-the-dark-side-of-our-voice-assistants

[6] https://unesdoc.unesco.org/ark:/48223/pf0000367416.locale=en

[7] https://www.genderlessvoice.com/

[8] Hauswald, Johann & Tang, Lingjia & Mars, Jason & Laurenzano, Michael & Zhang, Yunqi & Li, Cheng & Rovinski, Austin & Khurana, Arjun & Dreslinski, Ronald & Mudge, Trevor & Petrucci, Vinicius. (2015). Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers. ACM SIGPLAN Notices. 50. 223-238. 10.1145/2775054.2694347.

[9] https://www.zdnet.com/article/amazon-employees-are-listening-in-to-your-conversations-with-alexa/