Synthetic Data: Machines Become Consumers

Test your ideas and products with confidence using AI-synthesized consumers.

Synthetic users providing user research services without actual users

Launched in February, the Synthetic Users service, as its name suggests, provides synthetic virtual consumers as the target audience for product development user research, rather than real humans. It allows for interviews and surveys with virtual individuals, providing feedback on product usage experiences. It offers functionalities like setting specific target customer scenarios, such as a long-term European couple, and achieving significant cost savings, with 100 interview datasets priced at $380. The service has garnered a diverse range of reactions within communities of anthropologists, sociologists, and other social science professionals, including discomfort, a sense of crisis, and even amusement.

This includes anxieties about whether the fundamental aspects of human identity, purpose, pleasure, and values, which are the core of qualitative research (not just ‘synthesized’ creative works, but understanding individuals themselves), can be easily copied and understood. There are also cynical views that the service may not adequately capture the intricate sociopolitical situations and interpersonal relationships that shape the complex issues people face in reality.

In fact, such synthetic data is not a novel concept. It is particularly useful when datasets are difficult to obtain. For example, it has been used in virtual car simulations by automakers to mimic driver behavior, training models in a wide array of situations. It has also been used to replicate the records of over 2.7 million COVID-19 patients, creating a dataset that is statistically identical but devoid of personally identifiable information, enabling researchers worldwide to rapidly share and study it.

However, the current rapid spread of ChatGPT across various services has triggered an explosive surge in demand for synthetic data, which was already on the rise. This has led to the emergence of services that claim human daily life experiences – the very source of insights – can be replaced by synthetic data.

Particularly, the Synthetic Users service starkly highlights a key concern regarding the use of synthetic data:The ‘gap between reality and data’ – highlighting the need for a redefined understanding of ‘data’ and ‘truth.’

We are already living in an era of misinformation, and it is becoming increasingly difficult to understand the origin and biases of all the data we encounter. The upcoming flood of synthetic data will not only blur the boundaries between ‘real’ and ‘artificial’ but also make it more challenging for regular data consumers to critically evaluate the source of original data, the methods used to collect and manipulate it, and consequently, the degree of trust that should be placed in it.

Therefore, to prevent the synthetic data revolution from inadvertently creating an unintended world, it is crucial to start by focusing on ‘small data’ rather than ‘big data.’ Many companies today exhibit a tendency towards what is known as ‘data-driven decision-making,’ where decisions are made based on all available data, even if those datasets are demonstrably biased or incomplete. Thus, synthetic data should stem from the best real-world data we can find. Furthermore, this process should be accompanied by a deep contextual understanding of what is most important within that data and why, providing the highest possible quality initial dataset.

If it is not grounded in a rigorous understanding of the most recent fundamental human phenomena, such as the discrepancy between what people say and do, or the unforeseen effects of our actions on our lives, we risk simulating a social world that threatens reality in ways that are harmful to both businesses and individuals.

Synthetic data will play an increasingly significant role in our daily lives moving forward. It has the potential to reshape everything from the algorithms that shape our experiences of the world to our understanding of data and reality. Entrusting such critical decisions solely to data scientists, no matter how well-intentioned they may be, is too risky, and collaboration with experts in the social sciences and humanities is needed. This isn't simply because synthetic data might be unhelpful or worse than some of our current datasets, but rather due to a fear of the immense potential it holds to achieve too much.

*This article is the original content published on April 11, 2023, in theElectron Times named column.

References

Research without the headache of actual people

The AI State of the Union

User research Without the synthesising

Synthetic Research & Silicon Sampling // BrXnd Dispatch vol. 010
We Need to Talk About Synthetic Data⁠⁠⁠⁠⁠⁠⁠

Comments0