In the recognition of Social Science Week 2020, I would like to take the opportunity and praise social science and humanity studies in taking a unique privilege stance on bringing methodological and empirical expertise to study the social world. As a computer scientist, I completely understand the rhetoric about Big Data and Artificial Intelligence, but I am hoping the hype does not underestimate the unrivalled depth that qualitative research can offer.
Let the Symphony begin!
I fell in love with Social Science after reading the article published by Halford and Savage. The authors advocated Symphonic Social Science — as in the use of a symphonic metaphor in the call to reorient big data analysis —to engage more fully and effectively with innovative data assemblage techniques. Broadly speaking, the assemblage term means bringing multiple datasets together to have a full picture of the problem at the hand (analogous to Data Fusion).
One of my favorite parts in the article is how the authors categorized the dissimilarity between Symphonic Social Science and Big Data Analytics into 5 dimensions of theoretical awareness, data choice, temporality, the role of correlation, and practice of visualization.
“… the similarities places symphonic social science on the same territory as big data analytics whilst the differences hold out the possibility of doing things differently.”
In many data science projects, data collection and analysis are ahead of theory. In my own experience working on computational social science projects, I have come across a lot of interplay between data and theory during different stages of the project lifecycle. While ago, in a conversation that I had with a practitioner from a humanitarian organization about the difficulties of partnership with industrial bodies, it was mentioned that in a project the aim of which was quantifying individuals’ Quality of Life, the data scientists used data on Social Cohesion instead. This is a real-world example of how a lack of theory can lead to failure right away from the data collection step.
The interplay between temporality and visualization is also complex. While real-time analysis, stream processing, and ad-hoc querying are prevalent in computer science, historical data and long-term effects are in the center of attention of many social science projects. The visualization challenge arises when historical analysis needs to be presented in a meaningful way. For a tangible example, readers are referred to Figures 5 and 6 project SOPHIA that represents the output of topic modeling (in a form of ribbon graph) conducted on a five-year longitudinal sample of tweets and news .
The marriage between data science and social science — whether it follows the symphony or not — has already been established and is evolving not only in academia and research community but also within governments and foundations bodies. Having said that, there is still a long way to go to adjust data science tools/techniques suitable for social studies. To give a realistic prospect of the current state, let’s take Maslow’s hierarchy of needs as a paradigm to illustrate the capacity of data analytics to address social needs.
The Big Data Maslow’s Pyramid
Briefly speaking, Maslow’s hierarchy posits that human needs encompass five levels that take shape in the form of a pyramid, starting from more basic needs, such as those relating to one’s physiology and safety, to moderate needs, such as those relating to a sense of belongingness, and lastly more complex needs, such as esteem and reaching self-actualization. The premise is that unless an individual’s basic needs have been met, higher needs in the pyramid are of no relevance. Let’s see how successful data science is in the satisfaction of these hierarchical needs.
At the bottom of the pyramid is the physiological needs which include food, water, shelter, the other most basic needs for human survival. The power of big data and IoT applications for the fulfillment of the needs of this layer is undeniable. For instance, sensory data can be collected for water sanitization purposes such as detecting certain pollution event, or earth observation data in detecting urban sprawls of cities and the changing spatial patterns in urban land uses .The majority of these data are not necessarily of the type of social data. However, the convergence of such data and human intelligence can ease the fulfillment of these needs — for instance, community empowerment in social media crisis response .
Safety needs, the second bottom of the pyramid, represents the need to be free of the fear of physical danger, the need to be free of deprivation of basic physiological needs, and the need for self-preservation. Recently we have started to observe changes in the healthcare provisioning model from clinic-centric to patient-centric, particularly during COVID time such as CovidCare. Apart from enhancing physical health, the AI-powered analytics can be useful in mental health treatment and crisis counseling . The machine learning solution developed by the Crisis Text Line is one on the ground example of such efforts that help predict and prevent instances of self-harm of high-risk texters in a timely manner (under 5 minutes).
Next, the need for belongingness and love is a moderately complex human need compared to the first two. These needs are often fulfilled by forming social relationships with others in a hope that those connections bring happiness and love into individual’s lives. Various breakthroughs in Social Network Analysis, Network Theory and Agent-based modeling have revolutionized researchers’ understanding of social connections and relationship formation . While ethnographic interviews and surveys are commonplace to collect relationship data, quantitative researchers and data scientists are hunting for new sources of data. Some of this data will include information that would otherwise never be admitted to anybody such as search history.
The two remaining need in the hierarchy, Esteem and Self-actualisation remain very complex and not yet been adequately captured and analysed by the technology community. Having said that, there is evidence to suggest that esteem and personal recognition need has benefited from advancements in data science, where work around searching for, finding, and identifying information has been made easier through Natural Language Processing techniques such as Sentiment Analysis, Stance Detection, Citation Analysis, and so on. But these techniques are still far away from procuring self-actualisation — to be all that one can be.
As it is evident from the above discussion, we have put most of our innovative digital capacity into designing solutions for improving the quantity of lives in society, whereas there is not many advances to boost the quality of those lives. Having said that, quantity and quality of life are not mutually exclusive. The point is the pace of data science towards improving life quality is much slower. We are still struggling with questions such as how we can create meaningful social connections? How do we help everyone live a fulfilling life? How do we make people happier?
At this point in the pitch, I am guessing some readers — social commentators perhaps — are already forecasting doom-and-gloom scenarios in which technology and data science can bring more harm instead of good into our lives. While I am not suggesting using big data for fulfilling complex human needs, in my humble opinion, ethics and privacy are orthogonal issues to my discussion. In particular, there exist many privacy-enhancing techniques with a strong privacy guarantee (such as k-anonymity, l-diversity, t-similarity, and other blockchain-based approaches) that can be helpful in reconciling the use of data for research or security tasks and privacy. However, many of these techniques do not come as off-the-shelf open-source solutions and do not get the recognition they deserve.
I would like to conclude my pitch by making a few points:
- Current technological advances that drive people upwards in the pyramid of social needs have been less tried or passed pilot phases, much to the frustration of understanding/measuring complex societal phenomena.
- Nuances and insights that can be drawn at individual qualitative studies should never be underestimated. As a good example, I would like to reference, the South Australia 100, among many new research works that have been taken in response to #COVID19 spread. Amazing work.
- Presenting historical findings in a single visualization could be great, but can be too much of a good thing, in particular when there are different stories packed into your longitudinal data. Until we have the right vis tool in place, visualise your findings into multiple sub-figures in a way that each of them tell one bold story or make a well-thought decision about the optimum partitioning granularity prior to conducting your analysis.
Again Happy @SocSciWeek!
The original idea of the Big Data Maslow’s Pyramid has been first presented by Prof. Timos Sellis at the Swinburne Society4.0 seminar back in 2018, and would like to thank him for sharing his insights.
 Halford, S. and Savage, M., 2017, “Speaking sociologically with big data: Symphonic social science and the future for big data research”, Sociology 51, no. 6: 1132–1148.
 Farmer, J., Soltani Panah, A., and McCokser A., 2020, “Community responses to family violence: Charting policy outcomes using novel data sources, text mining & topic modelling — Project SOPHIA”, https://apo.org.au/node/278041.
 Wu, J., Guo, S., Li, J. and Zeng, D., 2016, “Big data meet green challenges: Big data toward green applications”. IEEE Systems Journal, no. 10, vol. 3, pp.888–900.
 Jahng, M.R. and Hong, S., 2017. “How should you tweet?: The effect of crisis response voices, strategy, and prior brand attitude in social media crisis communication”. Corporate Reputation Review, 20(2), pp.147–157.
 Divekar, R.R. and Rastogi, N., 2017. Managing crises, one text at a time. XRDS: Crossroads, The ACM Magazine for Students, 23(3), pp.36–37.
 Kurka, D.B., Godoy, A. and Von Zuben, F.J., 2015. “Online social network analysis: A survey of research applications in computer science”. arXiv preprint arXiv:1504.05655.