Demystifying “big data” part 1: Network analysis

This piece is the first of four I will publish this spring in which I describe particular techniques used to make sense of or mine large data sets. This post covers network analysis

The ubiquitous term “big data” gets thrown around with ever-increasing regularity. The term is catchy and promises unprecedented insights into all sorts of phenomenon in the sciences and society. Yet it isn’t the large quantities of *insert unit of measurement here* that make the “big data” matter; it is the analysis.[i] On its own, big data is just a lump of big data. To actually make sense of it depends on how said large sets of data are analyzed and interpreted. One such analysis is network analysis.

There are a couple different ways to understand what a network means. Here, we’ll look at information networks and social networks. Social networks are a way to make sense of how we as human beings interrelate, understanding that individuals “are, as it were, tied to one another by invisible bonds which are knitted together into a criss-cross mesh of connections, much as a fishing net or a length of cloth is made from intertwined fabrics.”[ii] It is an approach to understanding social structure; how we as people are tied up with other groups of people. Technical and mathematical applications of social network analysis have increased in the last forty years, although the roots of this basic perspective are, essentially, as old as sociology itself. Social networks are explained and visualized through vertices (or nodes), which represent people or groups of people. The edges between these vertices represent some form of social interaction between them. These edges can have many different possible definitions. It could represent friendship, professional relationships, exchanges of goods and services, communication patterns or many other types of connections. Information networks are the way items of data are linked together. In this case, the nodes might be some form of information (such as a website) and the edges are the ways they are connected to one another. The distinctions between information networks and social networks can be “fuzzy” and there are many examples that straddle the boundaries.[iii]

Network analysis is the analysis (or examination, inspection, scrutiny, etc.) of these networks. It is considered a “relational” approach because it is mapping out relationships between phenomenon. It examines the way things are connected. The word network itself should give it away; its primary definition in the Merriam-Webster dictionary is a “a fabric or structure of cords or wires that cross at regular intervals and are knotted or secured at the crossings,” a definition which is in itself an adept metaphor. Yet subsequent secondary and tertiary definitions are even more direct: “an interconnected or interrelated chain, group, or system” and “a usually informally interconnected group or association of persons.” Implicit in the term “network” is the understanding that it is focused on relationships between units (be these people or groups or things or concepts).

Why do people use network analysis and to what end? Various types of studies use this method across a range of disciplines. Studies that seek to understand connections between human beings, between human beings and objects and between objects and objects all use this method. The end goal is to conceptualize connections. Network analysis is, after all, relational. It measures and maps the relationships between “units.”

Examples abound of network analysis attempting to explain and describe phenomenon that would otherwise be difficult to conceptualize. In one article, the authors seek to understand how farmers exchange knowledge by charting out a multitude of ways in which information flows.[iv] In another, the authors attempt to map out the connections between mental health services and rural farmers in distress.[v] Network analysis allows them to map out the links between these two groups to see where there are connections or where connections need to be strengthened. Network analysis, as we saw earlier, doesn’t always have to apply to humans, though. It can also be used to understand other pieces of data. For instance, in one case, the authors used network analysis as a tool to reveal connections between soil biodiversity and the way in which land is being used.[vi]

A month ago, I travelled to Colombia to conduct research. My collaborators and I were seeking to understand coffee farmers and how they exchange information around climate change in order to, ultimately, help improve climate change adaptation. We interviewed a group of farmers and one of our questions was about their social networks. With whom do they speak about climate change? For each interviewee, we carefully noted the five people with whom they speak most often and what their relationship is to the farmer (for instance, is it a neighbor, cousin, spouse, politician, etc.) In this case, social network analysis is an appropriate way to gain an understanding of this group’s networks. Mapping out the individual networks each farmer has and looking for overlapping trends gives us keen insights into how information flows within this demographic of people; who talks to whom? where does the real information-sharing actually happen? Later, if we try to aid in supplying climate change adaptation information to this group, we’ll understand better how information actually flows from one group to another – without using assumptions based on our own experiences that are, almost always, wrong when applied to others.

A challenge in conducting network analysis, however, is the data itself. In order to analyze large sets of data, the data needs to be “clean.” It needs to be in a standardized format and consistent. This can require a considerable amount of preprocessing. Massive sets of data don’t generally arrive perfectly packaged and ready for analysis. They must be cleaned up to match the tools you use to analyze them. This can be extremely time-consuming and may lead to periodic dead-ends as you “troubleshoot” your data, so to speak. Another challenge is choosing exactly how you will analyze the various pieces. For instance, what constitutes as a relationship? When is it a relationship and when is it insignificant proximity? The last challenge, however, is the most crucial; how do you interpret the analysis? Once the results of the analysis roll in, how do you make sense of it in the broader context? How we interpret the results of a network analysis is really where the rubber hits the road.

If you are interested in learning more about this method, the first place to start is a simple examination of the world around you. Envision the ways in which things are connected. Who do you speak to on a regular basis? Where does your food come from? Where do you learn about politics and what sources of information shape your opinions? The possibilities are endless. After noticing these networks in your life, begin to ponder how these might look on the ‘macro’ scale. For instance, might there be trends in who people talk to? If we were to look at 10,000 people, would they speak to similar groups of people that you do? Would they get information from similar places? And what might these trends mean? From there, it’s a matter of starting to read articles and research that discuss network analysis and envisioning how you yourself might conduct a study on a network. The first step is curiosity. The resources listed below can take you from there.

To conclude, network analysis is one of the ways in which we make sense of large sets of data. It helps us to understand the relationships between various components within that data and to chart out the “network” of how the units connect. These units can be people, organizations, data, concepts, etc. Methods of analysis, such as this one, are critical for our capacity to harness the power of big data and to make sense of all the noise.

 

 

Resources

[i] Alvarez, R. M. (Ed.). (2016). Computational Social Science. Cambridge University Press.

[ii] Scott, J. (1988). Trend report social network analysis. Sociology, 109-127.

[iii] Newman, M. (2010). Networks: an introduction. Oxford university press.

[iv] Wood, B. A., Blair, H. T., Gray, D. I., Kemp, P. D., Kenyon, P. R., Morris, S. T., & Sewell, A. M. (2014). Agricultural science in the wild: A social network analysis of farmer knowledge exchange. PloS One9(8), e105203.

[v] Fuller, J., Kelly, B., Sartore, G., Fragar, L., Tonna, A., Pollard, G., & Hazell, T. (2007). Use of social network analysis to describe service links for farmers’ mental health. Australian Journal of Rural Health15(2), 99-106.

[vi] Creamer, R. E., Hannula, S. E., Van Leeuwen, J. P., Stone, D., Rutgers, M., Schmelz, R. M., … & Buee, M. (2016). Ecological network analysis reveals the inter-connection between soil biodiversity and ecosystem function as affected by land use across Europe. Applied Soil Ecology97, 112-124.

4 Comments on “Demystifying “big data” part 1: Network analysis

  1. I strongly agree about the need to “clean” the data before studying. I do not doubt the importance and usefulness of network analysis. Doing such analysis, however, IF those very important steps you describe are not taken can lead to misleading assumptions about causality. (Mistaking correlation for the impetance behind changes). I guess I’m saying consult a professional before trying this at home 🙂

    Liked by 1 person

  2. Pingback: Demystifying “big data” part 2: Text mining – Jessica Eise

  3. Pingback: Demystifying “big data” part 3: Agent-based modeling – Jessica Eise

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s