Design and Implementation of a Data Collection System for Social Network Analysis


Benham-Hutchins, M., Brewer, B.B, Carley, K., Kowalchuk, M, & Effken, J.A. (Summer 2017). Design and implementation of a data collection system for social network analysis. Online Journal of Nursing Informatics (OJNI), 21(2), Available at

Research reported in this paper was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R01GM105480 – Measuring Network Stability and Fit (Benham-Hutchins, M., Brewer, B., & Carley, K.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


Patient safety and quality initiatives emphasize the importance of communication and collaboration among health care providers across the continuum of care. This has led to increased interest in methods such as social network analysis (SNA) that support investigations of healthcare provider communication networks. However, designing and implementing a SNA data collection system for the healthcare environment can be challenging for both new and experienced researchers. In this paper, we report on the design of a novel web-based SNA data collection system and its implementation in the inpatient hospital environment. 

Design and Implementation of a Data Collection System for Social Network Analysis

 Communication among healthcare providers has been identified as a crucial component in the quest to reduce medical error (Alvarez & Coiera, 2006; Colvin, Eisen, & Gong, 2016; Horak, Pauig, Keidan, & Kerns, 2004; Lancaster, Kolakowsky-Hayner, Kovacich, & Greer-Williams, 2015).  As a result, a growing number of patient safety and quality initiatives now focus on the quality of inter- and intra-disciplinary collaboration among health care providers across the continuum of care (Lancaster et al., 2015; Singh & Sittig, 2015). Solving communication problems demands that researchers look more closely at how providers access information and communicate with other providers while providing patient care (Effken et al., 2011). This has led to increased interest in novel research methods, such as network analysis (NA), that provide a framework to study complex communication networks (Poghosyan, Lucero, Knutson, Friedberg, & Poghosyan, 2016; Provan, Veazie, Teufel-Shone, & Huddleston, 2004). NA focuses on the relationship between groups or organizations, while social network analysis (SNA) concentrates on understanding the relationship among individuals (Freeman, 2004; Knoke & Yang, 2008; Scott, 2005; Wasserman & Faust, 1994).

The relationships among individuals can be analyzed based on specific characteristics (e.g., frequency or quality of communication) and contextual communication features.  For example, in today’s complex health care environment, relationships between providers may or may not be reciprocated, and individual providers can be connected to more than one other person or group.  SNA methods have provided insight into issues such as the impact of information technology on provider communication (Benham-Hutchins & Effken, 2010); provider communication in emergency departments (Houghton et al., 2006); provider information exchange networks in the intensive care unit (Lurie, Fogg, & Dozier, 2009); and medication advice-seeking in a renal unit (Creswick & Westbrook, 2007).

Typically, SNA data have been collected using pencil and paper questionnaires.  However, collecting data manually has several limitations, as we discovered in our previous studies (Benham-Hutchins & Effken, 2010; Effken et al., 2011). SNA requires a high response rate to ensure the resulting networks do not have “holes” that limit their accuracy. Although staffing on nursing units is scheduled weeks in advance, last minute changes are not uncommon due to fluctuations in patient census or acuity, or unexpected staff absences. During an earlier study (Effken et al., 2011), researchers were forced to do extensive, last-minute editing and copying of paper questionnaires to ensure that staff were presented with an accurate list of staff working on their shift. Even so, other staff members were sometimes “written in” by respondents, which had to be verified against the final assignment sheet to ensure data accuracy. In addition, researchers could not be certain that all staff completing the survey included the “written-in” staff member when recalling their own interactions. Data from the paper questionnaires were later transcribed to a spreadsheet, which introduced multiple opportunities for errors. When errors were suspected, considerable time was required to check that day’s final assignment sheets and the actual surveys to assure accuracy. 

To make the data collection process and preparation of data for analysis as efficient and accurate as possible for the Measuring Network Stability and Fit (NetFIT) study, we designed and implemented a data collection system to automate the entire process from data collection to analysis.  Here we report on the results of that effort. We begin by providing an overview of the NetFIT project, for which the system was designed, and then describe the features of the data collection system and its initial application. Finally, we evaluate our experience using the system and its potential for future use. 

Overview of the NetFIT Project

 The novel data collection system was developed as part of the multi-year NetFIT research study. The aims of the study are to compare the information sharing and decision-making networks in hospital nursing units and evaluate the stability of these networks over time. In contrast to the composition of groups commonly analyzed with SNA, the personnel working on inpatient nursing units are not consistent day-to-day. This is primarily due to the 24-hour unit staffing requirements that result in most staff working three 12-hour shifts per week. Additional factors that influence the number of staff working a specific shift include patient acuity, unit patient census and unexpected staff illnesses or absences. Possibly mediating the changes in personnel is the similarity of staff roles (e.g., registered nurse, nursing assistant) from day to day.

Institutional review board approval was obtained from the participating hospitals and universities. Three hospitals participated, providing a total of 25 nursing units and 100 data collection points. Over 1,500 licensed and unlicensed nursing staff members working on the participating units completed SNA surveys designed to identify communication patterns related to patient care. Nursing staff network data were collected from staff working on the designated units four times during the seven months of data collection. Baseline data were collected on a specific weekday (over a 24-hour period to capture all shifts) and subsequently on the same weekday one, three, and six months later to evaluate network stability over time and comparability within and across units. Because the unit of analysis for this study was the nursing unit, data from individual staff were aggregated to the nursing unit level to create 24-hour networks for analysis.

Survey questions were designed to compare and contrast information sharing and decision-making communication networks. To create the information sharing network, staff members were asked: “How frequently did you discuss patient care with staff members working on your unit during the last shift you worked?” and “How frequently did you give information to staff on the next shift or get information from the previous shift.” To create the decision-making network, staff members were asked how frequently they sought advice from other staff members, and how often other staff sought them out for advice. Staff members were also asked to rate the trustworthiness of information they received from other providers. This confidence measure served as a proxy for valuing what the other staff member knows and the frequency measure served as a proxy for accessibility. We anticipated that an automated SNA data collection system could facilitate accurate data collection.

The SNA Data Collection System

The SNA data collection system consists of two components, a website and an Android tablet application. Figure 1 provides an overview of system features. The development of each component is described below.


The website features a SQLite ( database for storing survey data, as well as administrative interfaces for adding staff members, downloading survey data in CSV, XML, or DyNetML format, adding and editing data collectors and website users, editing answer codes, downloading new versions of the Android application, and adding and editing hospital role abbreviations (Figure 2). The website runs on a University-hosted Apache web server and has two components, a front-end written in HTML, CSS, and JavaScript and a back-end accessed via Ajax calls written in Python.

Invisible to the end user is an Ajax application programming interface (API) that allows the Android application to synchronize data with the website. With Ajax API calls, the application can download new staff member data from the website or upload staff survey data to the SQLite database. The use of an Ajax-based API standardizes the functionality of the back-end of the website, thus allowing multiple types of usage. Using Hypertext Transfer Protocol (http) to standardize communication between the website and the Android application eliminates the need for a complicated communication protocol. Using the Ajax-based API and http decreases the complexity of the system and is consistent with common website practices. The website can be deployed on any standard Apache installation with only a single directory name change.

Data collector accounts were created by the data administrators to assign specific website and data collection capabilities (for instance, only lead data collectors were allowed to modify unit staff lists). Using the website’s staff unit editor, new staff could be added through a direct entry form or a CSV upload (Figure 3). Master staffing lists for each unit were uploaded to the web server before data collection began. Staff schedules for the actual data collection days were obtained from hospital contacts on the day of data collection. To achieve an accurate staff roster, a lead data collector selected the names assigned to work during the targeted 24-hour period from the unit master list, adding new and transferring (i.e., floated) staff from other units and deleting staff not working as originally scheduled to achieve an accurate staff roster (Figure 4). Because multiple Android devices were used simultaneously to collect staff data, once a unit’s staffing for the shift was judged to be accurate, the lead data collector synchronized each Android device with the web server. The time between finalization of staffing for the oncoming shift and the beginning of the shift was often very brief, requiring synchronization of multiple devices to be done in fifteen or fewer minutes.

Although survey participants must see actual staff names to complete the communication questionnaire, anonymous IDs are automatically generated prior to data analysis to maximize participant confidentiality. The same anonymous IDs are retained throughout data collection, which was critical in our situation because a particular staff member was likely to complete multiple questionnaires during the longitudinal study. This system allows researchers to tailor the questions presented to participants based on specific characteristics (such as role, shift, or licensure). In addition to network-specific questions, other typical survey question formats are supported (yes/no, numeric responses, Likert scale, multiple choice, and so on). A codebook was developed by researchers to map the full text of the response choices (visible to participants) to the numerical values used for analysis. Survey results are synchronized on a central webserver so that data can be downloaded as a single CSV, XML or as the DyNetML file required by ORA, the network analysis software program used for the NetFIT study (Carley, Diesner, Reminga, & Tsvetovat, 2007; Carley, Pfeffer, Reminga, Storrick, & Columbus, 2013). Researchers can download all results or restrict the download to specific data collection periods, work shifts, nursing units, or hospitals.

Preparing and Using the Android Application

Once the SNA application has been downloaded onto an Android tablet (Figure 5), users must login using a designated bootstrap account to synchronize tablet functions with the web server. Selecting the "Update from Server" button causes the application to issue a "pull request" from the server to download all survey customizations, as well as account information for data collectors. Once these data are on the tablet, data collectors can login and download data specific to the hospital, unit, and staff for the relevant data collection time period. A dialog box shows when the download is complete and indicates that the device is synchronized with current web server updates, including the current unit staffing schedule. Once final staffing changes are made and all devices to be used on the unit are updated, Internet connectivity is not required.

Each data collector is associated with a specific study site and can only access nursing unit information for that hospital. To begin data collection, the data collector selects the "Staff Survey" button and the desired nursing unit. The device is now ready for survey administration. To encourage staff participation, it is important that a data collection tablet is available when the potential participant is ready to take the survey. We found that having one device for every two scheduled staff members assured quick access. To help data collectors keep track of multiple tablets during the busy data collection time period, we put bright pink cases on all the devices (Figure 5).

After staff had completed their shifts, data collectors introduced themselves to each staff member, explained the purpose of – and requirements for – the study, answered any questions and invited participation. Staff members who were willing to participate reviewed a consent form on the tablet and checked a box to register their agreement. To begin the actual survey, participants first selected their own names from a list of unit staff working during that 24-hour period. Based on the data associated with their name, each participant was presented with a list comprised only of those staff members with whom they could have interacted with during their shift. To prevent each participant from accidently submitting a survey, taking multiple surveys, or interacting with the main menu, data collectors entered a security code before submitting the survey and preparing the tablet for the next available participant.

When all questionnaires for a nursing shift had been completed, the data collector verified that the tablet was connected to the Internet and then uploaded the data to the web server. The transmission has a fail-safe mechanism such that, if transmission is interrupted, survey data will not be lost. All completed surveys remain on the tablet device and are sent to the server every time the "send to server" button is selected. If a copy of a survey already exists on the server, the new copy is ignored. This design allows for redundant data across the server and devices, providing a backup in the event that a send request is interrupted. 

Evaluation:  Strengths, Limitations and Future Applications

After a preliminary pilot test, the new system was used to collect data for the NetFIT study. The use of tablet devices to collect data requires that data collectors be present to distribute and retrieve the devices. The SNA data collection system cannot eliminate last-minute nursing unit staffing changes. Instead of making modifications to a master paper questionnaire and then making paper copies for distribution in the research lab, data collectors were able to make the changes on the system website when they arrived in the hospital for data collection and then download the changes onto each device that would be used.  

Although this problem did not surface in the pilot test, early participants indicated that the tablet touch screen and buttons were too small, making it difficult to make selections accurately. In response, we provided a stylus option, which improved the user experience. Another problem discovered early in the data collection process was that if a tablet lost its charge, it would lose the current date/time – resulting in the survey data collection date/time being incorrect. To solve this problem, we added an additional step, confirming the date on the device, to the device set-up procedure. At the end of the data collection period, data collectors were surveyed about their front line data collection experiences. Data collectors reported some problems obtaining and updating last-minute staffing lists and difficulty locating specific staff members. Problems locating staff were related to being in a patient room, going home early, working overtime, and working other than a standard 12-hour shift.

Using hospital public wireless (Wi-Fi) Internet connections that required “pop up” sign-in screens through a browser created other problems. At times it was difficult to tell if the tablet was fully connected to the Internet. This resulted in the need to restart the device. Internet disconnects requiring the data collector to move to another area of the hospital to connect to the Internet were reported frequently. Because every time staffing changes were made all the devices had to be connected to the server and updated with the current staffing list, inconsistent Wi-Fi access provided a challenge since this process had to be completed and finalized before data collection commenced. Fortunately, the application was designed so that the devices did not have to be connected to the Internet after the initial set up. During data collection, data were stored on the individual devices. After data collection was completed, the Wi-Fi connection had to be verified, often re-connection was required, and the data were uploaded to the secure server.

In general, data collection was accurate and efficient.  One unexpected problem with participant inclusion was due to “non-standard” or “partial” shifts that did not result in the individual’s shift ending at 7 p.m. or 7 a.m., when data collectors were present on the units. This required a great deal of post hoc work to retrofit these outliers into a shift (or more than one shift, if they were overlapping two) for analysis and to make changes to the algorithm used for assigning shifts. 

Currently, we are beginning to realize the greatest benefits of the system. The collected data are available to ORA so that networks can be visualized and metrics generated easily. The multiple complex analyses needed to compare the two kinds of networks and how the network metrics of each are associated with patient safety and quality outcomes will be much less labor intensive and less subject to data entry errors. Moreover, we anticipate that the time required to identify appropriate metrics for measuring network stability will be cut dramatically, as will the time needed to determine the stability of nursing networks.


We described the design and initial application of a novel SNA data collection system. The system supported the collection of SNA survey data from more than 1,500 licensed and unlicensed nursing staff members. These data were collected from 25 units, in three hospitals, and represent 100 hospital unit information-sharing and decision-making networks. Through a combination of web-based tools and an Android application, we were able to support last-minute staffing changes, minimize human data input error, and allow the survey results to be downloaded in formats compatible with statistical and network analysis software programs. Currently, analysis of the data is underway. Despite the occasional problems described, we anticipate that electronic data collection using systems such as the one we described here is a viable approach to securely, efficiently and accurately collect data for SNA in health care and other venues while maintaining the confidentiality of participants. 


Alvarez, G., & Coiera, E. (2006). Interdisciplinary communication: An uncharted source of medical error? Journal of Critical Care, 21, 236-242.

Benham-Hutchins, M., & Effken, J. (2010). Multi-professional patterns and methods of communication during patient handoffs. International Journal of Medical Informatics, 79(4), 252-267. doi:10.1016/j.ijmedinf.2009.12/005

Carley, K. M., Diesner, J., Reminga, J., & Tsvetovat, M. (2007). Toward an interoperable dynamic network analysis toolkit. Decision Support Systems, 43(4), 1324-1347.

Carley, K. M., Pfeffer, J., Reminga, J., Storrick, J., & Columbus, D. (2013). ORA User's Guide 2013. Carnegie Mellon University, School of Computer Science, Institute for Software Research Retrieved from

Colvin, M. O., Eisen, L. A., & Gong, M. N. (2016). Improving the patient handoff process in the intensive care unit: keys to reducing errors and improving outcomes. Seminars in Respiratory and Critical Care Medicine, 37(1), 96-106. doi:10.1055/s-0035-1570351

Creswick, N., & Westbrook, J. I. (2007). The medication advice-seeking network of staff in an Australian hospital renal ward. Studies in health technology and informatics, 130, 217-231.

Effken, J. A., Carley, K. M., Gephart, S., Verran, J. A., Bianchi, D., Reminga, J., & Brewer, B. B. (2011). Using ORA to explore the relationship of nursing unit communication to patient safety and quality outcomes. International Journal of Medical Informatics, 80(7), 507-17. doi:10.1016/j.ijmedinf.2011.03.015

Freeman, L. C. (2004). The Development of Social Network Analysis: A Study in the Sociology of Science. Vancouver, BC Canada: Empirical Press.

Horak, B. J., Pauig, J., Keidan, B., & Kerns, J. (2004). Patient safety: a case study in team building and interdisciplinary collaboration. Journal for Healthcare Quality, 26(2), 6-12; quiz 12-13.

Houghton, R. J., Baber, C., McMaster, R., Stanton, N. A., Salmon, P., Stewart, R., . . . Walker, G. (2006). Command and control in emergency services operations: a social network analysis. Ergonomics, 49(12-13), 1204-1225.

Knoke, D., & Yang, S. (2008). Social Network Analysis (2nd ed.). Los Angeles: Sage Publications.

Lancaster, G., Kolakowsky-Hayner, S., Kovacich, J., & Greer-Williams, N. (2015). Interdisciplinary communication and collaboration among physicians, nurses, and unlicensed assistive personnel. Journal of Nursing Scholarship, 47(3), 275-284. doi:10.1111/jnu.12130

Lurie, S. J., Fogg, T. T., & Dozier, A. M. (2009). Social network analysis as a method of assessing institutional culture: three case studies. Academic Medicine, 84(8), 1029-1035. doi:10.1097/ACM.0b013e3181ad16d3

Poghosyan, L., Lucero, R. J., Knutson, A. R., Friedberg, M., & Poghosyan, H. (2016). Social networks in health care teams: evidence from the United States. Journal of Health Organization and Management, 30(7), 1119-1139. doi:10.1108/jhom-12-2015-0201

Provan, K. G., Veazie, M. A., Teufel-Shone, N. I., & Huddleston, C. (2004). Network analysis as a tool for assessing and building community capacity for provision of chronic disease services. Health Promotion Practice, 5(2), 174-181. doi:10.1177/1524839903259303

Scott, J. (2005). Social Network Analysis (2nd ed.). London: Sage.

Singh, H., & Sittig, D. F. (2015). Measuring and improving patient safety through health information technology: the health IT safety framework. BMJ Quality & Safety, 25(4).doi:10.1136/bmjqs-2015-004486,

Wasserman, S., & Faust, K. (1994). Social Network Analysis: Methods and Applications. New York: Cambridge University Press.

Author Bios

Marge Benham-Hutchins. PhD, RN
University of Texas at Austin, School of Nursing, Assistant Professor

University of Texas at Arlington        BSN    12/99   Nursing
University of Texas at Arlington        MSN   05/02   Nursing Administration
University of Arizona, Tucson           PhD     05/08   Health Informatics/Nursing Systems

Barbara Brewer, MBA, PhD, RN, FAAN
The University of Arizona, College of Nursing, Associate Professor

The University of Arizona, Tucson, AZ        PhD     2002    Nursing Systems, Informatics
Columbia University, New York, NY            MBA   1992   
Yale University, New Haven, CT       MSN   1988    Clinical Nurse Specialist
Wesleyan University, Middletown, CT          MALS 1986    Literature
University of Rhode Island, Kingston, RI      BS       1972    Nursing

Kathleen Carley, PhD
Carnegie Mellon University, Institute for Software Research, Professor

Massachusetts Institute of Technology           SB       1978    Political Science
Massachusetts Institute of Technology           SB       1978    Economics
Harvard University     PhD     1984    Sociology

Michael Kowalchuck
Carnegie Mellon, Institute for Software Research, Senior Research Programmer

Judith A. Effken, PhD, RN, FAAN
The University of Arizona - Professor Emerita; Research Professor

Evangelical Deaconess School of Nursing, Milwaukee, WI Diploma          1962    Nursing
University of Hartford, Hartford, CT BA      1973    Psychology
University of Connecticut, Storrs, CT            MS      1983    Nursing Management
University of Connecticut, Storrs, CT            PhD     1993    Psychology


Nursing informatics; research funding; bibliometric analysis;bibliometric mapping; thematic analysis