Main

October 21, 2009

Responsive Buildings and Social Networks

In a blog entry about a year ago I talked about using sensor data to change architecture into a changeable force for altering social networks. I recently analyzed the office layout data from 4 of our sociometric badge studies, and found that the probability of interaction between two people degraded greatly as the distance between their desks (as well as the physical barriers such as walls) increased.

I got fairly intrigued by the idea of dynamically modifying office layout to help deal with this situation, and recently for the Media Lab fall sponsor week my UROP Alex Speltz and I built a prototype of an augmented cubicle wall that changes based on the social context. Here's a picture:

The wall is a little over 2 meters tall and made of two plexiglass sheets with a wood frame. Inside the plexiglass sheets are window blinds that can be raised and lowered by an actuator mounted on the bottom of the wall.

The idea is that by detecting the stage of work for a worker (exploring vs. exploiting) we can determine if they need more face-to-face interaction or less for a certain period of time (probably at least a week). If someone needs to talk more with people around them, at night the actuator will pull down the blinds to create a window, making serendipitous interaction easier. If, on the other hand, the person is more in an exploit mode and needs to sit at their desk and work, the blinds are pulled up at night, and when they come in the next day it will give them more privacy. People can also specify their interaction preferences through a web-based system that my other UROPs Tim Kaler, Ernie Park, and Margaret Ding made, which allows us to further tailor the system output.

It's important to imagine an entire office outfitted with these, so if you knew that two groups were starting to work on a project together the barriers between those groups would disappear, while if someone was monopolizing the time of another group the barriers between them would increase. In effect, the augmented cubicle would become a social signal for availability. While people can control the blinds manually, in practice people stick with the defaults (I pulled down the blinds in my office two months ago and haven't gotten around to pulling them back up).

We're planning on deploying this in a real organization in the next few months as the design gets finalized to see if we can have a positive effect on the work environment, as well as productivity and job satisfaction. We're also currently making a demonstration video of the wall, which I'll post here as soon as it's ready.

September 21, 2009

Communities in Networks

Uncovering the "community" structure of social networks has a long history, but communities play a pivotal role in almost all networks across disciplines. Intuitively, one can think of a network community as consisting of a group of nodes that are relatively densely connected to each other but sparsely connected to other dense groups of nodes. Communities are important because they are thought to have a strong bearing on functional units in many networks. So, for example, communities in social networks can correspond to different social groups, such as family, whereas web pages dealing with a given subject tend to form topical communities.

The concept is simple enough, but it turns out that coming up with precise mathematical definitions and algorithms for community detection is one of the most challenging problems in network science. Recently, a lot of the research in this area has been done using ideas from statistical physics, which has an arsenal of tools and concepts to tackle the problem. Unfortunately (but understandably) relatively few non-physicists like to read statistical physics papers.

Together with my colleagues Mason Porter (Oxford University) and Peter Mucha (University of North Carolina at Chapel Hill), we thought it would be useful to let others take a peek at some of this work. In an effort to put in context some of the hundreds of papers, we recently compiled an introductory review on some of our favorite approaches to community detection. While there are excellent existing reviews, our "Communities in Networks", published by Notices of the American Mathematical Society (AMS), tries to make sense of this smorgasbord of methods and, hopefully, lets a broader audience get a flavor of this exciting field.

I hope to be making a couple of postings on community structure and community detection later on. In the meantime, you can see for yourself if we have succeeded by checking out the freely accessible article on the AMS website, or by going to arXiv or SSRN.

community.gif
The largest connected component of a coauthorship network connecting physicists who have published together on networks. Each node is colored according to community membership.

September 10, 2009

Learning from Chemical Traces

A few weeks ago a really fantastic study got a lot of press about how researchers found that 90% of US bills have trace amounts of cocaine on them. This got me thinking about some of the other interesting currency studies that have been done.

Where is George? comes to mind as another brilliantly designed study. Researchers stamped thousands of bills with a URL where people who received the bill could go and enter its current location. The researchers got a huge number of responses, allowing them to use bill mobility patterns to approximate human mobility patterns. Of course with the recent availability of high quality cell phone and sensor data this may not be the best data collection method in the future, but at the very least it's a great study design.

But the cocaine study got me thinking: what else can we learn about people's habits from chemical traces on bills? Of course the reason cocaine can be detected is that it binds to the green dye in money, but a large number of other compounds would likely also bind to this dye. Can you learn about fast food consumption from bill traces? Could you gauge the "stress level" of the country by measuring the amount of certain sweat compounds?

You can potentially get this data from other sources, but often it's hard to get a large enough cross section of society to get a broad enough picture. By combining analysis of these physical traces with digital traces, we can get closer to having a complete view of how our society is behaving.

May 14, 2009

One rank to rule them all - Notes on benchmarking eGovernment

Almost a decade ago, the EU Commission started to measure the progress of eGovernment in its Member States (now 27) and some other countries. Whenever the new version is published, results usually receives a lot of media attention. Headlines may state "Country X is a leader in eGovernment, it ranked 2nd behind country Y".

Whenever I attend EU conferences that are in some way connected to eGoverment, representatives of Member States like to point out their country's position in the EU eGovernment benchmarking study to underline how far they have come. In fact, whenever politicans or high-level administrators from EU Member States talk about eGovernment, they refer to one particular result the EU eGovernment benchmark--online sophistication Therefore, the benchmark has positively influenced eGovernment policies in EU Member States (MS) and beyond.

Yet, what can the benchmark tell us?

The EU eGovernment benchmark measure 20 public services and the national portal, using four indicators : online sophistication (5-stages), online availability, user centricity and national portals. So in its essence the E-Government benchmark only tells us what is happening on the supply-side of eGovernment in 20 areas. eGovernment, of course, is much more complex than that.

Other eGovernment benchmarks like the one conducted by the United Nations face similar difficulties. How do you measure a complex issue with a limited amount of budget? How do include new trends such as Government 2.0 (Paper / Blog) in a benchmark?

Furthermore, how can you allign benchmarks? They tend differ in scope(EU=20 public services; UN= mix of info society indexes (e.g. from ITU) and eParticipation), underlying cause-effect framework, measures, analysis or transparency of the methodology . Results differ widely and politicians tend to focus on one result. For example, Iceland ranks EU=19, Brown/Brookings=68, UN=21. Why not agree on one cross-financed benchmark and indicators?

The EU and the United Nations are currently revising the eGovernment benchmark methodology. This happens mostly in closed circles of government representatives and experts from academia. While I don't want to criticize this process in general, revising an eGovernment benchmark could be improved by consulting the public (anyone...academics, citizens..) on e.g. how to come up with a framework of measures to capture citizen-centricity, what should be measures or how that should be done. Furthermore, why not make the complete set of data-set available for researchers after 2 years?

Let me close this entry with two recommendations for those involved in redesigning eGovernment benchmarks:

Selecting Measures:
(You should consider this for each measure and any combination of measures)
- Understandibility
- Impact
- Timeliness
- Validity/Accuracy
- Uniqueness
- Comprehensiveness
- Weight
- Collection costs
- Controllability

Scope:
(You can focus on)
- Input
- Process
- Output
- Outcome
- Efficiency (outputs relative to inputs)
- Effectiveness (outcome relative to output and goals)
- Demand
- Usage / Adoption

March 14, 2009

Sunbelt Update

Many of us have been at the Sunbelt conference for the past two days, and there have been some extremely interesting talks on new research.

Some particularly interesting work is coming out of the United States Military Academy Network Science Center. One of their current projects involves giving out BlackBerries to 35 cadets and continuously logging their location with GPS, e-mail, phone calls, and other data. They are also collecting all other e-mail data and giving these cadets weekly surveys on their networks, so this is shaping up to be a very interesting data set.

Jamie Olsen from the Carnegie Mellon CASOS group also presented some fascinating work looking at shipping traffic patterns using GPS-like sensors and identifying areas of importance using clustering techniques and network measures.

I'm really encouraged to see this uptick in sensor-related research in the social network community, and the great receptions that these and similar presentations received signals the increasing appeal of the reality mining technique.

February 3, 2009

Christakis and Fowler in SCIENCE, 23 Jan 2009 Issue

The golden boys of networks in public health are featured in this week's issue of SCIENCE.christakis fowler photo.png

From the headline of the article:

Friendship as a Health Factor
In a string of hot articles, two social scientists report that obesity, smoking, and other facets of health "spread" in networks. As the two friends expand their theory, doubters sharpen their questions.

The story is way cool. Their research, and the ground that they are breaking, are way cooler.

January 5, 2009

Google books

There was an interesting article in today's New York Times on Google books. Google books is a massive effort to scan, essentially, all print media, going back centuries. (Also see effort by Open Content Alliance.) Partially putting aside the important issues around control of the data, the digitization of texts creates the capacity to access, organize, and analyze much of what humanity has "thought" in recent history. From the perspective of a social scientist, the exciting prospect is to view this corpus as, perhaps the most extraordinary data set ever assembled (especially when combined with recent developments in natural language processing). Can we see the rise and fall of social movements? Of ways of thinking about the world, linking these constructs to space and time? This is part of a broader movement, as I have written before, toward a "computational social science."

The one aspect of control that this does raise is what access will there be to the entire Google books corpus for researchers? Indeed, part of the concern that has driven the Open Content Alliance (as I understand it) are the issues around public access to the corpus, where, for example, libraries will need to pay subscription fees for access to what could be a Google monopoly. There are similar concerns, as I see it, regarding access to those who wish to do research on these data. For those readers of the blog who have insight on this, please post comments.

November 3, 2008

Large Scale Real-Time Behavioral Feedback with Sensors

I previously blogged about how environmental sensors could transform how we think about architecture and how this data could be combined with wearable sensors. At the Media Lab's Awareness event, held last Thursday, myself and other Media Lab researchers explored this in more detail. At this event, over 150 sponsors, students, and faculty wore Sociometric Badges for the entire day so that we could show how interactions and behavior varied across different areas of the lab during different parts of the day, as well as give people insights into how different companies behaved: do people from Hitachi and Canon have similar behavior patterns? Do they see the same demos? We gave participants real time feedback on displays scattered around the lab. Here's a screenshot, but a video is coming:

This shows the activity in the different areas. Each circle represents a person, and circles grow as people stay in an area for longer. The solid portion of each circle indicates how much people are talking, and the color indicates how engaged that person is in talking (dark green implies not engaged, bright green is very engaged), which we extract in real time from the microphone on the badges. The circles also move around based on accelerometer activity, so the position information is only based on what basestation people were close to, giving us what room they were in.
Our badges interacted with Ubiquitous Sensor Portals created by the Responsive Environments group, which allowed people to browse video feeds of the building in real time. Eventually this system will enable interaction between the virtual world (i.e. Second Life) and the real world by allowing for voice communication through the portals to other portals or virtual partners in Second Life. Here's a picture of the portals:


At the end of the day we also gave participants feedback about their company's activity and companies that were similar to them. By using the badges to figure out who talked to whom, we grouped companies and people together that saw similar demos, met the same people, had similar behavioral patterns, etc. It was amazing that companies often had more in common with companies outside their industry than with those within it. We actually generated a network of interest similarity, which (not surprisingly) showed the Media Lab at the center with many other companies tapping in to a core of sponsors with many diverse interests, which were not always the companies with the most attendants. The personalized feedback was also very interesting, since we showed people who you may be interested in meeting based on the same features we used for companies. This appeared to work extremely well, since two other people in my research group appeared in my feedback even though I never interacted with them that day. While this information is not useful in and of itself, it did convince me of the system's effectiveness, which for users would help them trust other recommendations.

The point of all of this was not to show how this technology can impact a one-day event. After all, there's not much time for real-time reflection in one day. Instead we aimed to spark discussion about how continuous deployments of these systems could fundamentally change businesses and public spaces in general. Imagine continuous feedback on behavior, personally customized by you to help increase your productivity and effectiveness. Imagine spaces where the line between virtual and physical is blurred to the point where you can just as easily have a conversation with the person next to you as with the person next to you in the virtual representation of your building in Second Life. The Sensor Portals will continue to be active at the Media Lab, allowing us to continually tailor this system to be the most beneficial to users and further research.

We are also nearing deployment of our Sensible Organization tools in the laboratory of one of the world's largest pharmaceutical companies as well as in the call center of a major financial firm. We hope to measure through these interventions whether or not we can raise productivity and enhance community within organizations, as well as answer deep theoretical questions on networks and behavior. In social network theory there is the frequent claim that central individuals tend to be more productive because they have access to more diverse information, but the causality issue has not been thoroughly studied. Maybe more productive people simply tend to be more central, so we should instead try to detect behavioral and psychological characteristics to create feedback systems. By creating recommendation systems that actually make individuals more central (see my paper describing this system), enabling us to give empirical support to theoretical arguments.

This technology will fundamentally change organizations and management as a whole, and this deployment is the first step in this direction. Over the next few months through longer term experiments we'll begin to learn exactly what this means, and how management practices can change to take advantage of this data while at the same time using it to empower employees to make the right decisions. Stay tuned.

October 23, 2008

Online tutorial for Excel .NetMap

The following event PNG is sponsoring might be of interest to readers of this blog:

Marc Smith--Online tutorial for Excel .NetMap October 27, 2008: 12-2 pm (EDT)


~Online event. Registration required, and free of charge.~


Note that there is a small chance that we will hit the capacity limit of 200, so register ahead of time:

This url also has links to where you can download the software, information on technical requirements for logging on, etc.

(Excel) .NetMap is an add-in for Office 2007 that provides social network diagram and analysis tools in the context of a spreadsheet. Adding the directed graph chart type to Excel opens up many possibilities for easily manipulating networks and controlling their display properties.

In this tutorial the steps needed to install and operate (Excel) .NetMap are reviewed. The (Excel) .NetMap add-in provides directed graph charting features within Excel, allowing users to create node-link diagrams with control over each node and edge color, size, transparency and shape. Since .NetMap builds within Excel, all of the controls and programmatic features
of Office are available. Additional features of (Excel) .NetMap generate social networks from data sources like personal e-mail (drawing data from the Windows Desktop Search engine). Arbitrary edge lists (anything that can be pasted into Excel) can be visualized and analyzed in .NetMap.


Marc Smith is the soon to be Chief Social Scientist at Telligent, and recently of Microsoft Research specializing in the social organization of online communities and computer mediated interaction. He founded the Community Technologies Group and is now part of the Internet Services Research Center at Microsoft Research in Silicon Valley. He is the co-editor of Communities in Cyberspace (Routledge), a collection of essays exploring the ways identity; interaction and social order develop in online groups.

Smith's research focuses on computer-mediated collective action: the ways group dynamics change when they take place in and through social cyberspaces. Many "groups" in cyberspace produce public goods and organize themselves in the form of a commons (see related papers). Smith's goal is to visualize these social cyberspaces, mapping and measuring their structure, dynamics and life cycles. He has developed the "Netscan" engine that allows researchers studying Usenet newsgroups to get reports on the rates of posting, posters, crossposting, thread length and frequency distributions of activity. These data have revealed a complex online social ecosystem populated by multiple social roles.

This session will provide a walk through the basic operation of .NetMap. Techniques for time slicing and filtering networks will be highlighted. You may download the Excel .NetMap Add-in and slides visit in advance of this tutorial.

April 21, 2008

International Meeting on Methodology for Empirical Research on Social Interactions, Social Networks, and Health

I was pleased to be invited to the:

International Meeting on Methodology for Empirical Research on Social Interactions, Social Networks, and Health

which will be held early in May in Cambridge. It's being organized by Chuck Manski and Nicholas Christakis, and hosted by The Institute for Quantitative Social Science (IQSS) at Harvard.


Objective: To bring together econometricians, social statisticians and social network analysts to improve research on the relationship between social interactions and healthy.


Speakers and topics:

Opening Remarks, Charles Manski, Northwestern University

"Social Contagion in Health Behaviors in Current and Future Longitudinally Resolved Social Network Datasets" Nicholas Christakis, Harvard University

"Stochastic Blockmodels for Networks with Mixed Membership and Challenges for Modeling Dynamically Evolving Networks" Steve Fienberg, Carnegie Mellon University

"Social Interactions from the Perspective of Economics" Steven Durlauf, University of Wisconsin-Madison

"New Models for Dynamic Analysis of Multi-Sided (Large Scale) Conflict" Peter Bearman, Columbia University

"Human Dynamics: From Priorities to Human Travel Patterns" Laszlo Barabasi, Northeastern University

"Superspreaders or Limited Access Highways? Explaining Generalized Epidemics and Prevalence Disparities in HIV" Martina Morris, University of Washington

"Separating Social Influence from Social Selection on the Basis of Longitudinal Data and Statistical Models" Tom Snijders, University of Oxford

"Longitudinal Model of Network Formation: Heider's Theory of Balance vs. Simmel's Triadic Formation", Mark Handcock, University of Washington

"Network Topology and its Implications for Model-Building" Pip Pattison, University of Melbourne

"Selection and Influence: Models for Individual Attributes and Social Network Structures" Garry Robins, University of Melbourne

"The Average Outcome and Inequality Implications of Segregation in the Presence of Social Spillovers " Bryan Graham, University of California, Berkeley

"Point Process Estimation of Large-scale Spatial Dependencies", Matthew Harding, Stanford University

Closing remarks, Nicholas Christakis, Harvard University


Contact Info:

Gabrielle Stone, IQSS Events Coordinator

tel: 617-495-9489

March 3, 2008

HBS online exhibition: The Human Relations Movement

The Harvard Business School library hosts a great online exhibition called "The Human Relations Movement" with explanations and pictures of the Hawthorne Effect.

Harvard researchers have studied the Hawthorne plant between 1924 and 1933 by observing how different changes in work-related variables effect performance.

The Hawthorne effect is described by Rothlisberger as:

"the phenomenon in which subjects in behavioral studies change their performance in response to being observed"

October 31, 2007

Applying theory to managerial problems: how do you resolve communication problems inside firms?

When talking about information sharing and knowledge exchange inside firms, I am faced with the same question over and over again: "How do we know what we know and don't know?". Let me describe this to you with a small example.

HighTech Corp. is a medium sized technology firm in Europe. The communications department is responsible for ensuring a regular information flow and knowledge exchange between stakeholders inside and outside the firm. Internal stakeholders could be but are not limited to R&D engineers, sales staff or the management board of the firm. External stakeholders of the firm are distributors, clients or investors of the firm.

However, such information flow and knowledge exchange inside HighTech Corp. is often disturbed by physical, mental or psychological barriers of its personnel. Hence, the communication department faces serious problems when trying to find out what is going on inside the firm, what latest R&D trends are inside the firm/the industry, how clients react to HighTech Corps. novel product line etc.

As a consequence, HighTech Corp wants to embark on a project that puts a system/method/technology in place that can help the communication department to find out what HighTech Corp knows, what HighTech Corp does and who knows what inside HighTech Corp.

The overarching question(s) I have is/are: From everything we know about theories and concepts of information sharing in social networks, what are the theories with the most predictive power that can help us to better address such real-life issue?

On a more specific level, the questions could be formulated as follows:
- How do firms find out what they know and what they don't know?
- What methods should/should not be used?
- Can technology help?

- Can you get your personnel motivated to share their thoughts and ideas?
- If yes, how?
- If no, what do you do?

September 27, 2007

More on computational social science...

I presented some of my work on "computational social science" at the Applied Stats workshop yesterday. One of the questions that came up was what are the best tools to deal with unusual and massive data sets. Clearly, part of the answer is that there is nothing truly "off the shelf" that you have to write a lot of code from scratch. But the other part is that there are some flexible platforms/tools that are vastly better than others, and I would be interested in comments on what you think is useful for datasets, say, with millions of observations, or pulling text and link structure off of the web, etc.

August 21, 2007

How do networkers network?

Together with Timothy Huerta, Texas Tech University, and Jennifer van Stelle, Stanford University, I have written a paper on "How do networkers network?". We conducted a study of participants at the annual conference of INSNA (International Network of Social Network Analysts) to understand how young researchers are introduced into the community of senior researchers. The paper is work in progress at the moment and we would like to hear your comments, especially on our methodology.

You can find the paper in our working paper series (Working Paper # PNG07-005) and an abstract here:

This study was conceived during the 2005 INSNA conference by attendees who were interested in the evolving patterns of relationships among social network academics and consultants, and in how junior researchers were being integrated into the existing community. The study was also intended as a session- and space-planning aid for the 2006 conference organizers. Specifically, this paper describes a study of networking among social network professionals who attended the 2005 INSNA (International Network for Social Network Analysis) “Sunbelt” Conference. The attendees were asked to respond to two rounds of surveys regarding their experiences. We obtained data on existing and new ties in the first round of the survey, and tracked the maintenance or decay of those ties in the second round (approximately nine months later). We employ homophily arguments as well as theories of status and career/life cycle to determine what factors led to the establishment of ties from interactions at the conference. We consider the content of the new ties in addition to the above-mentioned theories to understand why such ties decayed or were maintained in the post-conference period. As well as applying the results of this study to the understanding of social network dynamics, we hope our findings will further the integration of new members into the existing community and enhance the session-scheduling and space-utilization aspects of conference planning.

July 31, 2007

The contagion of obesity

Remarkable article from NEJM last week:

Nicholas A. Christakis, M.D., Ph.D., M.P.H., and James H. Fowler, Ph.D.
The Spread of Obesity in a Large Social Network over 32 Years, New England Journal of Medicine, Volume 357:370-379 July 26, 2007

quoting from the summary of results:

Results Discernible clusters of obese persons (body-mass index [the weight in kilograms divided by the square of the height in meters], 30) were present in the network at all time points, and the clusters extended to three degrees of separation. These clusters did not appear to be solely attributable to the selective formation of social ties among obese persons. A person's chances of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) if he or she had a friend who became obese in a given interval. Among pairs of adult siblings, if one sibling became obese, the chance that the other would become obese increased by 40% (95% CI, 21 to 60). If one spouse became obese, the likelihood that the other spouse would become obese increased by 37% (95% CI, 7 to 73). These effects were not seen among neighbors in the immediate geographic location. Persons of the same sex had relatively greater influence on each other than those of the opposite sex. The spread of smoking cessation did not account for the spread of obesity in the network.


This paper received enormous attention, with front page articles in a variety of newspapers, and the ultimate sign of popular attention, jokes by late night talk show hosts.


What Jay Leno did not bring up, however, was the deft way this paper handles the issues around causality (see my earlier posting on issues around causation and social networks).

In particular, the paper uses several approaches to rule out explanations for their findings other than social contagion. First, the analysis leverages the longitudinal nature of the data to distinguish contagion from (obese) birds flocking together. Second, auxiliary hypotheses (in particular, (a) showing that asymmetric friendships—I think that you are a friend and not vice versa—have asymmetric effects, and (b) that friendship effects matter even for geographically distant dyads) undermine alternative explanations based on omitted variables. All in all, a very nice piece of analysis.

I am sure Leno was waiting to get to these finer points until this week…

May 20, 2007

Effect of Network Structure on Consensus

I was struck by a presentation by Qiming Lu at the NetSci conference on the dramatic effect of the micro-structure of social networks on consensus formation. Essentially, the results showed that creating random graphs with the same macro properties (clustering coefficient, characteristic path length, etc.) yielded vast differences in the final consensus state of a social network in an opinion spreading simulation.

Lu used social ties between actors to denote influences that individuals have on each other, with actors having certain probabilities of changing their opinion based on their neighbors' states and their current state. In the study, the authors randomly assigned individuals to start with different words for a single concept (called the naming game) to see how many words would exist in the steady state of the system.

In simulations on real world network structures, the authors found that actors would converge to using two words to describe a common concept, while replicating the same macro properties on a random graph yielded a consensus on one word. This has tremendous implications for how we characterize networks, since it points to a lack of a measure to capture certain features of naturally formed networks.

This also leads us to think about how we can combat groupthink, since these results imply that some larger social structures may exhibit some resistance towards groupthink. It is important to isolate these factors so that we can design our organizations and meetings to take advantage of these natural characteristics. Of course we do not randomly start with opinions, but form them over time as a function of those around us. However, these results may strengthen the notion that independent opinion formation followed by social discussion effectively combats groupthink, as has been previously demonstrated in smaller systems.

They have presented some of their results previously, in the paper Dynamics of Naming Games in Random Geometric Networks, but their NetSci paper will hopefully be available online soon.

March 14, 2007

Social Networks and Communication Neworks

The University of Toronto’s NetLab has been doing some exciting research on how to measure social networks and communication behavior. Their recent conference paper, “Collecting Social Network Data to Study Social Activity-Travel Behavior: An Egocentric Approach,” discusses new methods of collecting data about social network, travel behavior, and the use of communication technologies. This is exciting research because it shows how to effectively measure two important elements of social life – the cognitive dimension of perceiving the existence of social ties, and the behavioral dimension of interaction that actually occurs with social ties. Moreover, this research incorporates multiple types of communication, including communication that occurs in-person, telephone, and email. The advantage of this approach is that rather collecting data about only certain kinds of ties or ways of interacting – such as the General Social Survey’s question about “those with whom you discuss important matters” – measuring both the cognitive and behavioral elements of social ties gives a more comprehensive understanding of the extent to which social life exists in America and how it actually occurs.

March 7, 2007

Bayesian Models of Social Networks and Text with Application to Political, Legal and Bibliometric Data

This is an abstract of this weeks PNG/CCCSN seminar with Andrew McCallum (University of Massachusetts, Amherst). We encourage you to discuss his presentation via comments on the blog.

"The field of social network analysis studies mathematical models of patterns in the interactions between people or other entities. In this talk I will present several recent advances in generative, probabilistic modeling of networks and their per-edge attributes. The Author-Recipient-Topic model discovers role-similarity between entities by examining not only network connectivity, but also the words communicated on on those edges; I'll demonstrate this method on a large corpus of email data subpoenaed as part of the Enron investigation. The Group-Topic model discovers groups of entities and the "topical" conditions under which different groupings arise; I'll demonstrate this on coalition discovery from many years worth of voting records in the U.S. Senate and the U.N. I'll conclude with further examples of Bayesian networks successfully applied to relational data, as well as discussion of their applicability to trend analysis, expert-finding and bibliometrics."

Here is a link to Andrew's talk: "Bayesian Models of Social Networks and Text with Application to Political, Legal and Bibliometric Data"

February 19, 2007

Extending the Technology Enactment Framework - PNG Working Paper

Jane Fountain’s book “Building the Virtual State” introduced social science researchers to the technology enactment framework (TEF). This working paper presents further modifications to the revised TEF by Okumura who introduced key actors that influence technology enactment. I propose a fourth actor group, the citizen and further causal relations between existing actors and the organisational setting. The revisions towards a more hybrid TEF between an actor-centric and institutional approach allows overcoming some of the limitations brought up by the framework's critics such as the absence of socio-technical systems theory.

February 16, 2007

Structure versus Content in Network Analysis

There is a fundamental assumption underlying analyses of networks investigating the effects of ties. The assumption is of a commonality or regularity in the presence of ties among dyads and the content of the interactions taking place across these ties.

If the tie relationship is defined as the classical friendship, trust, or advice type ties, this assumption may not be particularly problematic. As more and more network studies construct networks from data harvesting (e.g., email logs, text analysis, etc.), this assumption merits more scrutiny.

Click below for additional discussion with examples.

Continue reading "Structure versus Content in Network Analysis" »

January 10, 2007

Market Analysis on Social Network Analysis Books

Together with my co-author Dr. Marina Hennig from the Humboldt-University of Berlin, I conducted a market analysis of the main social network analysis books. We included English and German books. The results are written up in two working papers, published in German and English, in our Working Paper Series.

Here is a short abstract:

We conducted a market analysis of existing books on social network analysis as a basis for a grant proposal submitted to the German Research Foundation (Deutsche Forschungsgemeinschaft - DFG) in November 2006. The results of this analysis are included in this working paper format and open for discussion. We are eager to learn about alternative interpretation(s) or analysis dimensions and will be (are) happy to update this paper as soon as we receive valid comments or requests for changes.

We appreciate any kinds of comments, additions, opinions on our analysis. We appreciate any kinds of comments, additions, opinions on our analysis. Please contact Ines Mergel if you would like to use our Endnote file.

September 21, 2006

Cyberinfrastructure for Network Science

Hello all! Welcome to a new academic year. We had a wonderful kick off with two presentations by Katy Borner today. I am actually going to make a post for each of them. The first presentation was a description of cyberinfrastructure that Prof Borner (and others, most of whom have now presented on the complexity seris) are designing-- "The Network Workbench". This is cyberinfrastructure aimed at the Network Science research community. I am putting the abstract and link to the project below. While any questions are welcome (and Prof. Borner will be weighing in on the discussion), of particular interest regarding a tool that is very much under construction now is what would be of use to the social science community? The basic idea behind the infrastructure is that it should be able to incorporate various datasets and algorithms developed out there in the community-- one simply has to produce something that "plugs into" the tool Anyhow, they have developed various algorithms and the like already, and need to prioritize "what's next." So, if you have particular suggestions, please post as comments. (btw, note that in the next day or so I will post the powerpoint to her workshop here)

The Network Workbench workshop

This workshop introduced diverse cyberinfrastructures (CIs), such as the Information Visualization CI, the Network Workbench, and respective databases. These CIs serve the needs of the InfoVis and NetsSci research communities respectively. They also make possible the analysis and mapping of mankind’s scholarly knowledge.

August 16, 2006

"Network elasticity" and "individual plasticity"

I briefly want to plug these two book end constructs that I framed in a paper I wrote some years ago in the Journal of Mathematical Sociology ("The Co-evolution of Network and Individual"), which examined how networks and nodes co-evolve. Essentially, network elasticity captures how endogenous the network is—how much nodes get to choose who they connect to. Individual plasticity, in turn, captures how endogenous “attributes" are—how much individuals are affected by who they are connected to. In this paper (and the discussion here) I apply these ideas to social influence processes, but the concepts are more general than that. I would argue that different social systems differ dramatically in how elastic their networks are, and how plastic the nodes are, which, in turn, has certain systemic implications.

The idea that individuals affect their network as compared to being affected by their network are sometimes placed at opposite ends of the spectrum; but of course, they are really orthogonal processes. Since this paper was written, statistical tools (e.g., by Tom Snijders and his team with Siena) have been refined to examine just such a coevolutionary process. My focus is really on something different than estimating the underlying transition probabilities for the change in state of particular relationships or nodes.

Rather, what I am focused on are the long run dynamic systemic consequences of different levels of elasticity and plasticity. For example, in Co-evolution, I examined the social network within a government agency, where the social structure was very rigid, where the ties of a new person were pretty much the same as their predecessor, and that individuals entered when their were early in their professional career and thus pretty malleable. The result was that structure drove attitudes, not the other way around. One could produce a 2 x 2 typology of networks and plausible resulting dynamics:

High plasticity and low elasticity: homophilous network, where the social structure will drive attitudes (e.g., traditional bureaucracy).

High elasticity and low plasticity: homophilous network, where social systems will polarize along nodal characteristics.

High elasticity and high plasticity: dual possibilities of emerging with a homogeneous, cohesive, network, or polarized cliques that do not talk to each other.

Low elasticity and low plasticity: heterophilous network.

Of course, the above depends a lot on the determinants of the social structure; an inelastic network that forces you to talk with likeminded people has very different implications than an inelastic network that forces you to talk with people who are different from you.

The Co-evolution of Individual and Network" Journal of Mathematical Sociology, January 2001, 69-108.

August 13, 2006

Quantifying Social Networks in Africa - EPROM

Wanted to spread the word that we are now launching EPROM (Entrepreneurial Programming and Research On Mobiles) jointly at MIT and the University of Nairobi. The premise behind the project comes from the fact that today’s mobile phones are designed to meet Western needs. Subscribers in developing countries, however, now represent the majority of mobile phone users worldwide (1.4 billion mobile phone subscribers live in the developing world!). We have put particular emphasis on Africa because it is currently the fastest growing mobile phone market in the world, and I’ve moved to Kenya for the year to get the project off the ground.

What Kenyans are starting to do with their phones is amazing. Today, in my small town of Kilifi, I can buy milk, pay for a taxi ride, even check the local vegetable prices on my mobile... I describe this phenomenon in more detail here.

To further our understanding of the underlying factors driving entrepreneurship using mobile phones, we are involving several students as research assistants to pursue research on behavioral and mobile phone usage patterns. We will be distributing Nokia ‘smartphones’ to fifty individuals in different demographics and log their behavior over the course of six months. The phones will have a custom application that continuously logs location, nearby peers, communication and phone usage statistics, similar to the data collected for 100 people during the Reality Mining project at MIT. In this previous research, we generated models of our subjects’ lives with such precision that they could be used to accurately predict subsequent behavior. Based solely on data logged by our custom phone application, we have successfully shown that after two months logging it is possible not only to predict behavior, but also to infer friendships, differentiate demographics, validate survey responses, and even quantify the dynamics of an organization. It is our hope that this data will provide an analogous quantitative description of Kenyan social networks and mobile phone usage behaviors.

Cheers from Kenya...

April 25, 2006

Sampling Strategy Online Forum

Hey there,

I am currently pre-testing an online survey, which I would like to launch in a couple of weeks.

However, the issue I am still struggling with is the sampling strategy. Let me briefly give you some background.

± My research objective is to identify information and advice networks in the online community. Besides a set of demographic/control questions, I ask the respondents to identify their three most important online community contacts (e.g., who comes to you to ask for information/advice AND whom do you turn to to ask for information/advice)

± The community has more than 80,000 registered users. However, the percentage of the really active members is assumed to be far below the latter figure (no clear figures exist). Furthermore, users of the community reported multiple identities of single users (several cases are known, where one user utilizes several aliases)

± There is – currently – not technically feasible/easy way of extracting a list of the most active network members, which might define the sample size for the survey (besides going on the website and manually scan through the forums etc).

± I originally planned to announce the survey and post the link to the survey on the homepage of the forum instead of sending the survey link to each member individually.

The question(s) I thus have is/are:

± How have other researchers solved the sampling issue in a similar type of situation?

± Have other studies tried to identify information and advice network with an ego-sampling strategy, where the exact sample size wasn’t known ex-ante? If yes, did they succeed or fail?

± What are the most relevant/important papers to look at with respect to sampling strategy/research methods of online surveys in online communities?

What do you think? Do you have any ideas?

March 23, 2006

The billion dollar question

Last week, an article in the New York Times reported on the newest developments in a decade-long struggle to modernize the F.B.I. computer system (“Cost Concerns for F.B.I. Computer Overhaul", March 14, 2006). Citing a Justice Department report, it says that the overhaul will likely cost “another half-billion dollars to complete". The same amount has already been spent: After the report by the 9/11 commission revealed that the antiquated computer system might have played a part in the intelligence gaps before 9/11, the F.B.I. reacted by devoting $535 million to its Trilogy Program, a network “designed to provide all FBI offices with better organization, access and analysis of information" (F.B.I. press release). So far, the results have been less than satisfactory: Its core component, a case management system (known as Virtual Case File system), collapsed under technical difficulties and was abandoned by the F.B.I. after it had spent $170 million on it. And a few days ago, an audit by the GAO revealed that the F.B.I. and its contractors spent more than $17 million on “questionable payments" (Washington Post, March 18, 2006). Now, the F.B.I. plans to spend $425 million on a new case management system, partly with the same contractors, named Sentinel.

What puzzled me was a statement by the inspector general’s office of the Justice Department quoted in the Times saying that they were unsure whether the new system, “even if successful, would allow the bureau to share information adequately with other intelligence and law enforcement agencies". The lack of information sharing was one of the main issues pointed out by the 9/11 commission, and yet, after investing a billion dollars, information sharing is not built into the system. It seems to me that the project would benefit enormously from the insights organizational researchers have into information networks. For example, my research shows that large IT projects that start off with an exploration of informational needs are more successful in the “exploitation", or implementation phase. Rather than focusing solely on network technology, the F.B.I. should devote some resources to finding out who needs to talk to whom. Have researchers conduct interviews, run focus groups, maybe even do an ethnography to identify the information network, then build the computer network to support it.

February 20, 2006

How to measure relations: the coming paradigm shift

The dominant way to measure relations in social network analysis (SNA) is still based on self-report data. I predict that over the next 10-15 years there will be a dramatic shift in how this foundational construct is measured, from self-report to behavioral measures. The reason for this is simply that people leave vastly more traces of their behaviors now than they used to. Nathan's entry of last week offers a remarkable example of this.

We see an opening wedge of studies using observational data—e.g., see Tyler et al. and Diesner and Carley, which used e-mail data. However, if you look at sociology and OB publications (for ex), the vast majority of social network articles still rely on self report data. Further, most of those papers interpret the results as if they reflect actual behavior.

This is potentially troubling, because research suggests that the correlation between self report and behavior is surprisingly low. The classic work in this vein comes in a series of studies co-authored by Bernard Killworth, and sometimes Sailer. (Note that Freeman et al. found that people do better at recalling long term social structure than short term interactions.)

Putting aside the correlation between observational and self-report data, these are distinct constructs, and which one is interested in depends on ones research questions. Clearly, in certain cases behavior is all that matters—if you are interested in the transmission of STD’s, then a key question is how to eliminate the deviation between self reports and actual behavior (e.g., see Brewer et al.). However, those deviations, as Corman and others have explored, are not random. Even in the context of sexual behavior, one would guess that recall is correlated with (for example) emotional significance. This is not so interesting, perhaps, in understanding the spread of sexual diseases, but might be interesting for other research questions.

In any case, I do think that this is an area that requires a great deal more attention over the next few years. In particular, there needs to be more attention to (1) the link between different types of interaction behaviors and self-reported relations, and (2) the interaction between the two (e.g., it may matter if I talk a lot with my friend or not, not just whether they are (a) someone I see as a friend, or (b) someone I talk a lot with).

(Note, btw, that this entry benefited from a recent exchange on the Socnet listserv on “CSS & ‘A Million Little Pieces’?)

Continue reading "How to measure relations: the coming paradigm shift" »

February 14, 2006

What would you do with the telephone call network of an entire country?

I’m beginning a collaboration with British Telecom in an effort to analyze their massive call network dataset. This is a dynamic, directed network that contains ~250 million nodes (ie: distinct phone numbers) and ~2000-5000 edges (ie: calls) generated each second. The phone numbers are of course one-way hashed such that it is impossible to link a node’s identifier to an actual phone number. However we do have information about the country and region to which the node belongs (ie: country code / area code). While it is not inclusive of every call to and from the UK, it is estimated that the dataset includes approximately 80% of landline calls and 30% of mobile calls.

So my question to the complex systems / social network community is this: what are some questions we should attempt to ask of this dataset? Possible examples include calculating the strength of a particular region’s relationships with other regions and countries, analyzing the dynamics involved in “call cascades?, inferring the average size of an individual’s hierarchical social groups (from close friend to possible acquaintance), etc...

duration2.gif

While many metrics may be impossible to calculate for a network of this magnitude, simple sampling can yield interesting results. For example, the plot above represents the duration of outgoing calls from 100,000 randomly sampled nodes during 6 month intervals over the course of October 1995 to March 1998. It is clear that there are an increasing number of very long calls (over 10^4.2 seconds ~ 4.5 hours) which could be a good indicator of the uptake of dial-up internet in the UK during this timeframe.

February 13, 2006

The genetic basis (?) of political orientations

There was an interesting paper recently published in the American Political Science Review by Alford et al. that received a lot of attention, which asserted that political orientations have a major genetic component. This paper was done well, by behavioral genetic research standards, I think, a standard analysis of "concordance rates" of identical vs fraternal twins. I do want to pick at one important premise, which is that the strong correlation of political attitudes of identical (monozygotic) twins as compared to fraternal (dizygotic) twins is not due to greater communication and thus social influence between identical twins. Such an assertion is plausible, in that similarity typically predicts communication (homophily)—in fact, this is one of the most robust patterns in social network analysis. I would guess that identical twins see themselves as more similar than fraternal twins do (political orientations aside), and thus talk more. Note that I am unfamiliar with any research that actually demonstrates this (please comment if you are).

The authors do directly tackle this point (see p. 155), but, at first glance, at least, it is a weak reed they rest their assertion on, which is a paper by Martin et al. in 1986. ...

Continue reading "The genetic basis (?) of political orientations" »

February 12, 2006

Longitudinal Data and the Adoption of Technology

I've spent this last week working on a paper with Kakuko Miyata and Barry Wellman. The paper uses longitudinal survey data collected in Japan to understand the causal relationship between the use of keitai (internet enabled mobile phones) and the reception of social support. This is one of the first opportunities that I've had to write a paper based on longitudinal data, and I'm thoroughly enjoying the experience. In addition to providing me with an understanding of the causal relationship between the technology and social behavior, the data is also allowing me to chart the adoption of a new technology, as it has become integrated into lives a general public. This experience has made me wonder about the extent to which the adoption of keitai is the result of a social network structure that is more prevalent in Japan than in countries. My hope is that more longitudinal studies of this nature will be conducted in different countries, so that I might someday better understand the extent to which adoption patterns are the result of differences in network structure, vs. other factors, such as culture, marketing, or investment in technological infrastructure.

February 7, 2006

Knowledge Networks: Knowledge Transfer within and between Organizations

In his 1999 paper on the "Search-and-Transfer" problem, Hansen introduced the notion of knowledge networks. In a product development context, Hansen finds that tie strength and knowledge tacitness moderate the ability of an organization to locate and exchange knowledge between organizational subunits.

Building on a current class discussion in the context of our latest course here at the Kennedy School, given by David Lazer(Network Analysis for Managers and Analysts), we have raised the question how feasible it might be to apply social network analysis for management and consulting purposes in industry. We asked whether or not it is possible to construct knowledge networks among firms by conducting a series of interviews in various firms.

A point we have really been discussing on is: To what extent will firms be willing and able to provide information about their knowledge search-and-transfer behavior in their inter- vs. intraorganizational knowledge networks?

The reference we use is:
Hansen, M. T. 1999. The Search-Transfer Problem: The Role of Weak Ties in Sharing Knowledge across Organization Subunits. Administrative Science Quarterly, 44(1): pp. 82-111

December 22, 2005

Virtual Stock Markets - Proving the Powerlaw?

Social relations between individuals can be complex systems. How the structure of social networks impacts the behaviour of a system has been researched recently. These are i.e. power grids, neural networks, the World Wide Web or stock markets. Although different in the underlying interaction dynamics or micro-physics, all these networks have shown a tendency to self-organize in structures that share common features. In particular, the number of connections, for each element, or node, of the network follow a power law distribution. Networks that fulfill this property are referred to as scale-free (SF) networks M. Bartolozzi, D. B. Leinweber1, A. W. Thomas. (2005).

I would like to draw your attention to 2 projects which are using the power law in a direct and indirect way. First, there is the use of virtual stock markets to improve market research. Second, a recent project concerning blogs and virtual stock markets (VSMs) tries to proove the existence of powerlaw.

Continue reading "Virtual Stock Markets - Proving the Powerlaw?" »

December 21, 2005

Auld lang syne: networks as behavioral flows

Over this holiday season, as many of you go somewhere, or are the destination for others, I hope you will be thinking about social networks. In particular, I hope you will think about social networks using a different metaphor than is usually applied in the field. The dominant way of thinking about networks, still, is as a slow-changing structure that enables/constrains behavior, and/or through which things circulate. While I think that this metaphor has value, this season think about “the network? as a set of relational behavioral flows, where you engage in some set of behaviors (visiting, writing Christmas cards, e-mailing, etc) vis-à-vis other people. A relationship is thus simply a particular behavior at a particular moment in time, and networks are simply the accumulation of these moments over time for some set of people. Networks, thus viewed, may exhibit certain types of properties, e.g., periodicity. Nathan Eagle of the Media Lab (with Sandy Pentland), has done some particularly nifty work with devices that measure interactions (“sociometers?), demonstrating the kinds of periodicity one may see among people who work together. Holiday travel is another example, where certain pairs (and larger order aggregations) may tend to get together at a particular time of year over an extended period. Further, certain types of events (e.g., graduating, getting a new job, etc), through this lens, are simply correlates of dramatic changes in these behavioral flows.

This metaphor, I think, can take you some places that a structural metaphor cannot—e.g., in understanding the spread of things through the system. Further, in turn, it can strengthen the structural metaphor by understanding the micro-behavioral foundations of certain relationships. E.g., Eagle finds that friends have systematically different relational flows than non-friends. The relational flow metaphor also highlights sequence in a way that is invisible in the structural metaphor. This is something that David Gibson (formerly of Harvard, now at Penn) has done some nice work on (Mazel tov to David and Ann on the new addition to their family, btw!).

Just something to think about as you are sitting down for your holiday meal, and singing auld lang syne (which itself is about the ebb and flow of relationships).

Refs:

Gibson, David R. 2005. "Concurrency and Commitment: Network Scheduling and its Consequences for Diffusion." Journal of Mathematical Sociology 29:295-323

http://www.nathaneagle.com/

December 19, 2005

Causal consulting

My sense is that social network analysis has increasingly been used for consulting purposes. This raises a couple of concerns and an opportunity. The concerns are two-fold: first is that a body of complex and sometimes conflicting findings are inevitably hyped and simplified as they pass through the prism of the consulting world—I think sometimes beyond recognition. Second is that, as noted in my previous posts, a lot of these findings rest on fairly shaky causal legs—particularly when you consider the lack of studies on system-level network structure and system performance. That is, perhaps importing these ideas into practice is the organizational equivalent of hormone replacement therapy. We make prescriptions based on correlational evidence, and make recommendations that may have adverse effects.

That said, ultimately ideas only matter if they have some impact on how people think and act—that is, people outside of the insular world of academia. One hopes that SNA can offer insights into how organizations (and other collectives) function, and how to operate more effectively. This all points back to my earlier arguments about the need to strengthen the foundations of causal assertions in the field.

This, in turn, points back to what (consultant and other based) interventions can offer back to the field—better insight into cause and effect. For example, do particular types of “network strengthening? actually improve outcomes at group and individual levels as predicted? Does making expertise and social networks transparent increase knowledge sharing? Do efforts to increase relationships across silo’s improve coordination and access to information? And are there any unanticipated negative consequences? Etc etc. Of course, all of this presupposes building in evaluative measures into the intervention, and then a rigorous evaluation of whether the intervention worked, and it may not be reasonable to expect those that recommend certain interventions to rigorously evaluate them. But one problem at a time….

December 7, 2005

An Introduction

As a way of introducing myself to this blog, I'm posting an interview that I recently did for a radio show on the Canadian Broadcasting Corporation (CBC). This interview is part of a series about fellows at Massey College, University of Toronto. The interview focuses on my research – how people maintain their relationships by way of the internet. I hope you enjoy. To Listen, Click Here.

December 5, 2005

Longitudinal data, causal inferences, and the institutional milieu

A quick follow up on my earlier post re causal inferences, in which I stated that longitudinal data are not a cure all for determining the direction of the causal arrow. Longitudinal data, in principle, should allow a tracing of what preceded what temporally, and thus (hopefully) causally. Thus, in the context of social influence, if A and B started talking at time t, and their attitudes converged at time t + 1, it would seem reasonable to assert that communication lead to convergence (social influence), rather than prior similarity to communication (homophily). However, it possible that exogenous factors are dynamically operating on either the network or attitudes (to take the social influence example) over time. For example, imagine the attitude in question has to do with the role of government in markets, and one found looking at a population that both attitudes and communication patterns converged over time (suggesting both homophily and social influence). But now add to this scenario that the population in question is an undergraduate cohort, and an alternative explanation might simply be that ones major (e.g., economics) affects both attitudes and ties over time. It is plausible that such a process would lead to incorrect inferences regarding the sources of attitudes and ties. More generally, institutions affect both outcomes of interest and the configuration of networks. Neglect of the institutional milieu (which is often ignored in SNA research) can thus lead to spurious inferences regarding the reciprocal influence of networks and individual-level outcomes, even with longitudinal data.

December 1, 2005

Control and causation

Following up on both the Mobius paper and my earlier passage on causation, how many social network related studies have incorporated a degree of control over some critical dimension of the network and/or other factors? In the Mobius paper resources, of a sort, were exogenously spread through the network, with certain paths facilitated (through lower interest rates for certain dyads) and their diffusion studied. In Festinger’s classic study on social influence, people were exogenously placed in housing. In Newcomb’s dorm study the students were exogenously placed in their rooms, and various measurements taken before they took residence. There are also small group lab experiments—Bavelas and colleagues’ work in the Small Group Network Laboratory at MIT in the 1950s—and social exchange theory—by Emerson and Cook and others in the 1970s and after. What other research has there been where there has been a degree of control? I interpret control pretty liberally here—e.g., where some exogenous proxy, correlated with communication, is used to examine the impact of the network. It’s an intrinsically difficult problem, since typically one cannot randomly assign certain types of relationships to pairs of people ("you two be friends"), but not insurmountable. If people have (1) ideas re particular research that did have a measure of control; or (2) ideas re how to achieve a degree of control so as to allow better causal inferences, please post a comment.

November 28, 2005

Mobius on "Measuring Trust in Social Networks Through a Microfinance Field Experiment"

Markus Mobius will be speaking today on "Measuring Trust in Social Networks Through a Microfinance Field Experiment"

Monday, November 28, 2005
Bell Hall, Kennedy School of Government
12:00 - 1:30 p.m.

We propose a methodology to measure trust within a social network and apply it in a field experiment in shantytowns of Lima, Peru. We model trust as a transaction cost which an agent pays to gain permission to use someone else's asset. Social closeness reduces the transaction costs through two channels: (1) it reduces asymmetric information and makes it more likely that the asset's owner can identify the user as a 'good' responsible) type; (2) it gives the owner the ability to control the agent's use of the asset and hence reduce moral hazard.

We have designed a microfinance program where we invite a subsample of the Shantytown community to become 'sponsors'. Sponsors receive a line of credit and can use a fixed share of it to obtain loans for their own household. The rest of their credit line (the 'asset') is allocated for 'sponsoring'. Any household in the community can get a low-interest rate loan from our microfinance partner by finding a sponsor who agrees to cosign the loan application. We randomize interest rates across all client-sponsor pairs: this allows us to measure the tradeoff between accessing a socially close sponsor with a high interest rate and a socially distant sponsor with a low interest rate. A second randomization varies the extent to which a sponsor is responsible for a borrower's default which allows us to separate our two trust channels. In this paper we report early results from two communities. We find that social distance up to length three reduces transaction costs by about 1 to 4 percent in terms of monthly interest rates. Moreover, geographic distance is also highly significant.

Mp3 Podcast - Presentation PDF

November 21, 2005

A dictum regarding social network analysis and causal inference

Figuring out the direction of the causal arrow is perhaps the major methodological issue of social science. This challenge is particularly acute in the study of social networks. Does the position in the social network affect success, or does success affect position in the social network? Do birds of a feather flock together, or do dogs and their owners starting looking alike? The strong structuralist tradition in social network analysis, which posits that networks are out of the reach of the agency of individuals, obfuscated this issue for a long time. With the increased attention to dynamic networks, and the development of tools to study how networks and individuals change at the same time (e.g., see the fine work on p* models, and related software, such as SIENA--http://stat.gamma.rug.nl/snijders/siena.html) there has been a dramatic improvement in the statistical toolkit available to deal with these issues. However:

(1) Longitudinal data do not guarantee correct conclusions regarding cause and effect. For example, one can imagine omitted variables dynamically affecting network and/or individual level variables, resulting in a spurious inference of causation in longitudinal data.

(2) Cross-sectional data can, under the correct circumstances, allow reasonable inferences of causation. Festinger’s classic study of social influence is, arguably, one such example.

(3) Despite the massive upsurge of social network related research, only a fraction of published social network research use longitudinal data, and only a fraction of a fraction of the studies that use cross-sectional data even hazard a sentence on where the network in question came from.

So, let me propose the following dictum for social network research:

Any research on the impact of social networks must at least wrestle with the factors underlying the network(s) under study, considering the possibility that (a) the network being studied was the result of the purported “impact? (i.e., reverse causation) and (b) some plausible third factor has affected both the network and potential outcome (i.e., spurious inference).

This is a pretty low bar, actually, but in many fields I would guess that close to 0% of the research exceeds it.