Open Source Research – Following the Code

Reading Time: 4 minutes

Following is the description of the joint research Prof. Neil Gandal Dr. Uriel Stettner and I were working on. Our main goal was to research how knowledge spillovers between different OSS projects affects projects’ success and progress.

In the case of OSS development, knowledge spillovers (if they exist) likely occur via two channels

Spillovers from Software reuse: Programmers take software code from one project and employ it in another project.
Spillovers from Common Programmers: Programmers take knowledge, know-how, and experience from one or more OSS project they work on and employ that knowledge on another OSS project they work on.

The first channel includes (i) reuse from one project that a programmer is working on to another project he or she is working on as well as (ii) reuse from a project that has no common programmers with the relevant project. The second channel includes knowledge, know-how, and experience, other than software reuse. A key question is whether these spillovers exist in a large OSS network, and if they do, whether knowledge transfer enhances the performance of the projects involved. In previous work we examined how connections among software projects via common programmers affected the success of OSS projects (Fershtman and Gandal 2011; Gandal and Stettner 2016), We found evidence of positive spillovers, but since we could not measure reuse on a large scale, these spillovers include knowledge, know-how, experience, and reuse from other projects the programmer is working on. By directly measuring software reuse as well as network connections we can separately measure the importance of the two channels.

Direct knowledge spillovers occur when two projects have a common programmer who transfers knowledge, know-how and experience embedded in the code from one project to another. In contrast, indirect project spillovers occur when knowledge is transferred from one project to another when the two projects are not directly linked through a common programmer. For example, suppose that programmer “A” works on projects I and II, while programmer “B” works on projects II and III. Programmer A could take knowledge from project I and use it in project II. Programmer B might find that knowledge useful and take it from project II to project III. In such a case, knowledge is transferred from one project to another by programmers who work on more than one project. There is a direct spillover from project I to project II, and an indirect spillover from project I to project III, since projects I and III are not directly connected.

We calculate reuse measures for all projects in our data set, and examine whether reuse of software is associated with project success (controlling for other factors)

We used the same base network and datasets as described in: Information Flows Thesis Research. We constructed a “reuse” connection network between the projects where project A has a directed connection to project B if there is at least one pair of similarity files belonging to these projects such that the original file belongs to A and the destination file belongs to B. Note that if Project B copied from Project A, and Project C copied the same file from project B, project A gets credit as the source in both cases. In this case, project B is just a facilitator and does not get credit as the source. Finally, we then added up all of the connections and defined the variables reuse_in and reuse_out for each project. “Reuse_in” is the number of other projects from which that project reused at least one software file. “Reuse_out,” is the number of projects to which the project “contributed” at least one software file. We also account for investment and effort in the project. Hence, we compute the number of modifications and additions made to the code for each project over the period between 2005 and 2008. A modification is defined as a change made by a programmer to existing code within a distinct file, while an addition occurs when a programmer adds a new file that contains a block of code that was not previously part of a focal OSS project. Thus, a modification captures an activity that affects a particular set of code with the desire to, for example, make the code more efficient or stable. Accordingly, modifications are a good proxy for incremental innovation that, for example, improve how the software product works via the refinement, reutilization, and elaboration of established ideas and technologies. Additions are a proxy for new knowledge that may provide additional functionality (Lewin, Long, & Carroll, 1999)

Our key findings are:

1. Controlling for other factors that explain success, projects that reuse code from a greater number of projects have more success.
2. Even after controlling for software re-use effects, we find knowledge spillovers via common programmers among projects: projects that have more connections are more successful. This suggests that projects receive additional (i.e., non-code) knowledge spillovers from connected projects.

We see that knowledge spillovers take place via both channels discussed above and that both channels (reuse of code and other knowledge spillovers from connected projects) yield spillover benefits.
We then delineate reuse into two categories:

Software reuse from connected projects, i.e., reuse from a project with a contributor in common with the relevant project.
Software reuse from unconnected projects, i.e., reuse from projects without a contributor in common with the relevant project.

We find that reuse from connected projects is not statistically significant in explaining success, while reuse from unconnected projects is statistically significant. Overall, our results suggest that knowledge spillovers from neighboring products are primarily due to knowledge other than copying code, while “reuse” spillovers come from the general community of open source software projects. These results provide the first empirical support for knowledge spillovers via reused code in large open source software networks

Open Source Research – Following the Code

Leave a Comment Cancel Reply

Open Source Research – Technical Work

Information Flows Thesis Research

You may also like