Open Source History and Mormon Primary Sources

By February 20, 2012

David Golding is a PhD student in the History of Christianity at Claremont Graduate University, is a co-editor (with Loyd Ericson) of the Claremont Journal of Mormon Studies, and assisted in creating the new MHA website. He also wrote the leading book on the web programming framework, CakePHPHe has been kind enough to share a little bit about an exciting new primary source project.

See his previous posts here and here.

We all have become hybrids in this day and age, haven?t we? In another life?and it still manages to remain with me no matter what I might do to shake it off?I worked in software development and desktop publishing. I can?t help but return to systems theory and technology as I build my own research agenda as a historian. For years now, I?ve anticipated historians taking advantage of what software engineers work with every day: open source data and logic. And yet nothing quite like open source technology has taken root in the archival and historical professions. It?s time for us to consider the benefits of pushing our research into a collective and open system, a system already possible (and free of charge) thanks to advances in social media, software versioning, and cloud computing.

Many of us may not be aware of the complexity and elegance of Cappuccino, a web application framework in JavaScript and Objective-J. While it is software and thus of a different genre than the historical monograph, it is still an example of the height of elegance, sophistication, and skill for which any historian has striven when crafting a narrative. Take a look at the lack of redundancy and the way that Cappuccino separates the massive body of aspects and concerns into coherent modules, and you?ll see what I mean.

And Cappuccino?s codebase was actually written by 83 contributors and well over 6,000 commits of code. Such is the magic of open source technology. Cappuccino is one of millions of projects hosted as a repository to which contributors commit changes to the code. Because these repositories are built on software versioning systems, nothing that is previously committed is destroyed and every new commit is preserved. This is ?Save As?? on steroids. Not only is everything saved, but everyone has open access to every contribution made. This prevents contributors from writing over the top of each other?s work while also maintaining a single source. Over time, the collaborative efforts of dozens of people using their spare time to provide a small contribution to a single codebase accomplishes what would otherwise require a company-wide effort. We all benefit. Open source software is freely available?by definition, it?s open. That doesn?t at all mean that open software is less valuable. On the contrary, sometimes the open environment takes the possibilities to a higher level than closed environments. Motorola just bought the rights to Cappuccino for $20 million, and yet any one of us can download, use, and even contribute to the Cappuccino codebase.

So what would an open-source historical monograph look like? I?m not sure. But I am sure that our profession can benefit from open source technology, especially when it is freely available and universally accessible (provided you have an Internet connection and a computer that is less than 10 years old). We have all seen how social media have brought a tremendous degree of interconnectivity and collaboration to the profession. It?s time to utilize these media and technologies to further our research goals as a field.

Jonathan Stapley has pointed out recently how an extraordinary resource, the Studies in Mormon History database, is under review as a viable project. He mentioned how the task of maintaining this database exceeds the capacity of any one individual due to the increasing flood of information. A solution to this difficulty in databasing Mormon history is to open the database to the collective energy of the users themselves. Wikipedia has done something similar with exceptional results. But giving just anyone access to this kind of data store risks compromising the reliability of the data. How could we ensure that the content meets our standards when anyone could add, edit, or delete records in the database? Open source technologies have an answer in how versioning systems and repositories work.

Distributed repositories function differently than what most of us are used to. When we use web databases, we usually interact with records or rows in a table. In an open source environment, anyone can ?fork? the repository which means that a local version of the database will be created on each user?s desktop. Each user has an exact copy of the repository. Users can then make any additions or edits to the local version of files themselves to suit their own purposes. When they wish to contribute their changes to the repository, they simply ?commit? their changes. Repository administrators have the ability to ?merge? these commits into the master branch. When changes are merged into the master branch, a new upgrade is pushed to all users. Individual users can then ?merge? the upgrade with their own local codebase. This process of downloading and contributing information creates a mirror between all of the distributed local repositories of all the users and the remote, centralized master repository. The effect is compounded the more users participate: each contribution is collated into a single master branch, and that branch grows at the pace of the whole community of contributors. None of the information is destroyed during this process, ensuring that no single user can adversely affect the quality of the data. Repository administrators function as gatekeepers of the data without having to individually maintain all of the records in the database.

The open source model accomplishes a wiki-level of open collaboration while also maintaining a closed system protected from malicious or low-quality manipulation of the data.

After years of building my own personal research database, it dawned on me that I found myself continually reinventing the wheel. I would labor to assemble a comprehensive list of sources relating to a particular topic or event and then discover a monograph or a bibliography that had done much of the same work. Archives and bibliographies are amazing resources, but they remain unevenly distributed across libraries, publications, and institutions. I believe open source technology and software versioning repositories can overcome these systemic hurdles we face as researchers. And so I have begun a project to ?open source? my own personal database. The database is hosted as a code repository on GitHub, the same leading open source website used by Cappuccino and several major products you may have already used, like Etsy, World of Warcraft, the New York Times, and several mobile apps for Twitter and Facebook.

This Mormon Primary Sources project on GitHub aims to consolidate citations of all primary sources relating to Mormonism. It accomplishes this by remaining an open system, meaning that it can evolve and adjust (what software engineers call scaling) as the source materials themselves change over time.

The project is in its infancy, but the database structure and the overall system is in place and ready to grow. The repository is ready for browsing, cloning, contributing, and merging. Anyone can browse the database from the web browser and anyone can contribute without worrying that the database will suffer. In a real sense, the repository behaves like a sandbox environment: no one can destroy the data except for the repository?s administrators. Feel at liberty to experiment and tinker with your local copies of the database or with the Git versioning software.*

For those interested in joining this collaborative effort, or even those curious about this repository as a researching reference, I invite you to visit the repository itself. Further details will be posted there as the project progresses. I hope to see you there. Of course, reinventing the wheel is always an available option, too.


* For Mac users, GitHub simplifies this process with their own specialized app. After installing GitHub for Mac, users only need to click the ?Clone in Mac? button on the Mormon Primary Source repository page and a local copy of the database will be saved to their computer. They can immediately begin editing the files of this database and GitHub for Mac tracks all of their changes. The ?synchronize? button in the application makes sure both the remote and the local versions of the repository mirror each other. PC users can accomplish the same tasks, but the process does run a little differently.

Article filed under Miscellaneous


Comments

  1. Dave you did it. You just blew open the future of Mormon studies and we all get to be a part of it. Thank you! Also, I’ve been toying around with the idea of a New LDS Biographical Encyclopedia (a la Andrew Jenson), based on a wiki-system. I’d love to create a Google group so we can coordinate outlying projects built on MPS data. 🙂

    The JSON touch is gorgeous by the way and there is plenty to chew up. Let’s be in touch!

    Comment by Tod Robbins — February 20, 2012 @ 4:14 pm

  2. Great work.
    I was pondering something like this for an encyclopedia of historical Mormon exegesis–that would require some database architecture to make it useful, but I think that would be a useful test project–You could call up any Biblical citation and see how Mormons had interpreted it over the years.

    I think if it were a straightforward interface that had some way of validating whether the uploads were actually correct, it could be quite useful.

    Comment by smb — February 20, 2012 @ 4:35 pm

  3. Great concept. I like the idea, but it made me think of “new.familysearch.org,” where the integrity of the database does not seem all that secure. I and others many others have seen changes made without regard to provenance or accuracy, which frustrates me to no end. I don’t think the administrators of new.familysearch.org have exercised the same level of control over what you refer to as “commits.” Perhaps it is not a valid comparison, but it does beg the question. Any thoughts?

    Comment by kevinf — February 20, 2012 @ 4:55 pm

  4. A lot of the accuracy depends on what controls or structures the data administrators follow when merging contributions to the master branch. From what I’ve seen, the updated FamilySearch.org system has the capacity for improved data management, but the verification process is somewhat haphazard. What I like about the open source model is that software versioning prevents us from losing any contribution or edit. So if there is a redundancy somewhere or an entry in the database in need of revision, it will be visible for all to see, including the entry’s total history. You can literally see a running change log of each file in the repository, which amounts to a complete history of the changes that have been made over the life of the entry. That, as well as a total history of committed entries, are all visible through the web interface on GitHub. As I understand it, features like this aren’t totally in place on the new FamilySearch.org site.

    There is also an issue tracking feature on this project’s repository. Just look for the “Issues” tab at the top, and anyone can log an issue or a concern with the project’s administrators. If you were to, say, notice a problem with a database record, you could post that issue to the repository and site admins could then address the problem. That, too, remains open to users or researchers that might browse the repository.

    Comment by dgolding — February 20, 2012 @ 5:06 pm

  5. Tod, you’re too generous. Hopefully your optimism will inspire us to participate and contribute!

    If you’re at all interested in building a repo for your LDS biographical encyclopedia project under the Open Source History group on GitHub, let me know and we can fire it up.

    Comment by dgolding — February 20, 2012 @ 5:15 pm

  6. Fascinating project, David. I look forward to reviewing it a bit more closely.

    Comment by J. Stapley — February 20, 2012 @ 9:16 pm

  7. Thanks for the heads up, David. Like J, I look forward to looking it over in greater detail.

    Comment by Jared T — February 20, 2012 @ 9:17 pm

  8. David, I’ve been using CakePHP for a few years now as my bread and butter. Pretty cool. These days I’ve been using MongoDB + Lithium (gosh, that sounds like code for a drug, doesn’t it?), which I see you’re into now too.

    I’ve been using GitHub to put together data on Adam-God, which you might be interested in:

    https://github.com/aaronshaf/Adam-God

    I’ve also used GitHub for a few months for Theopedia.com (using a homegrown static site generator; I’d like to use something like Phrozn at some point), but this has been a big failure, as the learning curve (which seems small to us) is seemingly too much for others to submit contributions. It would be helpful perhaps to have a dirt-simple web interface for non-GitHub users for submitting contributions/changes, which in turn automatically creates a GitHub ticket or pull request.

    Comment by Aaron Shafovaloff — February 21, 2012 @ 3:50 pm

  9. I like the GitHub “Edit this File” feature for providing the simplest interface for users to commit changes. I don’t how it can get easier than that.

    I hope GitHub can release a PC version of their GitHub for Mac application. The learning curve is significantly reduced when users can interact with a GUI. The more open source technology moves to GUI formats, the more we’ll see collaboration. I’ve often thought of developing an application specific to the historical profession that links with a Git repository to allow for distributed version control. But, well, I guess we’ll just have to wait and see how social version controlling continues to grow.

    Comment by dgolding — February 21, 2012 @ 4:00 pm


Series

Recent Comments

Kevin Barney on What to Expect When: “I'll be there. I first started subscribing to the Journal circa 1995, because I was teaching a stake church history class and thought I ought to…”


Edje Jeter on What to Expect When: “Great comments, y'all.”


Jeff T on What to Expect When: “Thanks, Edje! In case anyone else is as unorganized as me: the conference hotel doesn't have any more rooms at the MHA rate ($122 per night),…”


Ardis on What to Expect When: “I see the First Timers’ Breakfast is on the schedule this year, which is another place to break the ice, have questions answered, and recognize…”


Curtis C on What to Expect When: “Thanks for the great writeup! I've always wanted to attend MHA, but never seemed to have the time to do it. Now that it's in…”


acw on What to Expect When: “Thanks for this thorough and inspiring intro! I hope to make it one of these years.”

Topics


juvenileinstructor.org