Elegy for Missing Data, in Advance

By January 13, 2014

Or: All Web is Not Created Equal, have you noticed?

One of the sessions I attended at the AHA this month was Session 151, Social Media and History. It featured one of our JIers, Max Mueller, talking about tensions and complications in the church’s “I am a Mormon” campaign, including the fascinating case of one woman whose tattoos were airbrushed out of her profile pic (her profile is now gone, for other reasons). Great talk, by the way, along with several others that reflected on the ethical and methodological problems of using social media as historical sources for researching marginalized groups or threatened voices. In each of the presentations — Max’s on constructing Mormon online “diversity,” Jessica Lingel’s on underground music scenes, Sadaf Jaffer’s on online discussion boards for Pakistani atheists, and Amy Holmes-Tagchungdarpa’s on sites made by and about Tibetans — the very existence of the sites to begin with, and especially their continued life on the web, is inherently unstable. It was actually a rather terrifying session, like watching 4 canaries in a coal mine (Hey! There’s a pocket of air over here! Oh wait, never mind).

I made a comment to this effect in the session, but wanted to elaborate here to enlarge this point a bit: this is and will be a serious problem for historians of the 21st century. And that is all of us, by default, because we live here and now. The “digital archive” is vast, and welcome, exciting, and appears misleadingly robust – but it is in fact very fragile. What is available to historians relies largely upon on goodwill, technology upgrades, and the market – none of which inspire my confidence as durable forces for the preservation of historical sources. (Betamax, anyone?)

Surprisingly, perhaps the one site that historians and academics love to villify (while using it, of course, on the sly), i.e. Wikipedia, seems to me to have the best model for historical value of them all, because it was built around the ideals of transparency and digital democracy. That doesn’t mean you can identify the authors with precision (it’s very difficult to do so, in fact), but you can at least see all prior versions at a single click and track changes over time.

By contrast, take — oh, let’s say, mormon.org (the public portal) or even lds.org (the member resource), beautifully crafted online environments that represent the church’s best practices today and presumably the foundation of its online presence for years to come. Built, I probably do not need to point out, upon very different values. What you can find there is pretty much only what is there now. Things come, go, vanish, launch, in a constant state of (often unannounced) change that nonetheless presents itself as final, unchanging and authoritative. The transition of church curriculum to born-digital creations (the youth lessons, for example) and the ability to continually/instantaneously update materials obviously has massive benefits for a globally-internet-connected worldwide church, but it is a historian’s worst nightmare. If you cannot see the “manuscript edits” so to speak, how do you know what changed, when, how and why? And if the old just vanishes from the online environment without a trace, what happens to the possibilities for historical research? Most of what we are all busily creating in this decade has simply been written in the equivalent of vanishing ink.

Of course this is not a problem limited to the study of Mormonism, or religion, or subaltern groups in general. It is a current problem that has no obvious solution in the immediate future. Archivists worry about it a lot. The Internet Archive’s Wayback Machine is one partial attempt, but it archives only a tiny fraction of the ever-growing webiverse, like a space telescope focusing its mighty eye on a single dot in space at a time. I don’t have any bright ideas, either, I’m just feelin’ the pain of the historians in the 22nd and 23rd centuries. Unless they’ve figured out time travel by then, of course.

  1. Amen, Tona. Save a local copy of everything.

    Comment by D. Martin — January 13, 2014 @ 10:24 am

  2. I was thinking about this the other day when I was revisiting my notes on the gendering of missionary dress codes and lamented the fact that I only have the new version to work with now. Will be saving stuff like this more often now…

    Comment by Saskia — January 13, 2014 @ 11:21 am

  3. Tona–

    “Things come, go, vanish, launch, in a constant state of (often unannounced) change that nonetheless presents itself as final, unchanging and authoritative.”

    Wow, couldn’t have said it better (and certainly didn’t!).

    This isn’t a phenomena exclusive to online resources (how many times have D&C and the Book of Mormon been edited, updated and at the same time presented as “unchanging and authoritative.” But what is different is that with pen and paper (and printing presses), scholars can actually trace these changes, when with digital archives these traces disappear.

    Comment by Max — January 13, 2014 @ 11:37 am

  4. What a way to start out the semester, Tona! The digital archive has some major issues it will need to address, particularly as “Things come, go, vanish, launch, in a constant state of (often unannounced) change that nonetheless presents itself as final, unchanging and authoritative.”

    What do you think of the idea that PhDs and archivists can be put to work finding solutions to these problems? More than digitizing documents, but finding ways to prevent the digital archive from becoming as big a problem as many fear?

    Comment by J Stuart — January 13, 2014 @ 12:38 pm

  5. I guess I’m skeptical, or at least not totally reassured, by the assumption that “PhDs and archivists” will save the day. It seems bigger than that. Merely making local copies proliferates a multitude of haphazard disconnected “private” digital archives, which doesn’t help the researcher who’s not the direct recipient of those digital copies in the future. Saving only what we’re interested in studying or setting net filters based on the questions we’re asking now perpetuates the same bias problems that plague text-based archives, which reflect the judgements and biases of their creators. In other words, all the reasons to be wary of existing archives seem likely to crop up in the future ones, when didn’t Al Gore invent the internet so we didn’t have to suffer these problems all over again? 🙂

    Comment by Tona H — January 13, 2014 @ 1:09 pm

  6. I’ve long been curmudgeonly over the angst and worry over digital archives. Although I work on the 19th century, advisor and most of my friends work on earlier periods where the existence of a single diary is a MAJOR boon and a cause for celebration. I was a bit relieved/reassured when I saw your argument in your last comment, Tona, when you mention that the internet is likely going to experience many of the same problems as a text-based archives. It’ll have the same problems of bias, lack of completeness, and blind spots as more traditional archives. I wonder if what is different about the internet is not the creation and loss of knowledge – that has always existed in text-based archives in which entire collections were purposely burned and moths ate boxes upon boxes of letters – but the idea that we could ever have complete access to a history. Knowledge has always been contingent on what survives and before the nineteenth century when we had a proliferation of texts it was unusual to have multiple manuscript copies of an item survive. Although it’s theoretically possible to trace how ideas have been modified in a document before the late eighteenth or early nineteenth century, in many cases, it’s possible because none of the early copies survive. It seems to me that while some of these problems are new (i.e., obsolescence – poor betamax), the vast majority are reiterations of problems historians from earlier time periods have always dealt with.

    Comment by Amanda HK — January 13, 2014 @ 2:33 pm

  7. I’m glad you mentioned the Internet Archive, but I have lost count of how many times my heart has sank upon finding a robots.txt file inserted into a homepage that prevent its inclusion in the Wayback Machine. If only Brewster Kahle had decided that everything on the Internet was in the public domain and no webpage should be exempt from inclusion.

    I agree one of the great hazards facing future historians of the Internet era is the untraceable provenance for the vast majority of digitally-born documents. Awash in a web of ever-changing content, future historians who try to trace the history or transmission of ideas online may face some serious challenges. I suppose we can take some comfort in knowing the Library of Congress is preserving all American tweets.

    The late Ray Rozensweig was one of the few historians who saw the problem coming before most people even realized how ephemeral the Internet would become. At the same time, he knew information overload was just around the corner and that future historians would need new tools to sift through the unfathomable number of bytes each person might leave behind. In one of his forward-looking essays he wrote “historians need to be thinking simultaneously about how to research, write, and teach in a world of unheard-of historical abundance and how to avoid a future of record scarcity.”

    Comment by sterflu — January 13, 2014 @ 9:33 pm

  8. Amanda–

    This is not a rhetorical (or accusatory) question: Have you ever worked with digital archives from the 21st century? Your point about how what happens to survive (which is often arbitrary or just dumb luck) and what doesn’t from past periods affects history is well-taken. But having worked with with both nineteenth-century and twenty-first-century materials there is something tangibly different.

    As I mentioned at AHA. Scholars of Mormonism can study the edits from the original transcriptions of certain revelations (made much easier, ironically, by the work at the Joseph Smith Papers Project). They are on the page, as it were. If they were being recorded for the first time in a digital age, the traces of these edits would disappear with a few key strokes.

    Comment by Max — January 15, 2014 @ 10:27 am

  9. Yes, Max, I have. I guess part of my impatience with this kind of conversation is I fear that it sometimes becomes navel gazing.

    Comment by Amanda HK — January 15, 2014 @ 1:43 pm

  10. Not that I think that’s what Tona or you are engaged in here, but too often, I feel like conversations about the archive become long, eloquent speeches that have little connection to the work that historians, archivists, and others do. One of the things that I appreciate about Tona is that her work is always connected to the real world and to real problems, but I fear that these kinds of conversations often lead to theory and not to attempts to actually solve the problem.

    I also really liked Stflu’s comment, which states the problem in one of the most succinct and accessible ways I’ve seen. The problem isn’t scarcity but that we’ve had unprecedented profusion of data that has made us forget how fragile all information is.

    Comment by Amanda HK — January 15, 2014 @ 1:45 pm

  11. Amanda-

    I don’t see how this conversation is at all navel gazing. I’m not sure what could be more vital to the historian’s enterprise than the archive. Exactly to the point about the fragility of historical records, understanding what goes into the archive, and what leaves it, what is accessible, and what is not, directly affects what gets included in the historical narrative.

    (I’m reminded here of the ASCH discussion on Porterfield’s new book and the continuing debate over the “great awakenings.” I tend to believe that that what really happened was a lot more people were writing about the revivals–more robust “reporting” of cases than an actual increase in cases. That is a question that goes right to the archive and directly affects historical interpretation, no?)

    Comment by Max — January 15, 2014 @ 3:42 pm

  12. Point taken…. and no historian worth their salt doesn’t consider that. What I find trouble is disconnected discussions of the archive in my mind these types of discussions are best when they are accompanied by 1) subsequent discussions of what the author is doing to help preserve the digital archive or 2) in the case of historians working on other time periods what other sources they are consulting to try to work through the gap they have encountered. I have the same problem with pure theory. For me, theoretical ruminations have to be grounded in actual events. That grounding in events or details is makes what history different than anthropology or religious studies.

    Comment by Amanda — January 15, 2014 @ 4:33 pm

  13. I agree that discussions about the archive–digital or not–need to move towards solutions. That is why I think we should all be grateful to the Joseph Smith Papers Project folks, who are doing such groundbreaking work that allows for full exploitation of both ink and pen and 1s and Os.

    Comment by Max — January 15, 2014 @ 4:45 pm


