I wanted to highlight some of my favorite web reference tools as of late with a short post. Among the many, here are a few of my go-to tools when researching all things Mormon:
Latter-day Apostles (http://latterdayapostles.org/)
This tool provides a fun way to visually browse the organization of the leadership of the Church of Jesus Christ of Latter-day Saints from 1835 (with the formation of the Quorum of Twelve Apostles) to the present. It’s mostly a quick reference point for me with a question like, “Who was in the quorum in 1901?” The developer of the site, Dallin Regehr, doesn’t provide citation of his data, but I’ll assume it’s fairly accurate after checking the datum for a few people against other sources.
There are two way to display the data:
A card view:
Or a list view:
Each provide a nice overview with reference points such as years since the church was organized and the estimated membership numbers for each year (maybe from the Church Almanac?). There is a scrubber at the top of the site which allows you to move between months and years as well as a play button so you can be mesmerized by the deaths and additions to the quorums.
You can also search for a particular apostle:
Dallin Regehr, the developer, also makes this note about the site:
The author makes no guarantee about the accuracy of the data provided by this site. It is intended for fun and curiosity, not life-or-death research.
Google’s Ngram Viewer (http://books.google.com/ngrams)
Google’s Ngram Viewer is a fabulous tool for digging into the big data of the world’s book corpus. You can easily run comparisons on word usage within the period 1800-2000. For instance here is a comparison between the use of 'Joe Smith'
and 'Joseph Smith'
between 1800 and 2000 in English:
Hovering your cursor over a particular line will display the year and frequency percentage for that term. Now to dive in a little more into the power of the tool. What about the French corpus with the same query?
You see what happened there? The literature picks up around 1850-1855, which corresponds well with the first missionaries arriving in France in 1849. So cool! How about Spanish? Let’s add ‘José Smith’ to our query to get a full picture:
Hmm… What is going on in Spanish-speaking nations from 1860-1880 that would cause such a spike in Joseph Smith? These are the types of research questions we can come away with from using the Ngram Viewer. I could do this all day, but suffice it to say this is a great tool for analyzing trends in the literature, which of course is not complete since the data is only representative of the books Google has digitized thus far. Anyhow, dig in and have fun!
These are very cool, Tod.
How confident are we—and does Google give us a way to find out—that the N-gram results derive from approximately random book distributions and are not artifacts of what has been digitized.
As a hypothetical: Are we sure that there was a sudden surge of Iberophone interest in Joseph Smith in the 1860s or is Google going backwards in time by decade through libraries? (I’d be really surprised if that is the case, but I am curious what we know about the corpus.)
Comment by Edje Jeter — August 3, 2013 @ 10:35 pm
Edje,
Great questions. I’ll look into the book distribution aspect, but could you clarify what you mean by
Comment by Tod R. — August 3, 2013 @ 11:08 pm
Are they randomly (with regard to publication date) digitizing books or is there an underlying order. Are they first digitizing books published in the 1890s and then from the 1880s, etc.
Like I said, I’d be really surprised if there is any systematic temporal bias to their collections (past the usual biases for things that were stored in amenable libraries). On the other hand, I have to constantly remind myself about the copyright restrictions in the US or I’d have posted several times about the “strange disappearance of sources starting in the 1920s.”
Comment by Edje Jeter — August 3, 2013 @ 11:37 pm
Ah! The only bias there would be in the digitization process is whether or not the partner library happened to prioritize a certain time period within their collections, but I think we’re fairly safe in assuming there is a random book distribution.
PS: I had to Google Iberophone to understand your hypothetical. Ha! Thanks for keeping me on my feet!
Comment by Tod R. — August 4, 2013 @ 1:54 pm
Edje,
Here’s a partial answer to the distribution question:
They get into the nitty gritty in these two papers:
http://www.sciencemag.org/content/331/6014/176 (paywall)
http://aclweb.org/anthology-new/P/P12/P12-3029.pdf
The English corpus is 4,541,627 books.
Comment by Tod R. — August 4, 2013 @ 2:42 pm
Also, don’t miss this Atlantic article about Ngram:
http://www.theatlantic.com/technology/archive/2012/10/bigger-better-google-ngrams-brace-yourself-for-the-power-of-grammar/263487/
Comment by Tod R. — August 4, 2013 @ 2:46 pm
Thanks, Tod. That’s just the sort of information I was too lazy to ferret out myself.
Comment by Edje Jeter — August 4, 2013 @ 9:29 pm