The digital age has introduced many new tools to aid historian’s research effort. And like the use of the tools of any profession (i.e., carpentry, mechanics, plumbing, etc.), there are risk to using tools improperly. Using the proper tools and using the proper tools properly helps the user accomplish his task more affectively, efficiently and without mishaps. For every job there is the right tool. Similarly, for every research project, the proper use of the right tools can enhance the research results and lead to more confident arguments. However, the researcher must also be familiar with the limitations of each tool to avoid the risks of getting misleading or erroneous results. The purpose of this post is to provide a few examples of tools that should be in every historian toolbox and warnings about their limitations.
Networks Analysis is used for studying data (stuff) and more specifically the interdependent relationships (nodes) that connect the stuff.
Network Analysis Warnings
Networks cannot be applied to all data. Some data does not fit well into any one category and there complex situations that should not be reduced. There are also theoretical and philosophical considerations that get lost when network methodology gets translated. This leads to methods beings used for different purposes then they were intended. Humanistic data is often uncertain and biased to begin with, every arbitrary act of data-cutting to make the network manageable has the potential to add further uncertainty and bias to a point where the network no longer provides meaningful results. And the context (tone) or perspective of the data may change the structure and nature of the network which may skew your results.
Topic Modeling is a form of text mining, a way of identifying patterns in a corpus. You take your corpus and run it through a tool which groups words across the corpus into ‘topics
This is an excellent tool for discovery. The results of the topic modeling help to uncover evidence already in the text.
Topic Modeling Warnings
Topic Modeling is not necessarily useful as evidence. Topic modeling is complicated and potentially messy. Topic modeling output is not entirely human readable. One way to understand what the program is telling you is through visualization, but you must be sure that you know how to understand what the visualization is telling you. Topic modeling tools are fallible, and if the algorithm isn’t right, they can return some strange results.
The term Big Data refers simply to the use of predictive analysis or certain other advanced methods to extract value from data. Historians identify terms and then use algorithms to search for and analyze those particular terms so the relationships can be studied. Close reading and data-driven analysis can enhance each other and expands what historians can do.
Big Data Warnings
Human beings recognize tone. Algorithms are better suited to sifting through data in search of keyword. But when we see a word or something being highlighted with an algorithm, we don’t know exactly what it means. To produce useful results, this kind of investigation depends on customized algorithms. But coming up with a good algorithm involves both code and context, a mingling of the complementary strengths of computer scientists and humanists. Human recognize tone (context) code recognize keywords. Code databases are expensive, and sometimes don’t accurately read scans. So “Data mining has limitations. And working with historical documents like newspapers can be costly, messy, nuanced and defy easy computational analysis because of the different writing styles used in newspapers over time.
Understanding the benefits and the limitation of a tool is the first step to determining what tool is right for your project. Regardless of the limitations, tools can be useful in achieving the goals of many research projects. Don’t be afraid to fail or to get bad results, because those will help you find the settings which give you good results. You may even discover that for some projects you get the same results without any tool, which is also discovery. So, go ahead and explore the digital toolbox, plug in some data and see what happens. See if digital tools can help you “build” an argument.
Brett, Megan, “Topic Modeling: A Basic Introduction,” Journal of DigitalHumanities 2 (Winter 2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/
Howard, Jennifer, “Big-Data Project on 1918 Flu Reflects Key Role of Humanists,” Chronicle of Higher Education, February 27, 2015, http://chronicle.com/article/Big-Data-Project-on-1918-Flu/190457/
Weingart, Scott, “Demystifying Networks,” scottbot irregular, December 14, 2011, http://www.scottbot.net/HIAL/?p=6279