學術, 敎育

Dear Humanists: Fear Not the Digital Revolution

이강기 2019. 3. 29. 14:41

Dear Humanists: Fear Not the Digital Revolution

Advances in computing will benefit traditional scholarship — not compete with it.


 
The Chronicle of Highetr Education
March 27, 2019 
                                       

Ted Underwood is a professor of English and information sciences at the University of Illinois at Urbana-Champaign, and the author of Distant Horizons (University of Chicago Press, 2019).


As anyone who’s read an op-ed page in the past decade can tell you, the humanities are losing ground. Declining enrollment in our undergraduate majors is the most obvious threat, and many departments are already discussing ways to stop the bleeding. But the humanities have also been giving up intellectual ground, and that could soon be an equally serious problem. Questions that historians and literary critics used to debate are increasingly scooped up by quantitative disciplines. In 2011, for instance, a team led by evolutionary biologists cooperated with Google to analyze millions of digitized books, published a study in Science, and announced that they had founded a new field called "culturomics." Critical reaction from humanists was swift and vigorous. In The New York Times, Louis Menand pointed out that "there was not a single humanist involved" in the project. Privately, he called it "a completely superficial way to do cultural history."


------------------------------


The Digital Humanities Wars

In the last six months, a new front has opened in the often fiery disciplinary disputes over the role of quantitative methods in the humanities. The University of Chicago Press published two major new works of computational literary scholarship, Andrew Piper’s Enumerations: Data and Literary Study and Ted Underwood’s Distant Horizons: Digital Evidence and Literary Change. And this month Nan Z. Da harshly criticized what she called “computational literary studies” in the pages of Critical Inquiry. This week The Chronicle Review is featuring essays by Underwood and Da, arguing from either side of the conflict. We're also resurfacing some previous salvos in this war from our archive. — The Editors

--------------------------------

But complaints of this kind did nothing to discourage imitators. Large digital collections of books, images, and music raise intriguing questions about the human past that are visible to a wide audience, including scientists. Almost every month now, some aspect of our cultural heritage becomes the subject of a scientific article — and newspapers and magazines take notice. The public discovers that stories come in six basic shapes, for instance, or that scientists can prove hip-hop is more important than the Beatles.


I was originally trained in literary studies, and I want to say up front that humanists are right to be frustrated by this pattern. Scholars have spent hundreds of years studying the artifacts that scientists are analyzing, and quantitative analysis would almost always profit from a deeper grounding in this scholarship. More fundamentally, humanists know a thing or two about the way cultural history conveys meaning. Scientists, even social scientists, have usually been trained to seek out universal laws and constant trends. Their instinct is to taxonomize the possible types of stories, for instance, or identify laws that govern all cultural change.


Historians and literary critics started out with the same generalizing impulse. In the 18th and 19th centuries, we posited taxonomies of genres and laws of cultural evolution. But we found that those approaches weren’t usually flexible enough to describe human history. It turns out that Homo sapiens excels at making up new games — a talent that gives culture a remarkable and maddening specificity. It can be clear that there are only six possible narrative arcs — until someone invents a seventh, or until an expanding middle class decides that stories shouldn’t be characterized by clear narrative arcs at all, but should become long, baggy things called "novels" that imitate the chaos of biography. This is why humanists make such a fuss about situating every piece of evidence in a specific historical context. Since the 18th century, we have been stung repeatedly by the discovery that our descriptive categories are less universal than we thought. We have learned to temper generalization with attention to the quirks of particular places and times.


In short, humanists have spent centuries acquiring a distinctive interpretive expertise, and they are right to feel that research on cultural history would be more meaningful if it were built on that foundation. But there is, alas, another side to this story, less likely to be popular in history and English departments. While scientists usually do a better job if they work in collaboration with humanists, it must be admitted that today they can often make genuine contributions to historical understanding with or without our assistance. "Quantitative Analysis of Culture in Millions of Digitized Books" may not have created a new field called culturomics, but it did (in collaboration with Google) help produce an interactive website that journalists and schoolteachers still use to understand linguistic trends. The project wasn’t led by humanists, but it was nonetheless one of the most consequential public humanities projects of the last decade. And this was only an early, crude example of interdisciplinary interest in the humanities. More recent publications go far beyond graphing the frequencies of words. Sociologists have theorized the function of ambiguity in literary criticism; cognitive scientists have used information theory to describe historical change; the economist Thomas Piketty stormed the best seller list with Capital in the Twenty-First Century (Harvard University Press, 2014), reinterpreting the last two centuries of history with illustrations drawn from Balzac. Humanities departments really are no longer alone.


I have framed this as a painful admission, but if we remember that human self-understanding is a collaborative project, not a race between departments, the pain should be no more than a passing wince. It is actually good news for the humanities that biologists are getting excited about cultural history, and very good news that economists can make Balzac speak to contemporary interest in a wealth tax. Expanding the cast of characters involved in historical research will make conversations about the past more complex and faster-paced, but if we’re up for the challenge, these interdisciplinary connections can create opportunities that work in both directions. Some of the scientist-led teams I just described do include humanists, and there are an equal number of projects entirely led by humanists in the emerging fields of digital humanities and cultural analytics.


I am most familiar with examples in literary studies. In just the last three months, scholars have used quantitative methods to advance several important conclusions. Laura McGrath has analyzed a large dataset of "comparative titles" used in publishing; her evidence suggests that nonwhite writers are systematically pressured to compare themselves to white models. Eve Kraicer and Andrew Piper have studied millions of interactions between characters to show that contemporary fiction is still shaped by heteronormative patterns. If we were willing to own these successes — to claim them as typical examples of humanistic inquiry — we could reverse the story that began this article. Instead of saying that the humanities are besieged and giving up ground, we could truthfully say that these disciplines are discovering new missions and new ways to understand culture.


I express that optimism subjunctively because the suggestion that quantitative research should count as a typical form of humanistic inquiry is still hotly debated. For many scholars, the humanities are defined not just by their subject domains, but by a specific set of methods, emphasizing qualitative description of selected case studies. If we start from this assumption, it will look like a definitional error to describe new quantitative projects as an expansion of the humanities. At best, we will be able to say that research using numbers dilutes the distinctive character of the humanities; at worst, we will fear it as a Trojan horse subverting the humanities’ true purpose, which is presumably to serve as a counterweight to science. This argument is often put forward as a critique of the digital humanities in particular, but it is potentially something broader: a general theory of the proper relations between parts of a university.


In his Chronicle Review essay, "The Interdisciplinary Delusion," Jonathan Kramnick expresses resistance to interdisciplinarity in this fuller, broader form. It is only acceptable for scientists and humanists to cooperate, Kramnick argues, if the cooperation is framed as a dialogue between intrinsically different "methods and norms." Specifically, scientists put "a premium on simplicity — variables must be minimized and noise filtered out," whereas humanistic scholarship "adds complicating variables to arrive at its truth claim." It would be a mistake to try to fuse these methods, because such a fusion could produce only "a mere translation or reduction" of humanists’ knowledge into the simpler terms of science. A true interdisciplinarity would instead safeguard the distinctive identities of all the disciplines involved while dramatizing "tensions among different methods and truth claims."


This is a fair account of the rationale that kept the humanities separate from quantitative disciplines (even quantitative social sciences) in the 20th century. At the time, the rationale made sense. Twentieth-century scientists did seek to minimize the number of variables involved in their models. A model with too many variables might appear to explain the evidence only because it was so flexible that it could, in effect, memorize the location of every data point — a pitfall called "overfitting." So researchers were admonished to trim variables and keep their models, as the phrase went, "parsimonious."


Kramnick is right to imply that this approach never worked well for cultural artifacts. Novels and symphonies don’t condense neatly into a five-variable model. Where Kramnick goes wrong is in assuming that the tension between these divergent approaches could only ever be bridged by reducing humanistic evidence to the simpler terms of science. The intellectual shifts of the past 30 years have worked in the opposite direction. Old boundaries between the humanities and sciences are softening in part because quantitative disciplines have learned how to enlarge the number of variables they consider, even if it requires loosening their grip on old scientific ideals, such as explanatory parsimony.


It is understandable that new forms of quantitative reasoning aren’t yet visible to most humanists: they are still relatively new and controversial even in the sciences. New approaches have entered public consciousness mostly through the buzzword "big data" — an unhelpful term that seems to imply computers are just accelerating analysis and making it possible to survey more examples. But debates about "big data" were sparked by advances in machine learning, and machine learning isn’t just a bigger or faster version of 20th-century statistics. It involves shifts of approach that specifically address the barriers said to separate the humanities and sciences in accounts like the one offered by Kramnick.


       

Humanities departments risk turning into defensive enclaves largely sealed off from intellectual currents in society at large. 
  

For instance, instead of evaluating a model by maximizing its "goodness of fit" to a single set of evidence, researchers usually evaluate machine learning by asking how well a model trained on one sample predicts patterns in a second. Researchers have learned that it can be safe to include thousands of variables in a model, as long as you also add a degree of blurriness that prevents the model from memorizing or "overfitting" patterns in any single sample. A theory of modeling that reflects explicitly on the value of imperfection (the technical term is "bias") has built a new kind of bridge between quantitative and qualitative description. Instead of reducing a poem to two or three variables, we can now multiply variables as needed to reflect the complexity of the evidence, while acknowledging the necessary imperfection of our model. When Hoyt Long and Richard Jean So used machine learning to trace the diffusion of a "haiku style" in Anglo-American poetry, for instance, they didn’t just ask the model to count syllables. That would have produced a simple, correct, and boring story. Instead they asked the model to consider diction as well as form, so that it could reveal "latent, nonexplicit traces" of a pattern subtler than 5-7-5.


Machine learning has been controversial in the sciences. As Matthew L. Jones has pointed out, it genuinely troubles scientific protocols of explanation. When a model includes thousands of variables, it is rarely possible to make strong causal claims about any one of them. This can be a real loss for disciplines that are in the habit of reducing natural phenomena to elegant mathematical laws. Humanists, on the other hand, have rarely come anywhere near a parsimonious causal explanation of their subject. Our ambitions are more modest, and in our fields, the blurry, complex models created by machine learning can represent an unqualified gain. For instance, it is now possible to describe the relation between literary language and prestige, in a given era, by using the text of a book to predict the likelihood that it will get reviewed in elite magazines. This doesn’t necessarily mean that we can now explain literary judgment: The models that do the predicting use thousands of variables, so explanation remains difficult. But our understanding of cultural history is so hazy that many things short of streamlined explanation still represent a significant advance. For instance, we can train a model on one quarter-century of literary history, and use it to make predictions about books published in the next half-century. That will tell us, in effect, how rapidly the criteria of literary prestige changed.


In my recent work, I ask how well this experiment works for 19th- and 20th-century poetry and fiction. Can we use a book’s text to predict its likelihood of being reviewed in elite venues? The models I produce are only moderately accurate (72.5 percent for fiction and 79.5 percent for poetry) — as we might expect, since literary prestige is shaped by many factors outside the text itself. (So who do you know at The New York Times Book Review?) But the interesting discovery is that models of prestige based on the text lose very little accuracy when they make predictions about the future. Even predicting 50 or 75 years beyond the works they were trained on, models of judgment about poetry lose only 4 percent accuracy on average, and models of fiction only 1.5 percent. This poses a real problem for the notion that certain period boundaries (like the advent of modernism) represent a "revolution" that overturned prevailing literary standards. There are sudden changes in literary history: new genres, for instance, can emerge in a decade. But the standards that govern reviewing seem to change very slowly.



New forms of quantitative reasoning aren't yet visible to most humanists: they are still relatively new and controversial even in the sciences. 
  

It is important for humanists to know how machine learning works, not because we all need to use it, but because it will help us understand why the boundary between quantitative and qualitative reasoning is growing fuzzier. In the past, it was broadly right to assume that numbers couldn’t address the interpretive questions at the center of humanistic disciplines. Math might help scholars reason about literacy rates and book prices, for instance, but it couldn’t reveal much about literary judgment. The rules of that game have genuinely changed. We can see this as a threat, or as a new opening for adventurous questions about the past. I think the magnitude of the opportunity will be exciting, as soon as we stop telling ourselves that humanists are obliged to play by 20th-century rules — leaving all the new questions to scientists.


Expanding opportunities needn’t lead humanists to abandon or devalue their existing strengths. Humanities disciplines are linked by a shared subject: the human past. As humanistic methods grow more varied, preoccupation with historical change will still separate humanists from the social sciences, giving us special expertise in the malleability of institutions and cultural forms. When social scientists offer general laws of human behavior, we will always be in a position to point to exceptions. It is also part of the humanities’ mission to appreciate exceptions: it would be tragic if literary scholars became so infatuated with charts and graphs that they forgot to mention that Wuthering Heights is rather unlike other novels of its time.


But we may never need to worry about that. Students who know they enjoy math have many other majors to choose among; they don’t see history or English as obvious options. With patience, humanities departments may be able to encourage double majors and build interdisciplinary graduate concentrations. But this is simply not an environment where quantitative methods are ever likely to spread unchecked. The danger is very much on the other side: It is that self-reinforcing selection processes, compounded by rigid theories about the incompatibility of quantitative and qualitative reasoning, will turn humanities departments into defensive enclaves largely sealed off from intellectual currents in society at large.


That doesn’t have to be our future. Human history is an appealing subject, intrinsically interesting to a wide audience. The fact that even scientists want to write about history should remind us how natural it is to be curious about the past. We can choose what to do with that curiosity. If we wag our fingers at scientists and say that they’re expressing the wrong sort of curiosity, we may not stop new kinds of research from happening — but we probably will ensure that the energy of this new project flows instead toward departments of sociology and communications.


If, instead, humanists welcome new forms of curiosity about the past, and work to create a curricular foundation for them, we might be able to build new bridges between our departments and other corners of campus. If we succeed, new forms of research will benefit from humanistic expertise. Students will learn that the humanities are not a moral sanctum set apart from the world, but a mode of inquiry closely connected to other parts of their intellectual lives, including the pleasure of building models and solving problems. In this alternate future, it might eventually be possible for students interested in math to see history and English as plausible options — all the more valuable and practical because they teach us to balance mathematical abstraction against the gritty specificity of human life.



'學術, 敎育' 카테고리의 다른 글

The New Jews  (0) 2019.04.04
How International Education’s Golden Age Lost Its Sheen  (0) 2019.04.01
Words for every body   (0) 2019.03.28
Confronting Philosophy’s Anti-Semitism  (0) 2019.03.25
The good guy/bad guy myth   (0) 2019.03.25