Narrative Flow GraphsAuthor: Michael DeBuse
The Narrative Flow Graph is a graph representation of the syuzhet of a story. Vertices are events or scenes and edges represent individual plot-important entities within the story. Scenes follow the narrative ordering within the text, and the edges connect the scene vertex of their most recent appearance to the scene vertex of their next concecutive appearance. The resulting graph is multi-edge and acyclic (a multi-DAG), showing the dependencies of what scenes must occur before others through the entities involved in those scenes.
This blog detailes the input and output for generating a Narrative Flow Graph. The full deatails of the system can be found in the paper, Plot Extraction and the Visualization of Narrative Flow (paper link forthcoming). Download links to the full output Narrative Flow Graphs from the paper are provided at the end.
Table of Contents
- Annotated Glossary
- Annotated Coreference
- From Text to an Event Graph
- Location Inference
- Condensing the Graph
- Ground Truth and Output Narrative Flow Graphs from the Paper
Here we define the relevant terminology used in this demo. More detailed definitions can be found in the afiliated paper.
- Event: A sencentce containing an entity within the text.
- Entity: A text token representing a plot-important Actor, Object, or Location.
- Actor: Entity of volition in the story, one who acts or reacts to situations or events.
- Object: Entity without volition, the items, tools, and props that are important solely due to thier use or existance.
- Location: Where plot-relevant scenes take place. Locations that are plot-important but are not where any scene takes place are Objects.
- Glossary: Input XML file detailing the entities the user wants the Narrative Flow Graph to track.
- Tokens: Individually partitioned words or punctuation from a text.
- Structures: The orientation of edges and vertices that follow specific patterns.
- Braid: A sequence of vertices that have the same edges interconnecting them.
- Parallel: Two or more seqeunces of vertices that have their own isolated edges that do not cross over to the other sequences.
- Scene: A consecutive sequence of events at the same location involving the same entities.
Assessing the plot-importance of entities within a story remains an immensely challenging tast. In order to fully automate the creation of Narrative Flow Graphs, inferring or estimating the plot importance of entities is a necessary step. For now, we can provide a user-annotated glossary of those entities as input so that the system knows what entities to look for in the text.
The Glossary is an XML file with individual entiries for each entity, incliding an unique numerical ID, the entity classification (Actor, Object or Location), and a unique, single-word moniker or label to associate with that entity. The label can be taken directly from the source text, such as a name, or it can be defined by the user. That label will be what appears on the graph edges and coreferenced into the text, so it is best to make the label informative.
The formatting of the XML file is as follows:
Below is an example of a glossary for the traditional Japanese folktale, The North Wind and the Sun (found here), where the North Wind and Sun compete to remove a traveler's cloak to see who is stronger:
In this story, we have three Actors, the North Wind and the Sun who are the main characters of this folktale and the poor traveling man whose cloak they want to remove. There are two Location, the sky where they argue and the road where they try to remove the man's cloak. Lastly, there are two Objects, the man's cloak, which is material, and the concept of "stronger" which is immiterial. "Stronger" is important to the plot of this story, because it is the driving force of the North Wind and the Sun's arguement and differentiates the methods to remove the man's cloak. In the story, the North Wind tries to use his strength, and so "strength" is mentioned many times in that scene. The Sun, on the other hand, coaxes the man to remove his cloak. Not once is strength ever mentioned in that scene.
In order to properly track entities between scenes, we need to know precicely where they appear in the text no matter what form they appear in or what words are used to mention them. This task is called coreference resolution. Pronoun coreference remains one of the most challenging tasks in NLP. So that we can accuately coreference every entity mention to the precice entity we have in the glossary, we perform the coreference resolution by hand. We coreference using two-column CSV files with the tokens of the story in the left column and the ID of the entity in the right column. Whenever a token in the left column refers to an entity from the glossary, we put the ID of that entity in the right column. If the reference referrs to multiple entities, we put the IDs in quotation marks and input in each ID referenced separated by a comma in the right column. For example, if our two entities are Jack (ID = 1) and Jill (ID = 2), and the sentence is, "Jack and Jill lost the bucket they filled." then the CSV file would be annotated as follows:
Jack, 1 and, Jill, 2 lost, the, bucket, they, "1,2" filled, .,
The system then replaces any token with the labels of its references from the glossary. The resulting coreferenced text may read oddly to a human, but it is necessary for the system to know where each entity appears.
Due to the length of a full coreference CSV file for The North Wind and the Sun, I will not display it here, but the file can be downloaded HERE (google sheet view link to be added later).
Now that we have a glossary of entities and all the entities in the source text have been coreferenced, we can create vertices from every sentence that contains at least one entity. Below are the sentences from the beginning of The North Wind and the Sun:
1: “NorthWind’m stronger.” 2: “No, Sun’m stronger.” 3: The voice of the NorthWind and the Sun were heard in the sky. 4: It seemed that the NorthWind and Sun were arguing about which power was stronger. 5: “Since Sun are energetic every day, it’s hot and everyone is troubled. 6: When NorthWind blow the cool wind, everyone looks happy. 7: NorthWind’m more helpful to them all.” 8: “But, if Sun’m not here, trees and vegetables don’t grow. 9: In the winter, since the wind NorthWind blew was cold, it was very cold. 10: Everyone was unable to go out, isn’t that right? 11: These days, it’s warm and everyone is glad.” 12: “No, look at over there. 13: Since Sun shine strongly, the water in the river is going to disappear soon. 14: Without the water, everyone cannot live.” 15: Both of NorthWind and Sun just spoke NorthWind and Sun own story and didn’t listen to the other NorthWind and Sun’s one. 16: NorthWind and Sun were arguing every day. 17: One day, the NorthWind said as usual, “Let NorthWind and Sun finish the argument because NorthWind’m stronger.”
We see that the beginning of the story is an aguement between the North Wind and the Sun. Recalling that our glossary contains entries for "NorthWind", "Sun", "sky", and "stronger", we can ignore lines 10, 11, 12, and 14. We then create a vertex for the remaining 10 sentences. The only location for this section of text, "sky," is mentioned in sentence 3, so that is the only vertex with a location marked.
The text task is to connect edges. Each edge represents as single entity from the glossary, so there may be multiple edges from one vertext to another if they have more than one entity in common. Recall that we only draw edges from the vertex of the most recent previous appearance of an entity to their current appearance. We tend not to draw edges for location entities, since they are shown in the event vertices, but they can be drawn if the user desires. For this demo, we will only draw the Actor and Object edges. The resulting graph not looks as follows:
Note that the directionality of the edges is impied left-to-right in the sequential order of the narrative. We see that in the beginning, the narrator is speaking of both the North Wind and the Sun, so they appear together in vertices 3 and 4. This creates a short Braid structure. Following the short Braid structuer, we have as section of Parallels where the North Wind and the Sun individually argue their point, either speaking of themselves or their oponent. The narrator then returns and speaks of the two together again, creating another short Braid for vertices 15 and 16. "Stronger" is only mentioned twice, so it creates an edge from vertex 4 to 17.
Locations are not explicitly mentioned in every event or even every scene. For this reason, we must have a way to infer the locations where the events take place. We cannot simply assume that the most recent past location mention in the text is the location that the curent event is taking place. We could have parallel events occuring in the story that the narrative is interchanging between, and though the location of these parallel events might have been mentioned at first, the author trusts that the reader will remember and does not explicitly state it every time the perspective changes between the parallels. How then can we infer the location? Our graph structure gives us a perfect method for this inference.
We run a breadth-first search backwards through reverse edge direction starting from each vertex in narrative order without a Location-classified entitey explicitly stated to find the first vertex with a location. Since a dependency chain exists between the vertex without a location and the one with, we can assume in better confidence in the absense of an explicitly stated location that their locations are the same.
For our The North Wind and the Sun example, vertex 3 has the only location mention. Following our location inference method explained above, vertices 1 and 2 will regretably not have a location applied to them. The remaining vertices 4 through 17 will have "sky" applied to each of them, since the most recent location found following the breadth-first search in reverse edge direction will be vertex 3.
A graph made from single sentences grows long rather quickly. We need a method to condense these events and entity mentions into scenes. Since scenes are consecutive events with the same entities involved, we can apply that definition directly to our graph. We look for sequences of vertices in narrative order that have the same location and entities. The similarity in entities can vary depending on how much we want to condense the graph. Do we want all entities to perfectly match, or are we okay with one or two entities being different or missing? Or, we could only care about the similarity in location of the events. The allows similarity in entities is a hyperparameter that must be chosen before condensing the graph.
To condense the graph, we iterate in narrative order over the vertices. If the next vertex matches our condensation requirements, we add it to the list of vertices we will combine until we reach a vertex that fails our requirements. We then reassign all incoming edges and outgoing edges of those vertices in our list to the combined vertex. Any self-edges created in this way are removed.
In our The North Wind and the Sun example, because we are only working with a small number of vertices, we will require that all entities in the curent vertex must be in the following vertex for it to condense, with the same location, of course. We see that vertex 3 takes place in the sky and has bothe the North Wind and the Sun. Vertex 4 similarly has both and takes place in the sky. Vertex 4 introduces a previously unknown entitiy, "stronger," but that is permissible by our requirements. Next we see that both vertices 6 and 7 take place in the sky and have the North Wind. Latly, Nodes 15 and 16 take place in the sky and have both the North Wind and the Sun. Looking back at the content of each sentence, we see that the combined vertices for sentences 3-4, 6-7, and 15-16 logically make sense. The resulting condensation is as follows:
As an aesthetic choice, we can array the sequential vetrices based on edge similarity to better show the structures within the graph. Graph visualizer libraries like GraphViz do this automatically for you. The result is the Narrative Flow Graph for the beginning of the folktale, The North Wind and the Sun.
Narrative Flow Graphs are often rather large and long to preserve the narrative order sequence of the vertices. This makes viewing them often difficutlt. For both output and ground truth, we compile the graphs into PDFs using GraphViz. To view them, you need a PDF viewer with good zoom functionality like Foxit PDF Reader.
The Narrative Flow Graph system is in constant improvement as methods and algorithms improve, so new output files will be added to show the progress of the system. The dates before the files show at what time the files were first generated.
- To Build A Fire
- Leiningen Versus the Ants
- The Sound of Thunder
- Observer 1: A Warm Home
- Observer 4: Legends
- Lion King
Date created: March 2022