The Social Structures of Harry Potter

Introduction

The focus of this project was to analyse the Harry Potter universe using social networks, sentiment analysis and word clouds. To analyse the social structured, Networks were created from the movie scripts. Sentiment analysis was used to estimate and contrast the mood for both the movies and books, and the essence of each book was isolated and visualized as word clouds.

Network Dimensions

Undirected networks have been constructed using the movie manuscripts, with character information collected from the Harry Potter Wiki, like the characters house, blood status and loyalties. Each movie script was divided into individual scenes and an edge was added between each character that occurred in the same scene. The network is based on all mentioned characters as the scripts weren’t structured uniformly which made it too impractical to differentiate between characters that talk in a scene, characters that only are mentioned in speech, or characters that are mentioned in stage directions. The size of the nodes is scaled by the degree of each node weighted with the number of scenes the character appears in. The number of nodes and edges for each of the movies are illustrated in the graph below

degreesOverTime

The high number of nodes in Deathly Hallows Part 1, is likely due to the requirement to connect all the loose ends in the story. The book with the minimum number of nodes and edges is Goblet of Fire, and is likely due to the very centered storyline around the Triwizard Tournament. This tournament keeps the story focused on the school and the people present, and this changes as the later book introduce outside groups such as the Order of the Phoenix.

Degrees Over Time

The graphs below include each character that was among the top 20 connected nodes in a least one movie and shows for how many movies the character was among the top 20 as well as the degree for the movie. The goal is to illustrate which characters are most important for each movie, but also for the whole Series. They are sorted by relevance measured by both the number of movies in which they were among the 20 most connected nodes as well as their degree in those movie. The three main characters Harry Potter, Ron Weasley and Hermione Granger are the most relevant characters accoding to this measure. Many known characters appear in the top of the list, but it also captures many minor characters that played a key role in only one movie like the participants of the Triwizard Tournament or students that were patrified by the Basilisk.

degrees
degrees
degrees

Movie Networks

The following sections contain detailed networks for each movie and some statistics that try to break them down, set them into context and take a closer look at detected communities.

Philosopher's Stone

Movie Network

The network for Philosopher’s Stone illustrated the expected high involvement characters such as the professors, students and Dursleys. The Gryffindors are highly represented as would be expected, hardly matched by the Slytherins Snape and Draco Malfoy.

The three major characters Harry, Ron and Hermione, closely followed by the characters which are early introduction in the movie; Dumbledore, McGonagall, Hagrid and so on. The network is quite small compared to the later movies, but this is reflected in the necessity to introduce the whole universe in a book, and too many characters may confuse the viewer. Its is from the beginning clear that certain characters have been cut from the movie to achieve this. Peeves the poltergeist is an example of one of these cut characters which are magical in the story but may be superficial in the grand scheme of things. It is however noteworthy that many of these “simple” cuts and changes have affected the later movies greatly as certain plotlines have needed modifying to make sense to the audience.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distribution

A trend, that also rouglhy applies to all degree distibutions of the remaining movies, shows that most nodes have a degree in the rage from 8-15 that look rougly normat distributed around that range. Only a few characters have degrees in the range of 40-50. Those nodes are the main characters which is also reflected in the graphs from the section Network Dimensions Over Time. It also seems that there arent any prevelent power laws present in the networks.

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

Three communities are identified in the first movie. Community 0 involves Harry’s family, also defined as characters not connected to his Hogwarts life. Community 1 and 2 are hard to separate from each other, as none of them are unique. They both contain characters which are integral to the mission of the book, as well as side characters which are not. At first glance, Community 2 seems to contain nodes with a larger connection to Harry and the mission, with the exception a few tiny nodes, while few nodes from Community 1 are more connected to the mission. The distribution of known groups in the communities tries to help suggest connections between the characters.

Community 0
Community 1
Community 2

Distribution of known groups in the communities

Chamber of Secrets

Movie Network

Compared to the first movie, the trio is more prominent here. The significant size different in the nodes, indicated that we are focusing more on the characters, while the number of nodes are also increasing, with the introduction of new characters. McGonagall and Lockhart are the two prominent professors in this book, with Lockhart being a new character. A character which was never introduced in the series is the Librarian Mrs. Pince. Another new character is Colin Creevey who is quite relevant to the storyline, being one of the petrified students. He is amon one of the larger nodes alongside Ginny, Dumbledore and Voldemort. Another stand out node is Justin Finch-Fletchley a muggle-born Hufflepuff who mistakenly rejects Harry’s attempted friendship when he mistakes him for the Heir of Slytherin.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distribution

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

A total of five communities are identified in the second movie, indicating a more ambitious storyline. Community 0 contains the absolute integral characters to the story, alongside a few minor, this continue the trend from the first movie, where unlikely characters are included in communities. Community 1 appears to revolve around the story of the chamber and the diary, nicely including nodes related to that storyline. Community 2 involves the nodes mainly seen at Privet Drive. Community 3 loosely covers the characters affected or connected to the basilisks petrification victims. Community 4 contains only, with the exception of one node, Quidditch players from the Gryffindor and Slytherin teams. Community 5 contains otherwise mentioned nodes which consists of primarily Gryffindor students.

Community 0
Community 1
Community 2
Community 3
Community 4
Community 5

Distribution of known groups in the communities

Prisoner of Azkaban

Movie Network

The trio is in the third movie around the same node size as Sirius Black, closely followed by Dumbledore, Wormtail (Peter Pettigrew), Hagrid and Neville. As the story revolves around the escape and manhunt of Black, it is expected to have such a big node. The Slytherins besides Snape are not the main rivalry in this this movie, and this is mirrored in the network. Voldemort is not directly a prominent villain in the story as seen, however from the story we know that his venomosity is, for the better part of the movie, mirrored in Black. Lupin is an unexpected result, as the varying Defence Against the Dark Arts teachers tend to have a large impact on the story and seen with Lockhart. However, his presence is largely felt.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distribution

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

Community 0 Contains characters connected to the main storyline, the Marauders and Buckbeak. Community 1 contains the Weasley family and Voldemort, however the connection is not readily detectable. Community 2 involves the events at Privet Drive. Community 3 revolves around the nodes connected to Hagrid and his trail for Buckbeak's life. Community 4 focuses on the nodes related to the everyday happening at Hogwarts, constructed of Gryffindor students and teachers.

Community 0
Community 1
Community 2
Community 3
Community 4

Distribution of known groups in the communities

Goblet of Fire

Movie Network

In the Goblet of Fire the first significant difference seen in the sizes of hermione and ron’s node sizes compared to Dumbledore, who surpasses them. Mad Eye Moody follows tightly alongside Cedric Diggory. Though we see the final return of the Dark Lord, Voldemort is still comparatively small compared to other key characters. The three other triwizard competitors are both new and highly mentioned, as illustrated in the network, however considering the fascination with Fleur her node is not as large as fx. Krum’s. Draco Malfoy and Snape are both Harry’s primary antagonists, in this movie however this is not not the case, as focus is elsewhere.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distribution

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

Community 0 contains nodes related to Voldemort’s first reign of power. Community 2 is not very specific, but overall contains all nodes relevant to the overall main story. It can be concluded that Community 1 contains the nodes not that important to the story, which differs from the book and therefore hints at the differences between the two. In Community 3 there is not apparent relation between the nodes, besides that they all fight on Harry’s side of the war.

Community 0
Community 1
Community 2
Community 3

Distribution of known groups in the communities

Order of the Phoenix

Movie Network

As seen in Goblet of Fire, Order of the Phoenix have Dumbledore rivaling the appearance of Ron and Hermione. Likely due to the fact of him being mentioned more often. In this movie we are introduced to Professor Umbridge who is a delegate from the ministry of magic and has enormous impact on Hogwarts and Harry. Voldemort is also more prominent, as Harry tried to convince the world of his resurrection.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distribution

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

In Community 0 all the nodes work for the ministry of magic, but appear throughout the movie in different relations to Harry. Community 1 is the community of Dumbledore's Army (DA), which Harry starts as a studygroup to teach themself what Umbridge will not. Bellatrix and Cedric are in this community, and Neville and Cho respectively mention them during their meetings. Community 2 involved the characters connected to Nagini's attack on Mr. Weasley at the ministry of magic. Community 3 concerns the characters involved in the storyline taking place at Privet Drive. Community 4, the rivalry between the twins and Filch is legendary in the books, and they ultimately leave the school telling (in the book) Peeves to give Fitch hell.

Community 0
Community 1
Community 2
Community 3
Community 4

Distribution of known groups in the communities

Half Blood Prince

Movie Network

In this movie we see Ginny emerge as a large node, this is because of her growing relationship with Harry. We once again see Harry two antagonists grow in node size alongside Voldemort and Slughorn. Slughorn is here introduced as the new teacher of Poisons, and reintroduces Hogwarts to his Slugclub. The movie is largely dominated by Gryffindors and a small group of Slytherins.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distribution

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities

In Community 0 characters from the Slugclub, founded by Professor Slughorn to stay favorable with students with a promising future, and Slytherins. Community 1 groups Death Eaters, Order members and Diagonally shop owners, all of which fight on either side in the war between good and evil, which occurs outside Hogwarts. Community 3 includes nodes connected to the school, the main nodes being Dumbledore, Slughorn and Voldemort, and the those related to them.

Community 0
Community 1
Community 2
Community 3

Distribution of known groups in the communities

Deadly Hallows

Movie Network - Deathly Hallows Part I

The golden trio is dominating again, which is to be expected as a large portion of the movie documents their solitary travels in a magical tent, hunting Horcruxes. Dumbledore and Voldemort rivals about the attention as the later is an acute threat and the others reputation gets dragged through the mud. The other large players consists largely of member s of the Order of the Phoenix. With the largest cast, it is clear that the movie is setting up the final battle while attempting to tie loose ends.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distribution - Deathly Hallow's part I

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities - Deathly Hallow's part I

Community 0 represents the members of the Order of the Phoenix. In Community 1 the nodes are all involved in the break in at the ministry of magic. Community 2 illustrates the overlap in Dumbledore's story and the quest for the Horcruxes and the Deathly Hallows. Community 3 follows Voldemort’s storyline throughout the movie, from the murder of professor Burbage to the kidnapping of Ollivander. In Community 4 we find mainly minor characters such as the Dursleys and low tier Voldemort followers. Community 5 contains nodes of students at Hogwarts.

Community 0
Community 1
Community 2
Community 3
Community 4
Community 5

Distribution of known groups in the communities

Movie Network - Deathly Hallows Part II

Ginny, Neville and Seamus are stepping up at Hogwarts in the absence of the trio, alongside Luna and Dean. This shows the importance and power of the DA (Dumbledore's Army). As expected Voldemort plays a huge part in this movie, alongside his trusted Death Eaters. We can also glimpse the names of the main characters offspring.

Gryffindor
Slytherin
Ravenclaw
Hufflepuff
Muggle
Other

Degree Distribution - Deathly Hallow's part II

Degree Distributions

distribution

Degree Distributions with Log Scale

distribution

Communities - Deathly Hallows Part II

Community 0 are the students, professors and Order members involved in the battle of Hogwarts. In Community 1 illustrates the quest for the Horcruxes and Harry’s death. Community 2 is interesting as the common denominator between the nodes is the characters goal to protect or raise harry. Community 3 contains the descendants of the Harry and Ginny, Ron and Hermione, and Tonks and Lupin.

Community 0
Community 1
Community 2
Community 3

Distribution of known groups in the communities

Word Clouds

The word clouds show the most frequent words of each book that also are most unique for to that book compared to the whole series. This is achieved by using TF-IDF which diminishes the weight of frequent words based on the amount of other books they appears in.

As an example, the first word cloud from Philosopher’s Stone, words like “Fluffy” and “Flamel” show as big letters, which means that these words are frequent and most unique to that particular book. It provides a much better overview of the content of each book compared to a word cloud that is generated for just the plain book text as all important characters would take up most of the space for each book which wouldn’t be very interesting.

Sentiment Analysis

Overall

Compare the sentiments of movie and books, is to more or less look into the one-dimensional mood-scale, considering happy verses sad. But the values found vary so little that there is not much to be concluded, though the overall graphs can be justified with the events of the books. The dataset used to calculate the sentiment, is not representative of the harry potter universe, due to its unique world. Some strongly implicative words such as Dark Lord, Death Eaters and Muggle have huge meanings in the wizarding world, they also highlight the importance of neighbouring words, to the single words understanding. In any of the sentiment analysis, is so tiny any form of analysis is borderline speculative, however we still intent to try as we have found the results to fit very well with the material.

Book

This sentiment is calculated from the dialog from each movie. This was found to be the best indicator, through testing, as a book contains large amount of descriptive text.

The graph nicely illustrates the how we experience the movies. The first is introduction to the wizarding universe from the view of an 11 year old. It is magical and fantastic, and danger is not really comprehensible. The stakes are higher in the second and third, where the danger becomes life threatening. In Goblet of Fire we are once again introduced to something magical and exciting. The fifth and sixth books both have both discouraging and encouraging plotlines running in the books narrative. And as expected the seventh and last book Voldemort sends the wizarding world into a downwards spiral, that Harry must save.

Sentiments by Book

sentiment

Chapters

The sentiment analysis of the book chapters are also based of the dialog. Here the overall development in sentiment is very linear. Keys events such as Dumbledore’s and the resurrection of Voldemort clearly identifiable, oddly enough this does not apply to the dead of Sirius. The second chapter in Goblet of Fire is entirely inner dialogue and description - aka. No dialog - and thus has a sentiment of zero.

Sentiments by Chapter

sentiment

Movie-Scenes

The sentiment is calculated for a full scene, stage directions and all, as it was very hard to seperate the dialog, reasons are listed in the Notebook. The average sentiment of the movies are 0.5 below that of the books, which would be interesting to further investigate. The drop in Prisoner of Azkaban is due to a faulty scene. In Order of the Phoenix the sentiment is much lower towards the end of the movie possibly indicating that the movie has better conveyed the sad feeling of Harry because of Sirius’ murder. The death of Cedric and Dumbledore are however not very clearly identifiable.

Sentiments by Movie-Scenes

sentiment