Tuesday, October 4, 2011

Yeah, Right: How I met Wordle...

No wonder "words"is the most common word in this post!!
Every teacher will tell you that there were certain moments in their teaching lives where students inspired them to create certain teaching techniques or just come up with an interesting idea. Sometimes this can also be an idea for a little research, which happened in my case. It wasn't the discovery of Wordle, as you might have guessed, but another aspect of how Wordle can be used. Wordle is a word cloud generator which creates a cloud of words using the most common words in a text. The cloud is designed in a way that the most common words are written in large fonts and the less common words in smaller ones. The text can be provided either by copying/pasting it or typing the URL of a website or a blog. I use Wordle a lot because it is a great tool to start your lesson, especially a reading class. You can make students guess what the text is going to be about or you can highlight unknown target words. You can spend a substantial amount of time in class with one word cloud but that is a topic for another blog post.

So, while working on one of those clouds in one of my classes, a student asked me how to use this application. I told him about the website and asked him why he was interested in it. He told me that he was going to live in the U.S.A for a while. He wanted to download all the subtitles of "How I Met Your Mother" episodes and copy them into Wordle. This way he could get a cloud of the most common English words from the lives of hip/cool single people in their 30s. This way he would know the basic English words needed to survive in such a culture. I thought that this was a great idea. This would be something like a corpus study of "How I Met Your Mother". I don't know if he did it, but I knew I would do it. And I did. I didn't just use "How I met Your Mother" but other popular series such as "Dexter", "Mad Men", "Community" and "The Office" (both the UK and American versions). 


I downloaded all the subtitles from a divx subtitles site (there are plenty of them out there). I used the subtitles from all episodes from the first seasons, except for "How I met Your Mother" where I used the sixth season as well. I merged all episodes into one text file and removed the time stamps. I copied/pasted the whole text (an entire season of discourse!) to Wordle. The approximate number of words in each of these text files is as follows:

How I met Your Mother: 63.000 (they talk a lot in comedies)
Dexter:  56.000 (lots of chopping)
Mad Men: 54.000 (lots of unbuttoning!)
The Office U.S.A.: 16.000 (not much to talk in an office in the States?)
The Office U.K.: 20.000 (The Brits talk too much while at work)
Community: 64.000 (American community college students like to use words a lot!)

I used a horizontal layout and the maximum number of words in the cloud was 1000. I wanted it to be a really dense cloud since there were a lot of words to deal with. I also used different kinds of colours for all of them. 

Results and Comments:

Community is a series about the life of a bunch of interesting characters at a community college. Of course, as the title suggests, it tries to show the relations and sometimes the difficult way of life of a "community" in a comedic way. It is actually a hilarious show. If we look at the most common words in the first season, we can see that "Oh, like, yeah, gonna, know, okay, right, get, well, think, just" are right at the top. When we think that this series tries to imitate a community, it is actually sad to realize that the most common words used are only fillers! You can't even make a sentence out of them. Or can you? "Oh, yeah? Like, you gonna know! Right, get just well, I think..."

If you haven't watched Dexter, than you are missing out of one of the best series ever. There is lots of hacking and slashing involved, so you would expect a grimmer choice of vocabulary but as you can see "right, get, yeah, need, gonna, just, like, got, know, think, want, time" are the most common words. Of course, there is going to be "killer" there as well since it is Dexter, a serial killer. 

How I Met Your Mother Season 1:
I think everybody knows "How I Met Your Mother". This series is the main reason why I decided to write this post. Again, as we can see, words like "OH, know, just, right, yeah, going, like, want, get, think, really" dominate the life of these New Yorkers in their early-thirties. Of course, they are young, funny and in "love" (another word that can be spotted just between "like" and "know").

How I Met Your Mother Season 6:
The reason I chose the first and the sixth season from "How I Met Your Mother" was to see if there would be any change in corpus over the course of nearly six years. One would think that six years can make a difference in character development and the choice of words. However, as can be seen above, again the same words have been used.  "Oh" being right in the middle seems to be very meaningful!

 The Office U.K.:

I think "The Office UK" became popular after the American version. I haven't watched all of it, however, I have watched enough to witness a hilarious and embarrassing Ricky Gervais. Nevertheless, when you look at the frequency of the words, there is nothing that indicates that this is a very British show. Here we can observe that the English language cannot get away from the "yeahs" and needs to "know" everything and "wants" everything and "oh", it "gets" it!
The Office U.S.:

I definitely thought that there would be a huge difference between the American version of "The Office" and its British ancestor in terms of word choice. As you can see, this is not the case. "Yeah", to "know" the English language is very important. I think we "just" "gonna" have to accept that. Right!

By conducting this little experiment, I have come to the conclusion that the most used words in the English language are "Yeah, know, get, and oh". I am so happy that I did not spend too much money and resources into this research. Well, time, I have spent, but that is something that you need to sacrifice if you want to contribute to the field of science! I must say that I will continue recommending these Tv shows to my students since they are still among the best shows on Television. They keep everyone entertained while giving students the opportunity to get exposed to English at its most natural level. They can also observe how important the use of "fillers" are in spoken language. This "study" shows that communication is actually dependent on those "fillers".
It also has to be noticed that this research/experiment is not of a serious nature and I hope that you have not taken it too seriously. I only wanted to see how the spoken English language in popular TV series can be visualized by using Wordle. I must also admit that I was indeed a little curious about the outcome. I hope that no offense was taken by any speaker of the English language since no offense was intended.


