Tuesday, October 19, 2010

Text Analysis and Data Visualization - -

For the text analysis and data visualization exercise, I decided to analyze data from the novel, Alice in Wonderland by: Lewis Carroll, found in the Project Gutenberg catalogue. I chose this novel because it was one of my childhood favorites, along with the recent movie premiere, with one of my favorite actors, Johnny Depp! This story is about a girl named Alice who falls down a rabbit hole in a fantasy world aka wonderland. Wonderland is a place where talking animals and objects exist. Throughout Alice’s journey, she encounters some conflicts and whimsical songs. 

 ♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♠◦♣◦♥◦♦-♠◦♣◦♥◦♦


 
The first example of analyzing the text, Alice in Wonderland, I created a Wordle. The colors in the Wordle, is customized to how I perceived the story and my interpretations. The great thing about Wordle, is that it delivers the frequencies of certain words. While I was creating this Wordle i chose the option, " Remove common English words," therefore, it shows that the words that appear the most out of the text, is little and Alice. Alice is the main character of the story, which i expected to appear as a frequent word. 

 ♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♠◦♣◦♥◦♦-♠◦♣◦♥◦♦



After creating a Wordle, I then chose to analyze this text once again by using a  Word Tree. A word tree shows the frequencies of senctences that are created and involved with certain words. I explored and entered different words into the search bar to see which words created different outcomes. I first entered the word “ Alice”, and was content with the results. I then entered a more common word, “the”, and was blown away by the results. The word “the” appears approximately 10 times more frequently than the word “Alice” or “Rabbit”. One attribute that I really love about word trees is that you have the ability to control and visualize specific sentences of phrases just by clicking on a word. 

 ♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♦-♠◦♣◦♥◦♠◦♣◦♥◦♦-♠◦♣◦♥◦♦

For my final analysis of this text I settled on doing a manual extraction of data by following the procedure of the metadata exercise we did in blog#2. For this exercise I extracted given information about the article on Project Gutenberg and basically stated important facts about the data.
  • Title / Name – Alice’s Adventures in Wonderland
  • Author / Creator – Lewis Carroll (1832-1898)
  • Publisher – Project Gutenberg
  • Date of Creation – June 27, 2008
  • Date viewed – October 18, 2010
  • Language – English
  • Format (.html, .pdf, .avi, etc.)- PDF or HTML
  • Media Type (if applicable) – online book (ebook)
  • Rights - Public domain in the USA.
  • Subject or Topic – Fantasy
  • Category or Categories – Text

Sunday, October 17, 2010

data data datA ?

• What kinds of data can you analyze using your suite of tools?
Retrieve data from NY Times articles

• What kind of information can you extract from the data?
Visual forms of the data presented. This includes; word frequencies, amounts, geographical distribution, statistics, etc.

• What kinds of questions can you ask of your data using text analysis and data visualizations?
 To simplify and create discussions. This tool has the ability to demonstrate different visual forms of data. For example, a word tree can be used to simplify text and gives you the ability to search for a certain point, or word.

• What hidden patterns are revealed using text analysis and data visualization?
The hidden pattern was revealed through the different forms of visual data available. A word cloud represents the range of word frequencies while a word tree represents the relationships of words and the sentences they appear in.

• Who would be most likely to perform this kind of text analysis or data visualization
Anyone who is trying to specify a certain part of a text or statistic. This tool would be ideal for accumulation information for a presentation or skimming through an essay to pick out main points and/or words. 

- *-* -* -* -* -* -* -* -* -* -* -* -* -* -

Why did your group choose the (type of) data/texts that you did?
 My group chose the President Obama’s Inaugural speech because the tools that we were provided to use did not give a good range of useable data forms. This tool became very limited to the articles that it provided for us. The type of text we chose allowed us to test and use different visual formations apposed to the other articles, that were limited to mainly text clouds. 

What parts of the tool did you use?
We only were successful by using word clouds and word trees, as I said above, the tools were limited to certain articles, unfortunately.

What did you find out about these texts?
We looked at the use of certain words within the text and its relation to the article.  

Which elements of the tools produced the deepest kinds of analysis (i.e. which were the most useful)?
The most useful kind of text analysis would be a word tree. A word tree demonstrates relations and connections between words and the article. This tool is great for picking out points of an article

For my Text analysis and Data visualization I will be focusing on Alice in Wonderland from the Project Gutenberg cataloge. I will be creating a Wordle to express different colors and word frequencies, a Word Tree to show connections between frequently used words and lastly extracting data manually. 

Thursday, October 7, 2010

Wonders of Wordle!

This text is solely based on Scarface, Bob Marley and Oasis. An all time favorite movie, all time favorite artist and the band that I listened to throughout my childhood. This wordle is basically positive, there is nothing negative about it other than titles of songs( i.e. I shot the sheriff ) and the story line in Scarface. I was surprised at the way that wordle created this formation. I found the randomized scattering quiet fascinating. I also enjoyed how there is a lot of editing options. I loved how when you customize your own color pallet it does not result in the solid colors that you chose, instead it blends them to create new ones, as if the colors you chose were just a base. You are able to remove, add, configure, color and change the layout completely without starting over.I played around with the word cloud and discovered by removing one word almost changes the whole formation. The placement is so precise that when you remove one word, it doesn't just leave it blank, but it re-scrambles the whole cloud.

I chose to create a tag cloud based on a facebook message I recieved. Personally I liked wordle more than tag cloud, just because you have the ability to be more creative and personalize your word cloud. Opposed to Tag Crowd, there is only one style and no opportunity to edit or create your own work. One attribute of Tag Crowd is that you are able to clink on a word that takes you to a web source of your choice. I would like to see more variety of editing tactics in Tag Crowd, abitlity to change colors, fonts, layouts etc. A stop list is a list of insignificant words, designed to eliminate indexing of and retrieval by words like
“an” and “the.”
Stemwedel’s, “What’s the Point of a College Education?”
George Orwell’s 1984








Facebook Post!

  1. Steve Jobs’ 2005 Commencement Speech at Stanford
A personal Essay!

A poem from Representative Poetry Online
youtube!