Swaying from Truth: Candidates and Their Positions on Issues

Jyotishka Biswas, Eric Lau, Kalki Seksaria, Tiffany Wang

DataSculpture

The data say that Donald Trump and Hillary Clinton differ wildly both in political ideology and in general truthfulness, with Clinton trumping Trump in the latter. We want to tell this story because with the upcoming election, it’s important that people know both where the candidates stand and what they will say to gain political support.

In celebration (trepidation?) of the upcoming election, we looked at two data sets: the candidate files at PolitiFact and this New York Times interactive article on where the presidential candidates stand on various political issues.

We focused on Hillary Clinton and Donald Trump, who are the leading candidates (as of 3/14/2016) of their respective parties, and three issues — immigration, economy, and healthcare. From the New York Times article, we calculated how liberal each candidate is on a particular issue. From PolitiFact, we calculated the average truthfulness of statements within each issue category. In the resulting “pendulum chart”, we hoped to show a clear difference between the truthfulness and political stances of Clinton and Trump.

For our intended audience of moderately informed likely voters, we wanted to provide a lighthearted but informative view of these candidates. The sculpture was designed to be interactive — the pendulum heads have example statements from the candidates on one side and pictures of their faces on the other which vary corresponding to the average truthfulness, and are designed to swing slightly when picked up to support the “swaying from the truth” metaphor. To help with this, we created a smaller two-sided card containing the legend and information about the display, meant to be picked up and read by the interested viewer.

Yet we wanted the presentation to be just as useful when viewed from a distance, which informed our use of bright colors, bold text, and the easy-to-understand physical variables of position and length. The result, we hope, is a presentation that provides information at finer levels of granularity as the viewer approaches it, but for which the general message is clear throughout. As for the message, our aim was to avoid showing obvious bias through visual design differences between the candidates — the goal is for the data, through the presentation, to speak for itself.

Do out of school suspensions correlate with school performance?

Team Members: Catherine Caruso, Jane Coffrin, Iris Fung, Katie Marlowe

The data say that a higher school performance correlates with a lower number of out of school suspensions. In addition, schools that only administer in school suspensions perform higher than schools that administer out of school suspensions. We want to tell this story because we’d like to advise the Louisiana State School Board against current methods of discipline that may not be good for the student or the school as a whole. We would like to recommend that all out of school suspensions become in school suspensions, or something of the sort. Our Audience is the Louisiana State School Board.

When a child acts up in school, there are many ways to discipline him/her- in school suspension, out of school suspension, even expulsion. However, some methods are better than others when it comes to the student’s academic trajectory and success throughout high school. Out of school suspension may seem like an attractive option for the school because then the child is off school grounds, and is no longer the school’s responsibility. However, out of school suspension is problematic for the child. Now, a child that is already having behavioral issues no longer has the structure, schedule and supervision that comes with being in a school. Removing a child from school may place them in an unsupervised home situation, or in an even worse situation on the street.Ultimately, out of school suspension may make the child less willing to follow rules and pay attention in class, causing his/her academic performance to decline. If there are enough out of school suspensions, the performance of the entire school may be negatively affected. (http://pediatrics.aappublications.org/content/112/5/1206; http://www.teachsafeschools.org/alternatives-to-suspension.html)

We targeted the Louisiana State School Board because school board members are in a position to actually make beneficial changes to the system in a way that parents or teachers cannot. It is also worth noting that Louisiana is notorious for strict disciplinary procedures – other groups have also worked to try to reduce or ban school suspensions (https://www.louisianabelieves.com/schools/public-schools/louisiana-safe-and-supportive-schools-initiative-(lsssi); http://www.nola.com/politics/index.ssf/2015/04/louisiana_student_suspensions.html) but they have not been successful yet. In addition to showing that schools with few out of school suspensions perform much higher than schools with many out of school suspensions, we also included information about the difference in school performance for schools that administer in school vs. out of school suspensions. The schools that only suspend students in school have a much higher school performance, which makes the case that in school suspensions are a better option for the students and the school as a whole.

Our choice to represent the data using a 3d diorama-like structure is a nod back to the grade school days of creating dioramas, a staple of school projects. The materials – pipe cleaners, construction paper – do the same, and the colors we chose are vibrant and eye-catching. The movement of children from middle school on the left  to high school in the middle to graduation on the right leads the viewer’s eye from left to right to read the graph. The pipe cleaner colors  – yellow for high performing schools and purple for low performing schools – contrast each other well, and yellow often represents high achievement in academic settings. The suspensions are represented in red, a color commonly used to mean warning or stop. We only represented seven of the highest performing schools and seven  of the lowest performing schools to simplify the information and to make the distinction between the two groups visually striking. Complete information about the schools we included, their suspension rates, and their school performance appears on the back of the sculpture for anyone seeking additional information. The inset about in school vs. out of school suspensions serves to offer a viable solution to the problem we have presented, in hopes of motivating the board members to not only absorb the information, but to also start thinking about what action they can take to remedy the situation.   
While a data sculpture is a rather unconventional method for presenting such serious information in a formal setting like a school board meeting, we thought our novel approach would surprise the board members, and pique their interest, giving us the opportunity to engage them on the topic and talk about the information and the issue at hand in more detail. It is also a tongue-in-cheek reference to projects their own students might be creating.

View of the front of our sculpture.
View of the front of our sculpture.
View of the back of our sculpture.
View of the back of our sculpture.

Fireworks: Fun & Dangerous

Judy Chang, Gary Burnett, Andrew Mikofalvy

We chose to use the National Electronic Injury Surveillance System (NEISS) as our dataset, accumulating the injury reports from 2009 to 2014.  The data logs all injuries related to consumer products reported by a probability sample of hospitals across the country. We filtered the dataset to only look at injuries caused by fireworks. We want to tell this story because we want to raise awareness about the dangers of using fireworks. Our audience is consumers who may purchase fireworks to celebrate holidays, such as July 4th.

We only looked at fireworks-related injuries, and we counted the number of records by the body part injured via Tableau. Our goal was to see which parts of the body are most commonly injured by fireworks, and we found:

Total
Hand 329
Eyeball 260
Finger 201
Face 177
Foot 62
Trunk, upper 61
Leg, upper 49
Ear 49
Leg, lower 45
Arm, lower 35
Trunk, lower 29
>50% body 29
Ankle 27
Head 27
Neck 27
Mouth 16
Wrist 15
Knee 15
Arm, upper 13
Shoulder 9
Pubic region 9
Toe 9
Elbow 8
Not recorded 3
Internal 1
25-50% of body 1

The most common injuries are in the face, fingers, eyeballs, and hands. We wanted to demonstrate the gravity of these injuries by highlighting these body parts on the human body. We noticed there are roughly 4 clusters for the number of injuries: 0-10, 10-30, 30-100, and more than 100. Our data sculpture is hence a mannequin, where we painted each body part with the shade of red that corresponds to the number of injuries. We used yellow strings on the mannequin to demonstrate the boundaries of the body parts recorded in the dataset.

We also detached the hands to further illustrate the by-far most injured part of the body. The hands of the mannequin are also holding a firework, to show the audience the “source” of the red paint, and a stop sign, that warns the audience that fireworks cause at least 1500 injuries every year and to use caution when they use fireworks.

IMG_4365

Our dataset is only a subset of all fireworks related injuries; however, the number of injuries by body part is representative of the fireworks related injuries nationwide.

Can We Afford To Integrate Refugees Into the US?

By: Kenny Friedman, Mike Drachkovitch, and Felipe Lozano-Landinez

The data say that it costs about $65,000, on average, to integrate a refugee into the United States over a period of five years. We want to tell this story because in today’s political environment, which is exhibiting significant anti-immigrant and anti-refugee rhetoric, it is important to understand what it would actually take to grant asylum to global citizens in need in 2016.

Our audience for our data sculpture is the American citizens that reside in the State of New Hampshire. We further characterize this audience as those whose primary concern in the refugee debate is the economic impact of taking in refugees on their state resources, and would also venture to say that this audience is of a more conservative political inclination. Our goal is to help them understand the economic viability of taking in refugees in New Hampshire and encourage them to support refugee in-take for this year.

In order to tell this story, we used three data sets:

The first data set is from a Buzzfeed article about US Refugee Data by Jeremy Singer-Vine, and can be found in raw format in Github. We used this data set to estimate the number of refugees that New Hampshire could expect to take in in the Year 2016 (457), taking into account Obama’s increase in the refugee quota (from 70,000 to 85,000), the percentage of the quota that the US has fulfilled over the last 10 years (82%), and the percentage of US admitted refugees that New Hampshire took in annually between 2005-2015 (0.65%). This was a clean data set recommended to us by Rahul Bhargava.

The second data set is an analysis from the Center for Immigration Studies (CIS) regarding the cost of taking in a refugee over the first five years. We used this data set to estimate how much it would cost, on average, to integrate a refugee into the United States. We define “integrate” as having a refugee be resettled and established over a time span of five years in the US, to be consistent with the CIS analysis. For our “expected annual cost per single refugee integration” calculation, we used the aggregate five-year figure in the analysis and divided by 5 to get our number of $12,874.

Though we understand that CIS very much seems to have its biases against allowing immigration, we decided to use their data for two reasons: 1) Their analysis was the most thorough that we found online with regards to the economic cost of a refugee, and their methodology and data sources appear to of good objective merit, well thought out and fairly done. The bias seems to come from the way the calculations are used, not the calculations themselves. 2)  We realized that CIS’s potential bias would be of benefit to our story, because if it manifested in their calculations it would be in their interest to have the economic cost be as high as possible. Our story is about showing that this economic cost is not nearly as high as people think in the big picture; we are essentially using a “worst case” cost, and if our story can be impactful with it then it can only be stronger with a purportedly less biased estimate.

Our last data set is the 2016-2017 State of New Hampshire Budget, which provided us with the 2016 allocated state budget ($5.7 billion) information that we needed to appropriately size our data sculpture. This was taken directly from the Governor’s 2016-2017 Budget Bill.

We think our data sculpture is an appropriate and effective way to tell the data story because it re-frames large, abstract, and scary concepts of cost (spreadsheet numbers that are in the millions and billions) to more familiar conceptualizations of relative weight and relative volume. As such, comparisons can be made much more intuitively between how much it costs to integrate a refugee annually vs. the amount of money that is already in circulation for government purposes. The data is also very personal in the sense that it presents information specific to New Hampshire to citizens from New Hampshire. The experience of placing only a few Jelly Beans (each one represents $3M) from a bucket full of them (the state budget) and tipping the “balance of fate” for hundreds of refugees towards hope is very powerful, both because the audience has agency in this interactive display and also because it takes so little effort to make a huge impact.

Photo:

Final Project Pic

Video Demonstration (to turn into a GIF, you can right-click and click on “Loop”):

Hubway Rides by Neighborhood over Time

Aneesh Agrawal, Kenny Friedman, and Katie Marlowe

The data show routes that people commonly take by using Hubways. We want to tell this story because Hubway can be a great alternative transportation for routes that the MBTA does not cover.

Our data came from hubwaydatachallenge.org, which was a challenge in 2012 to visualize data from Hubway rides. Our data includes information on rides from 2011-2013. We picked a chord chart to visualize this data because this type of chart emphasizes the connections between various stations. The thickness of the chords corresponds to the relative frequency of rides between the neighborhoods. The chord chart points out specific routes that are taken frequently, which leads to the question: Why are people taking Hubways between these stations? Is it because the MBTA does not currently provide a good way to get between these destinations? Or is it just that there are a lot of people traveling between these areas? Specifically, we can look at the blue region (MIT) and the gray region (back bay). There is a thick chord between these areas. We know that it is pretty difficult to get between these areas via public transportation, there isn’t a T line that runs between them. This could be a good indication of a route that many people take without many options of how to get between, so many people decide to utilize Hubways.

If you look at the data over time, then you can see that some stations didn’t exist at the beginning, but were built in the middle of time this dataset is from. By the end of the timeframe, these stations become about half of overall monthly usage. This points to the conclusion that expanding the Hubway system is effective, and we recommend expanding it further. In late 2015, Hubway did announce some future plans for expansion.

View the visualization here.

Do Smear Campaigns work?

Group members: Michelle Thomas, Reem Alfaiz, Andrew Mikofalvy

The data shows the effectiveness of negative ad campaigns as well as their tendency to be used as a last resort in gaining support and lowering support for competitors. We wanted to tell this story because of the over saturation of campaign ads and curiosity over the effectiveness of negativity.

We used data from the Politcal TV Ad Archive to look at which candidates negative ads were targeting as well as when they were being published. We compared those findings to voter results in each primary and caucus from data from the New York Times. We chose to tell this story through a line graph representing voter results overlayed with a bar graph depicting air count of negative ads per candidate. Both graphs are plotted over the time frame of February 1st- March 1st. This is the date of the first caucus until Super Tuesday, a day in which 12 states and 1 territory hold their primaries and caucuses. This layout allows people to see the race between candidates and relate it to when campaigns chose to start using negative ads, as well as if the candidates they target did worse or not. We felt that this is an effective time frame as Super Tuesday is a large sample of voting results and holds a perceived weight for campaign success. We chose to use delegates won as the measure for candidate success since it factors in issues such as relative importance of states, since negative ads were shown more in some states than others. We also included explanatory text so that the chart is understandable regardless of political system education.

Print

View infographic here

Political TV ads: not what you think they’re about

Team members: Kalki Seksaria, Gary Burnett, Michael Drachkovitch, Argyro Nicolaou

The data say that the top topic choices for political TV ads in Iowa did not always match the issues voters considered to be the most important in that state. We want to tell this story because it provides insight into the logic of political TV ads during a primary, where the emphasis is less on pitting the voters against the rival political party but more about differentiating same-party candidates and getting voters to the polls.

We mainly worked with the Political TV ad data set but also used data from the CNN entrance polls from the Iowa primaries.

Screen Shot 2016-03-03 at 3.04.20 PM

We first had to clean up the data set, giving each topic its own column, since the ‘topic’ cells were populated with every topic mentioned in each ad. After aggregating the number of times each topic came up, we picked the top five issues for each party. In order to attribute ad affiliation as either Republican or Democrat we worked with the Sponsor column and not the candidate column since the latter included every candidate mentioned in the ad. We made a list of each of the Sponsors and researched their affiliation. Our charts exclude unaffiliated, non-profit donors (there were only two such organizations that advertised in Iowa anyway).

Having done this work, we used Tableau to create a line graph per issue per party (10 graphs total) mapping topic against time.

The CNN entrance polls on the Iowa primaries were used to make a bar chart of the timing of Republican and Democrat voters. Since the timing offered by the exit poll survey was under categorical values: ‘Today’; ‘last week’; ‘last month’; we had to decide which range of dates to include under each of these terms, to make sure that a relationship existed between the two datasets. Kalki describes the process: I first converted the categories into dates. Today = 2/1/16 (Iowa Caucuses Date). Last few days = the 2 days before the caucuses. For before last month, I assumed it meant between 1 and 3 months ago. I then assumed that the number of people who decided in a time window were evenly distributed over that time window. For example, if 30% of people decided last month, and “last month” included 23 days (30 day month – last 7 days are  “last week” or shorter), then 30% / 23 = 1.3% decided each day.

Our choice to present the most popular ad topics and most important topics according to voters as a table aims at pointing to the unexpected discrepancies between the two sets of information. What other reasons could there be for pushing a specific ad topic, even if voters don’t think it is important? To try to understand this, we plotted each party’s top-5 ad topics as a line graph against time and superimposed an area chart that maps the timing of voter decisions in order to see whether there is a correlation between certain ad topics being blasted out to voters and when the voters made up their mind.

 

5democratic_issues_TIMEcombined_5R_issues_Decision_time

A Day in the Life of a Hubway

header-01

By Jyotishka Biswas, Phillip Graham, and Maddie Kim

The data say that Hubway has had a positive impact on health and the environment in the Greater Boston Area. We want to tell this story to show that choosing to bike can make a difference.

The Hubway Bike Share system launched in 2011, and completed over 1 million rides over the next two years. We decided to look at the benefits of biking on health and the environment, and to quantify the impact that Hubway has had along these dimensions.

We chose to focus on the positive message for this assignment, as if we were part of Hubway’s marketing team, which guided many of the decisions we made. The first was to de-emphasize the charts — we used only two charts, to show the age and gender breakdown of Hubway users, statistics which were fun facts rather than central to our message. When it came to the core of the infographic, we presented medians rather than distributions to avoid unnecessary complexity. Primarily, we focused on keeping the tone light and fun, to make the reader more receptive to the message.

The result is a scrollable infographic, in which the story is told in a loose sequential frame format. The numbers are communicated in the context of a day in the life of a Hubway bike, and small comments in speech bubbles are used to signpost the flow of the story and provide some humor. We used bright colors to frame our content, and large, bold text to emphasize important numbers. We made the conscious decision to have a clear opinion and message, rather than to lay out our analysis and ask readers to assess it for themselves. We believe that this resulted in a more accessible presentation, and hopefully one which is as informative as it is enjoyable.

You can find the infographic here. (It’s made up of large images, so don’t click if you’re worried about data usage.)

 

 

Campaign Strategy 101: Winning Hearts and Minds

By Felipe Lozano-Landinez, Jane Coffrin, and Julia Appel

The Political TV Ad Archive contains information about the televised ads during the 2016 primary campaign season. Our goal with this project was to explore this data set and see what interesting campaign strategy insights we could derive by looking at which candidates sponsored ads on which TV shows. To do this, we cleaned/modified the data set to specifically focus on candidates via what ads they sponsored (not which ones they appeared in), the program on which each aired, and the ad’s emotive content (i.e. positive, negative, or mixed). We took a subset of the data (all TV programs with more than 500 ads aired as of the time that we downloaded the data), and also filtered out all the Presidential Candidates that haven’t been relevant in the race as of the last couple of weeks. Finally, we grouped TV shows into four “Show Type”: Talk Shows, Entertainment, Game Shows, and News.

We looked at the data in multiple layers through a series of increasingly granular questions: How did the ads gets segmented by “Show Type”? Did a particular political party dominate a specific “Show Type”? Were Republicans more likely to advertise on certain types of shows than democrats? Were there specific TV programs/shows that were targeted by specific candidates? Finally, were the ads sponsored by these candidates “pro” ads, meant to bolster their candidacy, or “con” ads, meant to bring down another candidate’s campaign?

We think this is an effective way to ask questions of the data, and ultimately derive an interesting story from them, because our top-down enabled us to look at the big picture, notice discrepancies, and then dig further to try and explain them.  We wanted to tell a few stories that surprised people; our approach helped us look at something that made sense on the surface (candidates advertise more on news shows), but maybe not at a deeper level (Donald Trump advertised significantly less than the two remaining Republican candidates in the race).

We believe that campaign strategists are strategic in their message targeting, but wanted to better understand how they target TV viewers, and whether or not they have different assumptions than we do about the political inclinations of TV viewers. We also wanted to see if the actions of a candidate’s campaign would differ from the conventional wisdom that normal Americans have about those candidates. On the surface level, our views/perspectives may align, but when we dig deeper we deconstruct our perspectives and demonstrate where things begin to differ, leading to greater understanding of the larger political atmosphere.

***

If you prefer to get late night comedy from Stephen Colbert than Jimmy Fallon and you’re a registered Republican planning on voting for Donald Trump, Marco Rubio’s campaign manager Terry Sullivan knows it. And he’s trying to change your mind.

While it may come as no surprise that campaign strategists profile TV viewers to target political ads and maximize impact, it may be surprising what shows they are actually targeting. What we found about the political ads you’ve seen this election cycle may give insight into the campaign strategies of the front running Republican and Democratic candidates. It also may give you a reason to change the channel.

First, we looked at which categories of TV show were most likely to air an ad from one of the nine major political candidates.

Distribution of Ads over 4 Show Types

If you’re watching a news show — the Today Show, for example — you’re almost twice as likely to see an ad for a political candidate than on any other type of TV show. Which political candidate? Bernie Sanders and Marco Rubio.

Slide1

What surprised us most about the breakdown of ads on news shows wasn’t who was advertising most frequently; it was who wasn’t. 

 

Notice Donald Trump, Republican nominee frontrunner and winner of seven states on Super Tuesday. He ran nearly 2/3 fewer ads than Marco Rubio, half as many as Jeb Bush (who didn’t even make it to Super Tuesday), and 300 less than Ted Cruz.

Emotive Content of Republican Candidate Sponsored News Ads Also, Donald Trump didn’t waste time attacking his opponents: he ran no attack ads (those marked with “con” emotive content) on news shows. Jeb Bush, Ted Cruz, and Marco on the other hand, were slinging mud all over the place.

Why are we seeing fewer ads, and no negative ads from Donald Trump? Maybe he doesn’t need to run attack ads, since much of his media presence revolves around negative commentary of his opponents? Maybe he doesn’t need to spend as much money on traditional media because of his polarizing candidacy? Whatever the reason, it looks like we won’t be seeing any traditionally slanderous campaign ads from the Donald any time soon.

After looking at news shows, we looked at the category that aired the second most political ads: talk shows. This is where Marco Rubio’s campaign strategy got interesting… We wanted to see the breakdown of advertisements from Republican candidates on talk shows. 
Slide1
Marco Rubio out-advertised his rivals by a margin of 2 to 1. Jeb Bush and Donald Trump were, again, distant second and third place runners-up to the TV advertising machine, Marco Rubio.

Republican Ads on The Late Show and The Tonight Show

 

Rubio’s advertisements weren’t all positive, either. Note the lack of attack ads from Donald Trump on both shows, as compared with Rubio, Bush, and Cruz.

Again we wondered: why is Marco Rubio trying to win the hearts and minds of these TV viewers? Is he trying to attract young voters, and perhaps draw them from the front-running Democratic candidate? Is he trying to appeal to young voters as a moderate candidate? Is he a moderate candidate?

Only time will tell whether these ad strategies — or lack thereof — will really influence voters; until then, we will continue to wonder how the political strategists are targeting our favorite TV shows.

Attack of the Chains?

By Catherine Caruso, Kendra Pierre-Louis and Tiffany Wang

Every year since 2008 the New York City based nonprofit, Center for an Urban Future has compiled a tally of the number of national retail chains located across the city. Underpinning their analysis is the unspoken assumption that retail chains are somehow a detriment to the fabric of the city. In that regards they have some support.

“The unintended consequence of their [chain stores] victories through the 1970s and beyond,” writes The Geography of Nowhere author James Howard Kuntsler in a 2013 post in the Huffington Post, “was the total destruction of local economic networks, that is, Main Streets and downtowns, in effect destroying many of their own livelihoods.”

A number of films, such as Wal-Mart: The High Cost of Low Price, associate chain stores, like McDonalds, Bed Bath & Beyond, Whole Foods, and Wal-Mart with the economic and cultural destruction of the communities in which they are located. In lieu of buying from national retailers, we’re told, that the best thing for local communities is to buy from local, independent retailers in what are called “buy local campaigns.”

There is some evidence suggesting that it may be better to buy local. In 2003 the Maine based Institute for Local Self-Reliance found that for every dollar spent at a local business, 45 cents stayed in the local community. Another nine percent stayed within the state. For chain stores, however, only 14 cents remained within the local community. The rest trickled out to the national management along with distance product suppliers. Their supposition does suggest that communities with chain stores would be economically stronger than those without them, and we wanted to see if this was the case in New York City.

We used Center for an Urban Future’s data on chain stores, and cross referenced it with income data to see if communities with higher incomes have fewer chain stores.

State of the Chains_Corrected

As you can tell, for the most part that’s exactly what we found. A few exceptions existed among middle income people but they’re within a reasonable margin of error. Generally speaking in New York City, if you make less than 44,000 dollars a year your neighborhood is going to be rife with chain stores, and if you’re making more than 84,5000 a year your neighborhood will have very little.

One word of caution: this doesn’t tell us why this correlation exists merely that it does exist. It could be that chain stores remove income from communities, or it could be cultural – a signal of gentrification is the emergence of local neighborhood shops. It could be that higher income individuals prefer to live in neighborhoods with fewer retail chains.