Urban Unrest Database Analysis

Click To Go Back To My Portfolio

Question 1: Distribution of Total Deaths Per Problem Type

Assuming the maxdeath is true for each event available (or the mindeath is true for events with a death flag but an unknown maxdeath), visualize and compare the two following distributions:

Using data from all events globally, what proportion of all deaths resulted from each problem type?
Using data from all events that occured in the country with the most total deaths globally, what proportion of all deaths resulted from each problem type?

To collect the results, I first made a list of all events that had a death, and included their maxdeaths (or their mindeaths in the case that maxdeaths are unkown but the death flag is activated). I then grouped this list by problem type to find the sum of all deaths per problem type. I transferred this list into the first bar chart using pyplot from matplotlib.

For the second bar chart, I first had to find the country with the most total deaths. This turned out to be Rwanda. Once I had that, I made a list of all events in Rwanda that had a death, and included their maxdeaths as done before. I then grouped this list by problem type to find the sum of all deaths per problem type. I transferred this list into the second bar chart using pyplot from matplotlib.

After visualizing the two charts, It's clear that "Armed Battle/Clash" type events have alloted for a majority of the deaths both globally and in Rwanda. In Rwanda however, almost every death is from these types of events vs globally where only about 5/6 of all total deaths are from these types of events. This makes me wonder how big of an impact "Armed Battle/Clash" events from Rwanda are effecting the global totals.

Question 2: Distribution of Total Deaths Per Problem Type Excluding Armed Battle Events in Rwanda

Assuming the maxdeath is true for each event available (or the mindeath is true for events with a death flag but an unknown maxdeath), visualize and compare the following distribution:

Using data from all events globally except for Armed Battle/Clash type events in Rwanda, what proportion of all deaths resulted from each problem type?

To collect the results I had to first the country with the most total deaths, which is Rwanda, and find the tuples for "Armed Battle/Clash" events in Rwanda. I then made a list of all events globally except those in Rwanda that are type "Armed Battle/clash". I grouped this list by problem type to find the sum of all deaths per problem type. I then transferred this list into the second bar chart using pyplot from matplotlib.

This new chart gives us a much better insight as to the difference in proportions between problem types besides "Armed Battle/Clash" (though this is still the problem type with the most deaths even while excluding the Rwanda genocide). We can now see that the proportion for "General Warfare" is about twice as large than that of "Armed Attack", the next biggest proportion.

The greater visualization this new chart shows us howver is how big of a chunk of all global deaths in the database are from "Armed Battle/Clash" in Rwanda. It appears that these type of events make up slightly more than half of all deaths. As a subquestion below, I will investigate to see what proportion of these are from the Rwanda genocide of 1994 by displaying each "Armed Battle/Clash" event in Rwanda.

Question 3: Deaths Per Event of Each Problem Type (Excluding the Rwanda Genocide of 1994) vs Deaths Per Event of the Rwanda Genocide of 1994

Assuming the maxdeath is true for each event available (or the mindeath is true for events with a death flag but an unknown maxdeath), for each problem type, find the average amount of deaths per each Urban Unrest event of this problem type with a confirmed death. Do not include events from the Rwanda Genocide of 1994 while calculating the ratio of the "Armed Battle/Clash" problem type.
Display this information on two different bar charts:

Include the ratio of deaths/event of the Rwanda Genocide of 1994, and compare to all problem types.
Show only the ratios for each problem type.

To collect this data I first used the previoud list of the maximum possible deaths for each event, and grouped it into each problem type to find the sum amount of deaths, as well as the count of the amount of events. I then compared these two values to find the ratio for each problem type. I then used SQL's "UNION" syntax to join that list with another. This second list found all events that were a part of the Rwanda genocide of 1994, and found the ratio of deaths per event for only these events. I then put this unioned list into the first bar graph using pyplot from matplotlib. I then used the first list (before the union with the ratio for each problem type only) to create the second bar graph using pyplot from matplotlib.

The first bar graph shows us two things. First is that the Rwanda genocide of 1994 is only a few different events, each with an incredibly high death count, instead of a lot of events with a large death count. Second is just how much more deadly this genocide was than any other Urban Unrest events. The Problem Type with the highest deaths per event ratio is Armed Battle/Clash (not including the Rwanda genocide), but the ratio for the Rwanda genocide is over 2,500 times the ratio for all other Armed Battle/Clash events.

The second bar graph gives us insights on the differences in ratios between each problem type. In conjunction with the pie chart from question two, we can analyze if a problem type has plenty events with low deaths, or a few events with high deaths. For example, Armed Attack was the 3rd highest death total in the pie chart, but has the lowest ratio of deaths per event out of any problem type. This shows us that there's a large number of different armed attacks in the database, but each one has reletively low casualties.

Question 4: Total Deaths From All Events in Rwanda vs Total Deaths From Events in All Other Countries Combined

Assuming the maxdeath is true for each event available (or the mindeath is true for events with a death flag but an unknown maxdeath). Find the total amount of deaths that have occured from Urban Unrest events in Rwanda. Then, using a bar chart, compare this total to the total amount of deaths that have occured from Urban Unrest events in every country besides Rwanda.

To collect this data, I had two find two different totals, with similar queries. The first found the total deaths (using the previoud maximum deaths possible list) of all events that have occured in Rwanda. The second finds the total deaths from all events that occured in countries that are not rwanda. I then used SQL's UNION syntax to combine these results into a single query. I then took the outputted totals and turned them into a bar graph using pyplot from matplotlib.

The results of this query really hits the nail on the head of just how brutal the Rwanda genocide of 1994 is. With just 5 events, the death total in Rwanda is almost as big as the death total of every other country combined. This chart shows us really just how large of a proportion of the total deaths in the database were from the Rwanda Genocide of 1994.

Question 5: For Each Country, Whether or Not A Majority of Urban Unrest Deaths Occured Within the Capitol

The Rwanda Genocide of 1994 occured in the capital of Rwanda, Kigali. Let's investigate whether or not countries have had more deaths in their capital city, or outside of their capital city. Assuming the maxdeath is true for each event available (or the mindeath is true for events with a death flag but an unknown maxdeath), for each country, analyize all Urban Unrest Events in the country and compare the amount of deaths that occured in the capital city vs the amount of deaths total.
Show this data visually through a world map where a country is red if a majority of it's deaths occured in the capital, blue if the majority of deaths were outside of the capital city, and green if there are no deaths found in the database for that country.

To get this result, I first made two lists using the list of the maximum possible deaths for each event. One being the amount of total deaths for each country, the other being the amount of total deaths for each country in that countries capital city. I then did a leftjoin between the first list and second list, so that I still had data for a country if it had deaths but none in it's capital city. I then produced a ratio for each country of total deaths in capital over total deaths, the ratio being 0 if there were no recorded deaths in it's capital city. I then used the SQL 'CASE' Syntax to assign either a 1 or a 0 to each country: 1 if it has had a majority of it's deaths in the capital, 0 if not. I then translated this into a world map, showing countries as red if they have a 1, blue if they have a 0, and green if the sequal query didn't return the country, which means there were no recorded deaths from an Urban Unrest Event in that country in the database. To make this map I used pygal_maps_world.

The first and most obvious takeaway is that for a majority of countries with a death recorded in the database, they had a majority of their recorded deaths happen in the capital city. Next we can look at each region to see if we can take something away. We see in South America and Africa, a majority of countries are in the red (have had more than half of their total deaths in the capital city). For the Middle East and Oceania, it seems that almost all countries have deaths recorded, and most of them are in the red. Asia, not including the Middle East, has a lot of countries in the blue meaning that there are deaths recorded in the database for these countries, but a majority of them are outside of the capital city. For europe on the other hand, almost every country is green meaning that there are no deaths recorded in the database.

Thanks for reading, I hope you enjoyed and learned something new!