Module # 9 Visualization in R
Module # 9 Visualization in R
In this project, I explored the ShipAccidents.csv dataset and created three visualizations in R: a basic histogram, a bar plot using Lattice, and a customized bar chart with ggplot2.
The first visualization I made was a basic histogram showing the distribution of ship service years. This quick and simple plot helped me understand how service years are spread out across ships. It was useful for spotting extreme values, but it lacks customization options, making it more suitable for initial exploration rather than detailed analysis.
Next, I used Lattice to create a bar plot showing the number of incidents by ship type. This method helped me categorize the incidents by ship type, offering a clear visual of which ship types have more accidents. While Lattice is great for grouped data, it’s less flexible in terms of customization compared to ggplot2.
Finally, I used ggplot2 to create another bar chart of incidents by ship type, but with more customization. I added color to the bars and rotated the x-axis labels for better readability. The chart looked more polished and professional, but ggplot2 does require more time to learn and set up compared to the other methods. It’s my go-to for creating high-quality visualizations, especially when presentation matters.
In conclusion, each visualization method has its pros and cons. Basic R is great for quick, simple plots, Lattice is useful for grouped data, and ggplot2 is the most powerful and flexible, ideal for polished, customizable charts. The choice depends on your need— speed, clarity, or customization.
> # Load the dataset from the Desktop
> ship_accidents <- read.csv("/Users/zayjenings/Desktop/ShipAccidents.csv")
>
> # View the first few rows of the dataset
> head(ship_accidents)
rownames type construction operation service incidents
1 1 A 1960-64 1960-74 127 0
2 2 A 1960-64 1975-79 63 0
3 3 A 1965-69 1960-74 1095 3
4 4 A 1965-69 1975-79 1095 4
5 5 A 1970-74 1960-74 1512 6
6 6 A 1970-74 1975-79 3353 18
> # Check the column names and types of the data
> str(ship_accidents)
'data.frame': 40 obs. of 6 variables:
$ rownames : int 1 2 3 4 5 6 7 8 9 10 ...
$ type : chr "A" "A" "A" "A" ...
$ construction: chr "1960-64" "1960-64" "1965-69" "1965-69" ...
$ operation : chr "1960-74" "1975-79" "1960-74" "1975-79" ...
$ service : int 127 63 1095 1095 1512 3353 0 2244 44882 17176 ...
$ incidents : int 0 0 3 4 6 18 0 11 39 29 ...
>
> # Summarize the dataset
> summary(ship_accidents)
rownames type construction operation
Min. : 1.00 Length:40 Length:40 Length:40
1st Qu.:10.75 Class :character Class :character Class :character
Median :20.50 Mode :character Mode :character Mode :character
Mean :20.50
3rd Qu.:30.25
Max. :40.00
service incidents
Min. : 0.0 Min. : 0.0
1st Qu.: 175.8 1st Qu.: 0.0
Median : 782.0 Median : 2.0
Mean : 4089.3 Mean : 8.9
3rd Qu.: 2078.5 3rd Qu.:11.0
Max. :44882.0 Max. :58.0
> # Create a histogram for the service column
> hist(ship_accidents$service, main="Histogram of Ship Service Years",
+ xlab="Service Years", col="lightblue", border="black")

> # Load lattice package
> library(lattice)
>
> # Create a bar plot showing incidents by ship type
> barchart(incidents ~ type, data=ship_accidents,
+ main="Incidents by Ship Type",
+ xlab="Ship Type", ylab="Number of Incidents",
+ col="lightcoral")
> # Load ggplot2 package
> library(ggplot2)
>
> # Create a bar chart for incidents by ship type
> ggplot(ship_accidents, aes(x=type, y=incidents, fill=type)) +
+ geom_bar(stat="identity") +
+ labs(title="Incidents by Ship Type", x="Ship Type", y="Number of Incidents") +
+ theme(axis.text.x = element_text(angle=45, hjust=1))
>


Comments
Post a Comment