“ggpubr” package in R for Data Visualization
Share
ggpubr
We are going to use “ggpubr” package for data visualization .
ggpubr
It provides some easy-to-use functions for creating and customizing “ggplot2” based publication ready plots.
We install “ggpubr” package as:
We load “ggpubr” package as:
We set the seed of random number generator , which is useful for creating random objects can be reproduced.
set.seed(1234)
We are creating a data frame contains variable ‘sex’ and ‘weight’ . We are using rnorn() function to generate random numbers from normal distribution . We are creating first 300 random numbers with mean 45 and next 300 random numbers with mean 49 .
wdata = data.frame(
sex = factor(rep(c(“F”, “M”), each=300)),
weight = c(rnorm(300, 45), rnorm(300, 49)))
We check top four observations of data frame wdata as:
head(wdata, 4)
We create a density plot by using ggdensity() function.
The first argument specifies the dataset and x specifies the variable to be drawn . The add argument is used to add mean line in the plot. We added rug to the plot so that we can display individual plots of density plot. We used color argument to color on the basis of sex value. We used fill argument to fill color according to sex value . We used palette for coloring or filling by group.
ggdensity(wdata, x = “weight”,
add = “mean”, rug = TRUE,
color = “sex”, fill = “sex”,
palette = c(“#00AFBB”, “#E7B800”))
We plot histogram with same options by using gghistogram() function .
gghistogram(wdata, x = “weight”,
add = “mean”, rug = TRUE,
color = “sex”, fill = “sex”,
palette = c(“#00AFBB”, “#E7B800”))
The default value of bins are 30 to plot histogram .
gghistogram(wdata, x = “weight”,
add = “mean”, rug = TRUE,
color = “sex”, fill = “sex”,bins = 50,
palette = c(“#00AFBB”, “#E7B800” ))
We have changes bins equal to 50 to see the difference in histogram formation. Now , the plot is more wider and more frequent observations can be seen .
We want to work on ToothGrowth dataset . We load ToothGrowth dataset by using following code :
data(“ToothGrowth”)
We check the description of ToothGrowth dataset as :
?ToothGrowth
df <- ToothGrowth
We want to see top four observations of ToothGrowth dataset.
head(df, 4)
We create a box plot by using ggboxplot() function . The arguments of function are :
data – a data frame
x – character string containing the name of x variable
y – character string containing one or more variables to plot
color – outline color
palette – the color palette to be used for coloring or filling by groups .
add – character vector for adding another plot element . We are adding “jitter” in the plot
shape – the shape or symbol to represent different box plots points .
We want to plot box plot with different doses with respect to len or Tooth length .
We can check the dose values by using following code :
unique(df$dose)
p <- ggboxplot(df, x = “dose”, y = “len”,
color = “dose”, palette =c(“#00AFBB”, “#E7B800”, “#FC4E07”),
add = “jitter”, shape = “dose”)
p
We are using stat_compare_means() function to compare p-values to a ggplot for box plots , dot plots and stripcharts .
The arguments of stat_compare_means() are –
comparisons – a list of two length vectors . The entries in vectors are either the names of two values on the x-axis or the two integers that correspond to the index of the groups of interest , to be compared .
We add label.y argument to 50 for absolute positioning of the label .
my_comparisons <- list( c(“0.5”, “1”), c(“1”, “2”), c(“0.5”, “2”) )
p + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
stat_compare_means(label.y = 50)
We create violin plots with box plots inside . We used add.params argument to add different parameters like color , shape , size etc .
ggviolin(df, x = “dose”, y = “len”, fill = “dose”,
palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),
add = “boxplot”, add.params = list(fill = “white”))+
stat_compare_means(comparisons = my_comparisons, label = “p.signif”)+ # Add significance levels
stat_compare_means(label.y = 50) # Add global the p-value
We can also create dot plots and adding mean and standard deviation line in plot .
ggdotplot(df, x = “dose”, y = “len”, color = “dose”, fill = “dose”,
palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),
add = “mean_sd”, add.params = list(color = “black”))
We create a new data frame as :
df3 <- data.frame(supp=rep(c(“AB”, “SK”), each=3),
dose=rep(c(“D0.5”, “D1”, “D2”),2),
len=c(7.2, 12, 34, 5, 8, 34.2))
We print the value of df3 as :
print(df3)
We create a bar plot to fill color on the basis of “supp” group . We use lab.col to specify color of label as white and lab. pos to specifying the position of labels. So , lab.pos defined position as inside the plot.
ggbarplot(df3, x = “dose”, y = “len”,
fill = “supp”, color = “supp”, palette = c(“#00AFBB”, “#E7B800”),
label = TRUE, lab.col = “white”, lab.pos = “in”)
We plot line plots with multiple groups . Here , we want to plot line plots combination of dose and len values. we use shape group by supp values .
ggline(df3, x = “dose”, y = “len”,
linetype = “supp”, shape = “supp”,
color = “supp”, palette = c(“#00AFBB”, “#E7B800”))
We can create a pie chart by using ggpie() function .
We create a data frame df4 as :
df4 <- data.frame(
group = c(“Male”, “Female”, “Child”),
value = c(22, 19, 45))
We check the dataset df4 as :
df4
We create a new variable labs to store the combination of group and values .
labs <- paste0(df4$group, ” (“, df4$value, “%)”)
ggpie(df4, x = “value”, fill = “group”, color = “white”,
palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),
label = labs, lab.pos = “in”, lab.font = “white”)
We want to work with “mtcars” dataset . We load “mtcars” dataset as :
data(“mtcars”)
We create a new object to store mtcars dataset.
dfm <- mtcars
We convert the cyl variable to a factor
dfm$cyl <- as.factor(dfm$cyl)
We add a new column name to store the name of cars .
dfm$name <- rownames(dfm)
We check top observations of dfm dataset
head(dfm[, c(“wt”, “mpg”, “cyl”)])
We create a scatter plot with concentration ellipses and labels . We use repel to avoid overplotting text labels .
ggscatter(dfm, x = “wt”, y = “mpg”,
color = “cyl”, shape = “cyl”,
palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),
ellipse = TRUE, mean.point = TRUE,
rug = TRUE, label = “name”, font.label = 10, repel = TRUE)
We create bar plot and sort data in descending order by using sort.val =desc . We fill color in the bars by cyl values . We set white color to bar borders . We used sort.by.groups as FALSE to not sort data by groups . We used x.text.angle = 90 to rotate x-axis in 90⁰ .
ggbarplot(dfm, x = “name”, y = “mpg”,
fill = “cyl”, # change fill color by cyl
color = “white”, # Set bar border colors to white
palette = “jco”, # jco journal color palett. see ?ggpar
sort.val = “desc”, # Sort the value in dscending order
sort.by.groups = FALSE, # Don’t sort inside each group
x.text.angle = 90 # Rotate vertically x axis texts
)
We change the value of sort.by.groups as TRUE , the data sort by each group .
ggbarplot(dfm, x = “name”, y = “mpg”,
fill = “cyl”, # change fill color by cyl
color = “white”, # Set bar border colors to white
palette = “jco”, # jco journal color palett. see ?ggpar
sort.val = “asc”, # Sort the value in dscending order
sort.by.groups = TRUE, # Sort inside each group
x.text.angle = 90 # Rotate vertically x axis texts
)
We create a dot chart by using following code :
ggdotchart(dfm, x = “name”, y = “mpg”,
color = “cyl”,
palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),
sorting = “ascending”,
add = “segments”,
ggtheme = theme_pubr()
)
We add background theme in Plots window by using ggtheme = theme_pubr()
We create a dot chart on graph between mpg and name of mtcars dataset . You can see various attributes of ggdotchart() function as :
?ggdotchart
ggdotchart(dfm, x = “name”, y = “mpg”,
color = “cyl”, # Color by groups
palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”), # Custom color palette
sorting = “descending”, # Sort value in descending order
add = “segments”, # Add segments from y = 0 to dots
rotate = TRUE, # Rotate vertically
group = “cyl”, # Order by groups
dot.size = 6, # Large dot size
label = round(dfm$mpg), # Add mpg values as dot labels
font.label = list(color = “white”, size = 9,
vjust = 0.5), # Adjust label parameters
ggtheme = theme_pubr() # ggplot2 theme
)