“ggpubr” package in R for Data Visualization

ggpubr

We are going to use “ggpubr” package for data visualization .

ggpubr

It provides some easy-to-use functions for creating and customizing “ggplot2” based publication ready plots.

We install “ggpubr” package as:

We load “ggpubr” package as:

We set the seed of random number generator , which is useful for creating random objects can be reproduced.

set.seed(1234)

We are creating a data frame contains variable ‘sex’ and ‘weight’ . We are using rnorn() function to generate random numbers from normal distribution . We are creating first 300 random numbers with mean 45 and next 300 random numbers with mean 49 .  

wdata = data.frame(

  sex = factor(rep(c(“F”, “M”), each=300)),

  weight = c(rnorm(300, 45), rnorm(300, 49)))

We check top four observations of data frame wdata as:

head(wdata, 4)

We create a density plot by using ggdensity()  function.

The first argument specifies the dataset  and x specifies the variable to be drawn . The add argument is used  to add mean line in the plot. We added rug to the plot so that we can display individual plots of density plot. We used color argument to color on the basis of sex value. We used fill argument to fill color according to sex value . We used palette for coloring or filling by group.

ggdensity(wdata, x = “weight”,

          add = “mean”, rug = TRUE,

          color = “sex”, fill = “sex”,

          palette = c(“#00AFBB”, “#E7B800”))

We plot histogram with same options by using gghistogram() function .

gghistogram(wdata, x = “weight”,

            add = “mean”, rug = TRUE,

            color = “sex”, fill = “sex”,

            palette = c(“#00AFBB”, “#E7B800”))

The default value of bins are 30 to plot histogram .

gghistogram(wdata, x = “weight”,

            add = “mean”, rug = TRUE,

            color = “sex”, fill = “sex”,bins = 50,

            palette = c(“#00AFBB”, “#E7B800” ))

We have changes bins equal to 50 to see the difference in histogram formation. Now , the plot is more wider and more frequent observations can be seen .

We want to work on ToothGrowth dataset . We load ToothGrowth dataset by using following code :

data(“ToothGrowth”)

We check the description of ToothGrowth dataset as :

?ToothGrowth

df <- ToothGrowth

We want to see top four observations of ToothGrowth dataset.

head(df, 4)

We create a box plot by using ggboxplot() function . The arguments of function are :

data – a data frame

x – character string containing the name of x variable

y – character string containing one or more variables to plot

color – outline color

palette – the color palette to be used for coloring or filling by groups .

add – character vector for adding another plot element . We are adding “jitter” in the plot

shape – the shape or symbol to represent different box plots points .

We want to plot box plot with different doses with respect to len or Tooth length .

We can check the dose values by using following code :

unique(df$dose)

p <- ggboxplot(df, x = “dose”, y = “len”,

               color = “dose”, palette =c(“#00AFBB”, “#E7B800”, “#FC4E07”),

               add = “jitter”, shape = “dose”)

p

We are using stat_compare_means() function to compare p-values to a ggplot for box plots , dot plots and  stripcharts .

The arguments of stat_compare_means() are –

comparisons – a list of two length vectors . The entries in vectors are either the names of two values on the x-axis or the two integers that correspond to the index of the groups of interest , to be compared .

We add label.y argument to 50 for absolute positioning of the label .

my_comparisons <- list( c(“0.5”, “1”), c(“1”, “2”), c(“0.5”, “2”) )

p + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value

  stat_compare_means(label.y = 50)                  

We create violin plots with box plots inside . We used add.params argument to add different parameters like color  , shape , size etc .

ggviolin(df, x = “dose”, y = “len”, fill = “dose”,

         palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),

         add = “boxplot”, add.params = list(fill = “white”))+

  stat_compare_means(comparisons = my_comparisons, label = “p.signif”)+ # Add significance levels

  stat_compare_means(label.y = 50)                                      # Add global the p-value

We can also create dot plots and adding mean and standard deviation line in plot .

ggdotplot(df, x = “dose”, y = “len”, color = “dose”, fill = “dose”,

          palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),

          add = “mean_sd”, add.params = list(color = “black”))

We create a new data frame as :

df3 <- data.frame(supp=rep(c(“AB”, “SK”), each=3),

                  dose=rep(c(“D0.5”, “D1”, “D2”),2),

                  len=c(7.2, 12, 34, 5, 8, 34.2))

We print the value of df3 as :

print(df3)

We create a bar plot to fill color on the basis of “supp” group . We use lab.col to specify color of label as white and lab. pos to specifying the position of labels. So , lab.pos defined position as inside the plot.

ggbarplot(df3, x = “dose”, y = “len”,

          fill = “supp”, color = “supp”, palette = c(“#00AFBB”, “#E7B800”),

          label = TRUE, lab.col = “white”, lab.pos = “in”)

We plot line plots with multiple groups . Here , we want to plot line plots combination of dose and len values. we use shape group by supp values .

ggline(df3, x = “dose”, y = “len”,

       linetype = “supp”, shape = “supp”,

       color = “supp”,  palette = c(“#00AFBB”, “#E7B800”))

We can create a pie chart by using ggpie() function .

We create a data frame df4 as :

df4 <- data.frame(

  group = c(“Male”, “Female”, “Child”),

  value = c(22, 19, 45))

We check the dataset  df4 as :

df4

We create a new variable labs to store the combination of group and values .

labs <- paste0(df4$group, ” (“, df4$value, “%)”)

ggpie(df4, x = “value”, fill = “group”, color = “white”,

      palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),

      label = labs, lab.pos = “in”, lab.font = “white”)

We want to work with “mtcars” dataset . We load “mtcars” dataset as :

data(“mtcars”)

We create a new object to store mtcars dataset.

dfm <- mtcars

We convert the cyl variable to a factor

dfm$cyl <- as.factor(dfm$cyl)

We add a new column name to store the name of cars .

dfm$name <- rownames(dfm)

We check top observations of dfm dataset

head(dfm[, c(“wt”, “mpg”, “cyl”)])

We create a scatter plot with concentration ellipses and labels . We use repel to avoid overplotting text labels .

ggscatter(dfm, x = “wt”, y = “mpg”,

          color = “cyl”, shape = “cyl”,

          palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),

          ellipse = TRUE, mean.point = TRUE,

          rug = TRUE, label = “name”, font.label = 10, repel = TRUE)

We create bar plot and sort data in descending order by using sort.val =desc . We fill color in the bars by cyl values  . We set white color to bar borders . We used sort.by.groups as FALSE to not sort data by groups . We used x.text.angle = 90 to rotate x-axis in 90⁰ . 

ggbarplot(dfm, x = “name”, y = “mpg”,

          fill = “cyl”,               # change fill color by cyl

          color = “white”,            # Set bar border colors to white

          palette = “jco”,            # jco journal color palett. see ?ggpar

          sort.val = “desc”,          # Sort the value in dscending order

          sort.by.groups = FALSE,     # Don’t sort inside each group

          x.text.angle = 90           # Rotate vertically x axis texts

)

We change the value of sort.by.groups as TRUE , the data sort by each group .

ggbarplot(dfm, x = “name”, y = “mpg”,

          fill = “cyl”,               # change fill color by cyl

          color = “white”,            # Set bar border colors to white

          palette = “jco”,            # jco journal color palett. see ?ggpar

          sort.val = “asc”,           # Sort the value in dscending order

          sort.by.groups = TRUE,      # Sort inside each group

          x.text.angle = 90           # Rotate vertically x axis texts

)

We create a dot chart by using following code :

ggdotchart(dfm, x = “name”, y = “mpg”,

           color = “cyl”,                                

           palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”),

           sorting = “ascending”,                

           add = “segments”,                      

           ggtheme = theme_pubr()          

)

We add background theme in Plots window by using ggtheme = theme_pubr()

We create a dot chart on graph between mpg and name of mtcars dataset . You can see various attributes of ggdotchart() function as :

?ggdotchart

ggdotchart(dfm, x = “name”, y = “mpg”,

           color = “cyl”,                                # Color by groups

           palette = c(“#00AFBB”, “#E7B800”, “#FC4E07”), # Custom color palette

           sorting = “descending”,                       # Sort value in descending order

           add = “segments”,                             # Add segments from y = 0 to dots

           rotate = TRUE,                                # Rotate vertically

           group = “cyl”,                                # Order by groups

           dot.size = 6,                                 # Large dot size

           label = round(dfm$mpg),                        # Add mpg values as dot labels

           font.label = list(color = “white”, size = 9,

                             vjust = 0.5),               # Adjust label parameters

           ggtheme = theme_pubr()                        # ggplot2 theme

)

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top