Overlay Discrete and Continuous X Axis Ggplot
Customizing Graphs
Graph defaults are fine for quick data exploration, but when you want to publish your results to a blog, paper, article or poster, you'll probably want to customize the results. Customization can improve the clarity and attractiveness of a graph.
This chapter describes how to customize a graph's axes, gridlines, colors, fonts, labels, and legend. It also describes how to add annotations (text and lines).
Axes
The x-axis and y-axis represent numeric, categorical, or date values. You can modify the default scales and labels with the functions below.
Quantitative axes
A quantitative axis is modified using the              scale_x_continuous              or              scale_y_continuous              function.
Options include
-                 
breaks- a numeric vector of positions
 -                 
limits- a numeric vector with the min and max for the scale 
                                  # customize numerical x and y axes                  library(ggplot2)                  ggplot(mpg,                  aes(x=displ,                  y=hwy))                  +                                                      geom_point()                  +                                                      scale_x_continuous(breaks =                  seq(1,                  7,                  1),                  limits=                  c(1,                  7))                  +                                                      scale_y_continuous(breaks =                  seq(10,                  45,                  5),                  limits=                  c(10,                  45))                                                          
              
Figure 10.1: Customized quantitative axes
Numeric formats
The                scales                package provides a number of functions for formatting numeric labels. Some of the most useful are
-                   
dollar
 -                   
comma
 -                   
percent 
Let's demonstrate these functions with some synthetic data.
                                      # create some data                    set.seed(1234) df <-                                        data.frame(xaxis =                    rnorm(50,                    100000,                    50000),                    yaxis =                    runif(50,                    0,                    1),                    pointsize =                    rnorm(50,                    1000,                    1000))                    library(ggplot2)                    # plot the axes and legend with formats                    ggplot(df,                    aes(x =                    xaxis,                    y =                    yaxis,                    size=pointsize))                    +                                                                                                    geom_point(color =                    "cornflowerblue",                    alpha =                    .6)                    +                                                            scale_x_continuous(label =                    scales::comma)                    +                                                            scale_y_continuous(label =                    scales::percent)                    +                                                            scale_size(range =                    c(1,10),                    # point size range                    label =                    scales::dollar)                                                                  
                
Figure 10.2: Formatted axes
To format currency values as euros, you can use
                label = scales::dollar_format(prefix = "", suffix = "\u20ac").
Categorical axes
A categorical axis is modified using the              scale_x_discrete              or              scale_y_discrete              function.
Options include
-                 
limits- a character vector (the levels of the quantitative variable in the desired order) -                 
labels- a character vector of labels (optional labels for these levels) 
                                  library(ggplot2)                  # customize categorical x axis                  ggplot(mpg,                  aes(x =                  class))                  +                                                      geom_bar(fill =                  "steelblue")                  +                                                      scale_x_discrete(limits =                  c("pickup",                  "suv",                  "minivan",                  "midsize",                  "compact",                  "subcompact",                  "2seater"),                  labels =                  c("Pickup                  \n                  Truck",                  "Sport Utility                  \n                  Vehicle",                  "Minivan",                  "Mid-size",                  "Compact",                  "Subcompact",                  "2-Seater"))                                                          
              
Figure 10.3: Customized categorical axis
Date axes
A date axis is modified using the              scale_x_date              or              scale_y_date              function.
Options include
-                 
date_breaks- a string giving the distance between breaks like "2 weeks" or "10 years" -                 
date_labels- A string giving the formatting specification for the labels 
The table below gives the formatting specifications for date values.
| Symbol | Meaning | Example | 
|---|---|---|
| %d | day as a number (0-31) | 01-31 | 
| %a | abbreviated weekday | Mon | 
| %A | unabbreviated weekday | Monday | 
| %m | month (00-12) | 00-12 | 
| %b | abbreviated month | Jan | 
| %B | unabbreviated month | January | 
| %y | 2-digit year | 07 | 
| %Y | 4-digit year | 2007 | 
                                  library(ggplot2)                  # customize date scale on x axis                  ggplot(economics,                  aes(x =                  date,                  y =                  unemploy))                  +                                                      geom_line(color=                  "darkgreen")                  +                                                      scale_x_date(date_breaks =                  "5 years",                  date_labels =                  "%b-%y")                                                          
              
Figure 10.4: Customized date axis
Here is a help sheet for modifying scales developed from the online help.
Colors
The default colors in            ggplot2            graphs are functional, but often not as visually appealing as they can be. Happily this is easy to change.
Specific colors can be
- specified for points, lines, bars, areas, and text, or
 - mapped to the levels of a variable in the dataset.
 
Specifying colors manually
To specify a color for points, lines, or text, use the              color = "colorname"              option in the appropriate geom. To specify a color for bars and areas, use the              fill = "colorname"              option.
Examples:
-                 
geom_point(color = "blue")
 -                 
geom_bar(fill = "steelblue") 
Colors can be specified by name or hex code.
To assign colors to the levels of a variable, use the              scale_color_manual              and              scale_fill_manual              functions. The former is used to specify the colors for points and lines, while the later is used for bars and areas.
Here is an example, using the              diamonds              dataset that ships with              ggplot2. The dataset contains the prices and attributes of 54,000 round cut diamonds.
                                  # specify fill color manually                  library(ggplot2)                  ggplot(diamonds,                  aes(x =                  cut,                  fill =                  clarity))                  +                                                      geom_bar()                  +                                                      scale_fill_manual(values =                  c("darkred",                  "steelblue",                  "darkgreen",                  "gold",                  "brown",                  "purple",                  "grey",                  "khaki4"))                                                          
              
Figure 10.5: Manual color selection
If you are aesthetically challenged like me, an alternative is to use a predefined palette.
Color palettes
There are many predefined color palettes available in R.
RColorBrewer
The most popular alternative palettes are probably the ColorBrewer palettes.
                                    
                
Figure 10.6: RColorBrewer palettes
You can specify these palettes with the                scale_color_brewer                and                scale_fill_brewer                functions.
                                      # use an ColorBrewer fill palette                    ggplot(diamonds,                    aes(x =                    cut,                    fill =                    clarity))                    +                                                            geom_bar()                    +                                                            scale_fill_brewer(palette =                    "Dark2")                                                                                    
                
Figure 10.7: Using RColorBrewer
Adding                direction = -1                to these functions reverses the order of the colors in a palette.
Viridis
The viridis palette is another popular choice.
For continuous scales use
-                   
scale_fill_viridis_c
 -                   
scale_color_viridis_c 
For discrete (categorical scales) use
-                   
scale_fill_viridis_d
 -                   
scale_color_viridis_d 
                                      # Use a viridis fill palette                    ggplot(diamonds,                    aes(x =                    cut,                    fill =                    clarity))                    +                                                            geom_bar()                    +                                                                                                    scale_fill_viridis_d()                                                                  
                
Figure 10.8: Using the viridis palette
Other palettes
Other palettes to explore include dutchmasters, ggpomological, LaCroixColoR, nord, ochRe, palettetown, pals, rcartocolor, and wesanderson.
If you want to explore all the palette options (or nearly all), take a look at the paletter package.
To learn more about color specifications, see the R Cookpage page on ggplot2 colors. Also see the color choice advice in this book.
Points & Lines
Points
For              ggplot2              graphs, the default point is a filled circle. To specify a different shape, use the              shape = #              option in the              geom_point              function. To map shapes to the levels of a categorical variable use the              shape = variablename              option in the              aes              function.
Examples:
-                 
geom_point(shape = 1)
 - geom_point(
aes(shape = sex)) 
Availabe shapes are given in the table below.
                                
              
Figure 10.9: Point shapes
Shapes 21 through 26 provide for both a fill color and a border color.
Lines
The default line type is a solid line. To change the linetype, use the              linetype = #              option in the              geom_line              function. To map linetypes to the levels of a categorical variable use the              linetype = variablename              option in the              aes              function.
Examples:
-                 
geom_line(linetype = 1)
 - geom_line(
aes(linetype = sex)) 
Availabe linetypes are given in the table below.
                                
              
Figure 10.10: Linetypes
Fonts
R does not have great support for fonts, but with a bit of work, you can change the fonts that appear in your graphs. First you need to install and set-up the            extrafont            package.
                              # one time install                install.packages("extrafont")                library(extrafont)                font_import()                # see what fonts are now available                fonts()                      Apply the new font(s) using the            text            option in the            theme            function.
                              # specify new font                library(extrafont)                ggplot(mpg,                aes(x =                displ,                y=hwy))                +                                                geom_point()                +                                                labs(title =                "Diplacement by Highway Mileage",                subtitle =                "MPG dataset")                +                                                theme(text =                element_text(size =                16,                family =                "Comic Sans MS"))                                                  
            
Figure 10.11: Alternative fonts
To learn more about customizing fonts, see Working with R, Cairo graphics, custom fonts, and ggplot.
Labels
Labels are a key ingredient in rendering a graph understandable. They're are added with the            labs            function. Available options are given below.
| option | Use | 
|---|---|
| title | main title | 
| subtitle | subtitle | 
| caption | caption (bottom right by default) | 
| x | horizontal axis | 
| y | vertical axis | 
| color | color legend title | 
| fill | fill legend title | 
| size | size legend title | 
| linetype | linetype legend title | 
| shape | shape legend title | 
| alpha | transparency legend title | 
| size | size legend title | 
For example
                              # add plot labels                ggplot(mpg,                aes(x =                displ,                y=hwy,                color =                class,                shape =                factor(year)))                +                                                geom_point(size =                3,                alpha =                .5)                +                                                labs(title =                "Mileage by engine displacement",                subtitle =                "Data from 1999 and 2008",                caption =                "Source: EPA (http://fueleconomy.gov)",                x =                "Engine displacement (litres)",                y =                "Highway miles per gallon",                color =                "Car Class",                shape =                "Year")                +                                                                                theme_minimal()                                                  
            
Figure 10.14: Graph with labels
This is not a great graph - it is too busy, making the identification of patterns difficult. It would better to facet the year variable, the class variable or both. Trend lines would also be helpful.
Annotations
Annotations are addition information added to a graph to highlight important points.
Adding text
There are two primary reasons to add text to a graph.
One is to identify the numeric qualities of a geom. For example, we may want to identify points with labels in a scatterplot, or label the heights of bars in a bar chart.
Another reason is to provide additional information. We may want to add notes about the data, point out outliers, etc.
Labeling values
Consider the following scatterplot, based on the car data in the mtcars dataset.
                                      # basic scatterplot                    data(mtcars)                    ggplot(mtcars,                    aes(x =                    wt,                    y =                    mpg))                    +                                                            geom_point()                                                                  
                
Figure 10.15: Simple scatterplot
Let's label each point with the name of the car it represents.
                                      # scatterplot with labels                    data(mtcars)                    ggplot(mtcars,                    aes(x =                    wt,                    y =                    mpg))                    +                                                            geom_point()                    +                                                            geom_text(label =                    row.names(mtcars))                                                                  
                
Figure 10.16: Scatterplot with labels
The overlapping labels make this chart difficult to read. There is a package called                ggrepel                that can help us here.
                                      # scatterplot with non-overlapping labels                    data(mtcars)                    library(ggrepel)                    ggplot(mtcars,                    aes(x =                    wt,                    y =                    mpg))                    +                                                            geom_point()                    +                                                            geom_text_repel(label =                    row.names(mtcars),                    size=                    3)                                                                  
                
Figure 10.17: Scatterplot with non-overlapping labels
Much better.
Adding labels to bar charts is covered in the aptly named labeling bars section.
Adding additional information
We can place text anywhere on a graph using the                annotate                function. The format is
                                      annotate("text",           x, y,                    label =                    "Some text",                    color =                    "colorname",                    size=textsize)                              where                x                and                y                are the coordinates on which to place the text. The                color                and                size                parameters are optional.
By default, the text will be centered. Use                hjust                and                vjust                to change the alignment.
-                   
hjust0 = left justified, 0.5 = centered, and 1 = right centered. -                   
vjust0 = above, 0.5 = centered, and 1 = below. 
Continuing the previous example.
                                      # scatterplot with explanatory text                    data(mtcars)                    library(ggrepel) txt <-                                        paste("The relationship between car weight",                    "and mileage appears to be roughly linear",                    sep =                    "                    \n                    ")                    ggplot(mtcars,                    aes(x =                    wt,                    y =                    mpg))                    +                                                            geom_point(color =                    "red")                    +                                                            geom_text_repel(label =                    row.names(mtcars),                    size=                    3)                    +                                        ggplot2::                    annotate("text",                    6,                    30,                    label=txt,                    color =                    "red",                    hjust =                    1)                    +                                                            theme_bw()                                                                  
                
Figure 10.18: Scatterplot with arranged labels
See this blog post for more details.
Adding lines
Horizontal and vertical lines can be added using:
-                 
geom_hline(yintercept = a) -                 
geom_vline(xintercept = b) 
where              a              is a number on the              y-axis and              b              is a number on the              x-axis respectively. Other option include              linetype              and              color.
                                  # add annotation line and text label                  min_cty <-                                    min(mpg$cty) mean_hwy <-                                    mean(mpg$hwy)                  ggplot(mpg,                  aes(x =                  cty,                  y=hwy,                  color=drv))                  +                                                      geom_point(size =                  3)                  +                                                      geom_hline(yintercept =                  mean_hwy,                  color =                  "darkred",                  linetype =                  "dashed")                  +                                    ggplot2::                  annotate("text",             min_cty,             mean_hwy                  +                                                      1,                  label =                  "Mean",                  color =                  "darkred")                  +                                                      labs(title =                  "Mileage by drive type",                  x =                  "City miles per gallon",                  y =                  "Highway miles per gallon",                  color =                  "Drive")                                                          
              
Figure 10.19: Graph with line annotation
We could add a vertical line for the mean city miles per gallon as well. In any case, always label annotation lines in some way. Otherwise the reader will not know what they mean.
Highlighting a single group
Sometimes you want to highlight a single group in your graph. The              gghighlight              function in the              gghighlight              package is designed for this.
Here is an example with a scatterplot.
                                  # highlight a set of points                  library(ggplot2)                  library(gghighlight)                  ggplot(mpg,                  aes(x =                  cty,                  y =                  hwy))                  +                                                      geom_point(color =                  "red",                  size=                  2)                  +                                                      gghighlight(class                  ==                                      "midsize")                                                          
              
Figure 10.20: Highlighting a group
Below is an example with a bar chart.
                                  # highlight a single bar                  library(gghighlight)                  ggplot(mpg,                  aes(x =                  class))                  +                                                      geom_bar(fill =                  "red")                  +                                                      gghighlight(class                  ==                                      "midsize")                                                          
              
Figure 10.21: Highlighting a group
There is nothing here that could not be done with base graphics, but it is more convenient.
Themes
            ggplot2            themes control the appearance of all non-data related components of a plot. You can change the look and feel of a graph by altering the elements of its theme.
Altering theme elements
The              theme              function is used to modify individual components of a theme.
The parameters of the              theme              function are described in a cheatsheet developed from the online help.
Consider the following graph. It shows the number of male and female faculty by rank and discipline at a particular university in 2008-2009. The data come from the Salaries for Professors dataset.
                                  # create graph                  data(Salaries,                  package =                  "carData") p <-                                    ggplot(Salaries,                  aes(x =                  rank,                  fill =                  sex))                  +                                                      geom_bar()                  +                                                      facet_wrap(~discipline)                  +                                                      labs(title =                  "Academic Rank by Gender and Discipline",                  x =                  "Rank",                  y =                  "Frequency",                  fill =                  "Gender") p                                                          
              
Figure 10.22: Graph with default theme
Let's make some changes to the theme.
- Change label text from black to navy blue
 - Change the panel background color from grey to white
 - Add solid grey lines for major y-axis grid lines
 - Add dashed grey lines for minor y-axis grid lines
 - Eliminate x-axis grid lines
 - Change the strip background color to white with a grey border
 
Using the cheat sheet gives us
                p                  +                                                      theme(text =                  element_text(color =                  "navy"),                  panel.background =                  element_rect(fill =                  "white"),                  panel.grid.major.y =                  element_line(color =                  "grey"),                  panel.grid.minor.y =                  element_line(color =                  "grey",                  linetype =                  "dashed"),                  panel.grid.major.x =                  element_blank(),                  panel.grid.minor.x =                  element_blank(),                  strip.background =                  element_rect(fill =                  "white",                  color=                  "grey"))                                                          
              
Figure 10.23: Graph with modified theme
Wow, this looks pretty awful, but you get the idea.
ggThemeAssist
If you would like to create your own theme using a GUI, take a look at                ggThemeAssist. After you install the package, a new menu item will appear under Addins in RStudio.
                
                Highlight the code that creates your graph, then choose the                ggThemeAssist                option from the                Addins                drop-down menu. You can change many of the features of your theme using point-and-click. When you're done, the                theme                code will be appended to your graph code.
Pre-packaged themes
I'm not a very good artist (just look at the last example), so I often look for pre-packaged themes that can be applied to my graphs. There are many available.
Some come with              ggplot2. These include              theme_classic,              theme_dark,              theme_gray,              theme_grey,              theme_light              theme_linedraw,              theme_minimal, and              theme_void. We've used              theme_minimal              often in this book. Others are available through add-on packages.
ggthemes
The                ggthemes                package come with 19 themes.
| Theme | Description | 
|---|---|
| theme_base | Theme Base | 
| theme_calc | Theme Calc | 
| theme_economist | ggplot color theme based on the Economist | 
| theme_economist_white | ggplot color theme based on the Economist | 
| theme_excel | ggplot color theme based on old Excel plots | 
| theme_few | Theme based on Few's "Practical Rules for Using Color in Charts" | 
| theme_fivethirtyeight | Theme inspired by fivethirtyeight.com plots | 
| theme_foundation | Foundation Theme | 
| theme_gdocs | Theme with Google Docs Chart defaults | 
| theme_hc | Highcharts JS theme | 
| theme_igray | Inverse gray theme | 
| theme_map | Clean theme for maps | 
| theme_pander | A ggplot theme originated from the pander package | 
| theme_par | Theme which takes its values from the current 'base' graphics parameter values in 'par'. | 
| theme_solarized | ggplot color themes based on the Solarized palette | 
| theme_solarized_2 | ggplot color themes based on the Solarized palette | 
| theme_solid | Theme with nothing other than a background color | 
| theme_stata | Themes based on Stata graph schemes | 
| theme_tufte | Tufte Maximal Data, Minimal Ink Theme | 
| theme_wsj | Wall Street Journal theme | 
To demonstrate their use, we'll first create and save a graph.
                                      # create basic plot                    library(ggplot2) p <-                                        ggplot(mpg,                    aes(x =                    displ,                    y=hwy,                    color =                    class))                    +                                                            geom_point(size =                    3,                    alpha =                    .5)                    +                                                            labs(title =                    "Mileage by engine displacement",                    subtitle =                    "Data from 1999 and 2008",                    caption =                    "Source: EPA (http://fueleconomy.gov)",                    x =                    "Engine displacement (litres)",                    y =                    "Highway miles per gallon",                    color =                    "Car Class")                    # display graph                    p                                                                  
                
Figure 10.24: Default theme
Now let's apply some themes.
                                      # add economist theme                    library(ggthemes) p                    +                                                            theme_economist()                                                                                    
                
Figure 10.25: Economist theme
                                      # add fivethirtyeight theme                    p                    +                                                            theme_fivethirtyeight()                                                                  
                
Figure 10.26: Five Thirty Eight theme
                                      # add wsj theme                    p                    +                                                            theme_wsj(base_size=                    8)                                                                  
                
Figure 10.27: Wall Street Journal theme
By default, the font size for the wsj theme is usually too large. Changing the                base_size                option can help.
Each theme also comes with scales for colors and fills. In the next example, both the                few                theme and colors are used.
                                      # add few theme                    p                    +                                                            theme_few()                    +                                                            scale_color_few()                                                                  
                
Figure 10.28: Few theme and colors
Try out different themes and scales to find one that you like.
hrbrthemes
The                hrbrthemes                package is focused on typography-centric themes. The results are charts that tend to have a clean look.
Continuing the example plot from above
                                      # add few theme                    library(hrbrthemes) p                    +                                                            theme_ipsum()                                                                  
                
Figure 10.29: Ipsum theme
See the hrbrthemes homepage for additional examples.
ggthemer
The                ggthemer                package offers a wide range of themes (17 as of this printing).
The package is not available on CRAN and must be installed from GitHub.
                                      # one time install                    install.packages("devtools") devtools::                    install_github('cttobin/ggthemr')                              The functions work a bit differently. Use the                ggthemr("themename")                function to set future graphs to a given theme. Use                ggthemr_reset()                to return future graphs to the                ggplot2                default theme.
Current themes include flat, flat dark, camoflauge, chalk, copper, dust, earth, fresh, grape, grass, greyscale, light, lilac, pale, sea, sky, and solarized.
                                      # set graphs to the flat dark theme                    library(ggthemr)                    ggthemr("flat dark") p                                                                  
                
Figure 10.30: Ipsum theme
I would not actually use this theme for this particular graph. It is difficult to distinguish colors. Which green represents compact cars and which represents subcompact cars?
Select a theme that best conveys the graph's information to your audience.
Source: https://rkabacoff.github.io/datavis/Customizing.html
0 Response to "Overlay Discrete and Continuous X Axis Ggplot"
Post a Comment