-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy pathTop50-Ggplot2-Visualizations-MasterList-R-Code.html
1501 lines (1358 loc) · 149 KB
/
Top50-Ggplot2-Visualizations-MasterList-R-Code.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<title>Top 50 ggplot2 Visualizations - The Master List (With Full R Code)</title>
<meta charset="utf-8">
<meta name="Description" content="R Language Tutorials for Advanced Statistics">
<meta name="Keywords" content="R, Tutorial, Machine learning, Statistics, Data Mining, Analytics, Data science, Linear Regression, Logistic Regression, Time series, Forecasting">
<meta name="Distribution" content="Global">
<meta name="Author" content="Selva Prabhakaran">
<meta name="Robots" content="index, follow">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="shortcut icon" href="/screenshots/iconb-64.png" type="image/x-icon" />
<link href="www/bootstrap.min.css" rel="stylesheet">
<link href="www/highlight.css" rel="stylesheet">
<link href='http://fonts.googleapis.com/css?family=Inconsolata:400,700'
rel='stylesheet' type='text/css'>
<!-- Color Script -->
<style type="text/css">
a {
color: #3675C5;
color: rgb(25, 145, 248);
color: #4582ec;
color: #3F73D8;
}
li {
line-height: 1.65;
}
/* reduce spacing around math formula*/
.MathJax_Display {
margin: 0em 0em;
}
</style>
<!-- Add Google search -->
<script language="Javascript" type="text/javascript">
function my_search_google()
{
var query = document.getElementById("my-google-search").value;
window.open("http://google.com/search?q=" + query
+ "%20site:" + "http://r-statistics.co");
}
</script>
</head>
<body>
<div class="container">
<div class="masthead">
<!--
<ul class="nav nav-pills pull-right">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">
Table of contents<b class="caret"></b>
</a>
<ul class="dropdown-menu pull-right" role="menu">
<li class="dropdown-header"></li>
<li class="dropdown-header">Tutorial</li>
<li><a href="R-Tutorial.html">R Tutorial</a></li>
<li class="dropdown-header">ggplot2</li>
<li><a href="ggplot2-Tutorial-With-R.html">ggplot2 Short Tutorial</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part1-With-R-Code.html">ggplot2 Tutorial 1 - Intro</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html">ggplot2 Tutorial 2 - Theme</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html">ggplot2 Tutorial 3 - Masterlist</a></li>
<li><a href="ggplot2-cheatsheet.html">ggplot2 Quickref</a></li>
<li class="dropdown-header">Foundations</li>
<li><a href="Linear-Regression.html">Linear Regression</a></li>
<li><a href="Statistical-Tests-in-R.html">Statistical Tests</a></li>
<li><a href="Missing-Value-Treatment-With-R.html">Missing Value Treatment</a></li>
<li><a href="Outlier-Treatment-With-R.html">Outlier Analysis</a></li>
<li><a href="Variable-Selection-and-Importance-With-R.html">Feature Selection</a></li>
<li><a href="Model-Selection-in-R.html">Model Selection</a></li>
<li><a href="Logistic-Regression-With-R.html">Logistic Regression</a></li>
<li><a href="Environments.html">Advanced Linear Regression</a></li>
<li class="dropdown-header">Advanced Regression Models</li>
<li><a href="adv-regression-models.html">Advanced Regression Models</a></li>
<li class="dropdown-header">Time Series</li>
<li><a href="Time-Series-Analysis-With-R.html">Time Series Analysis</a></li>
<li><a href="Time-Series-Forecasting-With-R.html">Time Series Forecasting </a></li>
<li><a href="Time-Series-Forecasting-With-R-part2.html">More Time Series Forecasting</a></li>
<li class="dropdown-header">High Performance Computing</li>
<li><a href="Parallel-Computing-With-R.html">Parallel computing</a></li>
<li><a href="Strategies-To-Improve-And-Speedup-R-Code.html">Strategies to Speedup R code</a></li>
<li class="dropdown-header">Useful Techniques</li>
<li><a href="Association-Mining-With-R.html">Association Mining</a></li>
<li><a href="Multi-Dimensional-Scaling-With-R.html">Multi Dimensional Scaling</a></li>
<li><a href="Profiling.html">Optimization</a></li>
<li><a href="Information-Value-With-R.html">InformationValue package</a></li>
</ul>
</li>
</ul>
-->
<ul class="nav nav-pills pull-right">
<div class="input-group">
<form onsubmit="my_search_google()">
<input type="text" class="form-control" id="my-google-search" placeholder="Search..">
<form>
</div><!-- /input-group -->
</ul><!-- /.col-lg-6 -->
<h3 class="muted"><a href="/">r-statistics.co</a><small> by Selva Prabhakaran</small></h3>
<hr>
</div>
<div class="row">
<div class="col-xs-12 col-sm-3" id="nav">
<div class="well">
<li>
<ul class="list-unstyled">
<li class="dropdown-header"></li>
<li class="dropdown-header">Tutorial</li>
<li><a href="R-Tutorial.html">R Tutorial</a></li>
<li class="dropdown-header">ggplot2</li>
<li><a href="ggplot2-Tutorial-With-R.html">ggplot2 Short Tutorial</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part1-With-R-Code.html">ggplot2 Tutorial 1 - Intro</a></li>
<li><a href="Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html">ggplot2 Tutorial 2 - Theme</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html">ggplot2 Tutorial 3 - Masterlist</a></li>
<li><a href="ggplot2-cheatsheet.html">ggplot2 Quickref</a></li>
<li class="dropdown-header">Foundations</li>
<li><a href="Linear-Regression.html">Linear Regression</a></li>
<li><a href="Statistical-Tests-in-R.html">Statistical Tests</a></li>
<li><a href="Missing-Value-Treatment-With-R.html">Missing Value Treatment</a></li>
<li><a href="Outlier-Treatment-With-R.html">Outlier Analysis</a></li>
<li><a href="Variable-Selection-and-Importance-With-R.html">Feature Selection</a></li>
<li><a href="Model-Selection-in-R.html">Model Selection</a></li>
<li><a href="Logistic-Regression-With-R.html">Logistic Regression</a></li>
<li><a href="Environments.html">Advanced Linear Regression</a></li>
<li class="dropdown-header">Advanced Regression Models</li>
<li><a href="adv-regression-models.html">Advanced Regression Models</a></li>
<li class="dropdown-header">Time Series</li>
<li><a href="Time-Series-Analysis-With-R.html">Time Series Analysis</a></li>
<li><a href="Time-Series-Forecasting-With-R.html">Time Series Forecasting </a></li>
<li><a href="Time-Series-Forecasting-With-R-part2.html">More Time Series Forecasting</a></li>
<li class="dropdown-header">High Performance Computing</li>
<li><a href="Parallel-Computing-With-R.html">Parallel computing</a></li>
<li><a href="Strategies-To-Improve-And-Speedup-R-Code.html">Strategies to Speedup R code</a></li>
<li class="dropdown-header">Useful Techniques</li>
<li><a href="Association-Mining-With-R.html">Association Mining</a></li>
<li><a href="Multi-Dimensional-Scaling-With-R.html">Multi Dimensional Scaling</a></li>
<li><a href="Profiling.html">Optimization</a></li>
<li><a href="Information-Value-With-R.html">InformationValue package</a></li>
</ul>
</li>
</div>
<div class="well">
<p>Stay up-to-date. <a href="https://docs.google.com/forms/d/1xkMYkLNFU9U39Dd8S_2JC0p8B5t6_Yq6zUQjanQQJpY/viewform">Subscribe!</a></p>
<p><a href="https://docs.google.com/forms/d/13GrkCFcNa-TOIllQghsz2SIEbc-YqY9eJX02B19l5Ow/viewform">Chat!</a></p>
</div>
<h4>Contents</h4>
<ul class="list-unstyled" id="toc"></ul>
<!--
<hr>
<p><a href="/contribute.html">How to contribute</a></p>
<p><a class="btn btn-primary" href="">Edit this page</a></p>
-->
</div>
<div id="content" class="col-xs-12 col-sm-8 pull-right">
<h1>Top 50 ggplot2 Visualizations - The Master List (With Full R Code)</h1>
<blockquote>
<p>What type of visualization to use for what sort of problem? This tutorial helps you choose the right type of chart for your specific objectives and how to implement it in R using ggplot2.</p>
</blockquote>
<p>This is part 3 of a three part tutorial on ggplot2, an aesthetically pleasing (and very popular) graphics framework in R. This tutorial is primarily geared towards those having some basic knowledge of the R programming language and want to make complex and nice looking charts with R ggplot2.</p>
<ul>
<li><p><a href="Complete-Ggplot2-Tutorial-Part1-With-R-Code.html">Part 1: Introduction to ggplot2</a>, covers the basic knowledge about constructing simple ggplots and modifying the components and aesthetics.</p></li>
<li><p><a href="Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html">Part 2: Customizing the Look and Feel</a>, is about more advanced customization like manipulating legend, annotations, multiplots with faceting and custom layouts</p></li>
<li><p><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html">Part 3: Top 50 ggplot2 Visualizations - The Master List</a>, applies what was learnt in part 1 and 2 to construct other types of ggplots such as bar charts, boxplots etc.</p></li>
</ul>
<h3><a name="top"></a>Top 50 ggplot2 Visualizations - The Master List</h3>
<p>An effective chart is one that:</p>
<ol style="list-style-type: decimal">
<li>Conveys the right information without distorting facts.</li>
<li>Is simple but elegant. It should not force you to think much in order to get it.</li>
<li>Aesthetics supports information rather that overshadow it.</li>
<li>Is not overloaded with information.</li>
</ol>
<p>The list below sorts the visualizations based on its primary purpose. Primarily, there are 8 types of objectives you may construct plots. So, before you actually make the plot, try and figure what findings and relationships you would like to convey or examine through the visualization. Chances are it will fall under one (or sometimes more) of these 8 categories.</p>
<ol style="list-style-type: decimal">
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#1.%20Correlation">Correlation</a>
<ul>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Scatterplot">Scatterplot</a></li>
<li><a href="#Scatterplot%20With%20Encircling">Scatterplot With Encircling</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Jitter%20Plot">Jitter Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Counts%20Chart">Counts Chart</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Bubble%20Plot">Bubble Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Animated%20Bubble%20Plot">Animated Bubble Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Marginal%20Histogram%20/%20Boxplot">Marginal Histogram / Boxplot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Correlogram">Correlogram</a></li>
</ul></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#2.%20Deviation">Deviation</a>
<ul>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Diverging%20Bars">Diverging Bars</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Diverging%20Lollipop%20Chart">Diverging Lollipop Chart</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Diverging%20Dot%20Plot">Diverging Dot Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Area%20Chart">Area Chart</a></li>
</ul></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#3.%20Ranking">Ranking</a>
<ul>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Ordered%20Bar%20Chart">Ordered Bar Chart</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Lollipop%20Chart">Lollipop Chart</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Dot%20Plot">Dot Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Slope%20Chart">Slope Chart</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Dumbbell%20Plot">Dumbbell Plot</a></li>
</ul></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#4.%20Distribution">Distribution</a>
<ul>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Histogram">Histogram</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Density%20Plot">Density Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Box%20Plot">Box Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Dot%20+%20Box%20Plot">Dot + Box Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Tufte%20Boxplot">Tufte Boxplot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Violin%20Plot">Violin Plot</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Population%20Pyramid">Population Pyramid</a></li>
</ul></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#5.%20Composition">Composition</a>
<ul>
<li><a href="#Waffle%20Chart">Waffle Chart</a></li>
<li><a href="#Pie%20Chart">Pie Chart</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Treemap">Treemap</a></li>
<li><a href="#Bar%20Chart">Bar Chart</a></li>
</ul></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#6.%20Change">Change</a>
<ul>
<li><a href="#Time%20Series%20Plot%20From%20a%20Time%20Series%20Object">Time Series Plots</a>
<ul>
<li><a href="#Time%20Series%20Plot%20From%20a%20Data%20Frame">From a Data Frame</a></li>
<li><a href="#Time%20Series%20Plot%20For%20a%20Monthly%20Time%20Series">Format to Monthly X Axis</a></li>
<li><a href="#Time%20Series%20Plot%20For%20a%20Yearly%20Time%20Series">Format to Yearly X Axis</a></li>
<li><a href="#Time%20Series%20Plot%20From%20Long%20Data%20Format">From Long Data Format</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Time%20Series%20Plot%20From%20Wide%20Data%20Format">From Wide Data Format</a></li>
</ul></li>
<li><a href="#Stacked%20Area%20Chart">Stacked Area Chart</a></li>
<li><a href="#Calendar%20Heat%20Map">Calendar Heat Map</a></li>
<li><a href="#Slope%20Chart%202">Slope Chart</a></li>
<li><a href="#Seasonal%20Plot">Seasonal Plot</a></li>
</ul></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#7.%20Groups">Groups</a>
<ul>
<li><a href="#Hierarchical%20Dendrogram">Dendrogram</a></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Clusters">Clusters</a></li>
</ul></li>
<li><a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#8.%20Spatial">Spatial</a>
<ul>
<li><a href="#Open%20Street%20Map">Open Street Map</a></li>
<li><a href="#Google%20Road%20Map">Google Road Map</a></li>
<li><a href="#Google%20Hybrid%20Map">Google Hybrid Map</a></li>
</ul></li>
</ol>
<h2>1. Correlation</h2>
<p>The following plots help to examine how well correlated two variables are.</p>
<h3><a name="Scatterplot"></a>Scatterplot</h3>
<p>The most frequently used plot for data analysis is undoubtedly the scatterplot. Whenever you want to understand the nature of relationship between two variables, invariably the first choice is the scatterplot.</p>
<p>It can be drawn using <code>geom_point()</code>. Additionally, <code>geom_smooth</code> which draws a smoothing line (based on loess) by default, can be tweaked to draw the line of best fit by setting <code>method='lm'</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># install.packages("ggplot2")</span>
<span class="co"># load package and data</span>
<span class="kw">options</span>(<span class="dt">scipen=</span><span class="dv">999</span>) <span class="co"># turn-off scientific notation like 1e+48</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>()) <span class="co"># pre-set the bw theme.</span>
<span class="kw">data</span>(<span class="st">"midwest"</span>, <span class="dt">package =</span> <span class="st">"ggplot2"</span>)
<span class="co"># midwest <- read.csv("http://goo.gl/G1K41K") # bkup data source</span>
<span class="co"># Scatterplot</span>
gg <-<span class="st"> </span><span class="kw">ggplot</span>(midwest, <span class="kw">aes</span>(<span class="dt">x=</span>area, <span class="dt">y=</span>poptotal)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">col=</span>state, <span class="dt">size=</span>popdensity)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_smooth</span>(<span class="dt">method=</span><span class="st">"loess"</span>, <span class="dt">se=</span>F) +<span class="st"> </span>
<span class="st"> </span><span class="kw">xlim</span>(<span class="kw">c</span>(<span class="dv">0</span>, <span class="fl">0.1</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">ylim</span>(<span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">500000</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">subtitle=</span><span class="st">"Area Vs Population"</span>,
<span class="dt">y=</span><span class="st">"Population"</span>,
<span class="dt">x=</span><span class="st">"Area"</span>,
<span class="dt">title=</span><span class="st">"Scatterplot"</span>,
<span class="dt">caption =</span> <span class="st">"Source: midwest"</span>)
<span class="kw">plot</span>(gg)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_1.png" alt="ggplot2 Scatterplot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Scatterplot With Encircling"></a>Scatterplot With Encircling</h3>
<p>When presenting the results, sometimes I would encirlce certain special group of points or region in the chart so as to draw the attention to those peculiar cases. This can be conveniently done using the <code>geom_encircle()</code> in <code>ggalt</code> package.</p>
<p>Within <code>geom_encircle()</code>, set the <code>data</code> to a new dataframe that contains only the points (rows) or interest. Moreover, You can <code>expand</code> the curve so as to pass just outside the points. The <code>color</code> and <code>size</code> (thickness) of the curve can be modified as well. See below example.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># install 'ggalt' pkg</span>
<span class="co"># devtools::install_github("hrbrmstr/ggalt")</span>
<span class="kw">options</span>(<span class="dt">scipen =</span> <span class="dv">999</span>)
<span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(ggalt)
midwest_select <-<span class="st"> </span>midwest[midwest$poptotal ><span class="st"> </span><span class="dv">350000</span> &<span class="st"> </span>
<span class="st"> </span>midwest$poptotal <=<span class="st"> </span><span class="dv">500000</span> &<span class="st"> </span>
<span class="st"> </span>midwest$area ><span class="st"> </span><span class="fl">0.01</span> &<span class="st"> </span>
<span class="st"> </span>midwest$area <<span class="st"> </span><span class="fl">0.1</span>, ]
<span class="co"># Plot</span>
<span class="kw">ggplot</span>(midwest, <span class="kw">aes</span>(<span class="dt">x=</span>area, <span class="dt">y=</span>poptotal)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">col=</span>state, <span class="dt">size=</span>popdensity)) +<span class="st"> </span><span class="co"># draw points</span>
<span class="st"> </span><span class="kw">geom_smooth</span>(<span class="dt">method=</span><span class="st">"loess"</span>, <span class="dt">se=</span>F) +<span class="st"> </span>
<span class="st"> </span><span class="kw">xlim</span>(<span class="kw">c</span>(<span class="dv">0</span>, <span class="fl">0.1</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">ylim</span>(<span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">500000</span>)) +<span class="st"> </span><span class="co"># draw smoothing line</span>
<span class="st"> </span><span class="kw">geom_encircle</span>(<span class="kw">aes</span>(<span class="dt">x=</span>area, <span class="dt">y=</span>poptotal),
<span class="dt">data=</span>midwest_select,
<span class="dt">color=</span><span class="st">"red"</span>,
<span class="dt">size=</span><span class="dv">2</span>,
<span class="dt">expand=</span><span class="fl">0.08</span>) +<span class="st"> </span><span class="co"># encircle</span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">subtitle=</span><span class="st">"Area Vs Population"</span>,
<span class="dt">y=</span><span class="st">"Population"</span>,
<span class="dt">x=</span><span class="st">"Area"</span>,
<span class="dt">title=</span><span class="st">"Scatterplot + Encircle"</span>,
<span class="dt">caption=</span><span class="st">"Source: midwest"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_2.png" alt="ggplot2 Scatterplot With Encircling" /> <a href="#top">[Back to Top]</a></p>
<h3><a name="Jitter Plot"></a>Jitter Plot</h3>
<p>Let’s look at a new data to draw the scatterplot. This time, I will use the <code>mpg</code> dataset to plot city mileage (<code>cty</code>) vs highway mileage (<code>hwy</code>).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># load package and data</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">data</span>(mpg, <span class="dt">package=</span><span class="st">"ggplot2"</span>) <span class="co"># alternate source: "http://goo.gl/uEeRGu")</span>
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>()) <span class="co"># pre-set the bw theme.</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(cty, hwy))
<span class="co"># Scatterplot</span>
g +<span class="st"> </span><span class="kw">geom_point</span>() +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_smooth</span>(<span class="dt">method=</span><span class="st">"lm"</span>, <span class="dt">se=</span>F) +
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">subtitle=</span><span class="st">"mpg: city vs highway mileage"</span>,
<span class="dt">y=</span><span class="st">"hwy"</span>,
<span class="dt">x=</span><span class="st">"cty"</span>,
<span class="dt">title=</span><span class="st">"Scatterplot with overlapping points"</span>,
<span class="dt">caption=</span><span class="st">"Source: midwest"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_3.png" alt="ggplot2 Scatterplot With Hidden Data points" /></p>
<p>What we have here is a scatterplot of city and highway mileage in <code>mpg</code> dataset. We have seen a similar scatterplot and this looks neat and gives a clear idea of how the city mileage (<code>cty</code>) and highway mileage (<code>hwy</code>) are well correlated.</p>
<p>But, this innocent looking plot is <em>hiding</em> something. Can you find out?</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">dim</span>(mpg)</code></pre></div>
<p>The original data has 234 data points but the chart seems to display fewer points. What has happened? This is because there are many overlapping points appearing as a single dot. The fact that both <code>cty</code> and <code>hwy</code> are integers in the source dataset made it all the more convenient to hide this detail. So just be extra careful the next time you make scatterplot with integers.</p>
<p>So how to handle this? There are few options. We can make a jitter plot with <code>jitter_geom()</code>. As the name suggests, the overlapping points are randomly jittered around its original position based on a threshold controlled by the <code>width</code> argument.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># load package and data</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">data</span>(mpg, <span class="dt">package=</span><span class="st">"ggplot2"</span>)
<span class="co"># mpg <- read.csv("http://goo.gl/uEeRGu")</span>
<span class="co"># Scatterplot</span>
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>()) <span class="co"># pre-set the bw theme.</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(cty, hwy))
g +<span class="st"> </span><span class="kw">geom_jitter</span>(<span class="dt">width =</span> .<span class="dv">5</span>, <span class="dt">size=</span><span class="dv">1</span>) +
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">subtitle=</span><span class="st">"mpg: city vs highway mileage"</span>,
<span class="dt">y=</span><span class="st">"hwy"</span>,
<span class="dt">x=</span><span class="st">"cty"</span>,
<span class="dt">title=</span><span class="st">"Jittered Points"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_4.png" alt="ggplot2 Jitter Plot" /> More points are revealed now. More the <code>width</code>, more the points are moved jittered from their original position.</p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Counts Chart"></a>Counts Chart</h3>
<p>The second option to overcome the problem of data points overlap is to use what is called a <em>counts chart</em>. Whereever there is more points overlap, the size of the circle gets bigger.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># load package and data</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">data</span>(mpg, <span class="dt">package=</span><span class="st">"ggplot2"</span>)
<span class="co"># mpg <- read.csv("http://goo.gl/uEeRGu")</span>
<span class="co"># Scatterplot</span>
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>()) <span class="co"># pre-set the bw theme.</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(cty, hwy))
g +<span class="st"> </span><span class="kw">geom_count</span>(<span class="dt">col=</span><span class="st">"tomato3"</span>, <span class="dt">show.legend=</span>F) +
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">subtitle=</span><span class="st">"mpg: city vs highway mileage"</span>,
<span class="dt">y=</span><span class="st">"hwy"</span>,
<span class="dt">x=</span><span class="st">"cty"</span>,
<span class="dt">title=</span><span class="st">"Counts Plot"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_5.png" alt="ggplot2 Counts Plot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Bubble Plot"></a>Bubble plot</h3>
<p>While scatterplot lets you compare the relationship between 2 continuous variables, bubble chart serves well if you want to understand relationship within the underlying groups based on:</p>
<ol style="list-style-type: decimal">
<li>A Categorical variable (by changing the color) and</li>
<li>Another continuous variable (by changing the size of points).</li>
</ol>
<p>In simpler words, bubble charts are more suitable if you have 4-Dimensional data where two of them are numeric (X and Y) and one other categorical (color) and another numeric variable (size).</p>
<p>The bubble chart clearly distinguishes the range of <code>displ</code> between the manufacturers and how the slope of lines-of-best-fit varies, providing a better visual comparison between the groups.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># load package and data</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">data</span>(mpg, <span class="dt">package=</span><span class="st">"ggplot2"</span>)
<span class="co"># mpg <- read.csv("http://goo.gl/uEeRGu")</span>
mpg_select <-<span class="st"> </span>mpg[mpg$manufacturer %in%<span class="st"> </span><span class="kw">c</span>(<span class="st">"audi"</span>, <span class="st">"ford"</span>, <span class="st">"honda"</span>, <span class="st">"hyundai"</span>), ]
<span class="co"># Scatterplot</span>
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>()) <span class="co"># pre-set the bw theme.</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg_select, <span class="kw">aes</span>(displ, cty)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">subtitle=</span><span class="st">"mpg: Displacement vs City Mileage"</span>,
<span class="dt">title=</span><span class="st">"Bubble chart"</span>)
g +<span class="st"> </span><span class="kw">geom_jitter</span>(<span class="kw">aes</span>(<span class="dt">col=</span>manufacturer, <span class="dt">size=</span>hwy)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_smooth</span>(<span class="kw">aes</span>(<span class="dt">col=</span>manufacturer), <span class="dt">method=</span><span class="st">"lm"</span>, <span class="dt">se=</span>F)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_6.png" alt="ggplot2 Bubble Plot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Animated Bubble Plot"></a>Animated Bubble chart</h3>
<p>An animated bubble chart can be implemented using the <code>gganimate</code> package. It is same as the bubble chart, but, you have to show how the values change over a fifth dimension (typically time).</p>
<p>The key thing to do is to set the <code>aes(frame)</code> to the desired column on which you want to animate. Rest of the procedure related to plot construction is the same. Once the plot is constructed, you can animate it using <code>gganimate()</code> by setting a chosen <code>interval</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Source: https://github.com/dgrtwo/gganimate</span>
<span class="co"># install.packages("cowplot") # a gganimate dependency</span>
<span class="co"># devtools::install_github("dgrtwo/gganimate")</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(gganimate)
<span class="kw">library</span>(gapminder)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>()) <span class="co"># pre-set the bw theme.</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(gapminder, <span class="kw">aes</span>(gdpPercap, lifeExp, <span class="dt">size =</span> pop, <span class="dt">frame =</span> year)) +
<span class="st"> </span><span class="kw">geom_point</span>() +
<span class="st"> </span><span class="kw">geom_smooth</span>(<span class="kw">aes</span>(<span class="dt">group =</span> year),
<span class="dt">method =</span> <span class="st">"lm"</span>,
<span class="dt">show.legend =</span> <span class="ot">FALSE</span>) +
<span class="st"> </span><span class="kw">facet_wrap</span>(~continent, <span class="dt">scales =</span> <span class="st">"free"</span>) +
<span class="st"> </span><span class="kw">scale_x_log10</span>() <span class="co"># convert to log scale</span>
<span class="kw">gganimate</span>(g, <span class="dt">interval=</span><span class="fl">0.2</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_7.gif" alt="ggplot2 Animated Bubble Plot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Marginal Histogram / Boxplot"></a>Marginal Histogram / Boxplot</h3>
<p>If you want to show the relationship as well as the distribution in the same chart, use the marginal histogram. It has a histogram of the X and Y variables at the margins of the scatterplot.</p>
<p>This can be implemented using the <code>ggMarginal()</code> function from the ‘<code>ggExtra</code>’ package. Apart from a <code>histogram</code>, you could choose to draw a marginal <code>boxplot</code> or <code>density</code> plot by setting the respective <code>type</code> option.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># load package and data</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(ggExtra)
<span class="kw">data</span>(mpg, <span class="dt">package=</span><span class="st">"ggplot2"</span>)
<span class="co"># mpg <- read.csv("http://goo.gl/uEeRGu")</span>
<span class="co"># Scatterplot</span>
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>()) <span class="co"># pre-set the bw theme.</span>
mpg_select <-<span class="st"> </span>mpg[mpg$hwy >=<span class="st"> </span><span class="dv">35</span> &<span class="st"> </span>mpg$cty ><span class="st"> </span><span class="dv">27</span>, ]
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(cty, hwy)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_count</span>() +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_smooth</span>(<span class="dt">method=</span><span class="st">"lm"</span>, <span class="dt">se=</span>F)
<span class="kw">ggMarginal</span>(g, <span class="dt">type =</span> <span class="st">"histogram"</span>, <span class="dt">fill=</span><span class="st">"transparent"</span>)
<span class="kw">ggMarginal</span>(g, <span class="dt">type =</span> <span class="st">"boxplot"</span>, <span class="dt">fill=</span><span class="st">"transparent"</span>)
<span class="co"># ggMarginal(g, type = "density", fill="transparent")</span></code></pre></div>
<p><img src="screenshots/ggplot_masterlist_8_1.png" alt="ggplot2 Marginal Histogram" /> <br> <br> <img src="screenshots/ggplot_masterlist_8_2.png" alt="ggplot2 Marginal Histogram" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Correlogram"></a>Correlogram</h3>
<p>Correlogram let’s you examine the corellation of multiple continuous variables present in the same dataframe. This is conveniently implemented using the <code>ggcorrplot</code> package.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># devtools::install_github("kassambara/ggcorrplot")</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(ggcorrplot)
<span class="co"># Correlation matrix</span>
<span class="kw">data</span>(mtcars)
corr <-<span class="st"> </span><span class="kw">round</span>(<span class="kw">cor</span>(mtcars), <span class="dv">1</span>)
<span class="co"># Plot</span>
<span class="kw">ggcorrplot</span>(corr, <span class="dt">hc.order =</span> <span class="ot">TRUE</span>,
<span class="dt">type =</span> <span class="st">"lower"</span>,
<span class="dt">lab =</span> <span class="ot">TRUE</span>,
<span class="dt">lab_size =</span> <span class="dv">3</span>,
<span class="dt">method=</span><span class="st">"circle"</span>,
<span class="dt">colors =</span> <span class="kw">c</span>(<span class="st">"tomato2"</span>, <span class="st">"white"</span>, <span class="st">"springgreen3"</span>),
<span class="dt">title=</span><span class="st">"Correlogram of mtcars"</span>,
<span class="dt">ggtheme=</span>theme_bw)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_9.png" alt="ggplot2 Correlogram" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h2>2. Deviation</h2>
<p>Compare variation in <em>values</em> between small number of items (or categories) with respect to a fixed reference.</p>
<h3><a name="Diverging Bars"></a>Diverging bars</h3>
<p>Diverging Bars is a bar chart that can handle both negative and positive values. This can be implemented by a smart tweak with <code>geom_bar()</code>. But the usage of <code>geom_bar()</code> can be quite confusing. Thats because, it can be used to make a bar chart as well as a histogram. Let me explain.</p>
<p>By default, <code>geom_bar()</code> has the <code>stat</code> set to <code>count</code>. That means, when you provide just a continuous X variable (and no Y variable), it tries to make a histogram out of the data.</p>
<p>In order to make a bar chart create bars instead of histogram, you need to do two things.</p>
<ol style="list-style-type: decimal">
<li>Set <code>stat=identity</code></li>
<li>Provide both <code>x</code> and <code>y</code> inside <code>aes()</code> where, <code>x</code> is either <code>character</code> or <code>factor</code> and <code>y</code> is numeric.</li>
</ol>
<p>In order to make sure you get diverging bars instead of just bars, make sure, your categorical variable has 2 categories that changes values at a certain threshold of the continuous variable. In below example, the <code>mpg</code> from mtcars dataset is normalised by computing the z score. Those vehicles with mpg above zero are marked green and those below are marked red.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>())
<span class="co"># Data Prep</span>
<span class="kw">data</span>(<span class="st">"mtcars"</span>) <span class="co"># load data</span>
mtcars$<span class="st">`</span><span class="dt">car name</span><span class="st">`</span> <-<span class="st"> </span><span class="kw">rownames</span>(mtcars) <span class="co"># create new column for car names</span>
mtcars$mpg_z <-<span class="st"> </span><span class="kw">round</span>((mtcars$mpg -<span class="st"> </span><span class="kw">mean</span>(mtcars$mpg))/<span class="kw">sd</span>(mtcars$mpg), <span class="dv">2</span>) <span class="co"># compute normalized mpg</span>
mtcars$mpg_type <-<span class="st"> </span><span class="kw">ifelse</span>(mtcars$mpg_z <<span class="st"> </span><span class="dv">0</span>, <span class="st">"below"</span>, <span class="st">"above"</span>) <span class="co"># above / below avg flag</span>
mtcars <-<span class="st"> </span>mtcars[<span class="kw">order</span>(mtcars$mpg_z), ] <span class="co"># sort</span>
mtcars$<span class="st">`</span><span class="dt">car name</span><span class="st">`</span> <-<span class="st"> </span><span class="kw">factor</span>(mtcars$<span class="st">`</span><span class="dt">car name</span><span class="st">`</span>, <span class="dt">levels =</span> mtcars$<span class="st">`</span><span class="dt">car name</span><span class="st">`</span>) <span class="co"># convert to factor to retain sorted order in plot.</span>
<span class="co"># Diverging Barcharts</span>
<span class="kw">ggplot</span>(mtcars, <span class="kw">aes</span>(<span class="dt">x=</span><span class="st">`</span><span class="dt">car name</span><span class="st">`</span>, <span class="dt">y=</span>mpg_z, <span class="dt">label=</span>mpg_z)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_bar</span>(<span class="dt">stat=</span><span class="st">'identity'</span>, <span class="kw">aes</span>(<span class="dt">fill=</span>mpg_type), <span class="dt">width=</span>.<span class="dv">5</span>) +
<span class="st"> </span><span class="kw">scale_fill_manual</span>(<span class="dt">name=</span><span class="st">"Mileage"</span>,
<span class="dt">labels =</span> <span class="kw">c</span>(<span class="st">"Above Average"</span>, <span class="st">"Below Average"</span>),
<span class="dt">values =</span> <span class="kw">c</span>(<span class="st">"above"</span>=<span class="st">"#00ba38"</span>, <span class="st">"below"</span>=<span class="st">"#f8766d"</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">subtitle=</span><span class="st">"Normalised mileage from 'mtcars'"</span>,
<span class="dt">title=</span> <span class="st">"Diverging Bars"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">coord_flip</span>()</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_10.png" alt="ggplot2 Diverging Bars" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Diverging Lollipop Chart"></a>Diverging Lollipop Chart</h3>
<p>Lollipop chart conveys the same information as bar chart and diverging bar. Except that it looks more modern. Instead of geom_bar, I use <code>geom_point</code> and <code>geom_segment</code> to get the lollipops right. Let’s draw a lollipop using the same data I prepared in the previous example of diverging bars.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>())
<span class="kw">ggplot</span>(mtcars, <span class="kw">aes</span>(<span class="dt">x=</span><span class="st">`</span><span class="dt">car name</span><span class="st">`</span>, <span class="dt">y=</span>mpg_z, <span class="dt">label=</span>mpg_z)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_point</span>(<span class="dt">stat=</span><span class="st">'identity'</span>, <span class="dt">fill=</span><span class="st">"black"</span>, <span class="dt">size=</span><span class="dv">6</span>) +
<span class="st"> </span><span class="kw">geom_segment</span>(<span class="kw">aes</span>(<span class="dt">y =</span> <span class="dv">0</span>,
<span class="dt">x =</span> <span class="st">`</span><span class="dt">car name</span><span class="st">`</span>,
<span class="dt">yend =</span> mpg_z,
<span class="dt">xend =</span> <span class="st">`</span><span class="dt">car name</span><span class="st">`</span>),
<span class="dt">color =</span> <span class="st">"black"</span>) +
<span class="st"> </span><span class="kw">geom_text</span>(<span class="dt">color=</span><span class="st">"white"</span>, <span class="dt">size=</span><span class="dv">2</span>) +
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Diverging Lollipop Chart"</span>,
<span class="dt">subtitle=</span><span class="st">"Normalized mileage from 'mtcars': Lollipop"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">ylim</span>(-<span class="fl">2.5</span>, <span class="fl">2.5</span>) +
<span class="st"> </span><span class="kw">coord_flip</span>()</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_11.png" alt="ggplot2 Lollipop Plot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Diverging Dot Plot"></a>Diverging Dot Plot</h3>
<p>Dot plot conveys similar information. The principles are same as what we saw in Diverging bars, except that only point are used. Below example uses the same data prepared in the <a href="Top50-Ggplot2-Visualizations-MasterList-R-Code.html#Diverging%20Bars">diverging bars example</a>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>())
<span class="co"># Plot</span>
<span class="kw">ggplot</span>(mtcars, <span class="kw">aes</span>(<span class="dt">x=</span><span class="st">`</span><span class="dt">car name</span><span class="st">`</span>, <span class="dt">y=</span>mpg_z, <span class="dt">label=</span>mpg_z)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_point</span>(<span class="dt">stat=</span><span class="st">'identity'</span>, <span class="kw">aes</span>(<span class="dt">col=</span>mpg_type), <span class="dt">size=</span><span class="dv">6</span>) +
<span class="st"> </span><span class="kw">scale_color_manual</span>(<span class="dt">name=</span><span class="st">"Mileage"</span>,
<span class="dt">labels =</span> <span class="kw">c</span>(<span class="st">"Above Average"</span>, <span class="st">"Below Average"</span>),
<span class="dt">values =</span> <span class="kw">c</span>(<span class="st">"above"</span>=<span class="st">"#00ba38"</span>, <span class="st">"below"</span>=<span class="st">"#f8766d"</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_text</span>(<span class="dt">color=</span><span class="st">"white"</span>, <span class="dt">size=</span><span class="dv">2</span>) +
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Diverging Dot Plot"</span>,
<span class="dt">subtitle=</span><span class="st">"Normalized mileage from 'mtcars': Dotplot"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">ylim</span>(-<span class="fl">2.5</span>, <span class="fl">2.5</span>) +
<span class="st"> </span><span class="kw">coord_flip</span>()</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_12.png" alt="ggplot2 Dotplot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Area Chart"></a>Area Chart</h3>
<p>Area charts are typically used to visualize how a particular metric (such as % returns from a stock) performed compared to a certain baseline. Other types of %returns or %change data are also commonly used. The <code>geom_area()</code> implements this.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(quantmod)
<span class="kw">data</span>(<span class="st">"economics"</span>, <span class="dt">package =</span> <span class="st">"ggplot2"</span>)
<span class="co"># Compute % Returns</span>
economics$returns_perc <-<span class="st"> </span><span class="kw">c</span>(<span class="dv">0</span>, <span class="kw">diff</span>(economics$psavert)/economics$psavert[-<span class="kw">length</span>(economics$psavert)])
<span class="co"># Create break points and labels for axis ticks</span>
brks <-<span class="st"> </span>economics$date[<span class="kw">seq</span>(<span class="dv">1</span>, <span class="kw">length</span>(economics$date), <span class="dv">12</span>)]
lbls <-<span class="st"> </span>lubridate::<span class="kw">year</span>(economics$date[<span class="kw">seq</span>(<span class="dv">1</span>, <span class="kw">length</span>(economics$date), <span class="dv">12</span>)])
<span class="co"># Plot</span>
<span class="kw">ggplot</span>(economics[<span class="dv">1</span>:<span class="dv">100</span>, ], <span class="kw">aes</span>(date, returns_perc)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_area</span>() +<span class="st"> </span>
<span class="st"> </span><span class="kw">scale_x_date</span>(<span class="dt">breaks=</span>brks, <span class="dt">labels=</span>lbls) +<span class="st"> </span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">90</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Area Chart"</span>,
<span class="dt">subtitle =</span> <span class="st">"Perc Returns for Personal Savings"</span>,
<span class="dt">y=</span><span class="st">"% Returns for Personal savings"</span>,
<span class="dt">caption=</span><span class="st">"Source: economics"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_13.png" alt="ggplot2 Area Chart" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h2>3. Ranking</h2>
<p>Used to compare the position or performance of multiple items with respect to each other. Actual values matters somewhat less than the ranking.</p>
<h3><a name="Ordered Bar Chart"></a>Ordered Bar Chart</h3>
<p>Ordered Bar Chart is a Bar Chart that is ordered by the Y axis variable. Just sorting the dataframe by the variable of interest isn’t enough to order the bar chart. In order for the bar chart to retain the order of the rows, the X axis variable (i.e. the categories) has to be converted into a factor.</p>
<p>Let’s plot the mean city mileage for each manufacturer from <code>mpg</code> dataset. First, aggregate the data and sort it before you draw the plot. Finally, the X variable is converted to a factor.</p>
<p>Let’s see how that is done.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Prepare data: group mean city mileage by manufacturer.</span>
cty_mpg <-<span class="st"> </span><span class="kw">aggregate</span>(mpg$cty, <span class="dt">by=</span><span class="kw">list</span>(mpg$manufacturer), <span class="dt">FUN=</span>mean) <span class="co"># aggregate</span>
<span class="kw">colnames</span>(cty_mpg) <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"make"</span>, <span class="st">"mileage"</span>) <span class="co"># change column names</span>
cty_mpg <-<span class="st"> </span>cty_mpg[<span class="kw">order</span>(cty_mpg$mileage), ] <span class="co"># sort</span>
cty_mpg$make <-<span class="st"> </span><span class="kw">factor</span>(cty_mpg$make, <span class="dt">levels =</span> cty_mpg$make) <span class="co"># to retain the order in plot.</span>
<span class="kw">head</span>(cty_mpg, <span class="dv">4</span>)
<span class="co">#> make mileage</span>
<span class="co">#> 9 lincoln 11.33333</span>
<span class="co">#> 8 land rover 11.50000</span>
<span class="co">#> 3 dodge 13.13514</span>
<span class="co">#> 10 mercury 13.25000</span></code></pre></div>
<p>The X variable is now a <code>factor</code>, let’s plot.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>())
<span class="co"># Draw plot</span>
<span class="kw">ggplot</span>(cty_mpg, <span class="kw">aes</span>(<span class="dt">x=</span>make, <span class="dt">y=</span>mileage)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_bar</span>(<span class="dt">stat=</span><span class="st">"identity"</span>, <span class="dt">width=</span>.<span class="dv">5</span>, <span class="dt">fill=</span><span class="st">"tomato3"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Ordered Bar Chart"</span>,
<span class="dt">subtitle=</span><span class="st">"Make Vs Avg. Mileage"</span>,
<span class="dt">caption=</span><span class="st">"source: mpg"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">65</span>, <span class="dt">vjust=</span><span class="fl">0.6</span>))</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_14.png" alt="ggplot2 Ordered Barchart" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Lollipop Chart"></a>Lollipop Chart</h3>
<p>Lollipop charts conveys the same information as in bar charts. By reducing the thick bars into thin lines, it reduces the clutter and lays more emphasis on the value. It looks nice and modern.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>())
<span class="co"># Plot</span>
<span class="kw">ggplot</span>(cty_mpg, <span class="kw">aes</span>(<span class="dt">x=</span>make, <span class="dt">y=</span>mileage)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_point</span>(<span class="dt">size=</span><span class="dv">3</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_segment</span>(<span class="kw">aes</span>(<span class="dt">x=</span>make,
<span class="dt">xend=</span>make,
<span class="dt">y=</span><span class="dv">0</span>,
<span class="dt">yend=</span>mileage)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Lollipop Chart"</span>,
<span class="dt">subtitle=</span><span class="st">"Make Vs Avg. Mileage"</span>,
<span class="dt">caption=</span><span class="st">"source: mpg"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">65</span>, <span class="dt">vjust=</span><span class="fl">0.6</span>))</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_15.png" alt="ggplot2 Lollipop Barchart" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Dot Plot"></a>Dot Plot</h3>
<p>Dot plots are very similar to lollipops, but without the line and is flipped to horizontal position. It emphasizes more on the rank ordering of items with respect to actual values and how far apart are the entities with respect to each other.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(scales)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
<span class="co"># Plot</span>
<span class="kw">ggplot</span>(cty_mpg, <span class="kw">aes</span>(<span class="dt">x=</span>make, <span class="dt">y=</span>mileage)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_point</span>(<span class="dt">col=</span><span class="st">"tomato2"</span>, <span class="dt">size=</span><span class="dv">3</span>) +<span class="st"> </span><span class="co"># Draw points</span>
<span class="st"> </span><span class="kw">geom_segment</span>(<span class="kw">aes</span>(<span class="dt">x=</span>make,
<span class="dt">xend=</span>make,
<span class="dt">y=</span><span class="kw">min</span>(mileage),
<span class="dt">yend=</span><span class="kw">max</span>(mileage)),
<span class="dt">linetype=</span><span class="st">"dashed"</span>,
<span class="dt">size=</span><span class="fl">0.1</span>) +<span class="st"> </span><span class="co"># Draw dashed lines</span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Dot Plot"</span>,
<span class="dt">subtitle=</span><span class="st">"Make Vs Avg. Mileage"</span>,
<span class="dt">caption=</span><span class="st">"source: mpg"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">coord_flip</span>()</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_16.png" alt="ggplot2 Dot Plot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Slope Chart"></a>Slope Chart</h3>
<p>Slope charts are an excellent way of comparing the positional placements between 2 points on time. At the moment, there is no builtin function to construct this. Following code serves as a pointer about how you may approach this.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(scales)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
<span class="co"># prep data</span>
df <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv"</span>)
<span class="kw">colnames</span>(df) <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"continent"</span>, <span class="st">"1952"</span>, <span class="st">"1957"</span>)
left_label <-<span class="st"> </span><span class="kw">paste</span>(df$continent, <span class="kw">round</span>(df$<span class="st">`</span><span class="dt">1952</span><span class="st">`</span>),<span class="dt">sep=</span><span class="st">", "</span>)
right_label <-<span class="st"> </span><span class="kw">paste</span>(df$continent, <span class="kw">round</span>(df$<span class="st">`</span><span class="dt">1957</span><span class="st">`</span>),<span class="dt">sep=</span><span class="st">", "</span>)
df$class <-<span class="st"> </span><span class="kw">ifelse</span>((df$<span class="st">`</span><span class="dt">1957</span><span class="st">`</span> -<span class="st"> </span>df$<span class="st">`</span><span class="dt">1952</span><span class="st">`</span>) <<span class="st"> </span><span class="dv">0</span>, <span class="st">"red"</span>, <span class="st">"green"</span>)
<span class="co"># Plot</span>
p <-<span class="st"> </span><span class="kw">ggplot</span>(df) +<span class="st"> </span><span class="kw">geom_segment</span>(<span class="kw">aes</span>(<span class="dt">x=</span><span class="dv">1</span>, <span class="dt">xend=</span><span class="dv">2</span>, <span class="dt">y=</span><span class="st">`</span><span class="dt">1952</span><span class="st">`</span>, <span class="dt">yend=</span><span class="st">`</span><span class="dt">1957</span><span class="st">`</span>, <span class="dt">col=</span>class), <span class="dt">size=</span>.<span class="dv">75</span>, <span class="dt">show.legend=</span>F) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_vline</span>(<span class="dt">xintercept=</span><span class="dv">1</span>, <span class="dt">linetype=</span><span class="st">"dashed"</span>, <span class="dt">size=</span>.<span class="dv">1</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_vline</span>(<span class="dt">xintercept=</span><span class="dv">2</span>, <span class="dt">linetype=</span><span class="st">"dashed"</span>, <span class="dt">size=</span>.<span class="dv">1</span>) +
<span class="st"> </span><span class="kw">scale_color_manual</span>(<span class="dt">labels =</span> <span class="kw">c</span>(<span class="st">"Up"</span>, <span class="st">"Down"</span>),
<span class="dt">values =</span> <span class="kw">c</span>(<span class="st">"green"</span>=<span class="st">"#00ba38"</span>, <span class="st">"red"</span>=<span class="st">"#f8766d"</span>)) +<span class="st"> </span><span class="co"># color of lines</span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">x=</span><span class="st">""</span>, <span class="dt">y=</span><span class="st">"Mean GdpPerCap"</span>) +<span class="st"> </span><span class="co"># Axis labels</span>
<span class="st"> </span><span class="kw">xlim</span>(.<span class="dv">5</span>, <span class="fl">2.5</span>) +<span class="st"> </span><span class="kw">ylim</span>(<span class="dv">0</span>,(<span class="fl">1.1</span>*(<span class="kw">max</span>(df$<span class="st">`</span><span class="dt">1952</span><span class="st">`</span>, df$<span class="st">`</span><span class="dt">1957</span><span class="st">`</span>)))) <span class="co"># X and Y axis limits</span>
<span class="co"># Add texts</span>
p <-<span class="st"> </span>p +<span class="st"> </span><span class="kw">geom_text</span>(<span class="dt">label=</span>left_label, <span class="dt">y=</span>df$<span class="st">`</span><span class="dt">1952</span><span class="st">`</span>, <span class="dt">x=</span><span class="kw">rep</span>(<span class="dv">1</span>, <span class="kw">NROW</span>(df)), <span class="dt">hjust=</span><span class="fl">1.1</span>, <span class="dt">size=</span><span class="fl">3.5</span>)
p <-<span class="st"> </span>p +<span class="st"> </span><span class="kw">geom_text</span>(<span class="dt">label=</span>right_label, <span class="dt">y=</span>df$<span class="st">`</span><span class="dt">1957</span><span class="st">`</span>, <span class="dt">x=</span><span class="kw">rep</span>(<span class="dv">2</span>, <span class="kw">NROW</span>(df)), <span class="dt">hjust=</span>-<span class="fl">0.1</span>, <span class="dt">size=</span><span class="fl">3.5</span>)
p <-<span class="st"> </span>p +<span class="st"> </span><span class="kw">geom_text</span>(<span class="dt">label=</span><span class="st">"Time 1"</span>, <span class="dt">x=</span><span class="dv">1</span>, <span class="dt">y=</span><span class="fl">1.1</span>*(<span class="kw">max</span>(df$<span class="st">`</span><span class="dt">1952</span><span class="st">`</span>, df$<span class="st">`</span><span class="dt">1957</span><span class="st">`</span>)), <span class="dt">hjust=</span><span class="fl">1.2</span>, <span class="dt">size=</span><span class="dv">5</span>) <span class="co"># title</span>
p <-<span class="st"> </span>p +<span class="st"> </span><span class="kw">geom_text</span>(<span class="dt">label=</span><span class="st">"Time 2"</span>, <span class="dt">x=</span><span class="dv">2</span>, <span class="dt">y=</span><span class="fl">1.1</span>*(<span class="kw">max</span>(df$<span class="st">`</span><span class="dt">1952</span><span class="st">`</span>, df$<span class="st">`</span><span class="dt">1957</span><span class="st">`</span>)), <span class="dt">hjust=</span>-<span class="fl">0.1</span>, <span class="dt">size=</span><span class="dv">5</span>) <span class="co"># title</span>
<span class="co"># Minify theme</span>
p +<span class="st"> </span><span class="kw">theme</span>(<span class="dt">panel.background =</span> <span class="kw">element_blank</span>(),
<span class="dt">panel.grid =</span> <span class="kw">element_blank</span>(),
<span class="dt">axis.ticks =</span> <span class="kw">element_blank</span>(),
<span class="dt">axis.text.x =</span> <span class="kw">element_blank</span>(),
<span class="dt">panel.border =</span> <span class="kw">element_blank</span>(),
<span class="dt">plot.margin =</span> <span class="kw">unit</span>(<span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">1</span>,<span class="dv">2</span>), <span class="st">"cm"</span>))</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_17.png" alt="ggplot2 Slope Chart" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Dumbbell Plot"></a>Dumbbell Plot</h3>
<p>Dumbbell charts are a great tool if you wish to: 1. Visualize relative positions (like growth and decline) between two points in time. 2. Compare distance between two categories.</p>
<p>In order to get the correct ordering of the dumbbells, the Y variable should be a factor and the levels of the factor variable should be in the same order as it should appear in the plot.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># devtools::install_github("hrbrmstr/ggalt")</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(ggalt)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
health <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"https://raw.githubusercontent.com/selva86/datasets/master/health.csv"</span>)
health$Area <-<span class="st"> </span><span class="kw">factor</span>(health$Area, <span class="dt">levels=</span><span class="kw">as.character</span>(health$Area)) <span class="co"># for right ordering of the dumbells</span>
<span class="co"># health$Area <- factor(health$Area)</span>
gg <-<span class="st"> </span><span class="kw">ggplot</span>(health, <span class="kw">aes</span>(<span class="dt">x=</span>pct_2013, <span class="dt">xend=</span>pct_2014, <span class="dt">y=</span>Area, <span class="dt">group=</span>Area)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_dumbbell</span>(<span class="dt">color=</span><span class="st">"#a3c4dc"</span>,
<span class="dt">size=</span><span class="fl">0.75</span>,
<span class="dt">point.colour.l=</span><span class="st">"#0e668b"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">scale_x_continuous</span>(<span class="dt">label=</span>percent) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">x=</span><span class="ot">NULL</span>,
<span class="dt">y=</span><span class="ot">NULL</span>,
<span class="dt">title=</span><span class="st">"Dumbbell Chart"</span>,
<span class="dt">subtitle=</span><span class="st">"Pct Change: 2013 vs 2014"</span>,
<span class="dt">caption=</span><span class="st">"Source: https://github.com/hrbrmstr/ggalt"</span>) +
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">plot.title =</span> <span class="kw">element_text</span>(<span class="dt">hjust=</span><span class="fl">0.5</span>, <span class="dt">face=</span><span class="st">"bold"</span>),
<span class="dt">plot.background=</span><span class="kw">element_rect</span>(<span class="dt">fill=</span><span class="st">"#f7f7f7"</span>),
<span class="dt">panel.background=</span><span class="kw">element_rect</span>(<span class="dt">fill=</span><span class="st">"#f7f7f7"</span>),
<span class="dt">panel.grid.minor=</span><span class="kw">element_blank</span>(),
<span class="dt">panel.grid.major.y=</span><span class="kw">element_blank</span>(),
<span class="dt">panel.grid.major.x=</span><span class="kw">element_line</span>(),
<span class="dt">axis.ticks=</span><span class="kw">element_blank</span>(),
<span class="dt">legend.position=</span><span class="st">"top"</span>,
<span class="dt">panel.border=</span><span class="kw">element_blank</span>())
<span class="kw">plot</span>(gg)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_18.png" alt="ggplot2 Dumbbell Chart" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h2>4. Distribution</h2>
<p>When you have lots and lots of data points and want to study where and how the data points are distributed.</p>
<h3><a name="Histogram"></a>Histogram</h3>
<p>By default, if only one variable is supplied, the <code>geom_bar()</code> tries to calculate the count. In order for it to behave like a bar chart, the <code>stat=identity</code> option has to be set and <code>x</code> and <code>y</code> values must be provided.</p>
<h4>Histogram on a continuous variable</h4>
<p>Histogram on a continuous variable can be accomplished using either <code>geom_bar()</code> or <code>geom_histogram()</code>. When using <code>geom_histogram()</code>, you can control the number of bars using the <code>bins</code> option. Else, you can set the range covered by each bin using <code>binwidth</code>. The value of <code>binwidth</code> is on the same scale as the continuous variable on which histogram is built. Since, <code>geom_histogram</code> gives facility to control both number of <code>bins</code> as well as <code>binwidth</code>, it is the preferred option to create histogram on continuous variables.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
<span class="co"># Histogram on a Continuous (Numeric) Variable</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(displ)) +<span class="st"> </span><span class="kw">scale_fill_brewer</span>(<span class="dt">palette =</span> <span class="st">"Spectral"</span>)
g +<span class="st"> </span><span class="kw">geom_histogram</span>(<span class="kw">aes</span>(<span class="dt">fill=</span>class),
<span class="dt">binwidth =</span> .<span class="dv">1</span>,
<span class="dt">col=</span><span class="st">"black"</span>,
<span class="dt">size=</span>.<span class="dv">1</span>) +<span class="st"> </span><span class="co"># change binwidth</span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Histogram with Auto Binning"</span>,
<span class="dt">subtitle=</span><span class="st">"Engine Displacement across Vehicle Classes"</span>)
g +<span class="st"> </span><span class="kw">geom_histogram</span>(<span class="kw">aes</span>(<span class="dt">fill=</span>class),
<span class="dt">bins=</span><span class="dv">5</span>,
<span class="dt">col=</span><span class="st">"black"</span>,
<span class="dt">size=</span>.<span class="dv">1</span>) +<span class="st"> </span><span class="co"># change number of bins</span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Histogram with Fixed Bins"</span>,
<span class="dt">subtitle=</span><span class="st">"Engine Displacement across Vehicle Classes"</span>) </code></pre></div>
<p><img src="screenshots/ggplot_masterlist_19.png" alt="ggplot2 Histogram on Numeric Variable" /> <img src="screenshots/ggplot_masterlist_20.png" alt="ggplot2 Histogram with 5 Bins - Spectral" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h4>Histogram on a categorical variable</h4>
<p>Histogram on a categorical variable would result in a frequency chart showing bars for each category. By adjusting <code>width</code>, you can adjust the thickness of the bars.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
<span class="co"># Histogram on a Categorical variable</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(manufacturer))
g +<span class="st"> </span><span class="kw">geom_bar</span>(<span class="kw">aes</span>(<span class="dt">fill=</span>class), <span class="dt">width =</span> <span class="fl">0.5</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">65</span>, <span class="dt">vjust=</span><span class="fl">0.6</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Histogram on Categorical Variable"</span>,
<span class="dt">subtitle=</span><span class="st">"Manufacturer across Vehicle Classes"</span>) </code></pre></div>
<p><img src="screenshots/ggplot_masterlist_22.png" alt="ggplot2 Histogram on Categorical Variable" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Density Plot"></a>Density plot</h3>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
<span class="co"># Plot</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(cty))
g +<span class="st"> </span><span class="kw">geom_density</span>(<span class="kw">aes</span>(<span class="dt">fill=</span><span class="kw">factor</span>(cyl)), <span class="dt">alpha=</span><span class="fl">0.8</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Density plot"</span>,
<span class="dt">subtitle=</span><span class="st">"City Mileage Grouped by Number of cylinders"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>,
<span class="dt">x=</span><span class="st">"City Mileage"</span>,
<span class="dt">fill=</span><span class="st">"# Cylinders"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_23.png" alt="ggplot2 Density Plot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Box Plot"></a>Box Plot</h3>
<p>Box plot is an excellent tool to study the distribution. It can also show the distributions within multiple groups, along with the median, range and outliers if any.</p>
<p>The dark line inside the box represents the median. The top of box is 75%ile and bottom of box is 25%ile. The end points of the lines (aka whiskers) is at a distance of 1.5*IQR, where IQR or Inter Quartile Range is the distance between 25th and 75th percentiles. The points outside the whiskers are marked as dots and are normally considered as extreme points.</p>
<p>Setting <code>varwidth=T</code> adjusts the width of the boxes to be proportional to the number of observation it contains.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
<span class="co"># Plot</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(class, cty))
g +<span class="st"> </span><span class="kw">geom_boxplot</span>(<span class="dt">varwidth=</span>T, <span class="dt">fill=</span><span class="st">"plum"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Box plot"</span>,
<span class="dt">subtitle=</span><span class="st">"City Mileage grouped by Class of vehicle"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>,
<span class="dt">x=</span><span class="st">"Class of Vehicle"</span>,
<span class="dt">y=</span><span class="st">"City Mileage"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_24.png" alt="ggplot2 BoxPlot" /></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggthemes)
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(class, cty))
g +<span class="st"> </span><span class="kw">geom_boxplot</span>(<span class="kw">aes</span>(<span class="dt">fill=</span><span class="kw">factor</span>(cyl))) +<span class="st"> </span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">65</span>, <span class="dt">vjust=</span><span class="fl">0.6</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Box plot"</span>,
<span class="dt">subtitle=</span><span class="st">"City Mileage grouped by Class of vehicle"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>,
<span class="dt">x=</span><span class="st">"Class of Vehicle"</span>,
<span class="dt">y=</span><span class="st">"City Mileage"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_25.png" alt="ggplot2 Grouped BoxPlot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Dot + Box Plot"></a>Dot + Box Plot</h3>
<p>On top of the information provided by a box plot, the dot plot can provide more clear information in the form of summary statistics by each group. The dots are staggered such that each dot represents one observation. So, in below chart, the number of dots for a given manufacturer will match the number of rows of that manufacturer in source data.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>())
<span class="co"># plot</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(manufacturer, cty))
g +<span class="st"> </span><span class="kw">geom_boxplot</span>() +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_dotplot</span>(<span class="dt">binaxis=</span><span class="st">'y'</span>,
<span class="dt">stackdir=</span><span class="st">'center'</span>,
<span class="dt">dotsize =</span> .<span class="dv">5</span>,
<span class="dt">fill=</span><span class="st">"red"</span>) +
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">65</span>, <span class="dt">vjust=</span><span class="fl">0.6</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Box plot + Dot plot"</span>,
<span class="dt">subtitle=</span><span class="st">"City Mileage vs Class: Each dot represents 1 row in source data"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>,
<span class="dt">x=</span><span class="st">"Class of Vehicle"</span>,
<span class="dt">y=</span><span class="st">"City Mileage"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_26.png" alt="ggplot2 Box and DotPlot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Tufte Boxplot"></a>Tufte Boxplot</h3>
<p>Tufte box plot, provided by <code>ggthemes</code> package is inspired by the works of Edward Tufte. Tufte’s Box plot is just a box plot made minimal and visually appealing.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggthemes)
<span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_tufte</span>()) <span class="co"># from ggthemes</span>
<span class="co"># plot</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(manufacturer, cty))
g +<span class="st"> </span><span class="kw">geom_tufteboxplot</span>() +<span class="st"> </span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">65</span>, <span class="dt">vjust=</span><span class="fl">0.6</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Tufte Styled Boxplot"</span>,
<span class="dt">subtitle=</span><span class="st">"City Mileage grouped by Class of vehicle"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>,
<span class="dt">x=</span><span class="st">"Class of Vehicle"</span>,
<span class="dt">y=</span><span class="st">"City Mileage"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_27.png" alt="ggplot2 Tufte Boxplot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Violin Plot"></a>Violin Plot</h3>
<p>A violin plot is similar to box plot but shows the density within groups. Not much info provided as in boxplots. It can be drawn using <code>geom_violin()</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_bw</span>())
<span class="co"># plot</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(class, cty))
g +<span class="st"> </span><span class="kw">geom_violin</span>() +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Violin plot"</span>,
<span class="dt">subtitle=</span><span class="st">"City Mileage vs Class of vehicle"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>,
<span class="dt">x=</span><span class="st">"Class of Vehicle"</span>,
<span class="dt">y=</span><span class="st">"City Mileage"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_28.png" alt="ggplot2 Violin Plot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Population Pyramid"></a>Population Pyramid</h3>
<p>Population pyramids offer a unique way of visualizing how much population or what percentage of population fall under a certain category. The below pyramid is an excellent example of how many users are retained at each stage of a email marketing campaign funnel.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(ggthemes)
<span class="kw">options</span>(<span class="dt">scipen =</span> <span class="dv">999</span>) <span class="co"># turns of scientific notations like 1e+40</span>
<span class="co"># Read data</span>
email_campaign_funnel <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv"</span>)
<span class="co"># X Axis Breaks and Labels </span>
brks <-<span class="st"> </span><span class="kw">seq</span>(-<span class="dv">15000000</span>, <span class="dv">15000000</span>, <span class="dv">5000000</span>)
lbls =<span class="st"> </span><span class="kw">paste0</span>(<span class="kw">as.character</span>(<span class="kw">c</span>(<span class="kw">seq</span>(<span class="dv">15</span>, <span class="dv">0</span>, -<span class="dv">5</span>), <span class="kw">seq</span>(<span class="dv">5</span>, <span class="dv">15</span>, <span class="dv">5</span>))), <span class="st">"m"</span>)
<span class="co"># Plot</span>
<span class="kw">ggplot</span>(email_campaign_funnel, <span class="kw">aes</span>(<span class="dt">x =</span> Stage, <span class="dt">y =</span> Users, <span class="dt">fill =</span> Gender)) +<span class="st"> </span><span class="co"># Fill column</span>
<span class="st"> </span><span class="kw">geom_bar</span>(<span class="dt">stat =</span> <span class="st">"identity"</span>, <span class="dt">width =</span> .<span class="dv">6</span>) +<span class="st"> </span><span class="co"># draw the bars</span>
<span class="st"> </span><span class="kw">scale_y_continuous</span>(<span class="dt">breaks =</span> brks, <span class="co"># Breaks</span>
<span class="dt">labels =</span> lbls) +<span class="st"> </span><span class="co"># Labels</span>
<span class="st"> </span><span class="kw">coord_flip</span>() +<span class="st"> </span><span class="co"># Flip axes</span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Email Campaign Funnel"</span>) +
<span class="st"> </span><span class="kw">theme_tufte</span>() +<span class="st"> </span><span class="co"># Tufte theme from ggfortify</span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">plot.title =</span> <span class="kw">element_text</span>(<span class="dt">hjust =</span> .<span class="dv">5</span>),
<span class="dt">axis.ticks =</span> <span class="kw">element_blank</span>()) +<span class="st"> </span><span class="co"># Centre plot title</span>
<span class="st"> </span><span class="kw">scale_fill_brewer</span>(<span class="dt">palette =</span> <span class="st">"Dark2"</span>) <span class="co"># Color palette</span></code></pre></div>
<p><img src="screenshots/ggplot_masterlist_29.png" alt="Population Pyramid With Ggplot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h2>5. Composition</h2>
<h3><a name="Waffle Chart"></a>Waffle Chart</h3>
<p>Waffle charts is a nice way of showing the categorical composition of the total population. Though there is no direct function, it can be articulated by smartly maneuvering the ggplot2 using <code>geom_tile()</code> function. The below template should help you create your own waffle.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">var <-<span class="st"> </span>mpg$class <span class="co"># the categorical data </span>
## Prep data (nothing to change here)
nrows <-<span class="st"> </span><span class="dv">10</span>
df <-<span class="st"> </span><span class="kw">expand.grid</span>(<span class="dt">y =</span> <span class="dv">1</span>:nrows, <span class="dt">x =</span> <span class="dv">1</span>:nrows)
categ_table <-<span class="st"> </span><span class="kw">round</span>(<span class="kw">table</span>(var) *<span class="st"> </span>((nrows*nrows)/(<span class="kw">length</span>(var))))
categ_table
<span class="co">#> 2seater compact midsize minivan pickup subcompact suv </span>
<span class="co">#> 2 20 18 5 14 15 26 </span>
df$category <-<span class="st"> </span><span class="kw">factor</span>(<span class="kw">rep</span>(<span class="kw">names</span>(categ_table), categ_table))
<span class="co"># NOTE: if sum(categ_table) is not 100 (i.e. nrows^2), it will need adjustment to make the sum to 100.</span>
## Plot
<span class="kw">ggplot</span>(df, <span class="kw">aes</span>(<span class="dt">x =</span> x, <span class="dt">y =</span> y, <span class="dt">fill =</span> category)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_tile</span>(<span class="dt">color =</span> <span class="st">"black"</span>, <span class="dt">size =</span> <span class="fl">0.5</span>) +
<span class="st"> </span><span class="kw">scale_x_continuous</span>(<span class="dt">expand =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">0</span>)) +
<span class="st"> </span><span class="kw">scale_y_continuous</span>(<span class="dt">expand =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">0</span>), <span class="dt">trans =</span> <span class="st">'reverse'</span>) +
<span class="st"> </span><span class="kw">scale_fill_brewer</span>(<span class="dt">palette =</span> <span class="st">"Set3"</span>) +
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Waffle Chart"</span>, <span class="dt">subtitle=</span><span class="st">"'Class' of vehicles"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">panel.border =</span> <span class="kw">element_rect</span>(<span class="dt">size =</span> <span class="dv">2</span>),
<span class="dt">plot.title =</span> <span class="kw">element_text</span>(<span class="dt">size =</span> <span class="kw">rel</span>(<span class="fl">1.2</span>)),
<span class="dt">axis.text =</span> <span class="kw">element_blank</span>(),
<span class="dt">axis.title =</span> <span class="kw">element_blank</span>(),
<span class="dt">axis.ticks =</span> <span class="kw">element_blank</span>(),
<span class="dt">legend.title =</span> <span class="kw">element_blank</span>(),
<span class="dt">legend.position =</span> <span class="st">"right"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_30.png" alt="Waffle Chart With Ggplot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Pie Chart"></a>Pie Chart</h3>
<p>Pie chart, a classic way of showing the compositions is equivalent to the waffle chart in terms of the information conveyed. But is a slightly tricky to implement in ggplot2 using the <code>coord_polar()</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
<span class="co"># Source: Frequency table</span>
df <-<span class="st"> </span><span class="kw">as.data.frame</span>(<span class="kw">table</span>(mpg$class))
<span class="kw">colnames</span>(df) <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"class"</span>, <span class="st">"freq"</span>)
pie <-<span class="st"> </span><span class="kw">ggplot</span>(df, <span class="kw">aes</span>(<span class="dt">x =</span> <span class="st">""</span>, <span class="dt">y=</span>freq, <span class="dt">fill =</span> <span class="kw">factor</span>(class))) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_bar</span>(<span class="dt">width =</span> <span class="dv">1</span>, <span class="dt">stat =</span> <span class="st">"identity"</span>) +
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.line =</span> <span class="kw">element_blank</span>(),
<span class="dt">plot.title =</span> <span class="kw">element_text</span>(<span class="dt">hjust=</span><span class="fl">0.5</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">fill=</span><span class="st">"class"</span>,
<span class="dt">x=</span><span class="ot">NULL</span>,
<span class="dt">y=</span><span class="ot">NULL</span>,
<span class="dt">title=</span><span class="st">"Pie Chart of class"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>)
pie +<span class="st"> </span><span class="kw">coord_polar</span>(<span class="dt">theta =</span> <span class="st">"y"</span>, <span class="dt">start=</span><span class="dv">0</span>)
<span class="co"># Source: Categorical variable.</span>
<span class="co"># mpg$class</span>
pie <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(<span class="dt">x =</span> <span class="st">""</span>, <span class="dt">fill =</span> <span class="kw">factor</span>(class))) +<span class="st"> </span>
<span class="st"> </span><span class="kw">geom_bar</span>(<span class="dt">width =</span> <span class="dv">1</span>) +
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.line =</span> <span class="kw">element_blank</span>(),
<span class="dt">plot.title =</span> <span class="kw">element_text</span>(<span class="dt">hjust=</span><span class="fl">0.5</span>)) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">fill=</span><span class="st">"class"</span>,
<span class="dt">x=</span><span class="ot">NULL</span>,
<span class="dt">y=</span><span class="ot">NULL</span>,
<span class="dt">title=</span><span class="st">"Pie Chart of class"</span>,
<span class="dt">caption=</span><span class="st">"Source: mpg"</span>)
pie +<span class="st"> </span><span class="kw">coord_polar</span>(<span class="dt">theta =</span> <span class="st">"y"</span>, <span class="dt">start=</span><span class="dv">0</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_31.png" alt="Pie Chart With Ggplot" /></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># http://www.r-graph-gallery.com/128-ring-or-donut-plot/</span></code></pre></div>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Treemap"></a>Treemap</h3>
<p>Treemap is a nice way of displaying hierarchical data by using nested rectangles. The <code>treemapify</code> package provides the necessary functions to convert the data in desired format (<code>treemapify</code>) as well as draw the actual plot (<code>ggplotify</code>).</p>
<p>In order to create a treemap, the data must be converted to desired format using <code>treemapify()</code>. The important requirement is, your data must have one variable each that describes the <code>area</code> of the tiles, variable for <code>fill</code> color, variable that has the tile’s <code>label</code> and finally the parent <code>group</code>.</p>
<p>Once the data formatting is done, just call <code>ggplotify()</code> on the treemapified data.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(treemapify)
proglangs <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"https://raw.githubusercontent.com/selva86/datasets/master/proglanguages.csv"</span>)
<span class="co"># plot</span>
treeMapCoordinates <-<span class="st"> </span><span class="kw">treemapify</span>(proglangs,
<span class="dt">area =</span> <span class="st">"value"</span>,
<span class="dt">fill =</span> <span class="st">"parent"</span>,
<span class="dt">label =</span> <span class="st">"id"</span>,
<span class="dt">group =</span> <span class="st">"parent"</span>)
treeMapPlot <-<span class="st"> </span><span class="kw">ggplotify</span>(treeMapCoordinates) +<span class="st"> </span>
<span class="st"> </span><span class="kw">scale_x_continuous</span>(<span class="dt">expand =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">0</span>)) +
<span class="st"> </span><span class="kw">scale_y_continuous</span>(<span class="dt">expand =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">0</span>)) +
<span class="st"> </span><span class="kw">scale_fill_brewer</span>(<span class="dt">palette =</span> <span class="st">"Dark2"</span>)
<span class="kw">print</span>(treeMapPlot)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_32.png" alt="Treemap With Ggplot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h3><a name="Bar Chart"></a>Bar Chart</h3>
<p>By default, <code>geom_bar()</code> has the <code>stat</code> set to <code>count</code>. That means, when you provide just a continuous X variable (and no Y variable), it tries to make a histogram out of the data.</p>
<p>In order to make a bar chart create bars instead of histogram, you need to do two things.</p>
<ol style="list-style-type: decimal">
<li>Set <code>stat=identity</code></li>
<li>Provide both <code>x</code> and <code>y</code> inside <code>aes()</code> where, <code>x</code> is either <code>character</code> or <code>factor</code> and <code>y</code> is numeric.</li>
</ol>
<p>A bar chart can be drawn from a categorical column variable or from a separate frequency table. By adjusting <code>width</code>, you can adjust the thickness of the bars. If your data source is a frequency table, that is, if you don’t want ggplot to compute the counts, you need to set the <code>stat=identity</code> inside the <code>geom_bar()</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># prep frequency table</span>
freqtable <-<span class="st"> </span><span class="kw">table</span>(mpg$manufacturer)
df <-<span class="st"> </span><span class="kw">as.data.frame.table</span>(freqtable)
<span class="kw">head</span>(df)
<span class="co">#> Var1 Freq</span>
<span class="co">#> 1 audi 18</span>
<span class="co">#> 2 chevrolet 19</span>
<span class="co">#> 3 dodge 37</span>
<span class="co">#> 4 ford 25</span>
<span class="co">#> 5 honda 9</span>
<span class="co">#> 6 hyundai 14</span></code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># plot</span>
<span class="kw">library</span>(ggplot2)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())
<span class="co"># Plot</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(df, <span class="kw">aes</span>(Var1, Freq))
g +<span class="st"> </span><span class="kw">geom_bar</span>(<span class="dt">stat=</span><span class="st">"identity"</span>, <span class="dt">width =</span> <span class="fl">0.5</span>, <span class="dt">fill=</span><span class="st">"tomato2"</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Bar Chart"</span>,
<span class="dt">subtitle=</span><span class="st">"Manufacturer of vehicles"</span>,
<span class="dt">caption=</span><span class="st">"Source: Frequency of Manufacturers from 'mpg' dataset"</span>) +
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">65</span>, <span class="dt">vjust=</span><span class="fl">0.6</span>))</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_33.png" alt="Bar Chart With Ggplot" /></p>
<p>It can be computed directly from a column variable as well. In this case, only X is provided and <code>stat=identity</code> is <em>not</em> set.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># From on a categorical column variable</span>
g <-<span class="st"> </span><span class="kw">ggplot</span>(mpg, <span class="kw">aes</span>(manufacturer))
g +<span class="st"> </span><span class="kw">geom_bar</span>(<span class="kw">aes</span>(<span class="dt">fill=</span>class), <span class="dt">width =</span> <span class="fl">0.5</span>) +<span class="st"> </span>
<span class="st"> </span><span class="kw">theme</span>(<span class="dt">axis.text.x =</span> <span class="kw">element_text</span>(<span class="dt">angle=</span><span class="dv">65</span>, <span class="dt">vjust=</span><span class="fl">0.6</span>)) +
<span class="st"> </span><span class="kw">labs</span>(<span class="dt">title=</span><span class="st">"Categorywise Bar Chart"</span>,
<span class="dt">subtitle=</span><span class="st">"Manufacturer of vehicles"</span>,
<span class="dt">caption=</span><span class="st">"Source: Manufacturers from 'mpg' dataset"</span>)</code></pre></div>
<p><img src="screenshots/ggplot_masterlist_34.png" alt="Bar Chart With Multiple Categories in Ggplot" /></p>
<p><a href="#top">[Back to Top]</a></p>
<h2>6. Change</h2>
<h3><a name="Time Series Plot From a Time Series Object"></a>Time Series Plot From a Time Series Object (<code>ts</code>)</h3>
<p>The <code>ggfortify</code> package allows autoplot to automatically plot directly from a time series object (<code>ts</code>).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## From Timeseries object (ts)
<span class="kw">library</span>(ggplot2)
<span class="kw">library</span>(ggfortify)
<span class="kw">theme_set</span>(<span class="kw">theme_classic</span>())