2025-05-06
Imagine an experiment:
And you want to plot it as two lines using Excel
R gives you a professional looking figure
If this table is in dummy.csv:
[bash]$ cat dummy.csv
sample_id,condition,time,concentration
S1,control,1,5.1
S2,control,2,4.8
S3,control,3,5.3
S4,control,4,5.0
S5,control,5,5.2
S6,treated,1,6.7
S7,treated,2,6.9
S8,treated,3,7.0
S9,treated,4,6.8
S10,treated,5,6.6
[bash]$Here’s how we can read this table into an R object.
Let us look at the contents of this table
## sample_id condition time concentration
## 1 S1 control 1 5.1
## 2 S2 control 2 4.8
## 3 S3 control 3 5.3
## 4 S4 control 4 5.0
## 5 S5 control 5 5.2
## 6 S6 treated 1 6.7
## 7 S7 treated 2 6.9
## 8 S8 treated 3 7.0
## 9 S9 treated 4 6.8
## 10 S10 treated 5 6.6
We can also get useful information about it.
## [1] 10 4
## 'data.frame': 10 obs. of 4 variables:
## $ sample_id : chr "S1" "S2" "S3" "S4" ...
## $ condition : chr "control" "control" "control" "control" ...
## $ time : int 1 2 3 4 5 1 2 3 4 5
## $ concentration: num 5.1 4.8 5.3 5 5.2 6.7 6.9 7 6.8 6.6
We can also get useful information about it.
## sample_id condition time concentration
## Length:10 Length:10 Min. :1 Min. :4.800
## Class :character Class :character 1st Qu.:2 1st Qu.:5.125
## Mode :character Mode :character Median :3 Median :5.950
## Mean :3 Mean :5.940
## 3rd Qu.:4 3rd Qu.:6.775
## Max. :5 Max. :7.000
Or peek into parts of the table.
## sample_id condition time concentration
## 1 S1 control 1 5.1
## 2 S2 control 2 4.8
## 3 S3 control 3 5.3
## 4 S4 control 4 5.0
## 5 S5 control 5 5.2
## 6 S6 treated 1 6.7
## sample_id condition time concentration
## 5 S5 control 5 5.2
## 6 S6 treated 1 6.7
## 7 S7 treated 2 6.9
## 8 S8 treated 3 7.0
## 9 S9 treated 4 6.8
## 10 S10 treated 5 6.6
Or summarize values in columns.
##
## control treated
## 5 5
##
## 1 2 3 4 5
## 2 2 2 2 2
time is x-axis
and concentration is y-axisggplot(data = df) +
aes(x = time, y = concentration, group = condition, color = condition) +
geom_line()ggplot(data = df) +
aes(x = time, y = concentration, group = condition, color = condition) +
geom_point()ggplot(data = df) +
aes(x = time, y = concentration, group = condition, color = condition) +
geom_line() + geom_point()ggplot(data = df) +
aes(x = time, y = concentration, group = condition, color = condition) +
geom_line() + geom_point() + ylim(c(0,10))ggplot(data = df) +
aes(x = time, y = concentration, group = condition, color = condition) +
geom_line() + geom_point() + scale_y_continuous(breaks=seq(0,10,1), limits=c(0,10))ggplot(data = df) +
aes(x = time, y = concentration, group = condition, color = condition) +
geom_line() + geom_point() + scale_y_continuous(breaks=seq(0,10,1), limits=c(0,10)) +
theme_classic()ggplot functionor
Then, you add a geometry layer:
geom_col(), geom_point(), etc.
geom_point(): for pointsgeom_lines(): for linesgeom_bar(): for barplotgeom_boxplot(): for boxplotggplot functionggplot() sets up the plot object and links data to
axes.aes() maps variables to visual properties (x-axis,
y-axis, colors, etc.).geom_*() plots data based on y values.ggplot?| Feature | Excel | ggplot2 |
|---|---|---|
| Data size | Small–Medium | Small to Very Large |
| Reproducibility | Manual clicks | Fully scriptable and reproducible |
| Styling | Limited templates |
Full control via theme(), colors, facets
|
| Automation | Hard | Easy: loop or script across datasets |
| Publication-ready | Time-consuming | High-quality out of the box |
Biologists often have different pieces of information in separate tables:
We want to combine them using a common column (like Sample ID).
Sample Metadata
| sample_id | tissue | animal_id | date |
|---|---|---|---|
| S1 | blood | F01 | 2020-01-29 |
| S2 | blood | F01 | 2020-02-04 |
| S3 | blood | F03 | 2020-01-29 |
| S4 | blood | F03 | 2020-02-04 |
Animal Metadata
| animal_id | condition | dob | sex |
|---|---|---|---|
| F01 | control | 2019-10-31 | M |
| F03 | treated | 2019-10-27 | F |
Protein Results
| sample_id | protein_X | protein_Y | protein_Z |
|---|---|---|---|
| S1 | 5.2 | 12.2 | 9.5 |
| S2 | 5.3 | 13.1 | 10.2 |
| S3 | 6.8 | 18.9 | 15.8 |
| S4 | 7.0 | 19.2 | 17.1 |
| sample_id | tissue | animal_id | date | protein_X | protein_Y | protein_Z |
|---|---|---|---|---|---|---|
| S1 | blood | F01 | 2020-01-29 | 5.2 | 12.2 | 9.5 |
| S2 | blood | F01 | 2020-02-04 | 5.3 | 13.1 | 10.2 |
| S3 | blood | F03 | 2020-01-29 | 6.8 | 18.9 | 15.8 |
| S4 | blood | F03 | 2020-02-04 | 7.0 | 19.2 | 17.1 |
sample_id) to
match.left_join(left_join(sample_metadata, animal_metadata, by="animal_id"), protein_conc, by="sample_id")| sample_id | tissue | animal_id | date | condition | dob | sex | protein_X | protein_Y | protein_Z |
|---|---|---|---|---|---|---|---|---|---|
| S1 | blood | F01 | 2020-01-29 | control | 2019-10-31 | M | 5.2 | 12.2 | 9.5 |
| S2 | blood | F01 | 2020-02-04 | control | 2019-10-31 | M | 5.3 | 13.1 | 10.2 |
| S3 | blood | F03 | 2020-01-29 | treated | 2019-10-27 | F | 6.8 | 18.9 | 15.8 |
| S4 | blood | F03 | 2020-02-04 | treated | 2019-10-27 | F | 7.0 | 19.2 | 17.1 |
%>%'%>%'E.g.
Think: “and then”.
left_join()?Without the pipe
left_join(left_join(sample_metadata, animal_metadata, by="animal_id"), protein_conc, by="sample_id")With the pipe
sample_metadata %>%
left_join(animal_metadata, by="animal_id") %>%
left_join(protein_conc, by="sample_id")| sample_id | tissue | animal_id | date | condition | dob | sex | protein_X | protein_Y | protein_Z |
|---|---|---|---|---|---|---|---|---|---|
| S1 | blood | F01 | 2020-01-29 | control | 2019-10-31 | M | 5.2 | 12.2 | 9.5 |
| S2 | blood | F01 | 2020-02-04 | control | 2019-10-31 | M | 5.3 | 13.1 | 10.2 |
| S3 | blood | F03 | 2020-01-29 | treated | 2019-10-27 | F | 6.8 | 18.9 | 15.8 |
| S4 | blood | F03 | 2020-02-04 | treated | 2019-10-27 | F | 7.0 | 19.2 | 17.1 |
sample_id column and sort by
concentrationWithout the pipe
## condition time concentration
## 1 control 2 4.8
## 2 control 4 5.0
## 3 control 1 5.1
## 4 control 5 5.2
## 5 control 3 5.3
## 6 treated 5 6.6
## 7 treated 1 6.7
## 8 treated 4 6.8
## 9 treated 2 6.9
## 10 treated 3 7.0
With the pipe
## condition time concentration
## 1 control 2 4.8
## 2 control 4 5.0
## 3 control 1 5.1
## 4 control 5 5.2
## 5 control 3 5.3
## 6 treated 5 6.6
## 7 treated 1 6.7
## 8 treated 4 6.8
## 9 treated 2 6.9
## 10 treated 3 7.0
Without the pipe
## # A tibble: 2 Ă— 2
## condition mean_conc
## <chr> <dbl>
## 1 control 5.08
## 2 treated 6.8
With the pipe
## # A tibble: 2 Ă— 2
## condition mean_conc
## <chr> <dbl>
## 1 control 5.08
## 2 treated 6.8