[R]Play with ggplot2 Part 01 “barplot”

ggplot2 になれるために、いろいろいじってみる。

National Cancer Institute で公開されている “Epidemiology and End Results (SEER) “ のデータをもとに、以下をグラフ化。

  • 5-Year Relative Survival (Percent) by Year of Diagnosis(5年相対生存率)
  • Stage Distribution (%) 1999-2006c, Case Counts and Percentages(発見時のステージ分布)
  • 5-Year Relative Survival (Percent) 1999-2006c by Stage at Diagnosis (ステージ別5年相対生存率)

選んだ癌は次の3つ。(深い意味なし)

5-Year Relative Survival (Percent) by Year of Diagnosis

※Based on End Results data from a series of hospital registries and one population-based registry.

library(ggplot2)
five <- read.csv('5year.csv')
head(five)
  Year.of.Diagnosis           Section     Gender Percent
1         1975-1977 Lung and Bronchus Both Sexes    12.7
2         1978-1980 Lung and Bronchus Both Sexes    13.3
3         1981-1983 Lung and Bronchus Both Sexes    13.7
4         1984-1986 Lung and Bronchus Both Sexes    13.3
5         1987-1989 Lung and Bronchus Both Sexes    13.5
6         1990-1992 Lung and Bronchus Both Sexes    14.0
qplot(Year.of.Diagnosis, Percent, data=five, facets = . ~ Section, group=Gender, color=Gender, geom="path",
      main="5-Year Relative Survival (Percent) by Year of Diagnosis")

MEMO

  • x 軸 axis.text  が詰まっているのを何とかしたいところ。

Stage Distribution (%) 1999-2006c, Case Counts and Percentages

※Based on the SEER 17 areas (San Francisco, Connecticut, Detroit, Hawaii, Iowa, New Mexico, Seattle, Utah, Atlanta, San Jose-Monterey, Los Angeles, Alaska Native Registry, Rural Georgia, California excluding SF/SJM/LA, Kentucky, Louisiana and New Jersey). California excluding SF/SJM/LA, Kentucky, Louisiana, and New Jersey contribute cases for diagnosis years 2000-2006. The remaining 13 SEER Areas contribute cases for the entire period 1999-2006. Based on follow-up of patients into 2007.

stage <- read.csv('stage.csv')
head(stage)
Stage.at.Diagnosis           Section     Gender Percent
1          Localized Lung and Bronchus Both Sexes      37
2           Regional Lung and Bronchus Both Sexes      26
3            Distant Lung and Bronchus Both Sexes      19
4           Unstaged Lung and Bronchus Both Sexes      18
5          Localized Lung and Bronchus      Males      37
6           Regional Lung and Bronchus      Males      27
qplot(Stage.at.Diagnosis, weight=Percent, geom="bar", data=stage, facets = Gender ~ Section, fill=Stage.at.Diagnosis, ylab="Percent",
      main="Stage Distribution (Percent) 1999-2006, Case Counts and Percentages")

各ステージの意味は “SEER’s Glossary of Statistical Terms” では次のように定義されている。

  • In situ cancer is early cancer that is present only in the layer of cells in which it began.
  • Localized cancer is cancer that is limited to the organ in which it began, without evidence of spread.
  • Regional cancer is cancer that has spread beyond the original (primary) site to nearby lymph nodes or organs and tissues.
  • Distant cancer is cancer that has spread from the primary site to distant organs or distant lymph nodes.
  • Unstaged cancer is cancer for which there is not enough information to indicate a stage.

MEMO

  • factor の並び順がアルファベット順になっているので、カスタマイズしたいところ。
  • “facets = Gender ~ Section” とやると Gender x Section の直積ファセットを作成できる
  • バーの高さは aggregate してもらう必要はないので、precompute した値を weight で渡す。
  • Pancreas Cancer は Distant 状態で約半数が検出されている

5-Year Relative Survival (Percent) 1999-2006c by Stage at Diagnosis

※Based on the SEER 17 areas (San Francisco, Connecticut, Detroit, Hawaii, Iowa, New Mexico, Seattle, Utah, Atlanta, San Jose-Monterey, Los Angeles, Alaska Native Registry, Rural Georgia, California excluding SF/SJM/LA, Kentucky, Louisiana and New Jersey). California excluding SF/SJM/LA, Kentucky, Louisiana, and New Jersey contribute cases for diagnosis years 2000-2006. The remaining 13 SEER Areas contribute cases for the entire period 1999-2006. Based on follow-up of patients into 2007.

stage_survival <- read.csv('stage-survival.csv')
head(stage_survival)
Stage.at.Diagnosis           Section     Gender Percent
1          Localized Lung and Bronchus Both Sexes      37
2           Regional Lung and Bronchus Both Sexes      26
3            Distant Lung and Bronchus Both Sexes      19
4           Unstaged Lung and Bronchus Both Sexes      18
5          Localized Lung and Bronchus      Males      37
6           Regional Lung and Bronchus      Males      27
qplot(Stage.at.Diagnosis, weight=Percent, geom="bar", data=stage_survival, facets = Gender ~ Section, fill=Stage.at.Diagnosis, ylab="Percent",
      main="5-Year Relative Survival (Percent) 1999-2006 by Stage at Diagnosis")

MEMO

  • Lung and Bronchus は Localized であれば5年相対生存率は非常に良い
  • バーの高さは aggregate してもらう必要はないので、precompute した値を weight で渡す。
Advertisements
Tagged with: , ,
Posted in life, R

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Archives
  • RT @__apf__: How to write a research paper: a guide for software engineers & practitioners. docs.google.com/presentation/d… /cc @inwyrd 4 months ago
  • RT @HayatoChiba: 昔、自然と対話しながら数学に打ち込んだら何かを悟れるのではと思いたち、専門書1つだけ持ってパワースポットで名高い奈良の山奥に1週間籠ったことがある。しかし泊まった民宿にドカベンが全巻揃っていたため、水島新司と対話しただけで1週間過ぎた。 それ… 5 months ago
  • RT @googlecloud: Ever wonder what underwater fiber optic internet cables look like? Look no further than this deep dive w/ @NatAndLo: https… 5 months ago
  • @ijin UTC+01:00 な時間帯で生活しています、、、 10 months ago
  • RT @mattcutts: Google's world-class Site Reliability Engineering team wrote a new book: amazon.com/Site-Reliabili… It's about managing produc… 1 year ago
%d bloggers like this: