Biostat 203B Homework 4

Due Mar 8 @ 11:59PM

Author

Jiachen Ai and UID: 206182615

Display machine information:

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.3.1    fastmap_1.1.1     cli_3.6.1        
 [5] tools_4.3.1       htmltools_0.5.7   rstudioapi_0.15.0 yaml_2.3.7       
 [9] rmarkdown_2.25    knitr_1.44        jsonlite_1.8.7    xfun_0.40        
[13] digest_0.6.33     rlang_1.1.1       evaluate_0.22    

Display my machine memory.

memuse::Sys.meminfo()
Totalram:    8.000 GiB 
Freeram:   155.156 MiB 

Load database libraries and the tidyverse frontend:

library(bigrquery)
library(dbplyr)
library(DBI)
library(gt)
library(gtsummary)
library(tidyverse)
library(dplyr)
library(stringr)
library(shiny)

Q1. Compile the ICU cohort in HW3 from the Google BigQuery database

Below is an outline of steps. In this homework, we exclusively work with the BigQuery database and should not use any MIMIC data files stored on our local computer. Transform data as much as possible in BigQuery database and collect() the tibble only at the end of Q1.7.

Q1.1 Connect to BigQuery

Authenticate with BigQuery using the service account token. Please place the service account token (shared via BruinLearn) in the working directory (same folder as your qmd file). Do not add this token to your git repository.

Answer

# path to the service account token 
satoken <- "biostat-203b-2024-winter-313290ce47a6.json"

# BigQuery authentication using service account
bq_auth(path = satoken)

Connect to BigQuery database mimic4_v2_2 in GCP (Google Cloud Platform), using the project billing account biostat-203b-2024-winter.

# connect to the BigQuery database `biostat-203b-2024-winter.mimic4_v2_2`
con_bq <- dbConnect(
    bigrquery::bigquery(),
    project = "biostat-203b-2024-winter",
    dataset = "mimic4_v2_2",
    billing = "biostat-203b-2024-winter"
)
con_bq
<BigQueryConnection>
  Dataset: biostat-203b-2024-winter.mimic4_v2_2
  Billing: biostat-203b-2024-winter

List all tables in the mimic4_v2_2 database.

dbListTables(con_bq)
 [1] "admissions"         "caregiver"          "chartevents"       
 [4] "d_hcpcs"            "d_icd_diagnoses"    "d_icd_procedures"  
 [7] "d_items"            "d_labitems"         "datetimeevents"    
[10] "diagnoses_icd"      "drgcodes"           "emar"              
[13] "emar_detail"        "hcpcsevents"        "icustays"          
[16] "ingredientevents"   "inputevents"        "labevents"         
[19] "microbiologyevents" "omr"                "outputevents"      
[22] "patients"           "pharmacy"           "poe"               
[25] "poe_detail"         "prescriptions"      "procedureevents"   
[28] "procedures_icd"     "provider"           "services"          
[31] "transfers"         

Q1.2 icustays data

Connect to the icustays table.

Answer

# full ICU stays table
icustays_tble <- tbl(con_bq, "icustays") |>
  print(width = Inf)
# Source:   table<icustays> [?? x 8]
# Database: BigQueryConnection
   subject_id  hadm_id  stay_id first_careunit last_careunit 
        <int>    <int>    <int> <chr>          <chr>         
 1   10004113 29879900 35200789 Neuro Stepdown Neuro Stepdown
 2   10018328 23786647 31269608 Neuro Stepdown Neuro Stepdown
 3   10019607 24546857 34178938 Neuro Stepdown Neuro Stepdown
 4   10023994 21824032 30439823 Neuro Stepdown Neuro Stepdown
 5   10028930 26238833 33941522 Neuro Stepdown Neuro Stepdown
 6   10067921 27475639 31097260 Neuro Stepdown Neuro Stepdown
 7   10088776 26138210 39177309 Neuro Stepdown Neuro Stepdown
 8   10088776 27242516 35695123 Neuro Stepdown Neuro Stepdown
 9   10103795 20269407 30619907 Neuro Stepdown Neuro Stepdown
10   10103795 22741814 31411464 Neuro Stepdown Neuro Stepdown
   intime              outtime                los
   <dttm>              <dttm>               <dbl>
 1 2173-03-20 20:16:36 2173-03-21 21:43:59  1.06 
 2 2154-04-24 23:03:44 2154-05-02 15:55:21  7.70 
 3 2167-03-19 17:37:39 2167-03-21 13:12:28  1.82 
 4 2146-07-19 15:32:22 2146-07-20 14:05:53  0.940
 5 2172-08-28 13:11:37 2172-08-29 14:19:35  1.05 
 6 2128-07-15 00:55:00 2128-07-16 11:41:23  1.45 
 7 2156-01-01 17:29:32 2156-01-02 13:35:41  0.838
 8 2155-05-01 15:30:41 2155-05-02 14:59:41  0.978
 9 2176-09-12 00:26:56 2176-09-17 14:52:28  5.60 
10 2176-07-01 19:45:36 2176-07-11 22:07:48 10.1  
# ℹ more rows

Q1.3 admissions data

Connect to the admissions table.

Answer

# full admissions table
admissions_tble <- tbl(con_bq, "admissions") |>
  print(width = Inf)
# Source:   table<admissions> [?? x 16]
# Database: BigQueryConnection
   subject_id  hadm_id admittime           dischtime           deathtime
        <int>    <int> <dttm>              <dttm>              <dttm>   
 1   10106244 26713233 2147-05-09 10:34:00 2147-05-12 13:43:00 NA       
 2   13700703 20448599 2172-09-25 01:01:00 2172-10-03 13:25:00 NA       
 3   15443666 27961368 2168-12-30 23:30:00 2169-01-05 16:02:00 NA       
 4   16299919 26977065 2193-05-15 08:37:00 2193-05-17 16:03:00 NA       
 5   19596808 26675460 2128-06-04 16:13:00 2128-06-05 11:36:00 NA       
 6   15868868 27142491 2191-08-07 13:06:00 2191-08-11 19:55:00 NA       
 7   11532813 24829571 2160-12-18 06:59:00 2160-12-25 14:40:00 NA       
 8   14527596 28165014 2165-11-28 19:02:00 2165-12-07 17:23:00 NA       
 9   15902500 27166666 2124-10-26 19:37:00 2124-10-28 16:30:00 NA       
10   16625434 25780525 2135-09-05 17:38:00 2135-10-14 15:08:00 NA       
   admission_type    admit_provider_id admission_location    
   <chr>             <chr>             <chr>                 
 1 DIRECT EMER.      <NA>              PHYSICIAN REFERRAL    
 2 OBSERVATION ADMIT <NA>              EMERGENCY ROOM        
 3 OBSERVATION ADMIT <NA>              EMERGENCY ROOM        
 4 OBSERVATION ADMIT <NA>              EMERGENCY ROOM        
 5 EW EMER.          P0038O            EMERGENCY ROOM        
 6 URGENT            P004U7            WALK-IN/SELF REFERRAL 
 7 OBSERVATION ADMIT P004U7            EMERGENCY ROOM        
 8 URGENT            P005HP            TRANSFER FROM HOSPITAL
 9 URGENT            P005HP            TRANSFER FROM HOSPITAL
10 URGENT            P005HP            TRANSFER FROM HOSPITAL
   discharge_location       insurance language marital_status
   <chr>                    <chr>     <chr>    <chr>         
 1 HOME                     Other     ENGLISH  SINGLE        
 2 HOME                     Other     ENGLISH  MARRIED       
 3 HOME HEALTH CARE         Other     ENGLISH  SINGLE        
 4 HOSPICE                  Medicare  ENGLISH  WIDOWED       
 5 HOME                     Medicare  ENGLISH  MARRIED       
 6 HOME HEALTH CARE         Medicare  ENGLISH  SINGLE        
 7 ACUTE HOSPITAL           Medicare  ENGLISH  SINGLE        
 8 HOME                     Medicaid  ?        MARRIED       
 9 HOME                     Medicare  ENGLISH  MARRIED       
10 SKILLED NURSING FACILITY Medicare  ?        SINGLE        
   race                   edregtime           edouttime          
   <chr>                  <dttm>              <dttm>             
 1 WHITE                  NA                  NA                 
 2 WHITE                  2172-09-24 17:38:00 2172-09-25 03:07:00
 3 BLACK/AFRICAN AMERICAN 2168-12-30 11:19:00 2168-12-31 01:22:00
 4 BLACK/AFRICAN AMERICAN 2193-05-15 04:36:00 2193-05-15 14:27:00
 5 WHITE                  2128-06-04 13:25:00 2128-06-04 16:59:00
 6 WHITE                  2191-08-06 18:50:00 2191-08-07 16:30:00
 7 WHITE                  2160-12-17 16:20:00 2160-12-18 08:43:00
 8 BLACK/CARIBBEAN ISLAND NA                  NA                 
 9 BLACK/CAPE VERDEAN     NA                  NA                 
10 WHITE                  NA                  NA                 
   hospital_expire_flag
                  <int>
 1                    0
 2                    0
 3                    0
 4                    0
 5                    0
 6                    0
 7                    0
 8                    0
 9                    0
10                    0
# ℹ more rows

Q1.4 patients data

Connect to the patients table.

Answer

# full patients table
patients_tble <- tbl(con_bq, "patients") |>
  print(width = Inf)
# Source:   table<patients> [?? x 6]
# Database: BigQueryConnection
   subject_id gender anchor_age anchor_year anchor_year_group dod   
        <int> <chr>       <int>       <int> <chr>             <date>
 1   10078138 F              18        2110 2017 - 2019       NA    
 2   10180372 M              18        2110 2008 - 2010       NA    
 3   10686175 M              18        2110 2011 - 2013       NA    
 4   10851602 F              18        2110 2014 - 2016       NA    
 5   10902424 F              18        2110 2017 - 2019       NA    
 6   11092326 M              18        2110 2008 - 2010       NA    
 7   11289691 F              18        2110 2017 - 2019       NA    
 8   11595073 M              18        2110 2011 - 2013       NA    
 9   11739764 F              18        2110 2017 - 2019       NA    
10   11776346 F              18        2110 2008 - 2010       NA    
# ℹ more rows

Q1.5 labevents data

Connect to the labevents table and retrieve a subset that only contain subjects who appear in icustays_tble and the lab items listed in HW3. Only keep the last lab measurements before the ICU stay and pivot lab items to become variables/columns. Write all steps in one chain of pipes.

Answer

From HW3, the lab items listed are: creatinine (50912), potassium (50971), sodium (50983), chloride (50902), bicarbonate (50882), hematocrit (51221), white blood cell count (51301), and glucose (50931)

# read labevents table
labevents_tble <- tbl(con_bq, "labevents") |>
  
  # filter subjects contained in icustays_tble
  semi_join(icustays_tble, by = "subject_id") |>
  
  # filter lab items interested
  filter(itemid %in% c(50912, 50971, 
                       50983, 50902, 
                       50882, 51221, 
                       51301, 50931)) |>
  
  # select columns interested for the next steps
  select(subject_id, itemid, valuenum, storetime) |>
  
  # left join with icustays_tble
  # I didn't use filter() here because SQL data reading is lazy
  left_join(icustays_tble, by = c("subject_id")) |>
  
  # then, filter the last lab measurements before the ICU stay
  filter(storetime < intime) |>

  # Since I want to find the last lab measurements for 
  # each patient, each time they enter hospital, 
  # each ICU stay, and each lab item, I need to group them
  group_by(subject_id, stay_id, itemid) |> 
  
  # sort by storetime and keep the last measurement
  slice_max(order_by = storetime, n = 1) |>
  
  # ungrouping and selecting columns for pivot_wider
  ungroup() |>
  select(subject_id, stay_id, itemid, valuenum) |>
  
  
  # pivot lab items to become variables/columns
  # set names_sort = TRUE to sort the itemid
  pivot_wider(names_from = itemid, 
              values_from = valuenum,
              names_sort = TRUE) |>
  
  # arrange by subject_id and stay_id to display in order
  arrange(subject_id, stay_id) |>
  
  # rename columns with meaningful names
  rename(creatinine = `50912`, 
         potassium = `50971`, 
         sodium = `50983`,
         chloride = `50902`,
         bicarbonate = `50882`,
         hematocrit = `51221`,
         white_blood_cell_count = `51301`,
         glucose = `50931`) |>
  
  
  # print the result
  print(width = Inf)
# Source:     SQL [?? x 10]
# Database:   BigQueryConnection
# Ordered by: subject_id, stay_id
   subject_id  stay_id bicarbonate chloride creatinine glucose potassium sodium
        <int>    <int>       <dbl>    <dbl>      <dbl>   <dbl>     <dbl>  <dbl>
 1   10000032 39553978          25       95        0.7     102       6.7    126
 2   10000980 39765666          21      109        2.3      89       3.9    144
 3   10001217 34592300          30      104        0.5      87       4.1    142
 4   10001217 37067082          22      108        0.6     112       4.2    142
 5   10001725 31205490          NA       98       NA        NA       4.1    139
 6   10001884 37510196          30       88        1.1     141       4.5    130
 7   10002013 39060235          24      102        0.9     288       3.5    137
 8   10002155 31090461          23       98        2.8     117       4.9    135
 9   10002155 32358465          26       85        1.4     133       5.7    120
10   10002155 33685454          24      105        1.1     138       4.6    139
   hematocrit white_blood_cell_count
        <dbl>                  <dbl>
 1       41.1                    6.9
 2       27.3                    5.3
 3       37.4                    5.4
 4       38.1                   15.7
 5       NA                     NA  
 6       39.7                   12.2
 7       34.9                    7.2
 8       25.5                   17.9
 9       22.4                    9.8
10       39.7                    7.9
# ℹ more rows

Q1.6 chartevents data

Connect to chartevents table and retrieve a subset that only contain subjects who appear in icustays_tble and the chart events listed in HW3. Only keep the first chart events during ICU stay and pivot chart events to become variables/columns. Write all steps in one chain of pipes.

Answer

The vitals for ICU patients from HW3 are: heart rate (220045), systolic non-invasive blood pressure (220179), diastolic non-invasive blood pressure (220180), body temperature in Fahrenheit (223761), and respiratory rate (220210)

# read chartevents table
chartevents_tble <- tbl(con_bq, "chartevents") |>
  
  # filter subjects contained in icustays_tble
  semi_join(icustays_tble, by = "subject_id") |>
  
  # filter chart events interested
  filter(itemid %in% c(220045, 220179, 
                       220180, 223761,
                       220210)) |>
  
  # left join with icustays_tble
  # didn't use filter() here because SQL data reading is lazy
  left_join(icustays_tble, by = c("hadm_id", 
                                  "subject_id", 
                                  "stay_id")) |>
  
  # then, filter the first chart events during ICU stay
  filter(charttime > intime, charttime < outtime) |>
  
  # Since I want find the first chart events for
  # each patient, each time they enter hospital,
  # each ICU stay, and each chart item, I need to group them
  group_by(subject_id, stay_id, itemid) |>
  
  # sort by charttime and keep the first measurement
  slice_min(order_by = charttime, n = 1) |>
  
  # ungrouping and selecting columns for pivot_wider
  ungroup() |>
  select(subject_id, stay_id, itemid, valuenum) |>
  
  # pivot chart events to become variables/columns
  # set names_sort = TRUE to sort the itemid
  pivot_wider(names_from = itemid, 
              values_from = valuenum,
              names_sort = TRUE) |>
  
  # arrange by subject_id and stay_id to display in order
  arrange(subject_id, stay_id) |>
  
  # rename columns with meaningful names
  rename(heart_rate = `220045`,
         systolic_non_invasive_blood_pressure = `220179`, 
         diastolic_non_invasive_blood_pressure = `220180`, 
         temperature_in_Fahrenheit = `223761`, 
         respiratory_rate = `220210`)  |>
  
  
  # print the result
  print(width = Inf)
# Source:     SQL [?? x 7]
# Database:   BigQueryConnection
# Ordered by: subject_id, stay_id
   subject_id  stay_id heart_rate systolic_non_invasive_blood_pressure
        <int>    <int>      <dbl>                                <dbl>
 1   10000032 39553978         91                                   84
 2   10000980 39765666         77                                  150
 3   10001217 34592300         96                                  167
 4   10001217 37067082         86                                  151
 5   10001725 31205490         55                                   73
 6   10001884 37510196         38                                  180
 7   10002013 39060235         80                                  104
 8   10002155 31090461         94                                  118
 9   10002155 32358465         98                                  109
10   10002155 33685454         68                                  126
   diastolic_non_invasive_blood_pressure respiratory_rate
                                   <dbl>            <dbl>
 1                                    48               24
 2                                    77               23
 3                                    95               11
 4                                    90               18
 5                                    56               19
 6                                    12               10
 7                                    70               14
 8                                    51               18
 9                                    65               23
10                                    61               18
   temperature_in_Fahrenheit
                       <dbl>
 1                      98.7
 2                      98  
 3                      97.6
 4                      98.5
 5                      97.7
 6                      98.1
 7                      97.2
 8                      96.9
 9                      97.7
10                      95.9
# ℹ more rows

Q1.7 Put things together

This step is similar to Q7 of HW3. Using one chain of pipes |> to perform following data wrangling steps: (i) start with the icustays_tble, (ii) merge in admissions and patients tables, (iii) keep adults only (age at ICU intime >= 18), (iv) merge in the labevents and chartevents tables, (v) collect the tibble.

Answer

# (i) start with the icustays_tble
mimic_icu_cohort <- icustays_tble |>
  
  
  # (ii) merge in admissions and patients tables
  
  # merge in admissions table by subject_id and hadm_id
  left_join(admissions_tble, by = c("subject_id", 
                                    "hadm_id")) |>
  
  # merge in patients table by subject_id
  left_join(patients_tble, by = "subject_id") |>
  
  
  # (iii) keep adults only (age at ICU intime >= 18)

  # calculate age at ICU intime
  mutate(age_intime =
           year(intime) - anchor_year + anchor_age) |>
  
  # convert age_intime to integer for comparison
  mutate(age_intime = as.integer(age_intime)) |>

  # keep age at ICU intime >= 18
  filter(age_intime >= 18) |>
  
  
  # (iv) merge in the labevents and chartevents tables
  
  # merge in the labevents by subject_id and stay_id
  left_join(labevents_tble, by = c("subject_id", "stay_id")) |>
  
  # merge in the chartevents by subject_id and stay_id
  left_join(chartevents_tble, by = c("subject_id", "stay_id")) |>
  
  
  # (v) collect the tibble
  
  # collect the tibble
  collect() |>

  # arrange by subject_id, hadm_id, and stay_id for display
  arrange(subject_id, hadm_id, stay_id) |>

  # print the result
  print(width = Inf)
Warning: ORDER BY is ignored in subqueries without LIMIT
ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
ORDER BY is ignored in subqueries without LIMIT
ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
# A tibble: 73,181 × 41
   subject_id  hadm_id  stay_id first_careunit                                  
        <int>    <int>    <int> <chr>                                           
 1   10000032 29079034 39553978 Medical Intensive Care Unit (MICU)              
 2   10000980 26913865 39765666 Medical Intensive Care Unit (MICU)              
 3   10001217 24597018 37067082 Surgical Intensive Care Unit (SICU)             
 4   10001217 27703517 34592300 Surgical Intensive Care Unit (SICU)             
 5   10001725 25563031 31205490 Medical/Surgical Intensive Care Unit (MICU/SICU)
 6   10001884 26184834 37510196 Medical Intensive Care Unit (MICU)              
 7   10002013 23581541 39060235 Cardiac Vascular Intensive Care Unit (CVICU)    
 8   10002155 20345487 32358465 Medical Intensive Care Unit (MICU)              
 9   10002155 23822395 33685454 Coronary Care Unit (CCU)                        
10   10002155 28994087 31090461 Medical/Surgical Intensive Care Unit (MICU/SICU)
   last_careunit                                    intime             
   <chr>                                            <dttm>             
 1 Medical Intensive Care Unit (MICU)               2180-07-23 14:00:00
 2 Medical Intensive Care Unit (MICU)               2189-06-27 08:42:00
 3 Surgical Intensive Care Unit (SICU)              2157-11-20 19:18:02
 4 Surgical Intensive Care Unit (SICU)              2157-12-19 15:42:24
 5 Medical/Surgical Intensive Care Unit (MICU/SICU) 2110-04-11 15:52:22
 6 Medical Intensive Care Unit (MICU)               2131-01-11 04:20:05
 7 Cardiac Vascular Intensive Care Unit (CVICU)     2160-05-18 10:00:53
 8 Medical Intensive Care Unit (MICU)               2131-03-09 21:33:00
 9 Coronary Care Unit (CCU)                         2129-08-04 12:45:00
10 Medical/Surgical Intensive Care Unit (MICU/SICU) 2130-09-24 00:50:00
   outtime               los admittime           dischtime          
   <dttm>              <dbl> <dttm>              <dttm>             
 1 2180-07-23 23:50:47 0.410 2180-07-23 12:35:00 2180-07-25 17:55:00
 2 2189-06-27 20:38:27 0.498 2189-06-27 07:38:00 2189-07-03 03:00:00
 3 2157-11-21 22:08:00 1.12  2157-11-18 22:56:00 2157-11-25 18:00:00
 4 2157-12-20 14:27:41 0.948 2157-12-18 16:58:00 2157-12-24 14:55:00
 5 2110-04-12 23:59:56 1.34  2110-04-11 15:08:00 2110-04-14 15:00:00
 6 2131-01-20 08:27:30 9.17  2131-01-07 20:39:00 2131-01-20 05:15:00
 7 2160-05-19 17:33:33 1.31  2160-05-18 07:45:00 2160-05-23 13:30:00
 8 2131-03-10 18:09:21 0.859 2131-03-09 20:33:00 2131-03-10 01:55:00
 9 2129-08-10 17:02:38 6.18  2129-08-04 12:44:00 2129-08-18 16:53:00
10 2130-09-27 22:13:41 3.89  2130-09-23 21:59:00 2130-09-29 18:55:00
   deathtime           admission_type              admit_provider_id
   <dttm>              <chr>                       <chr>            
 1 NA                  EW EMER.                    P30KEH           
 2 NA                  EW EMER.                    P30KEH           
 3 NA                  EW EMER.                    P4645A           
 4 NA                  DIRECT EMER.                P99698           
 5 NA                  EW EMER.                    P35SU0           
 6 2131-01-20 05:15:00 OBSERVATION ADMIT           P874LG           
 7 NA                  SURGICAL SAME DAY ADMISSION P47E1G           
 8 2131-03-10 21:53:00 EW EMER.                    P80515           
 9 NA                  EW EMER.                    P05HUO           
10 NA                  EW EMER.                    P3529J           
   admission_location discharge_location           insurance language
   <chr>              <chr>                        <chr>     <chr>   
 1 EMERGENCY ROOM     HOME                         Medicaid  ENGLISH 
 2 EMERGENCY ROOM     HOME HEALTH CARE             Medicare  ENGLISH 
 3 EMERGENCY ROOM     HOME HEALTH CARE             Other     ?       
 4 PHYSICIAN REFERRAL HOME HEALTH CARE             Other     ?       
 5 PACU               HOME                         Other     ENGLISH 
 6 EMERGENCY ROOM     DIED                         Medicare  ENGLISH 
 7 PHYSICIAN REFERRAL HOME HEALTH CARE             Medicare  ENGLISH 
 8 EMERGENCY ROOM     DIED                         Other     ENGLISH 
 9 PROCEDURE SITE     CHRONIC/LONG TERM ACUTE CARE Other     ENGLISH 
10 EMERGENCY ROOM     HOME HEALTH CARE             Other     ENGLISH 
   marital_status race                   edregtime           edouttime          
   <chr>          <chr>                  <dttm>              <dttm>             
 1 WIDOWED        WHITE                  2180-07-23 05:54:00 2180-07-23 14:00:00
 2 MARRIED        BLACK/AFRICAN AMERICAN 2189-06-27 06:25:00 2189-06-27 08:42:00
 3 MARRIED        WHITE                  2157-11-18 17:38:00 2157-11-19 01:24:00
 4 MARRIED        WHITE                  NA                  NA                 
 5 MARRIED        WHITE                  NA                  NA                 
 6 MARRIED        BLACK/AFRICAN AMERICAN 2131-01-07 13:36:00 2131-01-07 22:13:00
 7 SINGLE         OTHER                  NA                  NA                 
 8 MARRIED        WHITE                  2131-03-09 19:14:00 2131-03-09 21:33:00
 9 MARRIED        WHITE                  2129-08-04 11:00:00 2129-08-04 12:35:00
10 MARRIED        WHITE                  2130-09-23 19:59:00 2130-09-24 00:50:00
   hospital_expire_flag gender anchor_age anchor_year anchor_year_group
                  <int> <chr>       <int>       <int> <chr>            
 1                    0 F              52        2180 2014 - 2016      
 2                    0 F              73        2186 2008 - 2010      
 3                    0 F              55        2157 2011 - 2013      
 4                    0 F              55        2157 2011 - 2013      
 5                    0 F              46        2110 2011 - 2013      
 6                    1 F              68        2122 2008 - 2010      
 7                    0 F              53        2156 2008 - 2010      
 8                    1 F              80        2128 2008 - 2010      
 9                    0 F              80        2128 2008 - 2010      
10                    0 F              80        2128 2008 - 2010      
   dod        age_intime bicarbonate chloride creatinine glucose potassium
   <date>          <int>       <dbl>    <dbl>      <dbl>   <dbl>     <dbl>
 1 2180-09-09         52          25       95        0.7     102       6.7
 2 2193-08-26         76          21      109        2.3      89       3.9
 3 NA                 55          22      108        0.6     112       4.2
 4 NA                 55          30      104        0.5      87       4.1
 5 NA                 46          NA       98       NA        NA       4.1
 6 2131-01-20         77          30       88        1.1     141       4.5
 7 NA                 57          24      102        0.9     288       3.5
 8 2131-03-10         83          26       85        1.4     133       5.7
 9 2131-03-10         81          24      105        1.1     138       4.6
10 2131-03-10         82          23       98        2.8     117       4.9
   sodium hematocrit white_blood_cell_count heart_rate
    <dbl>      <dbl>                  <dbl>      <dbl>
 1    126       41.1                    6.9         91
 2    144       27.3                    5.3         77
 3    142       38.1                   15.7         86
 4    142       37.4                    5.4         96
 5    139       NA                     NA           55
 6    130       39.7                   12.2         38
 7    137       34.9                    7.2         80
 8    120       22.4                    9.8         98
 9    139       39.7                    7.9         68
10    135       25.5                   17.9         94
   systolic_non_invasive_blood_pressure diastolic_non_invasive_blood_pressure
                                  <dbl>                                 <dbl>
 1                                   84                                    48
 2                                  150                                    77
 3                                  151                                    90
 4                                  167                                    95
 5                                   73                                    56
 6                                  180                                    12
 7                                  104                                    70
 8                                  109                                    65
 9                                  126                                    61
10                                  118                                    51
   respiratory_rate temperature_in_Fahrenheit
              <dbl>                     <dbl>
 1               24                      98.7
 2               23                      98  
 3               18                      98.5
 4               11                      97.6
 5               19                      97.7
 6               10                      98.1
 7               14                      97.2
 8               23                      97.7
 9               18                      95.9
10               18                      96.9
# ℹ 73,171 more rows

Q1.8 Preprocessing

Perform the following preprocessing steps. (i) Lump infrequent levels into “Other” level for first_careunit, last_careunit, admission_type, admission_location, and discharge_location. (ii) Collapse the levels of race into ASIAN, BLACK, HISPANIC, WHITE, and Other. (iii) Create a new variable los_long that is TRUE when los is greater than or equal to 2 days. (iv) Summarize the data using tbl_summary(), stratified by los_long. Hint: fct_lump and fct_collapse from the forcats package can be useful.

Hint: Below is a numerical summary of my tibble after preprocessing:

<iframe width=95% height="500" 
src="./mimic_icu_cohort_gtsummary.html">
  </iframe>

Answer

mimic_icu_cohort_summary <- mimic_icu_cohort |>
  
  # preparation: select columns for tbl_summary
  select(first_careunit, last_careunit, los,
         admission_type, admission_location, 
         discharge_location, insurance,
         language, marital_status,
         race, hospital_expire_flag,
         gender, dod, sodium, chloride,
         creatinine, potassium, glucose,
         hematocrit, white_blood_cell_count,
         bicarbonate,
         temperature_in_Fahrenheit, 
         diastolic_non_invasive_blood_pressure,
         respiratory_rate,
         systolic_non_invasive_blood_pressure,
         heart_rate, age_intime) |>
  
  
  # (i) lump infrequent levels into "Other" level
  
  # according to the hint, first_careunit, last_careunit, 
  # admission_type, admission_location, and discharge_location 
  # should keep the top 4 frequent levels and lump the rest into "Other"
  mutate(across(c(first_careunit, last_careunit,
                  admission_type, discharge_location), 
                ~ fct_lump_n(., n = 4, other_level = "Other"))) |>
  
  # according to the hint, admission_location
  # should keep the top 3 frequent levels and lump the rest into "Other"
  mutate(across(c(admission_location),
                ~ fct_lump_n(., n = 3, other_level = "Other"))) |>
  
  
  # (ii) collapse the levels of race into 
  # ASIAN, BLACK, HISPANIC, WHITE, and Other
  
  # using regular expression to collapse the levels
  mutate(race = case_when(
    str_detect(race, "ASIAN") ~ "ASIAN",
    str_detect(race, "BLACK") ~ "BLACK",
    str_detect(race, "HISPANIC") ~ "HISPANIC",
    str_detect(race, "WHITE") ~ "WHITE",
    TRUE ~ "Other")) |>
  
  
  # (iii) create a new variable los_long that is TRUE when 
  # los is greater than or equal to 2 days
  mutate(los_long = los >= 2) |>
  
  # summarize the data using tbl_summary(), stratified by los_long
  tbl_summary(by = los_long) 
  
mimic_icu_cohort_summary
Characteristic FALSE, N = 38,0501 TRUE, N = 35,1311
first_careunit

    Cardiac Vascular Intensive Care Unit (CVICU) 5,827 (15%) 5,755 (16%)
    Medical Intensive Care Unit (MICU) 8,782 (23%) 7,116 (20%)
    Medical/Surgical Intensive Care Unit (MICU/SICU) 7,147 (19%) 5,586 (16%)
    Surgical Intensive Care Unit (SICU) 5,654 (15%) 5,507 (16%)
    Other 10,640 (28%) 11,167 (32%)
last_careunit

    Cardiac Vascular Intensive Care Unit (CVICU) 5,773 (15%) 5,614 (16%)
    Medical Intensive Care Unit (MICU) 8,929 (23%) 7,903 (22%)
    Medical/Surgical Intensive Care Unit (MICU/SICU) 7,051 (19%) 5,303 (15%)
    Surgical Intensive Care Unit (SICU) 5,665 (15%) 5,344 (15%)
    Other 10,632 (28%) 10,967 (31%)
los 1.1 (0.8, 1.5) 3.8 (2.7, 6.6)
admission_type

    EW EMER. 20,735 (54%) 17,937 (51%)
    OBSERVATION ADMIT 4,329 (11%) 4,645 (13%)
    SURGICAL SAME DAY ADMISSION 4,340 (11%) 3,033 (8.6%)
    URGENT 5,494 (14%) 6,959 (20%)
    Other 3,152 (8.3%) 2,557 (7.3%)
admission_location

    EMERGENCY ROOM 19,602 (52%) 16,279 (46%)
    PHYSICIAN REFERRAL 8,986 (24%) 7,412 (21%)
    TRANSFER FROM HOSPITAL 6,839 (18%) 8,959 (26%)
    Other 2,623 (6.9%) 2,481 (7.1%)
discharge_location

    DIED 3,298 (8.8%) 5,006 (14%)
    HOME 12,192 (33%) 5,255 (15%)
    HOME HEALTH CARE 10,347 (28%) 7,762 (22%)
    SKILLED NURSING FACILITY 6,239 (17%) 7,158 (20%)
    Other 5,402 (14%) 9,885 (28%)
    Unknown 572 65
insurance

    Medicaid 3,060 (8.0%) 2,468 (7.0%)
    Medicare 16,489 (43%) 16,602 (47%)
    Other 18,501 (49%) 16,061 (46%)
language

    ? 3,692 (9.7%) 3,751 (11%)
    ENGLISH 34,358 (90%) 31,380 (89%)
marital_status

    DIVORCED 2,843 (8.0%) 2,561 (7.9%)
    MARRIED 16,984 (48%) 15,784 (49%)
    SINGLE 11,175 (31%) 9,683 (30%)
    WIDOWED 4,699 (13%) 4,339 (13%)
    Unknown 2,349 2,764
race

    ASIAN 1,148 (3.0%) 1,007 (2.9%)
    BLACK 4,311 (11%) 3,649 (10%)
    HISPANIC 1,492 (3.9%) 1,249 (3.6%)
    Other 5,160 (14%) 5,596 (16%)
    WHITE 25,939 (68%) 23,630 (67%)
hospital_expire_flag 3,357 (8.8%) 4,972 (14%)
gender

    F 17,014 (45%) 15,349 (44%)
    M 21,036 (55%) 19,782 (56%)
dod 2110-02-08 to 2210-12-04 2110-01-25 to 2211-01-17
    Unknown 24,904 20,003
sodium 139.0 (136.0, 141.0) 138.0 (135.0, 141.0)
    Unknown 4,174 4,698
chloride 102 (98, 105) 102 (98, 105)
    Unknown 4,175 4,708
creatinine 1.00 (0.80, 1.40) 1.00 (0.80, 1.60)
    Unknown 2,599 3,171
potassium 4.20 (3.80, 4.60) 4.20 (3.80, 4.70)
    Unknown 4,189 4,712
glucose 118 (98, 153) 122 (100, 159)
    Unknown 4,282 4,817
hematocrit 36 (30, 40) 35 (29, 40)
    Unknown 2,236 2,781
white_blood_cell_count 9.0 (6.6, 12.6) 9.7 (7.0, 13.8)
    Unknown 2,272 2,822
bicarbonate 25.0 (22.0, 27.0) 24.0 (21.0, 27.0)
    Unknown 4,270 4,780
temperature_in_Fahrenheit 98.10 (97.60, 98.60) 98.20 (97.60, 98.80)
    Unknown 1,192 173
diastolic_non_invasive_blood_pressure 68 (57, 80) 66 (55, 79)
    Unknown 734 250
respiratory_rate 18.0 (15.0, 22.0) 19.0 (15.0, 23.0)
    Unknown 87 11
systolic_non_invasive_blood_pressure 122 (107, 139) 120 (104, 138)
    Unknown 733 247
heart_rate 85 (74, 99) 88 (76, 103)
    Unknown 19 1
age_intime 65 (53, 77) 67 (56, 78)
1 n (%); Median (IQR); Range

Q1.9 Save the final tibble

Save the final tibble to an R data file mimic_icu_cohort.rds in the mimiciv_shiny folder.

Answer

# make a directory mimiciv_shiny
if (!dir.exists("mimiciv_shiny")) {
  dir.create("mimiciv_shiny")
}
# save the final tibble
mimic_icu_cohort |>
  write_rds("mimiciv_shiny/mimic_icu_cohort.rds", compress = "gz")

Close database connection and clear workspace.

if (exists("con_bq")) {
  dbDisconnect(con_bq)
}
rm(list = ls())

Although it is not a good practice to add big data files to git, for grading purpose, please add mimic_icu_cohort.rds to your git repository.

Q2. Shiny app

Develop a Shiny app for exploring the ICU cohort data created in Q1. The app should reside in the mimiciv_shiny folder. The app should contain at least two tabs. One tab provides easy access to the graphical and numerical summaries of variables (demographics, lab measurements, vitals) in the ICU cohort. The other allows user to choose a specific patient in the cohort and display the patient’s ADT and ICU stay information as we did in Q1 of HW3.

Answer

I developed a Shiny app, called “app.R”, for exploring the ICU cohort data. The app is in the mimiciv_shiny folder.