Readers

Defines several methods for analyzing, plotting, and exporting wereable data, including a Pandas accessor for wereable dataframes

Overview

The circadian.readers module contains several methods for working with wereable data such as step counts, heart rate, and sleep. It also defines a Pandas accessor called WereableData to standardize and validate wereable dataframes.

Loading wereable data

The circadian.readers module provides functionality to import files in several formats, including raw CSV counts, JSON files, and data coming from Actiwatch readers in CSV format. For example, to load a CSV file with heart rate data we can do:

from circadian.readers import load_csv
file_path = 'circadian/sample_data/hr_data.csv'
df_hr = load_csv(file_path, timestamp_col='timestamp')
heartrate timestamp datetime
0 79.0 4.688359e+07 1971-06-27 15:13:12.693424232
1 80.0 4.688329e+07 1971-06-27 15:08:09.693448064
2 81.0 4.688306e+07 1971-06-27 15:04:20.692736632
3 80.0 4.688273e+07 1971-06-27 14:58:46.686474800
4 85.0 4.688257e+07 1971-06-27 14:56:08.187120912
... ... ... ...
99995 97.0 3.271680e+07 1971-01-14 15:59:56.779711960
99996 95.0 3.271679e+07 1971-01-14 15:59:49.779711960
99997 95.0 3.271679e+07 1971-01-14 15:59:48.779711960
99998 95.0 3.271678e+07 1971-01-14 15:59:43.779711960
99999 93.0 3.271677e+07 1971-01-14 15:59:34.779711960

100000 rows × 3 columns

by indicating which column contains the unix timestamp information, load_csv automatically generates a new column with the datetime information. If no timestamp column is provided, it is assumed that a column named ‘datetime’ (or ‘start’ and ‘end’) is present in the file. For data specified via time intervals, such as step counts, no new column is generated and the user can choose how to process the data. For example, to load a CSV file with step counts we can do:

file_path = 'circadian/sample_data/steps_data.csv'
df_steps = load_csv(file_path)
start end steps
0 1970-01-01 00:00:00 1970-01-01 00:01:00 21.000000
1 1970-01-01 00:49:00 1970-01-01 00:50:00 8.183578
2 1970-01-01 00:50:00 1970-01-01 00:51:00 19.816422
3 1970-01-01 01:51:00 1970-01-01 01:52:00 0.571419
4 1970-01-01 01:52:00 1970-01-01 01:53:00 26.499032
... ... ... ...
222765 1971-06-27 14:24:00 1971-06-27 14:25:00 28.006870
222766 1971-06-27 14:25:00 1971-06-27 14:26:00 15.957981
222767 1971-06-27 14:26:00 1971-06-27 14:27:00 14.000000
222768 1971-06-27 14:37:00 1971-06-27 14:38:00 72.642453
222769 1971-06-27 14:38:00 1971-06-27 14:39:00 31.995192

222770 rows × 3 columns

Additionally, we can import data in JSON format. For example, to load a JSON file with multiple streams of wereable data we can do:

file_path = 'circadian/sample_data/sample_data.json'
df_dict = load_json(file_path)
print(df_dict.keys())
dict_keys(['wake', 'steps', 'heartrate'])

where df_dict is a dictionary with the dataframes for each stream. The keys of the dictionary are the names of the streams. For example, to access the dataframe with the wake data we can do:

df_wake = df_dict['wake']
start end wake
0 1970-02-03 04:49:01.000000 1970-02-03 09:01:00.000000 0
1 1970-02-03 09:02:00.000000 1970-02-03 11:25:00.000000 0
2 1970-02-04 04:51:01.000000 1970-02-04 12:35:00.000000 0
3 1970-02-04 12:36:00.000000 1970-02-04 12:37:00.000000 0
4 1970-02-04 12:38:00.000000 1970-02-04 12:39:00.000000 0
... ... ... ...
2750 1971-06-27 07:38:31.105829 1971-06-27 08:01:01.105829 0
2751 1971-06-27 08:03:01.105829 1971-06-27 08:55:31.105829 0
2752 1971-06-27 09:05:31.105829 1971-06-27 09:07:01.105829 0
2753 1971-06-27 09:08:01.105829 1971-06-27 12:06:01.105829 0
2754 1971-06-27 12:08:01.105829 1971-06-27 12:15:31.105829 0

2755 rows × 3 columns

The circadian.readers module only accepts specific column names for wereable data. The accepted column names are stored in VALID_WEREABLE_STREAMS:

['steps', 'heartrate', 'wake', 'light_estimate', 'activity']

Finally, we can import data from Actiwatch readers. For example, to load a CSV file with data from an Actiwatch reader we can do:

file_path = 'circadian/sample_data/sample_actiwatch.csv'
df_actiwatch = load_actiwatch(file_path)
activity light_estimate wake datetime
0 91.0 318.16 1.0 2019-02-20 12:32:00
1 125.0 285.38 1.0 2019-02-20 12:32:30
2 154.0 312.05 1.0 2019-02-20 12:33:00
3 424.0 294.61 1.0 2019-02-20 12:33:30
4 385.0 285.06 1.0 2019-02-20 12:34:00
... ... ... ... ...
55646 0.0 5.02 0.0 2019-03-11 08:15:00
55647 56.0 4.56 1.0 2019-03-11 08:15:30
55648 30.0 2.85 1.0 2019-03-11 08:16:00
55649 9.0 2.39 0.0 2019-03-11 08:16:30
55650 2.0 2.20 NaN 2019-03-11 08:17:00

55651 rows × 4 columns

note that load_actiwatch automatically generates a new column with the datetime information and standardizes column names.

Resampling wereable data

The circadian.readers module provides functionality to resample both data that is specified via time intervals or via timestamps. For example, to resample a dataframe with step counts we can do:

name = 'steps'
resample_freq = '1D'
agg_method = 'sum'
resampled_steps = resample_df(df_steps, name, resample_freq, agg_method)
datetime steps
0 1970-01-01 847.000000
1 1970-01-02 1097.000000
2 1970-01-03 1064.000000
3 1970-01-04 2076.000000
4 1970-01-05 2007.000000
... ... ...
538 1971-06-23 9372.098478
539 1971-06-24 10142.402971
540 1971-06-25 15012.305396
541 1971-06-26 5747.457876
542 1971-06-27 3823.642766

543 rows × 2 columns

where resample_freq is a string indicating the frequency of the resampling in Pandas offset aliases notation. Under name, the column to be resampled is specified and the agg_method parameter indicates how to aggregate the data.

Combining wereable data

We can combine wereable data from different streams into a single dataframe. To achieve this we can use the combine_wereable_dataframes method which resamples and aggregates data to produce a dataframe with a single datetime index and columns for each stream. For example, to combine all the loaded dataframes from the previous section we would do:

df_dict = {
    'heartrate': df_hr,
    'steps': df_steps,
    'wake': df_wake
}
resample_freq = '1D'
combined_data = combine_wereable_dataframes(df_dict, resample_freq)
datetime heartrate steps wake
0 1970-01-04 0.000000 16188.000000 0.0
1 1970-01-11 0.000000 19199.000000 0.0
2 1970-01-18 0.000000 17888.000000 0.0
3 1970-01-25 0.000000 31880.133432 0.0
4 1970-02-01 0.000000 55150.172358 0.0
... ... ... ... ...
73 1971-05-30 79.914844 63341.399888 0.0
74 1971-06-06 97.080529 96297.437512 0.0
75 1971-06-13 93.772603 58357.605829 0.0
76 1971-06-20 99.018829 75479.093737 0.0
77 1971-06-27 97.370401 3823.642766 0.0

78 rows × 4 columns

For resampling, each wereable stream has a defaul aggregation method. The default methods are defined in the variable WEREABLE_RESAMPLE_METHOD:

{'steps': 'sum', 'wake': 'max', 'heartrate': 'mean', 'light_estimate': 'mean', 'activity': 'mean'}