An investigation into my crying patterns utilizing information I collected on myself
I’m obsessive about amassing information on myself. Each day of 2022, I stuffed out a Google Kind I made to gather information on myself, monitoring gadgets similar to whether or not I cried, exercised, drank espresso, or washed my hair. I additionally collected information from Apple Well being and Google Location Historical past to get a extra full image of my patterns and behaviors all year long. On this article, I present insights into my private experiences in 2022 by means of a mixture of all of this information
For me, 2022 was a yr of massive adjustments and new alternatives — I moved to New York, began a brand new job, and traveled to many cities. To mirror on the yr that simply occurred in a correct information scientist vogue, I mixed all of this information and analyzed it to grasp my patterns of crying — the place I cried, after I cried, how usually I cried, and a tiny little bit of perception into why I cried. The hope is that these insights will put together me for a lot of extra days of crying to return within the New Yr. (Notice: these analyses are purely for enjoyable and aren’t meant to be very rigorous. No statistical claims are made).
The article was initially posted on my weblog. If you happen to favored this text, you possibly can observe for extra related articles on https://weblog.yenniejun.com/.
An summary of the information sources
I mixed the next information:
- Apple well being information, exported right into a CSV (following directions). Included strolling pace, step asymmetry, and distance walked/ran.
- Garmin (train watch) information, exported right into a CSV. Included coronary heart fee, step rely, and flights of stairs climbed.
- Interval well being information from Flo, exported right into a CSV.
- Google Location Historical past information, obtained by means of Google Takeout. Google location information may be very granular when it comes to geography (very particular latitude and longitude coordinates) in addition to time (right down to the minute). I rounded the lat/lng to town stage and selected the most typical metropolis I used to be in per day
- Google kind survey information, exported right into a CSV. Included what sort of train I did, how a lot espresso I drank, and whether or not or not I cried.
In 2022, I cried in many alternative areas
In 2022, I cried a complete of 48 days. I break up my days amongst New York (the place I reside), Toronto (the place my accomplice lives), Seattle (the place my work is), visiting my mother and father, and touring. Almost 36% of my crying days have been in Toronto. Even accounting for the truth that I spent 49 days in Toronto (in comparison with 239 in New York), I cried far more usually in Toronto than wherever else.
I additionally checked out how usually I cried after I was in a sure location (% days crying in a metropolis divided by complete days spent in that metropolis). In comparison with different areas, I spent almost 1 / 4 of my time in Toronto crying.
So why did I cry a lot in Toronto? One purpose might be as a result of I spent January and February 2022 there — not solely was it excruciatingly chilly, however Toronto was nonetheless underneath lockdown at the moment. So that may clarify why I spent a lot time crying in Toronto. There was nothing else to do.
I waited for the weekends … to cry
Over 50% of my crying days have been on the weekend. I cried much more on Saturdays and Sundays than on the weekdays. No enjoyable weekend exercise fairly like crying.
I cried whether or not or not I exercised
Train is nice for psychological well being. I guessed that I would cry much less on the times I exercised. Above is a breakdown of sorts of train I did in several areas. I did martial arts (Muay Thai, boxing, and a singular Wing Chun class) largely in New York and Toronto whereas I did extra cardio (working, biking, climbing) throughout my visits to Seattle or to see my mother and father. Days of dancing (salsa, hip hop), strolling (a catch-all for days I walked over 10K steps however didn’t log a selected exercise), and different train (together with yoga, residence exercises, and going to the conventional gymnasium) are sprinkled among the many completely different areas.
I calculated the % of days I spent crying on days I did sure workout routines, relative to complete days I did that train. At first sight, it appears like I cried far more on days I went dancing. However in actuality, I solely went dancing 8 days in the whole yr, so the truth that I cried 2 of these days just isn’t a powerful sufficient indicator that dancing causes crying (or vice versa, that crying causes dancing? Though that will be fairly humorous).
However, I cried the least on days I did martial arts. This is smart, provided that a big purpose I’m going to the boxing gymnasium is to get all of my rage and frustration out. Even when the correlation is spurious, it’s nonetheless purpose to maintain doing what I’m doing.
I used to be shocked that I didn’t cry as a lot as I believed on days I didn’t train in any respect. I suppose it’s good to know that on days I don’t train, I don’t simply spend all my additional time crying at residence.
I cried on completely different elements of my month-to-month cycle
I spent a big a part of 2022 studying extra about how completely different elements of girls’s month-to-month cycles have an effect on temper, hormonal well being, and a lot extra. I used to be interested by what a part of my cycle I cried extra throughout. Anecdotally, it felt like I used to be all the time crying earlier than my interval began, so I hypothesized that I might see much more crying taking place proper earlier than my interval began.
I seemed on the % of time I spent crying for every day of my interval cycle. I coloured the times similar to the interval cycle: Menstrual section (days 1–5), Follicular section (days 5–14), Ovulation (days 14–15), Luteal section (days 15–28).
A big a part of crying occurs on Day 16 of my interval. That is often proper after ovulation occurs, and this makes provided that lots of hormonal fluctuation is going on then. I additionally cried lots on Days 21 and 22, which is a few week earlier than my interval begins and often after I really feel the worst PMS. I cried the least on Days 13 and 15 (proper earlier than ovulation) and Days 5, 8, and 9 (the primary few days after the menstrual section ended).
So, the truth just isn’t as clear-cut as “I cry lots earlier than my interval.” I cry throughout all elements of my 28-day cycle, however not equally on every day. I cry extra on days of higher hormonal fluctuations, similar to proper after ovulation and a earlier than the menstrual section begins once more. However I now know to take into account that the time after ovulation is one particularly prone to tears.
I couldn’t analyze my private information with out together with at the least a little bit little bit of machine studying. On this second a part of the article, I used unstructured information to additional analyze my crying habits.
I journaled each single day in 2022. I used OpenAI’s textual content embeddings to map every day’s journal right into a doc stage embedding (primarily, a listing of numbers that seize the essence of a snippet of textual content).
These embeddings are very excessive dimensional, so I used PCA to cut back the embeddings to 2 dimensions. I plotted the primary two precept elements and coloured every doc embedding based mostly on whether or not or not I cried that day. (Notice: the primary two precept elements solely defined 7% of the whole variance, which isn’t very excessive). At first look, there didn’t appear to be a lot of a transparent distinction between the embeddings for days I cried vs. days I didn’t cry. Maybe it is because the explanations for crying fluctuate otherwise for every occasion and it’s doubtless I didn’t write about crying in related methods every time it occurred.
Predicting for future crying days
Lastly, I wished to see if it have been attainable to foretell which days I might be extra more likely to cry sooner or later.
For the machine studying of us: I break up my dataset into prepare/check units based mostly on time (80% prepare, 20% check). I separated my information into coaching and testing. Within the check information, there have been solely 12 days of crying (out of 72 days). That is an instance of an imbalanced class, by which there are far more days of not crying than crying. When it comes to modeling, I stored issues easy as attainable. I used an out-of-the-box Gradient Boosting Classifier from sklearn. I attempted easier fashions, similar to logistic regression and random forest, however the outcomes have been so unhealthy I didn’t embrace these. I didn’t do any hyperparameter tuning or extra function engineering.
I constructed two classifiers. Every one predicted whether or not or not I cried on a given day:
- The primary made predictions based mostly on the entire structured options (e.g. Google location, Apple well being, survey information)
- The second made predictions based mostly on the journal embeddings
For the machine studying of us, I present the confusion matrices depicting the outcomes of every classifier. The primary mannequin (educated with out embeddings) was extra more likely to predict a day as crying, even when it wasn’t. The second mannequin (educated with embeddings) didn’t incorrectly predict a day as crying, nevertheless it additionally missed a lot of the precise days of crying. The 2 fashions didn’t differ vastly.
All of this to say — neither mannequin was superb at truly detecting precise days of crying. It’s straightforward to get a excessive accuracy by simply predicting “not crying” for on daily basis, because of the information imbalance (there have been many extra days of not crying than crying). Nonetheless, it’s tough (at the least with this early stage of modeling with out doing something fancy) to make any simplistic conclusions about clear indicators for crying. My journal entries, particularly, didn’t give clear indications for crying. This additional helps the concept every crying session is diverse in its trigger, sort, and essence. Predicting whether or not or not I’ll cry on a given day is fairly tough!
I really like New Years — it’s my favourite vacation. I really like resolving and resoluting into the brand new yr and reflecting on the previous yr. There’s one thing particular about utilizing my private information to mirror on my yr — together with my crying and train habits for 2022.
Not each perception was helpful. In line with these pie charts, I cried extra usually on days I washed my hair, on days I did artwork, and on days I drank espresso. As all three of those actions both convey me pleasure or are good for me, I’m not going to cease doing them.
If I had extra time, I might have favored to incorporate information from different elements of my life, similar to Spotify (music listening habits), Toggl (which I take advantage of to trace my working hours), and expense monitoring (the place does all my cash go?). Moreover, I might have favored to make use of Apple display information (at present not attainable to export) and sleep information (didn’t observe). These are issues I can purpose to incorporate in subsequent yr’s evaluation!