Day 108
Quotes
Get things out of your head and into a system that you fully trust. Everything you do should have positive value – it’s either improving you (I put self care and genuine leisure time in here, but not time wasting), improving a relationship, making money, or making one of those other things more efficient. Do high energy and high focus things when you actually have energy and focus; do mindless things when you feel mindless. Do not skimp on self-care, which includes genuine leisure time, good healthy food, exercise, good personal relationships, and adequate sleep. Aim for the “flow state” in everything you do, because you’ll never be better than when you’re so engaged that you lose track of time and place and just get lost in the moment. (How I get things done)
I find that forcing myself to think about those things at the pace of my handwriting brings a ton of clarity to the ideas I’m struggling with or the life issues I’m trying to figure out. (same source)
it’s easy to sleep well when you get up early and work hard. (same source)
“No more yes. It’s either HELL YEAH! or no.” — Derek Sivers
Random
I need a system to consistently track things I’m trying to optimize in my life. Today I already read N articles about excellent things I can do with my life, and usually it would end at it. Probably the first in line would be reinforcement and mental contrasting.
On a certain level we actually bump aganst the infinitely familiar thing about not knowing what I want.
The plan
- From now on, if I read something motivational in the morning, it should be one thing. And focus on it, think on it, only on it.
DNB and Typing
460 cpm 98% d4b 14% Thu 18 Apr 2019 12:54:55 PM CEST d4b 0% Thu 18 Apr 2019 12:56:50 PM CEST d4b 11% Thu 18 Apr 2019 12:58:46 PM CEST d3b 85% Thu 18 Apr 2019 01:00:22 PM CEST ! d4b 50% Thu 18 Apr 2019 01:03:42 PM CEST d4b 17% Thu 18 Apr 2019 01:05:37 PM CEST d4b 50% Thu 18 Apr 2019 01:07:32 PM CEST d4b 61% Thu 18 Apr 2019 01:09:28 PM CEST d4b 67% Thu 18 Apr 2019 01:11:25 PM CEST d4b 50% Thu 18 Apr 2019 01:13:19 PM CEST
Pandas
I’m familiar with most of this, but since I find myself googling it every time, I’ll just write it here, so I’ll know where to loo.
Scipy-lectures.org
Scipy Lecture Notes seems like a very interesting place.
Concatenate dafaframes
pd.concat([d, dd])
concatenates them leaving the same columns.
pd.concat([d, dd], ignore_index=True)
concatenates them leaving the same columns and having a common id
column.
pd.concat([d, dd], axis=1)
merges them horizontally, that is there will be all the columns from the input dataframes.
Seaborn multiple distplots on the same graph
Seaborn plt and labeling
Apparently sns.plt
is a bug which has been fixed. Nice. Regardless, the new correct way is import matplotlib.pyplot as plt; plt....
.
Pandas multiple conditions filtering
dsa[ (dsa.char_count>190) & (dsa.char_count<220) ]
Jupyter – making cells 100% wide
from IPython.core.display import display, HTML display(HTML("<style>.container { width:100% !important; }</style>"))
inside a cell (SO)
Thesis
I have my semi-final dataset, today I’ll clean it, analyze, and output it to some clean.csv
file. Along with creating a script that cleans the data, for all the repetitive things I’ll have to do.
Analyzing the dataset
0418-analysis-of-final-dataset
.
What I did
- Added quite a lot of features.
token_count
!=pos_count
.- Counts of POS are relative.
- Currently I have many more UK tweets than others - but I should have at least 10000 tweets for each language.
Interesting stuff
- Twitter does not count @replies in its character count
- This is why sometimes we get such bundles of joy of 964 characters:
{%raw%}’@FragrantFrog @BourgeoisViews @SimonHowell7 @Mr_Bo_Jangles_1 @Joysetruth @Caesar2207 @NancyParks8 @thetruthnessie @carmarsutra @Esjabe1 @DavidHuddo @rob22_re @lindale70139487 @anotherviv @AndyFish19 @Jules1602xx @EricaCantona7 @grand___wazoo @PollyGraph69 @CruftMs @ZaneZeleti @McCannFacts @ditsy_chick @Andreamariapre2 @barragirl49 @MancunianMEDlC @rambojambo9 @MrDelorean2 @Nadalena @LoverandomIeigh @cattywhites2 @Millsyj73 @strackers74 @may_shazzy @JBLittlemore @Tassie666 @justjulescolson @regretkay @Chinado59513358 @Louise42368296 @TypRussell @Anvil161Anvil16 @DuskatChristie @McCannCaseTweet @noseybugger1 @HilaryDean15 @DesireeLWiggin1 @M47Jakeman @crocodi11276514 @jonj85014 If it was in the Scenic several weeks after she was reported missing.Her body must have been put there.!\nWho by ?The people who hired the Scenic ! How hard is that to understand ?\nThis algorithmic software gives a probability of the identity of each contributer to the sample !\n😏’{%endraw%}
- Otherwise, we get a pretty similar distribution. Except also the 200 characters effect that’s especially pronounced in SA - do they use old clients or something similar?
Now playing: The Godfather II Soundtrack
Possible ideas for additional cleanup
- I can just remove from the text the all the @mentions except the first two. That would still give me a difference between replying to one or to multiple people, but I would assume would fare much better with various NLI stuffs.