In the middle of the desert you can say anything you want
Inspecting the importance of features when running Random Forest:
feature_importances = pd.DataFrame(rf.feature_importances_, index = X_train.columns, columns=['importance']).sort_values('importance', ascending=False)
df.shuffle(frac=1)
uses the shuffle function for this.
It’s kinda logical, but if I group stuff, it gets saved in the same order.
d3b 79% Sat 20 Apr 2019 11:18:34 AM CESTh d3b 71% Sat 20 Apr 2019 11:20:10 AM CEST d3b 71% Sat 20 Apr 2019 11:21:44 AM CEST d3b 100% Sat 20 Apr 2019 11:23:16 AM CEST d4b 56% Sat 20 Apr 2019 11:25:31 AM CEST d4b 50% Sat 20 Apr 2019 11:27:26 AM CEST d4b 50% Sat 20 Apr 2019 11:29:24 AM CEST d4b 17% Sat 20 Apr 2019 11:31:18 AM CEST d4b 40% Sat 20 Apr 2019 11:33:13 AM CEST d4b 50% Sat 20 Apr 2019 11:35:15 AM CEST d4b 56% Sat 20 Apr 2019 11:37:06 AM CEST
What would happen if I actually used them as one of my features, leaving the non-stopwords text alone? Here’s a long list
sklearn.preprocessing.LabelEncoder
for converting categorical data to a numerical format.
>>> from sklearn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6])
LabelEncoder()
>>> le.classes_
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6])
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2])
array([1, 1, 2, 6])
Can I use some of the insights/methods/ideas from stylometry for this? (After reading this article about Beowulf.
Will become a problem. I can just remove all tweets containing any quotes symbols('
, "
) after checking how many are there.
Get things out of your head and into a system that you fully trust. Everything you do should have positive value – it’s either improving you (I put self care and genuine leisure time in here, but not time wasting), improving a relationship, making money, or making one of those other things more efficient. Do high energy and high focus things when you actually have energy and focus; do mindless things when you feel mindless. Do not skimp on self-care, which includes genuine leisure time, good healthy food, exercise, good personal relationships, and adequate sleep. Aim for the “flow state” in everything you do, because you’ll never be better than when you’re so engaged that you lose track of time and place and just get lost in the moment. (How I get things done)
I find that forcing myself to think about those things at the pace of my handwriting brings a ton of clarity to the ideas I’m struggling with or the life issues I’m trying to figure out. (same source)
it’s easy to sleep well when you get up early and work hard. (same source)
“No more yes. It’s either HELL YEAH! or no.” — Derek Sivers
I need a system to consistently track things I’m trying to optimize in my life. Today I already read N articles about excellent things I can do with my life, and usually it would end at it. Probably the first in line would be reinforcement and mental contrasting.
On a certain level we actually bump aganst the infinitely familiar thing about not knowing what I want.
460 cpm 98% d4b 14% Thu 18 Apr 2019 12:54:55 PM CEST d4b 0% Thu 18 Apr 2019 12:56:50 PM CEST d4b 11% Thu 18 Apr 2019 12:58:46 PM CEST d3b 85% Thu 18 Apr 2019 01:00:22 PM CEST ! d4b 50% Thu 18 Apr 2019 01:03:42 PM CEST d4b 17% Thu 18 Apr 2019 01:05:37 PM CEST d4b 50% Thu 18 Apr 2019 01:07:32 PM CEST d4b 61% Thu 18 Apr 2019 01:09:28 PM CEST d4b 67% Thu 18 Apr 2019 01:11:25 PM CEST d4b 50% Thu 18 Apr 2019 01:13:19 PM CEST
I’m familiar with most of this, but since I find myself googling it every time, I’ll just write it here, so I’ll know where to loo.
Scipy Lecture Notes seems like a very interesting place.
pd.concat([d, dd])
concatenates them leaving the same columns.
pd.concat([d, dd], ignore_index=True)
concatenates them leaving the same columns and having a common id
column.
pd.concat([d, dd], axis=1)
merges them horizontally, that is there will be all the columns from the input dataframes.
Apparently sns.plt
is a bug which has been fixed. Nice. Regardless, the new correct way is import matplotlib.pyplot as plt; plt....
.
dsa[ (dsa.char_count>190) & (dsa.char_count<220) ]
from IPython.core.display import display, HTML display(HTML("<style>.container { width:100% !important; }</style>"))
inside a cell (SO)
I have my semi-final dataset, today I’ll clean it, analyze, and output it to some clean.csv
file. Along with creating a script that cleans the data, for all the repetitive things I’ll have to do.
0418-analysis-of-final-dataset
.
token_count
!= pos_count
.{%raw%}’@FragrantFrog @BourgeoisViews @SimonHowell7 @Mr_Bo_Jangles_1 @Joysetruth @Caesar2207 @NancyParks8 @thetruthnessie @carmarsutra @Esjabe1 @DavidHuddo @rob22_re @lindale70139487 @anotherviv @AndyFish19 @Jules1602xx @EricaCantona7 @grand___wazoo @PollyGraph69 @CruftMs @ZaneZeleti @McCannFacts @ditsy_chick @Andreamariapre2 @barragirl49 @MancunianMEDlC @rambojambo9 @MrDelorean2 @Nadalena @LoverandomIeigh @cattywhites2 @Millsyj73 @strackers74 @may_shazzy @JBLittlemore @Tassie666 @justjulescolson @regretkay @Chinado59513358 @Louise42368296 @TypRussell @Anvil161Anvil16 @DuskatChristie @McCannCaseTweet @noseybugger1 @HilaryDean15 @DesireeLWiggin1 @M47Jakeman @crocodi11276514 @jonj85014 If it was in the Scenic several weeks after she was reported missing.Her body must have been put there.!\nWho by ?The people who hired the Scenic ! How hard is that to understand ?\nThis algorithmic software gives a probability of the identity of each contributer to the sample !\n😏’{%endraw%}
Now playing: The Godfather II Soundtrack
Add search to this blog via this simple js
To watch: Hacking democracy with theater
It was a small Army Security Agency Station in Southeast Asia that I was doing some work for. They had a shrink and he pulled me aside. In just 10 minutes or so he taught me “breathing”. It wasn’t until the internet that I learned the term mindful breathing. Subsequently I figured out it was some sort of meditation. [..]\ \ He said I was ‘wrapped to tight’. What ever that means. Those guys were all spooks, but I did not have the same clearances. I was an outsider in that regard, but I did eat with them when at their place. I guess he was bored.\ \ He took my blood pressure and then taught me to breathe. Then he took it again. I was surprised at the drop. It hooked me on mindful breathing. It was probably a parlor trick, but it worked. He improved my lifetime health. For that I thank him.\ (from reddit)
Okular can fill and save PDF forms. Zathura can open already filled forms.
convert
pdftoppm input.pdf outputname -png
\
pdftoppm input.pdf outputname -png -f {page} -singlefile
It works much better than convert
.
timeww continue
continues the last tracked thing
Even though stylistically questionable (PEP8 favours multiple multiline comments), one possibility is to use """ mycomment """
; when they are not a docstring they are ignored. (source). They have to be indented right though. And feel kinda wrong\
Additionally:
triple-quotes are a way to insert text that doesn’t do anything (I believe you could do this with regular single-quoted strings too), but they aren’t comments - the interpreter does actually execute the line (but the line doesn’t do anything). That’s why the indentation of a triple-quoted ‘comment’ is important. – Demis Jun 9 ‘15 at 18:35
This is an excellent paper about Reddit and more focused on orthoographic errors. Will read next! \ And this is an awesome annotated dataset, exactly the kind I need.
SSH can handle commands.
From the blog post above: <Enter>~.
\
SSH parses commands sent after a newline and ~
. ~.
is the one to exit.
In ~/.ssh/config
.
Host host1 HostName ssh.example.com User myuser IdentityFile ~/.ssh/id_rsa
allows to just do sh host1
.
… Still amazed by Linux and the number of such things. If I ever planned to do Linux much more professionally, I would just sit and read through all the man pages of the typical tools, systematically.
I need to make this Diensttagebuch searchable from the website, not just locally with :Ag
.
t id!=123
, works with everything.
For unicode strings, do “unicode string”.encode(‘utf-8’)
I looked again at the confusion matrix, after having made a copy. It’s quite interesting:
array([[29, 14, 28, 26], [38, 57, 36, 27], [52, 18, 58, 28], [18, 14, 18, 39]])
This is a simple SVM, using extremely simple features, and 2000 examples per class. The columns/rows are: ar, jp, lib, it, in that order. My first error is that Arabic and countries which are around Libya are quite similar in my world, linguistically, and we can see that they are confused quite often, in both directions. Italy and Japan do much better.
Still, ich finde das sehr vielversprechend, and definitely better than chance. And logically it makes sense. I’ll continue.
The list. I’ll stick to Japan, UK, SA, Brazil, India – quite between each other, geographically and linguistically. I leave the US alone, too mixed.
This is the picker. DublinCore format is in the identical order as Twitter wants!
d[d.co.isin(['uk','in'])]
leaves the rows where co==‘uk’ or co==‘in’. \
For multiple conditions, df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
\
TODO: Why is .loc used here?
Has a config file! This opened a new universe for me too.
The key needs to be added from the panel, adding it to the user folder as usual does not work.
Wann vs wenn: Wann has nothing to do with if, it’s a question asking for a point of time. Wenn is closer to “if”, but it’s also a translation for “when”.
If we can say at what point time instead of when, then we need to use wann.
Wann [=at what time/when] kommt der Bus? \ Bis wann musst du arbeiten? \ Thomas fragt Maria, wann genau sie nach Hause kommt.
On the other hand, \ Ich gehe nach Hause wenn[!= at what time! just the “when” closer to “if”] ich fertig bin.
A wann-clause is ALWAYS functioning as the object of the verb.. If I can replace the clause with a thing, then it’s wann.\ Wenn answers to “at what time”, we can basically replace it with “at 3 am”.
When I have finished work, I will call you and tell you when I will be at home.\ When I have finished work, I will call you and tell you at what point in time I will be at home.\ Wenn ich mit der Arbeit fertig bin, rufe ich dich an und sage dir, wann ich zuhause bin.\ At 3 I’ll call you and tell you this thing.
$ git reset --soft HEAD~1
resets to last commit leaving all the changes on disc, but uncommitted. \
$ git reset --hard 0ad5a7a6
returns to any previous version.
Here, and it’s excellent. I should actually learn git in a normal systematic way. Additionally, what to do when your .gitignore is ignored by git@SO.
Busy person patterns as linked on HN Testosterone seems to have different effects than the stereotypes say, and road/roid rage is actually caused by estrogen spikes.
This eggs inside avocado recipe is very interesting. Will try tomorrow. Also this avocado hummus recipe.