A scraping method resilient to IP blocking

Disclaimer: use ideas here under your own responsibility.

If you are into web scraping you probably know that websites don’t like automated bots that pay a visit just to gather information. They have set up systems which can figure out that your program is not an actual person and, after a bunch of requests coming from your script, you usually get the dreadful HTTP 429 Too Many Requests Error. This message means that your IP address has been blocked from querying the website for a certain amount of time. Your bot can go home and cry.

Google’s search engine is…

Revisiting the Central Limit Theorem of Probability with combinatorics

A couple of weeks ago, my younger sister (still in high school) came to me saying that she must choose a topic for a project in her math class. After thinking about it for a while, I suggested her to take a look at the sum of n dice rolls and see if some interesting pattern arose when selecting different values for n (1,2,5,10…). She liked the idea, but of course, she wasn’t up for repeating the experiment 10.000 times for each set of dice manually by herself. In the end, I found myself writing some code to help her…

Juan Luis Ruiz-Tagle

Im a Data Scientist interested about almost everything

