Abijith B Programming and random thoughts

Cool way to scrape?

For scraping my workflow was to make a get request to obtain the HTML and then use BeautifulSoup to scrape. Is there any way around? a way to get structured data?

I always use ytfzf to search youtube videos, it is very convenient because I can skip the recommendation on the homepage on youtube, which is a huge distraction. One part I was missing was the comment section. I was looking if there is a way to obtain comments in the terminal. In my quest I found this. I was curious how this package was implemented. I found this cool way of doing requests.

If we want to make several requests to the same host we can use requests.Session(), It will use the same TCP connection. As a stock market enthusiast, I was looking if this can be used to get a live stock price from the NSE. Well, we can! There is a reload button and when it is pressed, a hell lot of requests are sent to the server. This URL seems to return the data that we are looking for https://www.nseindia.com/api/quote-equity?symbol=TCS. Without a prior connection to NSE, this might give you an error. That’s where the session helps.

import json

import requests

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '\
             'AppleWebKit/537.36 (KHTML, like Gecko) '\
             'Chrome/79.0.3945.130 Safari/537.36'

BASE_URL = 'https://www.nseindia.com/'

session = requests.Session()
session.headers['User-Agent'] = USER_AGENT
# trying first connection
r = session.get(BASE_URL)


def get_price(ticker):
    session.headers['referer'] = f'{BASE_URL}get-quotes/equity?symbol={ticker}'
    payload = {'symbol': ticker}
    data = session.get(f"{BASE_URL}api/quote-equity", params=payload)
    data = data.text.strip()
    return json.loads(data)

get_price("TCS")

I was looking for a cli-watchlist, now I got the main ingredient.

Broad framing

Let’s start with a question, would you accept a gamble on the toss of a coin in which you could lose $100 or win $200.

I wouldn’t go for the gamble, the loss of $100 weighs more than winning $200 for me. I will take the gamble if it can be repeated 100 times. The question is taken from Thinking fast and slow, as always my judgment was correctly predicted in the book. I’m not surprised, as most of my judgments are predicted accurately in the book.

There is no reason to reject the gamble if it can be repeated 100 times. The expected return is $5000 with a very low probability of losing any money(1/2300). The one who rejects the single gamble is also prone to reject similar other gambles too. Because each gamble is evaluated separately and rejected, like the decision we made at the beginning. This is narrow framing, the decisions are taken separately.

Consider three tosses of the above gamble, let’s check the possible outcome and probabilities

  • 12.5% loss $600
  • 37.5% win $0
  • 37.5% win $300
  • 12.5% win $600

I will take the deal as there is only 12.5% chance of losing something. But the human are naturally narrow framer and If I was encountered with gamble independtly at different occussion I will turn down. We need paper and pen to calculate probabilities, our monkey brains are not good at this. We can reduce the pain of lossing the gamble by having a broader frame, for favourable gamble the probability of losing reduces rapidly.

The author have a sermon for you If you still haven’t convinced for taking a favourable gamble.

“I sympathize with your aversion to losing any gamble, but it is costing you a lot of money. Please consider this question: Are you on your deathbed? Is this the last offer of a small favorable gamble that you will ever consider? Of course, you are unlikely to be offered exactly this gamble again, but you will have many opportunities to consider attractive gambles with stakes that are very small relative to your wealth. You will do yourself a large financial favor if you are able to see each of these gambles as part of a bundle of small gambles and rehearse the mantra that will get you significantly closer to economic rationality: you win a few, you lose a few. The main purpose of the mantra is to control your emotional response when you do lose. If you can trust it to be effective, you should remind yourself of it when deciding whether or not to accept a small risk with positive expected value. Remember these qualifications

  • It works when the gambles are genuinely independent of each other; it does not apply to multiple investments in the same industry, which would all go bad together.
  • It works only when the possible loss does not cause you to worry about your total wealth. If you would take the loss as significant bad news about your economic future, watch it!
  • It should not be applied to long shots, where the probability of winning is very small for each bet.

If you have the emotional discipline that this rule requires, you will never consider a small gamble in isolation or be loss averse for a small gamble until you are actually on your deathbed and not even then.”

Recommendation Engines - Part 1

Introduction

There is a mandatory Final Year Project in our curriculam. Our team consist of 4 people and we had very interesting ideas, but too sci-fi (Realtime stock market analysis, Workout assistance with computer vision). Finally we ended up with a project to explore Book Recommendations.

In this blog we will explore basic idea about recommendation engine.

Recommendation Engine - 101

Recommendation engine are every where, Netflix, Spotify, Youtube (recommends an unrelated video posted 8 years ago, atleast for me). It is very crucial tool to enhance the user experience. Let’s explore the basics of recommendation engine.

Content Based Filtering

Content Based Filtering uses the metadata of items that user already likes . For example if you like movie The Dark Knight the metadata includes {"Director": "Christopher Nolan", "Genre": ["DC", "Comic", "Superhero"]} So we can use this data to recommend more movies of Christopher Nolan, movies based of DC, comic, superhero etc. Also we can use the user metadata such as age, gender, location.

Collaborative Filtering

Collaborative filtering uses similarities between users and items simultaneously to provide recommendations. For example, If you and your friend has similar taste in movies, he can recommend movies he likes, most probably you might also like that movie. This can further divide into two.

  • Explicit Feedback : User provide positive or negative feedback. Like/Dislike in youtube, review stars in amazon.
  • Implicit Feedback : Most of the users might not like or dislike in youtube, even if they are interseted or not interseted. If they skip the video after few minutes, this could be a negative feedback, like that we can infer the feedback from other attributes. We can look in more details in upcoming blogs.

Reference

  1. DeepLearning@Institut Polytechnique de Paris
  2. CS246: Mining Massive Data Sets@stanford