1 Probabilities

Given a Doctor’s Patient Record, where each cell represents the number of patients with symptom and disease :

Table 1 : Joint observations of symptoms (rows) and disease (columns). Questions from the doctor :
Q1 : What is the joint probability that a patient has the symptom and the disease ? Ans :

Q2 : What is the probability that a patient has the symptom ? Ans :

Q3 : What is the probability that a patient has the disease ? Ans :

Q4 : What is the conditional probability that a patient has the symptom given they have the disease ? Ans :

*Q5 : What is the conditional probability that a patient has the disease given that they have the symptom ? Ans :


2 Maximum Likelihood

Q1. We flipped a coin 100 times. Given that there were 55 heads, use maximum likelihood estimation to find the probability of heads on a single toss. Ans : Binomial Distribution

Differentiate and Solve for

Set derivative to zero and solve:

Multiply both sides by :


Q2 : Let be independent and identically distributed (i.i.d.) Poisson () distributed. Find the maximum likelihood estimator for , . Calculate an estimate using the estimator when : Ans : PMF for Poisson

Likelihood Function

And for the log-likelihood:

Differentiate and solve for

For the given data :


Logistic Regression

A company wants to predict whether a customer will purchase a product based on their income and age. The logistic regression model is given by :

where :

  • if the customer buys the product, otherwise.
  • = Age of the customer.
  • = Monthly income of the customer (in £1000s).
  • , , .

Question 1 : Compute the probability that a customer aged 30 with a monthly income of £5000 will purchase the product. Ans :


Question 2 : Interpret the meaning of in the context of this problem. Ans : A one-year increase in age increases the log-odds of purchasing by 0.05, meaning age has a positive effect on purchase probability.

Question 3 : If a customer’s probability of purchasing is 0.7, should the company predict that they will buy the product? Justify your answer. Ans : : Since 0.7 > 0.5, the model predicts that the customer will buy the product.

Question 4 : Briefly explain one advantage and one limitation of using logistic regression for this problem. Ans : Advantage : Interpretability - coefficients show how features affect probability. Limitation : Assumes a linear relationship in log-odds, which may not always hold.

Naive Bayes - Disease Diagnosis

A medical test is used to diagnose a rare disease based on two symptoms: Fever and Cough. The model uses a Naive Bayes classifier. The probabilities of symptoms appearing given the disease status are:

Additionally, we know that only 5% of the population has the disease (i.e., ).

Question 1 : Compute the probability that a patient with both Fever and Cough has the disease, and classify the patient as having or not having the disease. Ans :

For no disease :

Posterior probability :

Since , the model does not classify the patient as having the disease.

Question 2 : What is the ’naive Bayes’ assumption and why is it important in Naive Bayes classification? Ans : The assumption simplifies computation by treating symptoms as independent given the disease. This makes the model fast and scalable, though symptoms in reality may be correlated.