1 Probabilities
Given a Doctor’s Patient Record, where each cell represents the number of patients with symptom and disease :
Table 1 : Joint observations of symptoms (rows) and disease (columns).
Questions from the doctor :
Q1 : What is the joint probability that a patient has the symptom and the disease ?
Ans :
Q2 : What is the probability that a patient has the symptom ? Ans :
Q3 : What is the probability that a patient has the disease ? Ans :
Q4 : What is the conditional probability that a patient has the symptom given they have the disease ? Ans :
*Q5 : What is the conditional probability that a patient has the disease given that they have the symptom ? Ans :
2 Maximum Likelihood
Q1. We flipped a coin 100 times. Given that there were 55 heads, use maximum likelihood estimation to find the probability of heads on a single toss. Ans : Binomial Distribution
Differentiate and Solve for
Set derivative to zero and solve:
Multiply both sides by :
Q2 : Let be independent and identically distributed (i.i.d.) Poisson () distributed. Find the maximum likelihood estimator for , . Calculate an estimate using the estimator when : Ans : PMF for Poisson
Likelihood Function
And for the log-likelihood:
Differentiate and solve for
For the given data :
Logistic Regression
A company wants to predict whether a customer will purchase a product based on their income and age. The logistic regression model is given by :
where :
- if the customer buys the product, otherwise.
- = Age of the customer.
- = Monthly income of the customer (in £1000s).
- , , .
Question 1 : Compute the probability that a customer aged 30 with a monthly income of £5000 will purchase the product. Ans :
Question 2 : Interpret the meaning of in the context of this problem.
Ans : A one-year increase in age increases the log-odds of purchasing by 0.05, meaning age has a positive effect on purchase probability.
Question 3 : If a customer’s probability of purchasing is 0.7, should the company predict
that they will buy the product? Justify your answer.
Ans : : Since 0.7 > 0.5, the model predicts that the customer will buy the product.
Question 4 : Briefly explain one advantage and one limitation of using logistic regression
for this problem.
Ans : Advantage : Interpretability - coefficients show how features affect probability.
Limitation : Assumes a linear relationship in log-odds, which may not always hold.
Naive Bayes - Disease Diagnosis
A medical test is used to diagnose a rare disease based on two symptoms: Fever and Cough. The model uses a Naive Bayes classifier. The probabilities of symptoms appearing given the disease status are:
Additionally, we know that only 5% of the population has the disease (i.e., ).
Question 1 : Compute the probability that a patient with both Fever and Cough has the disease, and classify the patient as having or not having the disease. Ans :
For no disease :
Posterior probability :
Since , the model does not classify the patient as having the disease.
Question 2 : What is the ’naive Bayes’ assumption and why is it important in Naive Bayes classification? Ans : The assumption simplifies computation by treating symptoms as independent given the disease. This makes the model fast and scalable, though symptoms in reality may be correlated.