• +91 9723535972
  • info@interviewmaterial.com

Machine Learning Interview Questions and Answers

Machine Learning Interview Questions and Answers

Question - 91 : - What ensemble technique is used by gradient boosting trees?

Answer - 91 : - Boosting is the technique used by GBM.

Question - 92 : - If we have a high bias error what does it mean? How to treat it?

Answer - 92 : -

High bias error means that that model we are using is ignoring all the important trends in the model and the model is underfitting.

To reduce underfitting:

  • We need to increase the complexity of the model
  • Number of features need to be increased
Sometimes it also gives the impression that the data is noisy. Hence noise from data should be removed so that most important signals are found by the model to make effective predictions.

Question - 93 : - Which type of sampling is better for a classification model and why?

Answer - 93 : - Stratified sampling is better in case of classification problems because it takes into account the balance of classes in train and test sets. The proportion of classes is maintained and hence the model performs better. In case of random sampling of data, the data is divided into two parts without taking into consideration the balance classes in the train and test sets. Hence some classes might be present only in tarin sets or validation sets. Hence the results of the resulting model are poor in this case.

Question - 94 : - When can be a categorical value treated as a continuous variable and what effect does it have when done so?

Answer - 94 : - A categorical predictor can be treated as a continuous one when the nature of data points it represents is ordinal. If the predictor variable is having ordinal data then it can be treated as continuous and its inclusion in the model increases the performance of the model.

Question - 95 : - What is the role of maximum likelihood in logistic regression.

Answer - 95 : - Maximum likelihood equation helps in estimation of most probable values of the estimator’s predictor variable coefficients which produces results which are the most likely or most probable and are quite close to the truth values.

Question - 96 : - Which distance do we measure in the case of KNN?

Answer - 96 : - The hamming distance is measured in case of KNN for the determination of nearest neighbours. Kmeans uses euclidean distance.

Question - 97 : - What is a pipeline?

Answer - 97 : - A pipeline is a sophisticated way of writing software such that each intended action while building a model can be serialized and the process calls the individual functions for the individual tasks. The tasks are carried out in sequence for a given sequence of data points and the entire process can be run onto n threads by use of composite estimators in scikit learn.

Question - 98 : - Which sampling technique is most suitable when working with time-series data?

Answer - 98 : - We can use a custom iterative sampling such that we continuously add samples to the train set. We only should keep in mind that the sample used for validation should be added to the next train sets and a new sample is used for validation.

Question - 99 : - What are the benefits of pruning?

Answer - 99 : -

Pruning helps in the following:

  • Reduces overfitting
  • Shortens the size of the tree
  • Reduces complexity of the model
  • Increases bias

Question - 100 : - What is normal distribution?

Answer - 100 : -

The distribution having the below properties is called normal distribution. 

  • The mean, mode and median are all equal.
  • The curve is symmetric at the center (i.e. around the mean, μ).
  • Exactly half of the values are to the left of center and exactly half the values are to the right.
  • The total area under the curve is 1.


NCERT Solutions

 

Share your email for latest updates

Name:
Email:

Our partners