Ans: When utilizing a dataset in data science or machine learning methods, not all variables may be required or helpful for building a model. To enhance the efficiency of our model, smarter feature selection approaches are necessary to prevent redundant models. The three major approaches for feature selection are as follows:
Ans: The discrepancy between the anticipated and actual value is referred to as an error. Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) are the most often used methods for calculating mistakes in data science (RMSE). At the same time, residual is the difference between a set of observed values and their arithmetic mean. A mistake is usually unobservable; however, a residual error can be seen on a graph. Error reflects the difference between observed data and the actual population. On the other hand, a residual indicates how the observed data differs from the sample population data.
Ans: A p-value can assist you to evaluate the strength of your results while doing a hypothesis test in statistics. The p-value ranges from 0 to 1. The value will denote the strength of the results. The claim under consideration is known as the Null Hypothesis.
A low p-value (0.05) demonstrates strength against the null hypothesis, implying that we can reject it. A high p-value (0.05) implies that the null hypothesis is strong, suggesting that we accept the null hypothesis. A p-value of 0.05 shows that the hypothesis may go either way. To put it another way, high P values indicate that your data is most likely a genuine null. Low P values indicate that your data is unlikely to include a genuine null.
Ans: In the banking industry, lending loans are the primary source of income for banks. However, if the payback rate is poor, there is a possibility of significant losses rather than gains. Giving loans to clients is thus a gamble because banks cannot afford to lose good customers while they cannot afford to attract poor ones. This is a typical illustration of how false positive and false negative scenarios are equally important.
Ans: One of the most common jobs in statistics and machine learning is fitting a model to a collection of training data so that it can make trustworthy predictions on untrained data. Ans: One of the most common jobs in statistics and machine learning is fitting a model to a collection of training data so that it can make trustworthy predictions on untrained data.
Overfitting occurs when a statistical model describes random error or noise rather than the underlying relationship. Overfitting happens when a model is overly complicated, such as when there are too many parameters compared to the amount of data. Overfitted models have poor prediction performance because they overreact to slight changes in the training data.
When a statistical model or machine learning method fails to capture the underlying trend of the data, this is referred to as underfitting. Fitting a linear model to non-linear data, for example, would result in underfitting. A model like this would also have poor prediction performance.
|