Please enable JavaScript.
Coggle requires JavaScript to display documents.
Ch. 3, 5 & 6 (Ch. 6 Define Prediction Target (Different subject matter…
Ch. 3, 5 & 6
Ch. 6 Define Prediction Target
Different subject matter experts may disagree on these values, so we will proceed with the assumption that only status 1 and 2 are indicative of good loans (logged as loan_is_bad: FALSE) and that status 3–7 are indicative of bad loans (logged as loan_is_bad: TRUE)
the value for loan_is_bad is FALSE, which means that it was a good loan. Given that our aim is to predict which loans are good rather than which ones are bad, we could have changed the target to loan_is_good with the first row containing a TRUE value
The machine learning algorithms will soon determine that thanks to the chivalrous norms of society in the time of the Titanic, the most important relationship to survival comes from the feature named sex
Learning to see the potential prediction targets of the world is a critically important skill to develop, as a well-founded understanding of prediction targets may enable the inspiration to use machine learning in new and ingenious ways
The first kind is classification, which predicts the category to which a new case belongs
If simplified into quantitative categories (sometimes called buckets), regression problems can be made into classification problems, as was true for the diabetes patient readmissions example
Alternatively, we can use machine learning to conduct regression, that is, to predict the target numeric values
It is possible to simplify the choice to either red or green but doing so would overlook crucial information that would inform more steady and smooth braking behavior
here we opted to categorize the outcome of readmission with binary yes/no values. The alternative might have been to consider the number of days between release and readmission as a regression target in order to predict what leads to immediate versus more delayed readmission (if readmission at all)
Ch. 5 Decide on Unit of Analysis
For a page-level analysis, you analyze all available web pages and examine the correlations between people visiting a given page
A unit (of analysis) is the what, who, where, and when of our analysis
Machine learning sometimes involves an intricate understanding of a problem’s real-world context
For each unit of analysis, there can be numerous outcomes that we might want to predict.
if the data scientist is trying to understand what drives admission satisfaction, the admission is the unit of analysis
ou lean on your subject matter expert and work with him or her to share knowledge of the problem context—you will likely be able to figure it out together
It is a good idea to factor in that each application is likely to be different, even when they come from the same person
Ch. 3 Specify Business Problem
By carefully specifying the problem, we can evaluate our options in a precise manner: should we proceed to address this problem, or is there a better problem in which to invest our time and resources?
Once we ensure that we have the requisite data, the project must then be described in a brief to be shared with stakeholders
Immediately, you may have noticed that the statement focuses on the lack of models, rather than on the business goals of the hedge fund
It is stated in the language of business (investment amount, interest rates, loss percentage), it specifies actions that should result (screen out riskiest borrowers, invest on the site, use model to automatically choose which loans to fund)
Even here, we can see that there are details missing, but most data science projects are processes of discovery