Please enable JavaScript.
Coggle requires JavaScript to display documents.
Provost - Chapter 14 (Humans in the Loop (Humans are better at identifying…
Provost - Chapter 14
Humans in the Loop
Humans are better at identifying small sets of relevant aspects of the world from which to gather data in support of particular tasks.
Computers are better at sifting through a massive collection of data, including a huge # of relevant variables, & quantifying the variables' relevance to predicting a target.
Data science involves the judicious integration of human knowledge & computer based techniques to achieve what neither of them could have done alone.
Human involvement also adds creativity, knowledge & common sense that adds value in selecting the right data to mine.
-
Leak: The data needed was removed - (the location of fraudulent calls). The data was scrubbed to improve the quality of the target variable.
What data is, is based on our interpretation, which can change through the data mining process.
Fundamental Concepts:
- General concepts about how data science fits in an organization & the competitive landscape
Includes ways to attract structure, & nurture data science teams
-
-
-
- General ways of thinking data-analytically
-
Concepts include data mining process, the collection of different high-level data science tasks
Data should be considered an asset (Think carefully about what investments we should make to get best leverage from asset)
Expected value framework can help us structure business problems so we can see the component data mining problems & connective tissue of costs, benefits & constraints of business environment.
Generalization & overfitting: by looking at data you can fin patterns & want those patterns to generalize to data we have not seen yet.
Applying data science to a well-structured problem versus exploratory data mining require different levels of effort in different stages of the process.
- General concepts for actually extracting knowledge from data, which undergird the vast array of data science techniques.
Identifying informative attributes (those that correlate with / give us information about an unknown quantity of interest.
Fitting a numeric function model to data by choosing an objective & finding a set parameters based on that objective.
-
-
-
-
-
Privacy, Ethics & Mining Data about Individuals
Tensions between privacy & improving business decisions is intense because there seems to be a direct relationship between the increased use of personal information & increased effectiveness of associated business decisions
The more fine-grained data you collect on people the better you can predict things about them that are important for business decions-making.