Chapter 14: Conclusion
Fundamental Concepts of Data Science
General ways of thinking data-analytically, which help us to gather appropriate data and consider appropriate methods. The concepts include the data mining process, the collection of different high-level data science tasks, as well as principles such as the following.
General concepts for actually extracting knowledge from data, which undergird the vast array of data science techniques. These include concepts such as the following.
General concepts about how data science fits in the organization and the competitive landscape, including ways to attract, structure, and nurture data science teams, ways for thinking about how data science leads to competitive advantage, ways that competitive advantage can be sustained, and tactical principles for doing well with data science projects.
Data should be considered an asset, and therefore we should think carefully about what investments we should make to get the best leverage from our asset
The expected value framework can help us to structure business problems so we can see the component data mining problems as well as the connective tissue of costs, benefits, and constraints imposed by the business environment
The data science team should keep in mind the problem to be solved and the use scenario throughout the data mining process
Generalization and overfitting: if we look too hard at the data, we will find patterns; we want patterns that generalize to data we have not yet seen
Applying data science to a well-structured problem versus exploratory data mining require different levels of effort in different stages of the data mining process
Fitting a numeric function model to data by choosing an objective and finding a set of parameters based on that objective
Controlling complexity is necessary to find a good trade-off between generalization and overfitting
Identifying informative attributes — those that correlate with or give us information about an unknown quantity of interest
Calculating similarity between objects described by data
Applying Fundamental Concepts to a New Problem: Mining Mobile Device Data
Recent shift to mobile devices
Many companies still trying to figure out how to reach consumers on their desktops, and now many scrambling to reach consumers on their mobile devices
Important: mobile devices provide information about a consumers location
Potential issue: how to reach consumers on their various devices
Privacy, Ethics, and Mining Data about Individuals
Mining data, especially about individuals, raises important ethical questions
Generally, the more fine-grained data you collect on individuals, the better you can predict things about them that are important for business decision-making. This seeming direct relationship between reduced privacy and increased business performance elicits strong feelings from both the privacy and the business perspectives (sometimes within the same person)