Chapter 14: Conclusion

Fundamental Concepts of Data Science

General ways of thinking data-analytically, which help us to gather appropriate data and consider appropriate methods. The concepts include the data mining process, the collection of different high-level data science tasks, as well as principles such as the following.

General concepts for actually extracting knowledge from data, which undergird the vast array of data science techniques. These include concepts such as the following.

General concepts about how data science fits in the organization and the competitive landscape, including ways to attract, structure, and nurture data science teams, ways for thinking about how data science leads to competitive advantage, ways that competitive advantage can be sustained, and tactical principles for doing well with data science projects.

Data should be considered an asset, and therefore we should think carefully about what investments we should make to get the best leverage from our asset

The expected value framework can help us to structure business problems so we can see the component data mining problems as well as the connective tissue of costs, benefits, and constraints imposed by the business environment

The data science team should keep in mind the problem to be solved and the use scenario throughout the data mining process

Generalization and overfitting: if we look too hard at the data, we will find patterns; we want patterns that generalize to data we have not yet seen

Applying data science to a well-structured problem versus exploratory data mining require different levels of effort in different stages of the data mining process

Fitting a numeric function model to data by choosing an objective and finding a set of parameters based on that objective

Controlling complexity is necessary to find a good trade-off between generalization and overfitting

Identifying informative attributes — those that correlate with or give us information about an unknown quantity of interest

Calculating similarity between objects described by data

Applying Fundamental Concepts to a New Problem: Mining Mobile Device Data

Recent shift to mobile devices

Many companies still trying to figure out how to reach consumers on their desktops, and now many scrambling to reach consumers on their mobile devices

Important: mobile devices provide information about a consumers location

Potential issue: how to reach consumers on their various devices

Privacy, Ethics, and Mining Data about Individuals

Mining data, especially about individuals, raises important ethical questions

Generally, the more fine-grained data you collect on individuals, the better you can predict things about them that are important for business decision-making. This seeming direct relationship between reduced privacy and increased business performance elicits strong feelings from both the privacy and the business perspectives (sometimes within the same person)