Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Sciences - Coggle Diagram
Data Sciences
What is methodology?
General strategy that guides processes and activities within a domain
Provides the analyst with a framework for how to proceed with the methods, processes, and heuristics used to obtain answers or results
Doesn’t depend on particular technologies or tools
Not a set of techniques or recipes
In the domain of data science, we are concerned with solving a problem or answering a question through analyzing data
Often, we construct a model of some sort to predict outcomes or discover underlying patterns > predictive & descriptive models
Goal of modeling is to gain insights from which we can formulate actions to influence future outcomes or behaviors
For our purpose, we need a data science methodology to guide us in achieving this goal
Data science project lifecycle
Another description of a data science project lifecycle:
“The lifecycle of a data science project: loops within loops”
Zumel, N., & Mount, J. Practical data science with R. Shelter Island, NY: Manning Publications, p. 7.
The right data, the right questions and the right methodology.
At Google major decisions are based on only a tiny sampling of all their data. You don’t always need a ton of data to find important insights. You need the right data. …
Most important, to squeeze insights out of Big Data, you have to ask the right questions. Just as you can’t point a telescope at the night sky and have it discover Pluto for you, you can’t download a whole bunch of data and have it discover the secrets of human nature for you.
Applying methodology to Data Science
We need a data science methodology to provide us with a guiding strategy… regardless of particular technologies & approaches
Data science methodology presented in the diagram on the next slide
Identifies similarities to recognized methodologies for data mining, notably CRISP-DM.
Updated for new considerations in data science
Cross Industry Standard Process for Data Mining
In the next few slides, we will look at a number of different representations of the Data Science methodology process
Data Science Methodology
1.Our data science methodology:
Begins with understanding the business problem (not with data).
Is highly iterative.
Does not “end.” As long as the business problem is relevant, we continue to refine the model (including data requirements and data preparation) based on feedback and then re-implement and refresh the model.
This is a general strategy for problem-solving:
It does not depend on particular technologies or tools.
It provides a “top-down” approach, but is conceptually consistent with “bottomup” approach (i.e., we still need to understand data, prepare data, have an analytic approach, build a model, evaluate it, deploy it, and refine it).