Please enable JavaScript.
Coggle requires JavaScript to display documents.
Course 1: Data Scientist's Toolbox (Types of Data Science Questions,…
Course 1: Data Scientist's Toolbox
Command Line Interface
Directory
just another name for a folder
directories in my computer are organized like a tree
directory at the very top is referred to as: root directory. shorthand notation for this: /
home directory is represented by: tilda character: ~
text
Getting to the command Line interface on Mac
command, space: brings up spotlight search
search for: terminal
when opening the command line interface, it opens in the home directory
Common Commands
pwd
stands for print working directory
typing this command in the command line shows the path to the current working directory I'm working in
ls
lists files and folders in the current working directory
ls -a
it lists hidden and unhidden files in the working directory
hidden files have period in front of the file name
ls -al
lists details for hidden and unhidden files and folders
cd
stands for change directory
if typed with no argument, it will take me to my home directory
takes as an argument the path where I want to go to to get to a new directory
cd..
changes the directory up one level from the current directory
mkdir
stands for "make directory"
takes as an argument the name of directory you are creating
touch
creates an empty file called test_file
cp
stands for "copy"
takes as its first argument a file, and as its second argument the path to where you want the file to be copied
can also be used for copying the contents of entire directories but must use the -r flag
rm
stands for "remove"
takes the file you wish to remove as its argument
rm - r
will recursively remove all files inside a directory
mv
stands for "move"
use to move files between directories
can also be used to rename files
first argument is the old name, 2nd argument is the new name of file
echo
will print whatever arguments I provide
clear
clears the terminal
Intro to Git
What is Version Control?
is a system that records changes to a file or set of files over time so that you can recall specific versions later
What is Git?
A free and open source version control system designed to handle everything from small to very large projects with speed and efficiency
operated from the command line
developed by Linux people
interact with it using the terminal application
Intro to GitHub
a web based hosting service for software development projects that use Git revision control system
allows users to "push" and "pull" local repositories onto the remote repositories on the web
Basic Markdown
Markdown
Syntax
##
creates a secondary heading with bold text
###
creates a tirtiary heading, with a slightly smaller heading size
*
creates a bulleted list
Types of Data Science Questions
1) Descriptive
only attempts to describe what is happening in the data set
isn't trying to generalize
2) Exploratory
goal is to find relationships I didn't know about
good for finding connections
good for defining future settings
usually not the final say
should not be used alone for generalizing or predicting
3) Inference
uses a small sample of data to generalize to a larger population
common goal of statistical models
estimates the quantity you are interested in and the uncertainty around that estimate
4) Predictive
goal is to use data objects to predict the values of some other objects
if X predicts Y, it doesn't mean that X causes Y
5) Causal
goal is to find out what happens to one variable when you make a change another variable
randomized trials required to get the cause
6) Mechanistic
goal is to understand the exact changes in variables that lead to changes in other variables for individual objects
Prediction Key Quantities
Sensitivity
the probability you have a positive given you have the disease (a true positive result)
Specificity
the probability of getting a negative test given you DON't have the disease ( a true negative result)
Positive Predictive Value
the probability you have the disease given you got a positive test
Negative Predictive Value
the probability you don't have the disease given you got a negative test
Accuracy
the probability of getting a correct result