Introduction to Statistics
What is statistics?
Statistics is the science that studies data sets and their interpretation in mathematical terms, establishing methods to obtain the measures that describe them, thus referring to probability theory.
its interpretation in mathematical terms, establishing methods for obtaining the measures that describe it, thus referring to the theory of probability. It is also considered as a science of mathematical basis for decision making in the presence of uncertainty
Brief History of Statistics
Simple forms of statistics have existed since the beginning of civilization, since graphic representations and other symbols were already used on skins, rocks, wooden sticks and cave walls to count the number of people, animals or certain things.
By 3000 B.C. the Babylonians were already using small clay tablets to compile data in tables on agricultural production and goods sold or exchanged by barter.
The Egyptians.
They were analyzing the population and income data of the country long before they built the pyramids in the 31st century BC. The biblical books of Numbers and Chronicles include, in some parts, statistical works. The former contains two censuses of the population of Israel and the latter describes the material welfare of the various Jewish tribes.
Roman Empire
It was the first government to collect a large amount of data on the population, area and income of all the territories under its control. During the Middle Ages only a few exhaustive censuses were carried out in Europe. The Carolingian kings Pippin the Short and Charlemagne ordered detailed surveys of church property in 758 and 762 respectively.
Applications of Statistics.
Statistics is a powerful auxiliary of many sciences and human activities: sociology, psychology, human geography, economics, etcetera. It is an indispensable tool for decision making. It is also widely used to show the quantitative aspects of a situation.
Statistics is related to the study of processes whose outcome is more or less essential and with the purpose of obtaining conclusions in order to make reasonable decisions according to such observations.
Divisions of Statistics.
Traditionally, Statistics is divided into:
✅ Descriptive statistics.
Those methods that include the collection, presentation and characterization of the characteristics of that set.
✅ Inductive Statistics or Statistical Inference.
Those methods that make it possible to estimate a characteristic of a population or to make a decision concerning a population based only on the results of a sample.
Data Collection and Submission
Basic Concepts
✅Population.
It is the set of all the elements that satisfy certain properties and between the The set of all the elements that satisfy certain properties and between the set of all the elements that satisfy certain properties and between the set of all the elements that satisfy certain properties and between the set of all the elements that satisfy certain properties and between the set of all the elements that satisfy certain properties (screws produced by a factory in a year, tosses of a coin, etc.).
It is the set of all the elements that meet certain properties and among which the observations fall.)
The observations will fall back on.
✅ Sample.
The subset of the population that is studied and from which conclusions are drawn about the characteristics of the population. The sample must be representative, in the sense that the conclusions obtained must be useful for the total population.
✅ Individual
Each of the elements of the sample or population (people, screws, hospitals, stores) and on which the observation will fall.
✅ Variable.
Each of the traits or characteristics of the elements of a population that vary from one individual to another (salary, eye color, sex, number of children).
✅Census.
It is a study that is made directly to an entire population.
✅ Sampling.
It is a study that is done only on a sample of the population. If the sample is representative, the results obtained will be equal or very close to those that would be obtained if a census were taken.
Tables.
Data can be collected in different ways: questionnaires, sampling, censuses, observations, etc,
censuses, observations, etc. In any case, once collected, they must be summarized to facilitate their analysis. The most elementary way to summarize them is through tables. Then, based on these tables, any of the following data presentation tools can be constructed.
following data presentation tools can be constructed on the basis of these tables..
Frequency is the number of times the same value is repeated.
With the relative frequency we know the number of times that each value appeared but in relation to the other values. The sum of all relative frequencies (fr) must be equal to one (1.00). In this case, by using decimal numbers, the accuracy has been lost a little, giving a total of 0.99, which is acceptable.
Graphical Presentation of Data - Part A.
✅ Histograms Histograms are vertical bar charts in which rectangular bars are constructed at the boundaries of each class.
Histograms are vertical bar charts in which rectangular bars are constructed at the boundaries of each class. The random variable or phenomenon of interest
is displayed along the horizontal axis; the vertical axis represents the number,
proportion or percentage of observations per class interval, depending on whether the particular histogram is a frequency histogram, relative frequency histogram or percentage histogram.
✅ Polygons.
A polygon is a graph that joins with lines the midpoints at the top of histogram bars.
Graphical Presentation of Data - Part B
Bar graphs.
In the bar chart, each category is described by a bar, the length of which represents the frequency or percentage of observations that fall into a category.
To construct a bar chart, the following suggestions are made:
✅ Bars should be constructed horizontally.
check: Bars should be constructed horizontally.
✅ All bars should have the same width.
check: Spaces between bars should vary from half the width of one bar to the width of one bar.
to the width of a bar.
✅ Scales and guides are useful aids in reading a graph and should be included.
✅ The zero point or origin must be indicated.
✅ Axes must be named.
Pie Chart.
A pie chart is a circle divided into parts that represent the relative frequency or percentage of different categories.
Central Trend Measures
Definition of measures of central tendency
One of the most outstanding characteristics of the data distribution is its tendency to accumulate towards the center of the distribution. This characteristic is called Central Tendency. In other words, measures of central tendency are those that indicate the center of a data set.
Some of them are:
✅ Mean or Average
✅ Mean or Average (the middle value).
✅ Median (the central value).
✅ Mode (the most frequent value).
Average or Mean.
The mean is an average. It is calculated by adding up all the data and then dividing the total by the number of data involved. When high school students
add up all their partial grades and divide by the number of grades, they get their partial average. That average is the arithmetic mean of their partial grades.
If we have a set of data, we can calculate the mean of the set and that way we summarize all the information into a single number. All the data in the set are around that number or average. Instead of a table, we would be left with a single number. This makes it easier to analyze the data.
Median
It is the mean value of an ordered sequence of data. If there are no ties, half of the observations will be smaller and half will be larger. The median is not affected by extreme values. To calculate the median, the data must first be put in order.
Trend.
In a data series, the mode is the most repeated value, the one that appears most frequently. The mode is not affected by the occurrence of any extreme value.
Deviation Measures
Definition of Deviation Measures
Although we have measures of central tendency that tell us the center of a data set, many times these measures do not show us the whole picture. Many data may be far from the center. For example, suppose the average (mean) age in your classroom is 28 years old. It may be that most of the members are indeed around that age, but that of 40, and even others below 20.
Variance and Standard Deviation.
The variance and standard deviation take into account how the observations are distributed. The sample variance is the average of the squared differences between each of the observations in a data series and the mean.
The variance and deviation measure the average dispersion around the mean, i.e., how the larger observations fluctuate above the mean and how the smaller observations are distributed below the mean.
observations are distributed below the mean.
The Coefficient of Variation.
The Coefficient of Variation kilos, etc.) or that correspond to extremely unequal populations, it is necessary to have a measure of variability that does not depend on the units or dispersions of variables that
It is necessary to have a measure of variability that does not depend on the units or dispersions of variables corresponding to ratio scales.
One way to construct a measure of variability that meets the above requirements is the so-called coefficient of variation.
The bars in the denominator represent the absolute value, i.e. they indicate that the unit of measurement of the mean should be disregarded). The lower the coefficient of variation, the more homogeneous the distribution of the measured variable.