[转载]统计学中的基本概念(英文解释)
2017-11-29 22:11阅读:
The independent variable: It is the factor that is measured,
manipulated or selected by the experimenter to determine its
relationship to an observed phenomenon. It is a stimulus variable
or input operates within a person or within his environment to
effect behavior.Independent variable
may be called factor
and its variation is called levels.
The dependent variable: The dependent variable is a response
variable or output. The dependent variable is the factor that is
observed and measured to determine the effect of the independent
variable; it is the factor that appears, disappears, or varies as
the researcher introduces, removes, or varies the independent
variables.
Moderate variable: It is the factor that is measured, manipulated
or selected by the experimenter to discover whether it modifies the
relationship of the independent variable to an observed phenomenon.
The term moderate variable describes a special type of independent
variable, a
secondary independent variable selected to determine if it affects
the relationship between the study’s primary independent variable
and its dependent variable.
Control variable: Control variables are factors controlled by the
experimenter to cancel out or neutralized any effect they might
otherwise on the observed phenomena. A single study can not examine
all of the variables in a situation (situational variable) or in a
person (dispositional variable); some must be neutralized to
guarantee that they will not exert differential or moderating
effects on the relationship between the independent variables and
dependent variables.
Intervening variable: An intervening variable is the factor that
theoretically effects observed phenomena but can not be seen,
measured, or manipulated; its effects must be inferred from the
effects of the independent and moderate variable on the observed
phenomena.
Consider the hypothesis Among students of the same age and
intelligence, skill performance is directly related to the number
of practice trials, the relationship being particularly strong
among boys, but also holding, though less directly, among girls’.
this hypothesis that indicates that practice increases learning,
involve several variables. Independent variable: number of practice
trail Dependent variable: skill performance Control variable: age,
intelligence Moderate variable: gender Intervening variable:
learning
Causes relationship effects Independent Variables Moderate
Intervening Dependent variables variables variables Control
variables Steps in data processing Raw data Editing Coding Analysis
Interview Developing a code book Developing a Questionnaires frame
of analysis observation Pre-testing the code book Analysis
Interview guid Sec.sources Coding the data Computer Verifying the
Manual coded data Data
There are two types of data. Qualitative Data and Quantitative Data
Qualitative Data is further divided into Nominal data and ordinal
data.
Nominal data As it obvious from the name nominal means “to give
names”. In social sciences, the qualitative cannot be measured or
simplified. To calculate this type of data, it is named and
categorized. This named or categorized data is called nominal
data.
Ordinal data The friendly term ordinal gives a meaning of “ordered
or arranged”. This data is arranged into orders, categorizing
individuals as more than or less than one another. After
nominalising the data into categories, it is then ordered or
arranged to get the desired result. Although ordinal measurement
may require more difficult processes but it gives more informative,
précised data.
Interval data It is the data or score as units of equal appearing
magnitude. The interval data can be added subtracted but cannot be
multiplied or divided.
Ratio Data It has a true zero value, that is, a point that
represents the complete absence of the measured characteristics,
ratios are comparable at different points. It is much more
frequently used in the physical sciences than in behavioral
sciences; For example 9 ohms indicates three times the resistance
of 3 ohms, while 6 ohms stands in the same ratio to 2 ohms.
For example; Listed below are the scores of a group of students on
a mid semester English test.
64,61,56,51,52,34,64,31,31,31,59,61,34,59,51,38,38, 38,36,36.
How many students received a score of 36? Did most of the
students receive a score above 50? Unordered data is difficult to
tell, to make any sense out of this data.
We must put it into some sort of order. One of the most common ways
to do this is to prepare a frequency distribution. This is done by
listing, in rank order from high to low, with tallies, to indicate
the number of subjects receiving each score. Often score in
distribution are grouped into intervals. This results in grouped
frequency distribution. Example of a frequency distribution Raw
score frequency 64 2 61 2 59 2 56 1 52 1 51 2 38 3 36 2 34 2 31 3
-------- N = 20 Table of a Grouped Frequency Distribution Raw Score
Intervals of Five Frequencies 60 -- 64 4 55 – 59 3 50 – 54 3 45 –
49 0 40 – 44 0 35 – 39 5 30 – 34 5 ------------- N=20 Frequency
Polygon
Average It enables a researcher to summarize the data in a
frequency distribution with single number. It is of three kinds;
Mode Median, Mean.
The mode The mode is the most frequent score in a distribution. The
score attained by more students than any other score e.g. in a
distribution, 25, 20,19,17,16,16,13,12. The mode is 16. What about
this distribution? 25, 20, 19, 19, 17, 16, 16, 12, and 11. This
distribution has two modes, 19 and 16. Hence it is called bimodal
distribution. This mode does not tell us very much about a
distribution. However, it is not often used in educational
research.
Median (the mid point) The median is the point below and above 50%
of the score in a distribution fall. This is a distribution 1, 2,
3, 4, 5. The median is 3. If the numbers are even in a distribution
then the median is the point halfway between the two middle most
scores. In a distribution; 2, 4,6,8,10,12. The median is 7.
The Mean It is determined by adding up all of the scores and then
dividing this sum by the total number of scores. X where sum of X
represents any raw score value, n represents the total number of
scores and X represents the mean. All the averages give us ample
information data by a single value.
But sometimes, the researcher cannot get the required results from
the data by using average. Then there is a need for measures
researchers can use to describe the spread or variability that
exists within a distribution because average tells us the total
behavior of data by single unit that sometimes leads to confusion
and ambiguity. To calculate the position of the data or deviation
there are certain ways.
Measures of Data Variability: Knowing central tendencies (mean,
median, and mode) isn’t enough. Also need a method for determining
how close the data is clustered around its center point(s). The
most typical measures of data variability: – Range, – Variance, and
– Standard Deviation. Range: • Simplest measure of variability. •
Calculated by subtracting the smallest measurement from the largest
measurement. • It is not a good measure of variability. i.e. if two
ranges are same, it does not mean that the spread is same.
Variance: • It is the sum of the square of the deviation from the
mean divided by (n-1) for a sample and is denoted by s2. Similarly,
the sum of the square of the deviation from the mean divided by N
for the population and is denoted by s2. Note: Deviations are
squared to remove effects of negative differences.
Standard Deviation: • While variance does not provide a useful
metric (i.e. “units squared”), taking the positive square root of
the variance provides a metric which is the same as the data itself
(i.e. “units”). – Sample Standard Deviation - s – Population
Standard Deviation - s Application of mean & standard deviation
to observe the behavior ofthe data •
Data can be standardized using mean & standard deviation. Thus,
for a single data set, variability can be discussed in terms of how
many members of the data set fall within one, two, three, or more
standard deviations of the mean.
Standard Score: It uses a common scale to indicate how an
individual compare to other individual in group. These scores are
particularly helpful in comparing an individual’s relative
position. The two standards score are the most frequently used in
educational research, o 1. 1 Z – Score 2. T- Score Z – Score The
simplest form of standard score is the Z – score. It expresses how
far a raw score is from the mean in standard deviation units.
A big advantage of Z – Score is that they allow raw scores on
different tests to be compared. Researchers use a formula to
convert a raw score into z-score Z score = (raw score – mean)/
Standard deviation
For example a student received raw scores of 60 on a biology test
and 80 on a chemistry test. A naïve observer might be inclined to
infer that the student was doing better in chemistry than in
biology. But this might be unwise, for how well the student is
comparatively cannot be determined until we know the mean and
standard deviation for each distribution of score. Let us suppose
the mean is 50 in biology and 90 in chemistry. Also assume the
standard deviation on biology deviation is 5 and on chemistry is
10. What does this tell us? The comparison of raw score and Z score
on two tests. Test score Raw score Mean SD Z Score % rank Bio 60 50
5 2 98 Chemistry 80 90 10 -1 16
Probability and Z score.
Probability: It refers to the likely hood of an event occurring and
a percentage stated in decimal form. For example if there is a
probability that an event will occur 25 percent of the time, this
event can be said to have a probability of .25.
Hypothesis: There are two kinds of hypothesis; one is the
predictive outcome of the study called research hypothesis where as
the null hypothesis is the assumption that there is no relationship
between the variables or in the population..
Co relational analysis: It shows the existing relationship between
the variables, with no manipulation of variables. It is also used
to analyze data containing two variables as well as examine the
reliability and validity of the data collection procedure.
Types of correlation: Highly positive; (When the variables are
directly proportional to each other) Low correlation; (When there
is no correlation between the variables) Negative correlation;
(When the variables are inversely proportional to each other) When
the researcher wants to make inferences to the population, he will
have to examine their statistical significance.
Statistical significance can be determined if correlation have been
obtained from the randomly selected samples. Depends on the size of
the correlation Significance of correlation Size of the sample
Level of significance is very important since it relates directly
to whether the null hypothesis is rejected or not.
Multivariate analysis: It is used to find out the relationship
between more than two variables as in correlation analysis. There
are two ways; Multiple regressions Factor analysis
Multiple regressions: Through multiple regressions it is possible
to examine the relationship and predictive power of one or more
independent variables with the dependent variables. it shows which
variables are significant in their contribution explaining the
variance in the dependent variable and how much they contribute.
Discriminate analysis Which contribution of variables distinguishes
between one or more categories of dependent variables?
Factor analysis: In it independent variable is not related to
dependent variables as in regression, but rather operates within a
number of independent variables without a need to have dependent
variables. In factor analysis the interrelationships between and
among the variables of the data are examined in an attempt to find
out how many independent dimensions can be identified in the data.
It thus provides information on the characteristics of the
variables. This type of analysis is based on the assumption that
variables measuring the same factor will be highly related. Whereas
variables measuring different factors will have low correlations
with one another.
Referential Technique: As we know that different designs call for
different methods of analysis. A statistical technique appropriate
for quantitative data will generally inappropriate for categorical
data. Types of Inferences Techniques: There are two types of
Inferences techniques that a researcher uses.
Parametric technique Non- Parametric technique Parametric It is the
most appropriate for interval data. It makes various kind of
assumptions about the nature of population from which the sample
involved in the research study, are drawn they are generally more
powerful than non- Parametric techniques because it reveals a true
difference or relationship if really exist.
Non-Parametric It is the most appropriate for nominal and ordinal
data. It makes few assumptions about the nature of the population
from which the sample are taken.
T-Test: It is used to compare the means of the two groups. T test
is used to determine the probability - that the difference between
the groups of subjects rather than a chance variance in data it is
used to compare. Types: T-test for independent means; It is used to
compare the mean scores of two different independen groups. t
T-test for correlate means; It is used to compare the means scores
of the same group before and after a treat mint of some sort is
given to see if any observed gain is significant or when the
researcher design involve two matched groups. The result of t-test
provides the researcher with a t-value.
Example A researcher is comparing the performance of the two
randomly selected groups learning French by two different methods.
The experimental group learns wit the aid of computer while the
control group h is exposed to the teacher. The researcher
investigates the effects of the computer practice on students’
achievement on French. After three months both the groups undergo
an achievement test. The researcher uses t- test to examine whether
there are differences in the achievements of the two groups. To
have a deep insight of the data through descriptive statistics,
first it have a mean X, SD and sample size N of the data .There
must be a mean of experimental or control group. ANOVA :( one way
analysis of variance) One way analysis of variance is used to
examine the differences in more than two groups. The analysis is
performed on the variances of the groups, focusing on whether the
variability between the groups is greater that the variability
within the groups value is the ratio between variances over the
within the variances. F= between group variance 3.