AIOU Solved Assignments 1 & 2 Code 1430 Autumn 2018. Solved Assignments code 1430 business statistics 2019. Allama iqbal open university old papers.

**Course: Business Statistics Level: B.A/BS**** Course Code: 1430 Semester Autumn 2018 Assignment No. 1**

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

**Q.1a) Distinguish between Primary and Secondary data, giving examples of each.Answer:**

Data collection plays a very crucial role in the statistical analysis. In research, there are different methods used to gather information, all of which fall into two categories, i.e. primary data, and secondary data. As the name suggests, primary data is one which is collected for the first time by the researcher while secondary data is the data already collected or produced by others.

There are many differences between primary and secondary data, which are discussed in this article. But the most important difference is that primary data is factual and original whereas secondary data is just the analysis and interpretation of the primary data. While primary data is collected with an aim for getting solution to the problem at hand, secondary data is collected for other purposes.

### Aiou Solved Assignments code 1430

** Definition of Primary Data**

Primary data is data originated for the first time by the researcher through direct efforts and experience, specifically for the purpose of addressing his research problem. Also known as the first hand or raw data. Primary data collection is quite expensive, as the research is conducted by the organisation or agency itself, which requires resources like investment and manpower. The data collection is under direct control and supervision of the investigator.

The data can be collected through various methods like surveys, observations, physical testing, mailed questionnaires, questionnaire filled and sent by enumerators, personal interviews, telephonic interviews, focus groups, case studies, etc.

Definition of Secondary Data

Secondary data implies second-hand information which is already collected and recorded by any person other than the user for a purpose, not relating to the current research problem. It is the readily available form of data collected from various sources like censuses, government publications, internal records of the organisation, reports, books, journal articles, websites and so on

Secondary data offer several advantages as it is easily available, saves time and cost of the researcher. But there are some disadvantages associated with this, as the data is gathered for the purposes other than the problem in mind, so the usefulness of the data may be limited in a number of ways like relevance and accuracy.

Moreover, the objective and the method adopted for acquiring data may not be suitable to the current situation. Therefore, before using secondary data, these factors should be kept in mind.

Key Differences between Primary and Secondary Data

The fundamental differences between primary and secondary data are discussed in the following points:

The term primary data refers to the data originated by the researcher for the first time. Secondary data is the already existing data, collected by the investigator agencies and organisations earlier. Primary data is a real-time data whereas secondary data is one which relates to the past.

Primary data is collected for addressing the problem at hand while secondary data is collected for purposes other than the problem at hand.

Primary data collection is a very involved process. On the other hand, secondary data collection process is rapid and easy.

Primary data collection sources include surveys, observations, experiments, questionnaire, personal interview, etc. On the contrary, secondary data collection sources are government publications, websites, books, journal articles, internal records etc.

Page 1 of 16

Created by Universal Document Converter

Primary data collection requires a large amount of resources like time, cost and manpower. Conversely, secondary data is relatively inexpensive and quickly available.

#### Aiou Solved Assignments code 1430 Autumn 2018

Primary data is always specific to the researcher’s needs, and he controls the quality of research. In contrast, secondary data is neither specific to the researcher’s need, nor he has control over the data quality.

Primary data is available in the raw form whereas secondary data is the refined form of primary data. It can also be said that secondary data is obtained when statistical methods are applied to the primary data.

Data collected through primary sources are more reliable and accurate as compared to the secondary sources.

b) Enumirate the main sources of errors in statistics and give their effect?

Here are 5 common sources of errors in statistics

I. Population Specification

This type of error occurs when the researcher selects an inappropriate population or universe from which to obtain data.

Example: Packaged goods manufacturers often conduct surveys of housewives, because they are easier to contact, and it is assumed they decide what is to be purchased and also do the actual purchasing. In this situation there often is population specification error. The husband may purchase a significant share of the packaged goods, and have significant direct and indirect influence over what is bought. For this reason, excluding husbands from samples may yield results targeted to the wrong audience.

Sampling

Sampling error occurs when a probability sampling method is used to select a sample, but the resulting sample is not representative of the population concern. Unfortunately, some element of sampling error is unavoidable. This is accounted for in confidence intervals, assuming a probability sampling method is used.

Example: Suppose that we collected a random sample of 500 people from the general U.S. adult population to gauge their entertainment preferences. Then, upon analysis, found it to be composed of 70% females. This sample would not be representative of the general adult population and would influence the data. The entertainment preferences of females would hold more weight, preventing accurate extrapolation to the US general adult population. Sampling error is affected by the homogeneity of the population being studied and sampled from and by the size of the sample.

Selection Selection error is the sampling error for a sample selected by a nonprobability method.

#### Aiou Solved Assignments code 1430 Business Statistics

Example: Interviewers conducting a mall intercept study have a natural tendency to select those respondents who are the most accessible and agreeable whenever there is latitude to do so. Such samples often comprise friends and associates who bear some degree of resemblance in characteristics to those of the desired population.

Non-responsive Nonresponse error can exist when an obtained sample differs from the original selected sample.

Example: In telephone surveys, some respondents are inaccessible because they are not at home for the initial call or call-backs. Others have moved or are away from home for the period of the survey. Not-at-home respondents are typically younger with no small children, and have a much higher proportion of working wives than households with someone at home. People who have moved or are away for the survey period have a higher geographic mobility than the average of the population. Thus, most surveys can anticipate errors from non-contact of respondents. Online surveys seek to avoid this error through e-mail distribution, thus eliminating not-at-home respondents.

Measurement

Measurement error is generated by the measurement process itself, and represents the difference between the information generated and the information wanted by the researcher.

Example: A retail store would like to assess customer feedback from at-the-counter purchases. The survey is developed but fails to target those who purchase in the store. Instead, results are skewed by customers who bought items online

#### Aiou Solved Assignments code 1430 Business Statistics Autumn 2018

**Q.2 a) What is measure of central tendency? What if the purpose served by it? What are its desirable qualities?**

**Ans:**

In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution. It may also be called a center or location of the distribution. Colloquially, measures of central tendency are often called averages. The term central tendency dates from the late 1920s.

The most common measures of central tendency are the arithmetic mean, the median and the mode. A central tendency can be calculated for either a finite set of values or for a theoretical distribution, such as the normal distribution. Occasionally authors use central tendency to denote “the tendency of quantitative data to cluster around some central value.

The central tendency of a distribution is typically contrasted with its dispersion or variability; dispersion and central tendency are the often characterized properties of distributions. Analysts may judge whether data has a strong or a weak central tendency based on its dispersion.

Measures

** Aiou Solved Assignments code 1430 Autumn 2018**

The following may be applied to one-dimensional data. Depending on the circumstances, it may be

appropriate to transform the data before calculating a central tendency. Examples are squaring the

Page 3 of 16

rea ed by Universal Document Converter

values or taking logarithms. Whether a transformation is appropriate and what it should be, depend heavily on the data being analyzed.

Arithmetic mean or simply, mean

the sum of all measurements divided by the number of observations in the data set. Median

the middle value that separates the higher half from the lower half of the data set. The median and the mode are the only measures of central tendency that can be used for ordinal

data, in which values are ranked relative to each other but are not measured absolutely. Mode

the most frequent value in the data set. This is the only central tendency measure that can be used with nominal data, which have purely qualitative category assignments.

Geometric mean

the nth root of the product of the data values, where there are n of these. This measure is valid only for data that are measured absolutely on a strictly positive scale.

Harmonic mean

the reciprocal of the arithmetic mean of the reciprocals of the data values. This measure too is valid only for data that are measured absolutely on a strictly positive scale.

Weighted arithmetic mean

an arithmetic mean that incorporates weighting to certain data elements.

Purpose of measures of central tendency

Measures of Central Tendency provide a summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution. There are three main measures of central tendency: the mean, the median and the mode.

When data is normally distributed, the mean, median and mode should be identical, and are all effective in showing the most typical value of a data set.

Desirable qualities of measure of central tendency Desirable qualities of a good measure of central tendency are:-

It should be rigidly defined.

It should include all observations.

it should be simple to understand and easy to calculate.

it should be capable of further mathematical treatment.

It should be least affected by extreme observations.

Page 4 of 16

reated by Universal Document Converter

- it should possess sampling stability.

b) Explain when median is more representative than mean? Calculate the median of the following distribution.

When median is more representative over the mean

The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These are values that are unusual compared to the rest of the data set by being especially small or large in numerical value. Another time when we usually prefer the median over the mean (or mode) is when our data is skewed (i.e., the frequency distribution for our data is skewed). If we consider the normal distribution – as this is the most frequently assessed in statistics – when the data is perfectly normal, the mean, median and mode are identical. Moreover, they all represent the most typical value in the data set. However, as the data becomes skewed the mean loses its ability to provide the best central location for the data because the skewed data is dragging it away from the typical value. However, the median best retains this position and is not as strongly influenced by the skewed values. This is explained in more detail in the skewed distribution section later in this guide.

Median

Classes Number Class boundaries F

100 -104 4 99.5 -104.5 4

105-109 14 104.5-109.5 18

110-114 60 109.5-114.5 78

115-119 138 114.5-119.5 216

120-124 236 119.5-124.5 452

125-129 298 124.5-129.5 750

130-134 380 129.5-134.5 1130

135-139 450 134.5-139.5 1580

140-144 500 139.5-144.5 2080

145-149 430 144.5-149.5 2510

150-154 260 149.5-154.5 2770

155-159 128 154.5-159.5 2898

160-164 66 159.5-164.5 2964

165-169 28 164.5-169.5 2992

170-174 12 169.5-174.5 3004

**Q.4 a) Discuss the different measures of dispersion? Describe the method of computation of any two of them with suitable example?****Answer:**

** Measures of Dispersion**

Measures of dispersion measure how spread out a set of data is.

For the study of dispersion, we need some measures which show whether the dispersion is small or large. There are two types of measure of dispersion, which are:

Absolute Measures of Dispersion

Relative Measures of Dispersion

Absolute Measures of Dispersion

These measures give us an idea about the amount of dispersion in a set of observations. They give the answers in the same units as the units of the original observations. When the observations are in kilograms, the absolute measure is also in kilograms. If we have two sets of observations, we cannot always use the absolute measures to compare their dispersions. We shall explain later as to when the absolute measures can be used for comparison of dispersion in two or more sets of data. The absolute measures which are commonly used are:

I. The Range

The Quartile Deviation

The Mean Deviation

The Standard Deviation and Variance

Page 6 of 16

Created by Universal Document Converter

Relative Measures of Dispersion

These measures are calculated for the comparison of dispersion in two or more sets of observations. These measures are free of the units in which the original data is measured. If the original data is in dollars or kilometers, we do not use these units with relative measures of dispersion. These measures are a sort of ratio and are called coefficients. Each absolute measure of dispersion can be converted into its relative measure. Thus the relative measures of dispersion are:

Coefficient of Range or Coefficient of Dispersion

111 Coefficient of Quartile Deviation or Quartile Coefficient of Dispersion 111 Coefficient of Mean Deviation or Mean Deviation of Dispersion

Coefficient of Standard Deviation or Standard Coefficient of Dispersion

Coefficient of Variation (a special case of Standard Coefficient of Dispersion METHODS OF DISPERTION:

There are various methods of measuring the dispersions of a series which can be broadly classified into three categories as under:

Method of limits

Method of computations

Method of graphs.

Dispersion by the method of limits

Under this method, the dispersion of a series are studied by taking into account the extreme limits of certain factors viz. value, quartiles, deciles, percentiles etc. of a series.

The following measures of dispersion come under this method : (i) Range, (ii) Inter-Quartile Range (iii) Semi-inter Quartile Range or Quartile Deviation, (iV) Decile Range, and (v) Percentile Range.

Dispersion by the method of computation

Under this method, the dispersal character of a series is studied through the process of computation. The following measures come under this calss:

3.Dispersion by the method of graphs

Under this method, the dispersion of a series is studied by drawing certain suitable, graphs, viz. Lorenz Curve:

Examples

Range

The simplest method of studying the variation in the distribution is the range. The range is defined as the difference between the largest item and the smallest item in the set of observations. So, in a set of observations if L is the largest item and S is the smallest item, then range is given by

Range = L — S

Page 7 of 16

reated by Universal Document Converter

In a grouped frequency distribution, range is the difference between the upper limit of the largest class and lower limit of the smallest class.

The range is the absolute measure of dispersion.It cannot be used to compare two distributions with different units.

Semi-Interquartile range or Quartile deviation

The measure of dispersion depending upon the lower and upper quartiles is known as the quartile deviation. The difference between the upper and lower quartile is known as the Interquartile range. Half the interquartile range is known as Semi-interquartile range or quartile deviation.

..Quartile deviation=Q3—Q12 /2

Mean Deviation (Average Deviation)

Mean deviation is defined as the arithmetic mean of the deviations of the items from mean, median and mode when all deviations are considered positive.

M.D. from mean=L/x—x In =L /d/n

M.D.f rom mean=L/x—x in = //d/n

Also M.D. from mean=/f/x—x /N= /f/d/N

b) Estimate the mean deviation from the arithematic mean of the following set of the

examination marks.

**Q.4 a) Define variance and standard deviation. Describe their properties.****Answer:**** Variance** is a terminology that is used in probability theory as well as in statistics. Variance is the measure of the spread of different data points among the given data set. It describes how far the data values are located from its mean position. As the name suggests, variance may be referred to the measure of degree of variation in the data. In this article, we shall go ahead and understand about the concept of variance, its properties and applications. **Variance**

Variance is a measurement of the spread between numbers in a data set. The variance measures how far each number in the set is from the mean. Variance is calculated by taking the differences between each number in the set and the mean, squaring the differences (to make them positive) and dividing the sum of the squares by the number of values in the set.

Variance Formula

Variance is calculated using the following formula:

**Q.5 a) What do you mean by absolute and relative measure of dispersion? State the uses of the co-efficient of variation in statistical analysis.****Answer:**- Absolute Measures of Dispersion
- These measures give us an idea about the amount of dispersion in a set of observations. They give the answers in the same units as the units of the original observations. When the observations are in kilograms, the absolute measure is also in kilograms. If we have two sets of observations, we cannot always use the absolute measures to compare their dispersions. We shall explain later as to when the absolute measures can be used for comparison of dispersion in two or more sets of data. The absolute measures which are commonly used are:
- The Range
- The Quartile Deviation
- The Mean Deviation
- The Standard Deviation and Variance
- Relative Measures of Dispersion

These measures are calculated for the comparison of dispersion in two or more sets of observations. These measures are free of the units in which the original data is measured. If the original data is in dollars or kilometers, we do not use these units with relative measures of dispersion. These measures are a sort of ratio and are called coefficients. Each absolute measure of dispersion can be converted into its relative measure. Thus the relative measures of dispersion are:

Coefficient of Range or Coefficient of Dispersion

171 Coefficient of Quartile Deviation or Quartile Coefficient of Dispersion

Coefficient of Mean Deviation or Mean Deviation of Dispersion

Coefficient of Standard Deviation or Standard Coefficient of Dispersion

Coefficient of Variation (a special case of Standard Coefficient of Dispersion

Uses of Coefficient of Variation

Coefficient of variation is used to know the consistency of the data. By consistency we mean the uniformity in the values of the data/distribution from the arithmetic mean of the data/distribution. A distribution with a smaller C.V than the other is taken as more consistent than the other.

C.V is also very useful when comparing two or more sets of data that are measured in different units of measurement.

The coefficient of variation of the observations is used to describe the level of variability within a population independently of the absolute values of the observations. If absolute values are similar, populations can be compared using their standard deviations. But if they differ markedly (for example, the weights of mice and elephants), or are of different variables (for example, weight and height), then you need to use a standardized measure – such as the coefficient of variation. The coefficient of variation (CV) for a sample is the standard deviation of the observations divided by the mean. The most common use of the coefficient of variation is to assess the precision of a technique. It is also used ass a measure of variability when the standard deviation is proportional to the mean, and as a means to compare variability of measurements made in different units.