Name: _______Uguz Ghany ______________ ID number: ______16203548 ___________

Applied Statistics 161.111

Assignment 1

Due date: Friday 24 April 2020

Assignment 1

Total marks: 50

Image result for green lipped musselThe population data

The population we are considering for this assignment are the 10,000 kuku (New Zealand green–lipped mussels) growing in a mussel farm in the Marlborough Sounds. The variables of interest are the length of the kuku (in millimetres), grade (small, medium or large) and sex (male or female).

Each kuku (mussel) has a unique ID. The population consists of:

· 1948 large kuku with ID numbers from 1 to 1948.

· 4457 medium kuku with ID numbers from 1949 to 6405

· 3595 small kuku with ID numbers from 6406 to 10000.

You do not have access to the population data. You will collect a sample using the following steps.

· Click on the following link to open a Shiny app in your web-browser and type your ID number into the “Student ID:” box.

· http://shiny.massey+.ac.nz/dleader/DataDownload111/

· This will generate an Excel file containing ID, length, grade and sex for each of the kuku in your sample. This is your sample data. The file you have downloaded is unique to you.

· Keep an electronic copy of your sample data for use in Assignment 2.

· Attach a copy of your sample data to the appendix.

· Use your sample data to answer the following questions in the spaces provided. You can re-size the answer spaces.

· Use Excel and incorporate the Excel output into your answers in the document below.

Part A: The Sampling Method [11 marks]

1. Your sample has been generated using the stratified random sampling method, where the kuku grades were used as the strata. Describe the process behind taking the sample with enough detail that someone could use your description to collect a sample them self. [7 marks]

The Stratified Random Sampling is a sampling method that involves dividing the population into small sub-group known as strata. These small groups called strata are formed based on the attribute or characteristics that are shared among the members under the study.

After dividing the population into strata, random samples are selected from the strata, which will be used during the study.

For example, let the population consist of total Number of Kuku; the kuku population is divided into K-units with the ith stratum that consists of Kuku units. After the stratum has been selected, a sample size of n Kuku from the ith stratum is then selected, which will be used for the study.

Talk about the actual Stratified sampling method for the Kuku, not Ni or what ever you talked about in the second paragraph.

2. Why is stratified random sampling an appropriate method to use with this population? [3 marks]

The method is appropriate since the sample being studied has a minimal variance, and the objective of the study is to estimate the population parameters that have the highest precision. Also, the method is appropriate since the population consists of two clear groups. This ensures that both groups are clearly represented.

3. Paste a copy of your data in the appendix. [1 mark]

Part B: Exploratory analysis on categorical data [9 marks]

1. Use Excel to produce a table of counts for the sex of the kuku in your sample. [2 marks]

Sex of kuku table of counts

Male

53

Female

47

Totals

100

2. What proportion of your sample are female kuku? [2 marks]

Total Female Kuku was 47 of a total Kuku amount of 100. Thus, the proportion will be

47/100

3. Use Excel to draw an appropriate graph to display the table you created above. [4 marks]

4. What does your graph tell you? [1 mark]

From the Graph, we Can observe that there is more Kuku male compared to Kuku female.

We can note from the graph that generally, male kuku are longer than female kuku.

As there is only 2 units to observe (male and female Kuku) we cannot draw on the centre, spread, shape, or if there are any outliers.

Part C: Exploratory analysis on numerical data [15 marks]

1. Use Excel to draw a boxplot of kuku lengths. [3 marks]

2. Use Excel to draw a histogram of kuku lengths. [4 marks]

Paste your graph here

3. Use Excel to calculate the numerical summaries of the kuku lengths. Fill in the table with the values rounded to 2 decimal places. [4 marks]

Kuku Lengths

Sample size

100

Mean

100.41

Standard deviation

47.62

Minimum

11.9

Lower Quartile

66.95

Median

92.3

Upper Quartile

122.55

Maximum

234.8

4. What do your plots and summary statistics tell you about kuku lengths in your sample? Consider the centre, spread, shape and outliers. [4 marks]

The lengths of the Kuku are cantered at the mode of the length where most of the length is distributed at 92.3mm length.

The range is 222.9mm thus tells us the lengths are highly spread.

The graph of the lengths is positively skewed with one peak where most of the length is concentrated at left (35mm).

There are three outliers 234.8mm, 216.1mm, and 212.7mm that were identified in the Kuku length.

Part D: Confidence interval for the mean [15 marks]

1. Calculate a 95% confidence interval for the mean length of kuku on the Marlborough Sounds mussel farm. To get full marks you must show your working for the following: [7 marks]

Standard error = Standard Deviation/ Sqrt of Counts

=47.62 /√100

SE = 4.762 mm

Confidence interval = using the Excel you can find confidence interval

= 9.45

Lower limit = calculated by subtracting confidence interval from the mean of the length

= 100.41 – 4.765 x 2

= 90.886 mm

Upper limit = calculated by adding the mean and the confidence interval

= 100.41 + 4.765 x 2

= 109.934 mm

95% confidence interval is between 90.886 mm and 109.934 mm.

2. Write a sentence to interpret your confidence interval in context. [4 marks]

We are 95% certain that the interval from 90.96 to 109.86 captures the true mean lengths of Kuku on the Marlborough Sounds mussel farm.

3. Are the conditions for this confidence interval met? Explain. [2 marks]

Yes, the conditions are met.

This is because the samples used during the study was stratified random sampling method.

Also, the sample size of kuku is larger than 30, hence the sampling distribution is normal.

4. Before the data was collected the manager of the mussel farm claimed that the average length of the kuku on the farm is 100mm. Does your confidence interval support the manager’s claim? Explain. [2 marks]

Yes, the confidence interval supports the manager’s claims since the mean lies between the lower and upper limit of the confidence interval, and thus the manger was correct about the claim.

