Subscribe

Social Media Links

Insights

 | 3 minute read

De-Mystifying Statistical Sampling: Extrapolation and Interpretation

What Litigators Should Know About Statistical Sampling in Labor and Employment Disputes

With statistical sampling, counsel can simplify damage analyses, avoid potential issues with incomplete or missing data, and minimize the risk of error.

Questions Counsel Should Ask After the Analysis Has Been Conducted:

  • What are sample statistics?
  • How do those sample statistics relate to the population?
  • How do I extrapolate and interpret the results?

Recap From Last Time
In our prior article, we discussed sample sizes and how they are derived based on the desired margin of error (MOE) and confidence level, what statistical inference means, and different types of sampling methodologies. The goal of statistical inference is to use the sample to estimate features of the population, and the confidence and precision (i.e., MOE) of those estimates are a function of the number of observations in the population and in the sample.
Now, we address how to extrapolate the results and how to interpret the MOE and confidence level of the sample.

A Simple Case Study
Let’s assume a company is being sued for violating meal and rest break statutes in California and only has records on physical paper timesheets. In working with the company, we know that there are approximately 30,000 work shifts in the population and that employees, on average, earn $20 an hour. To estimate the population of meal violations, we drew a random sample of 380 shifts for analysis, which gives us an MOE of 5.0% (i.e., +/- 5.0%) with 95% confidence. As noted in our prior article, this means that if we drew 100 random samples of 380 observations each, the true population value would fall within the MOE 95 times out of 100.

Sample Statistics

We take the 380 shifts and analyze them. We determine that out of 380 shifts, 190 shifts have a missed meal violation, or 50%. With an MOE of 5.0%, this means that we can infer that 50% of the shifts have a meal violation +/-5.0%. In other words, we estimate that the number of shifts with a meal violation in the population is 15,000 +/- 750, which would imply an estimate of meal violations that ranges from 14,250 to 15,750.

If you were to then estimate potential exposure for meal violations, we take the upper and lower bounds of our estimate and multiply by the average wage rate, or in this case, $285,000 to $315,000.

Takeaway

Statistical sampling, when done properly, allows for a simplified analysis while producing reliable results. In the above case study, we demonstrated that a small sample, in terms of the population, provides an estimated range of values based on the pre-determined confidence level and MOE. The methodology also saves considerable cost and effort because we only analyzed 380 shifts, instead of analyzing multiple thousands of shifts.

Courts and regulatory agencies alike have acknowledged and allowed for the use of statistical sampling in situations where data may be too voluminous, incomplete, or unorganized to analyze in its entirety. We should note that advances in artificial intelligence (AI) have allowed for the potential for processing and reviewing of handwritten timesheets, employment documents such as waiver forms, and other types of information. However, these types of documents are usually not entirely legible, have annotations of various forms, cross-outs, and other contextual information in disparate places on the document. We have found that AI has had issues with this type of task. In other types of analyses, such as time and motion studies, you may be able to use AI to assist with the analysis, such as using computer vision to review and record timestamp recordings of employees, but statistical sampling would still be needed to identify times, locations, and/or employees to review.

Additionally, the data could simply be too large to conduct an analysis of every single row. Imagine an analysis of an off-the-clock claim for a large retailer using the point-of-sale system to identify transactions outside of clocked hours. This type of data could easily be in the 10s to 100s of billions of rows. Although there are database systems that can handle voluminous data, such as this, it is not necessary when statistical sampling is a viable option that produces accurate results.

Litigators should always consider the option of statistical sampling in their future labor and employment class action cases as the size of data continues to grow and companies continue to use varying technologies to capture relevant employee data.

Let’s Connect

We solve problems by operating as one firm to deliver for our clients. Where others advise, we solve. Where others consult, we partner.

I’m interested in
I need help with