This article was originally published by the Nation Association of Litigation Support Managers publication, NALSM VIEWS, Spring 2004
Understanding Quality Measurement in the Legal Environment
As case complexity increases outsourcing (arguably) enables law firms to improve efficiencies, decrease costs and utilize specialized skills that may not be available with smaller cases. On the other hand, firms lose the craftsmen’s control exercised on smaller, less complex matters. To maintain control of these complex, often partially outsourced cases, measures need be taken to allow suppliers and consumers of legal services to measure the quality of their work. This enables firms to maximize outsourcing efficiencies while maintaining a level of excellence clients expect. To this end, I am presenting one simple, tested method for determining quality, available to both the managing attorney and the supplier of legal work. It produces a fact-based, reproducible measurement used to determine whether the outsourced produce meets the given standard of excellence or whether the firm is dealing with a substandard product that needs to be reworked.
Before we start, I would like to address attitude and the desire to excel. I doubt there is a provider of legal data who knowingly produces anything less than perfect workmanship. Yet in this world of imperfection, we know errors are inevitable. We know that humans fall asleep, that machines misread text. Because of inevitable errors, practical quality measurement techniques need to be taken by the legal professional to understand the true quality of the work they produce and use.
There are a number of ways to do this wrong, all generally listed under the category of Quality Control, or QC as it is generally known. Under this banner, well-intentioned firms advocate and implement expensive, inadequate, and occasionally frivolous document inspections, data checks and overall hand wringing that achieve no defensible objective. One very common method of non-productive QC is to check ‘a bunch’ of the work to make sure it is right. Worse is the firm who employs a specialized person to check all or some of the work, all the while having no proscribed technique to determine what to inspect, how much to inspect, and how to determine whether to accept or reject the work in front of them. In both cases time, effort and money flow freely into a drain, while no value is added to the product and no meaningful work is done.
Having taken this shot at my well-intentioned colleagues, it is time to describe a system that data providers and consumers can use to prove that they have produced the high quality work promised. Additionally where conflict arises, it can be used to defend or refute work quality either in court or at the bargaining table. Having promised to deliver, let me introduce – or perhaps reintroduce - the legal industry to one of oldest, simplest methods to measure work quality, namely MIL-STD-105.
MIL-STD-105 (pronounced mil standard 105) has been a manufacturing standard since World War II. Moreover, since it is straightforward and easy to implement, it maintains a stalwart position in modern MBA operational science courses, along with the more theoretical and statistically complex theories that make this simple system work.
MIL-STD-105 has two defining features:
1. It is easy to use.
2. It is repeatable and defendable.
Originally the standard was designed to create a non-arbitrary, meaningful and repeatable method of determining whether products supplied to the US Armed Services met agreed quality standards. As in many situations, the services extracted the complexity of quality measurement into two simple tables that are both easy to use and technically concise. However, to implement this standard, several terms need to be defined.
With this standard, there are no "re-do’s" or "maybe I should inspect a few more". The test is statistically sound and designed in such a way that a minimum number of samples (i.e., read a known, minimum cost) can provide accurate and meaningful description of the over-all quality of the work on hand. This does not mean that attorneys will not debate the issue. (Heaven forbid for those of us who support you). Rather it provides a concrete, reproducible test that both the consumer of data and the provider of data can implement to ensure themselves that they are producing the quality of work expected and advertised.
Implementing MIL-STD-105, A Case Study
Having reviewed the merits of using MIL-STD-105, I would like to create a simple, realistic case study that we can use to learn how to implement the tool. As an example, let’s assume the following:
Using our example, before we review the first item, we need to make a tactical choice: 1) do we count each field individually, or 2) do we count documents. In the first case our batch sized would be 5 fields * 15,232 documents or 76,160 items in the batch. In the second the batch size is 15,232 documents. In our example, I am really interested in how well each document is coded, so I am going to choose to view the document pool as the batch. Having made this decision, I am now required to look at each document in its entirety, meaning that we as the inspect team will need to verify that each of the 5 required fields is coded correctly. If any field is coded incorrectly, then I need to reject the document. If they are all correct, then the document passes. Similar reasoning could be used to view each field as a single entity and then one incorrect field would be an error.
Determining the Correct Inspection Level
To determine how many documents to inspect, we need to establish our inspection level. In Table I, you will notice there are a variety of inspection levels available including three general inspection levels and 4 special levels. As in many cases, we will start in the middle and work outwards. According to the American Society for Quality (ASQ), level II – called normal inspection - is appropriate for 1) unknown suppliers and for 2) suppliers of modest quality. Consequently, we will use this level in our example; however, the standard is designed to be fluid, so inspection levels may change over time depending on how the quality of the work changes over time. For example the ASQ states that if 10 lots are inspected with no errors then sampling can be reduced from normal level II to a reduced level I. On the other hand if two of 5 jobs are rejected, then inspection should be tightened from level II to level III until 5 consecutive jobs are accepted. Finally if any job is rejected from level I, inspection automatically returns to level II. The other levels, the special levels S1 – S4, are for very small jobs, and in our case, we would be doing this work in-house, probably reviewing everything in its entirety rather than sampling or coding.
Knowing the batch size, in our case 15,232, and our inspection level, normal level II, we use Table I to determine the number of samples we need to inspect. We find this by following down the left hand column, until we find that 15,232 falls between 15,000 and 500,000. Reading across the top of the table we find general inspection level II. Locating the intersection of our row and column, we determine that our batch size is "P". So how many is "P"? To answer this, we need to make one more look up. With "P" written down on the back of our hand, we go to Table II to determine the sample size as a real number. By following down the left-hand column we see "P". Just to the right of P we find that we need to look at 800 documents.
At this point we can see why "choose a few" is totally inadequate as a QC measure. From my experience, very few firms would actually review 800 documents to prove they are really 99.85% accurate. Most likely they would review a few score and call it a day. Yet to have the kind of accuracy demanded, 800 is the inspection size required. On the other hand, other firms might attempt to inspect 15,000 documents. In this case 800, it is a lot less that 15,000, and it is just as reliable an inspection for a number of reason, chief among these is inspection fatigue.
Having learned our inspection lot size, and knowing that we are looking for 99.85% accuracy, we read across table II-A to find that in those 800 documents, we can find as many as 3 with errors and still accept that job; however if we find 4, then we cannot state with any certainty that this job is good enough, and the whole job needs to be re-done and resubmitted. The lines pointing up and down indicate that if we are reading across and do not find a number in our row, we either skip up or down to the numbers provided. This allows one simple table to handle the very widest possible number of AQL and sample sizes and still remain uncluttered.
Inspecting the Job
Knowing our sample size, the job of inspecting is very simple.
At this point we have completed our review of MIL-STD-105; however, there are two matters left: gathering a truly random sample and record keeping. Our first impulse may be to simply grab 800 documents, but this almost always favors some attribute or person (like sampling only banker’s boxes on top and in the aisle). To prevent this unintentional skewing of the results, there are a couple of simple ways to get random numbers. Before computers, random number tables were commonly available. (Perhaps they still are.) But today, with computers on almost every desktop, generating a custom random list is fairly simple task that we can do ourselves.
Creating a Random Number Table
There are a number of ways to create random number tables, but you can use the description below to create a custom, random number table in MS Excel that admittedly it is not perfect, but it is certainly good enough. Let’s begin.
Using Microsoft Excel:
We now have a valid random number table. The remaining steps are optional but helpful in creating a functional spreadsheet that we can use as an inspection document:
Record-Keeping
Having gone to the trouble of creating and formatting a random list, we might as well use it to record a few numbers. This will provide meaningful traceability and a way to prove that we did what we said. In our case, we already have the Document number 1-800. In another case this might be a bates number or a DocID. In all cases it should be the unique identifier that tells us precisely which documents we reviewed. From here we could add:
Meaningful Results
Having taken all of these steps - pulling the applicable number of documents at random, reviewing the documents and recording our results – we now stand ready to state with authority that the work we are producing or purchasing meets a known quality standard. If there is a question about whether the job should be reworked, the consumer and the producer of the data can review the test documents and see the exact results of the original testing. Finally, and hopefully in most cases, both the consumer and producer of the legal service will have a meaningful, repeatable measurement that proves the work product meets the level of quality expected. This in turn is a vital first step in understanding and improving overall quality for our clients.
Conclusion
The inspection measurement technique reviewed in this article is not the most complete inspection tool available to modern litigation support managers and attorneys. Moreover there are valid criticisms of using MIL-STD-105. The primary complaint being that it does not in any way improve quality; it simply measures each batch pushed through the process. Still this standard has enough strength that it continues after 60 years of implementation to remain a mainstay of quality measurement for several reasons:
In summary MIL-STD-105 provides a simple means of measuring quality on a day to day basis without involving complex math or training, while at the same time creating a just and reproducible quality measurement system that employees and clients can understand.
Further Reading