we looked at the revised SWGFAST Draft for comment, Standards for
Minimum Qualifications and Training to Competency for Friction Ridge
Examiner Trainees (Latent/Tenprint)
we see Drs. Lyn and Ralph Haber's response to my comments on their
written testimony.
From The Detail, Issue 422, September 22, 2009.
Prepared by Ralph and Lyn Haber, October 3, 2009
Issue #422 of the Detail reprinted our testimony of September 9,
2009 to the US Senate in support of the creation of a National Institute
for Forensic Sciences. Appended to that reprint in
the Detail were some comments written by Kasey Wertheim. What follows is
our response to Kasey's concerns. We welcome this
dialogue as a chance to learn about others' reactions, and for others to
learn more about our concerns regarding fingerprint science.
Quality Control of Training, Proficiency, Method, and Laboratory
Our concerns focus on quality control assurances of the training
and experience of forensic examiners, their proficiency in the
performance of forensic comparisons, the accuracy of the method(s) used
to make those comparisons, and the regulations governing the crime
laboratory environments in which they work. Our
examples refer to forensic fingerprint comparisons and examiners, but
(with some exceptions of DNA work) these concerns pertain to all of the
forensic comparison disciplines: fingerprints, blood spatters, hair,
faces, bite marks, tool marks, bullets, tire tracks, footprints, speech,
questioned documents, handwriting)
Forensic comparison experts, unlike other scientific experts in
other disciplines, are not required by their profession to be certified
or board-approved to offer testimony in court. Unlike
other scientific experts in other disciplines, forensic comparison
examiners are not required to have their proficiency tested at regular
intervals and are permitted to carry out their comparison work in
un-accredited laboratories. Unlike other
scientific disciplines, forensic comparison lacks profession-approved,
standardized manuals on training, procedures, and quality control.
Kasey challenged these statements as overstated, wrong, or worse.
We, like Kasey, know many examiners who are highly trained,
experienced and proficient. We agree that an unknown
number of crime laboratories have written training guidelines, and/or
documented minimum requirements for proceeding from training on to
casework, and/or some sort of operations manual. We are not referring to
individual fingerprint examiners or individual crime laboratories.
We refer to the absence of standardized requirements
for the profession as a whole.
Regulations and policy statements issued by official bodies of the
forensic comparison professions rely on the word "recommended" in their
statements. There are three problems with that
wording: the content of the recommendation is not "required," it is not
necessarily observed, and there is no procedure in place to enforce it.
Examiners who provide testimony in court, in our experience, may
or may not work in an accredited laboratory, few are at least annually
proficiency tested, few are certified, and they may or may not have
graduated from a formal training program or been taught by a "trained"
trainer. We are not referring to individual examiners
or to individual laboratories. A National Institute
for Forensic Sciences (NIFS) would serve to assure standardized
requirements for the profession as a whole.
The National Academy of Sciences (NAS) report from last spring
observed that all of these requirements were missing, and that all of
them should be in place as requirements to work as forensic comparison
examiners in forensic crime laboratories.
Unknown Counts of Forensic Comparison Examiners
and Forensic Laboratories
The NAS noted that no accurate count exists today of the laboratories in
the United States in which forensic
comparisons are performed. Based on data compiled by
the IAI that we discussed in our testimony, it is possible that the
number of crime laboratories in which forensic comparisons are performed
might exceed those accredited by 10 or 20 fold.
Similarly, there is no accurate count of the persons performing forensic
comparisons, but the number substantially exceeds the number of IAI
members. Within the IAI membership, approximately 15%
of those who identify themselves as latent print examiners are
certified. Until that database is created, the ratio
of accredited to non-accredited laboratories, and the ratio of
IAI-certified to non-certified examiners is unknown, but is
substantially less than 15%.
There has been widespread support for the NAS recommendations that all
forensic examiners meet the certification requirements of their
profession, and that all laboratories that perform forensic examinations
be accredited by a single body. The forensic fingerprint profession
needs a National Institute for Forensic Sciences to provide the
assurance that laboratories and examiners are part of a known database,
and that they have met mandatory requirements accepted by the
profession.
Evidence of Reliability and Validity
Kasey asserts that "absence of proof is not proof of absence" in
response to our testimony that there is no evidence of the reliability
or validity of the proficiency tests presently in use, there is no
evidence of the reliability or validity of the certification tests
presently in use, and there is no evidence of the reliability and
validity of the ACE method itself. In the absence of
affirmative evidence, Kasey argues that these may still be reliable and
valid.
First, evidence of reliability and validity are standard
requirements for any claimed scientific procedure. If
a comparison method is to be used, it is necessary (from the viewpoint
of science) that the reliability of the method be demonstrated
(conclusions are consistent and in agreement), and that the validity of
the method be demonstrated (the test scores agree with ground truth).
These requirements are codified by the US Supreme Court in the
Daubert criteria to meet scientific tests. The same
demands apply to proficiency and certification tests.
Second, if a scientific paper is submitted for publication, the
reliability and validity of every test, every experimental manipulation,
and every scoring procedure must be demonstrated.
Otherwise, the paper is returned to the author automatically as
unsuitable for publication. The publisher does not conclude that the
absence of evidence of reliability might mean the data are reliable: the
publisher demands that the author show that it is reliable before
consideration of publication is entertained. This is
the publisher's quality control assurance that only good science gets
distributed.
If the forensic professions claim to use scientific
methods and scientific quality control procedures, the reliability and
validity of the methods and procedures must be demonstrated.
We claim that there is no evidence to support the
reliability and validity of the ACE method, or proficiency and
certification tests in use. Kasey's appeal to
"absence of proof is not proof of absence" is non responsive.
The answer is to provide the evidence.
A proficiency (or certification) test is informative only if the
test is reliable and valid. The test is reliable if
the score received by an examiner can be trusted as the examiner's
consistent and true score, and not one that would be different every
time the examiner takes a comparable test. The
easiest way to measure reliability is to divide the test items (latents
on proficiency tests) into two halves, and compare the score received on
the first half with that from the second half. A
reliable test would show a comparable score for a testee on each half.
A more time consuming method to measure reliability would be to
make up two versions of the same test (assuring comparability of the
difficulty of the test items), and then compare the scores a testee
received on each version. Both of these are standard
procedures, and provide a standardized measure of the error of
measurement in the final score. If the error of
measurement is very low (both versions produce the same score), then the
test is reliable, and trustworthy.
We have never seen a measure of reliability of a forensic proficiency or
certification test.
To be valid, a test must first be reliable. Then it
must be demonstrated that the test measures what it is supposed to
measure. If the test is a fingerprint comparison
proficiency test, then high scores should be correlated with other
measures of proficiency recognized by the profession, such as scores on
a valid certification test, supervisor ratings, or years of practice.
If the proficiency test does not correlate highly with these
other measures of proficiency, then the test is not valid.
We have never seen a measure of validity of a forensic proficiency or
certification test.
Other Evidence of Reliability and Validity of Proficiency and
Certification Tests
The statistical measurements of the reliability and
validity of the CTS proficiency test and the IAI certification test have
never been reported or demonstrated. However, we have
evaluated these tests in detail (see L. Haber, and R.N. Haber,
"Challenges to Fingerprints," 2009, published by Lawyers and Publishers
Co, Tucson, AZ) based on evidence published by CTS and
IAI, as well as our analyses of the test construction and
administration. We conclude that the reliability of
both of these tests, if demonstrated, would be poor (lower than is
useful for testing purposes), and that the validity of these tests will
be below a scientifically useful standard. We offer a
very brief overview here, listing some of the criteria that predict good
reliability and validity.
A test can be reliable only if the scores of those who are tested range
from most items correct to most items wrong. The
recent years of the CTS tests have average scores above 90% correct, and
some approach 100% correct. When nearly every testee
achieves the same score, a test is worthless for predictive purposes.
The range of scores on the IAI certification test have never been
disclosed.
A test is more likely to be valid the more its content resembles the
tasks being predicted. Neither of the two tests meet
this criterion. For example, most casework results in
exclusions or inconclusive results, but neither of these are the typical
outcomes of the proficiency or certification tests; a
substantial part of casework requires the intervention of AFIS searches
(preparation of the latent for inputting, and comparison of the original
latent to outputted candidates, but no such requirements occur on these
tests; casework involves latents of no value or of
great difficulty, but such latents are rarely included on these tests.
A test is most likely to be valid if procedures used on the test mirror
those on the tasks to be predicted. Every CTS
proficiency test fails this criterion. SWGFAST
restricts the examiner conclusions to one of only four possibilities: no
value, exclusion, inconclusive, or identification. However, on the CTS
proficiency tests, only two conclusions are permitted: identification or
not identified. No value and insufficient are not
allowed. Further, the “Not identified” conclusion is
not one of the SWGFAST conclusions. The test
responses do not correspond to normal casework.
A test is most likely to be valid if it is scored in the same way
casework would be scored (if ground truth were known).
The CTS proficiency test fails this criterion.
An unknown number of the CTS tests are taken by a committee of two or
more examiners, who report an identification only if each member of the
committee agrees. In these instances, the scores
reported by CTS are committee scores, not individual scores.
It is easy to show that the individual proficiency of each member
of a committee is much lower than the committee proficiency.
The CTS does not distinguish between answer sheets as committee
or individual when reporting their results, which inflates any estimate
of individual proficiency based on these scores.
Another scoring issue concerns the types of responses assessed. “Not
identified” is ambiguous and cannot be scored to assess accuracy: did
the examiner exclude the particular exemplar, or was he unable to reach
a definitive conclusion? Examiner accuracy in no
value and inconclusive conclusions is not assessed.
Only the number of erroneous identifications is scored, yet
identification is probably the least frequent conclusion an examiner
reaches in casework. Further, this scoring procedure
ignores the importance of exclusions in the work of forensic
comparisons.\
A test is most likely to be reliable and valid if the individual test
items are homogeneous, or, if not homogeneous, have been subdivided into
distinguishable categories ahead of time. The CTS and
IAI tests both fail this criterion. Homogeneous in
this context means that all of the items in any part of the test are
equivalent in content, difficulty, and required technique.
If the test includes multiple categories of test items,
indicating multiple skills or skill levels, then the items need to be
subdivided accordingly before hand, so they can be scored separately.
Since there is no measure of the difficulty of a latent print,
there is no measure of the reasons why one print differs from another
print or why different techniques are required.
Consequently, there is no independent way to determine which items to
score together for a particular skill or skill level.
A test is most likely to produce reliable and valid results if the
testees are randomly drawn from the population to be predicted.
The CTS proficiency test fails this criterion.
CTS does not provide information about who takes their test, about how
many times an individual examiner has taken a test, or any measure of
the representativeness of the sample of testees from among all
examiners. The average score on any test means
nothing unless the characteristics of the people being tested are
precisely known. (As an example, suppose one year the
majority of people taking the proficiency test were beginning examiners
and the average scores were low. The next year nearly all
the people taking an equivalent test were very highly trained and the
average score was very high. Unless you knew that the
population of people taking the test had changed, you might incorrectly
attribute the change in average score to improved examiner proficiency
or to easier test items.)
These are a few of the criteria that we used to evaluate the proficiency
and certification tests. Some of the flaws in the
current tests can be corrected with simple changes in content or
procedure. Others require substantial research
programs, such as developing a valid measure of latent print
informativeness and clarity. Kasey asserts that the
Certification boards have standard baselines that are aligned with the
IAI mission. His comment does not address the
failures in the construction, scoring and administration of these tests,
all of which must be addressed if these tests are to be reliable and
valid.
No-Value Latents
Kasey objects to our statement that no-value latents are not used for
comparisons for the purpose of excluding a suspect.
Kasey writes "In many laboratories, impressions that are not suitable
for identification may still be compared to exclude the subject as the
potential donor of the impression." This objection takes our words out
of context and misstates our point. Our point is that
no objective measure of the value of a latent for identification
or for exclusion has been developed, tested, and validated.
It is presently a subjective judgment, resulting in substantial
variation among examiners.
In our book, but not our testimony, we assess in detail the need for an
objective measure of the value of a latent that can be used for
exclusion but not for identification, as well as an objective measure of
the value of a latent and that can be used for identification.
We agree with Kasey that "impressions that are not suitable for
identification may still be compared to exclude the subject as a
potential donor…” Again, to make this judgment reliable and valid, there
must be an objective measure of the amount and kind of information
needed in an impression to use it to exclude and/or identify, with a
known probability of error.
Research Documentation of the Accuracy of
Forensic Comparisons is Needed
One of the virtually unanimous reactions by the forensic
professions to the NAS report is agreement that more research is needed.
To fulfill this goal, the forensic comparison professions need the
following.
The collaboration between researchers and crime laboratories to
discuss, design and participate in research on forensic comparison
issues.
The commitment of crime laboratories and their
examiners to partake in research work, and the commitment of research
scientists to work under real-world constraints.
A number of experimental researchers trained in experimental
research who are also interested in and knowledgeable about forensic
issues.
Agreements that protect anonymity of individual
participants in forensic research while allowing results to be published
An advocate for research within the forensic professions and
within the governmental agencies responsible for the forensic comparison
professions.
Funding for the research team and for the released
time of examiners in laboratories
The IAI Science and Practice committees and the FBI SWG
committees have made number of excellent recommendations that would
answer our research concerns. We testified to Congress in support of
the NIFS because the NIFS would facilitate or make possible the creation
of a research environment within the forensic professions, an
environment not present today. Our goal is the same as Kasey's: to
improve the professions of forensic comparison examinations to meet the
needs of society and the courts. A National Institute of Forensic
Sciences would provide a means of realizing this goal across
laboratories and forensic disciplines.