There is a persistent
misunderstanding about what DOT&E (Director, Operational Test and
Evaluation), the Pentagon’s weapon and system test group, is and does. As incredulous and unbelievable as it seems,
there are actually some who view DOT&E as an obstacle or impediment to
fielding weapon systems. This can only stem
from a complete failure to understand what DOT&E is and does. I’ll now make clear what they do.
Let’s start with what
DOT&E does not do.
- They do not approve or disapprove projects,
weapons, or systems. If they did,
the F-35 and LCS would have been cancelled long ago!
- They do not set criteria or specifications for
weapon systems.
- They do not decide when a system can be
fielded. The Navy routinely fields
systems that have not been completely or successfully evaluated by
DOT&E.
Now, what do they do?
At its simplest, DOT&E
takes specifications that are established
and provided by the Navy and implements test programs to verify that
the system under consideration meets those specifications.
This is important enough to
say again. DOT&E takes
specifications that are established
and provided by the Navy and simply verifies that the system meets
those specifications.
Some people believe that
DOT&E mandates unnecessary tests and becomes an obstacle to progress. Nothing could be further from the truth.
How does DOT&E determine
what tests are needed and how many? DOT&E
testing, as in industry in general, is based on statistics and experimental
design. There is an entire field of
mathematics devoted to figuring out how to set up experiments to obtain the
maximum amount of data from the minimum amount of test runs. I won’t even attempt to describe how this is
done. Suffice it to say that this is
pure mathematics. There is no personal
opinion involved. The experimental
design does not care what the designer thinks of the system being tested. In other words, DOT&E’s opinon, if they
even had one about a given system, does not enter into the design of the
experiment and has no influence on the type or number of tests required. See reference (1) for a brief discussion of
statistical experimental design.
The biggest obstacle to
rapid assessments is the Navy. The Navy
routinely declines to make ships, aircraft, and systems available for timely
testing. Consider the Navy’s habit of
putting off shock testing. If the Navy
would simply perform the shock testing with the first of class at the earliest
opportunity, the testing would be over and the ship could get on with its
service life and deployments. Instead,
for reasons unfathomable to me, the Navy spends more time fighting with
DOT&E and trying to delay testing than the time it would take to simply do
the testing. The only explanation I can
come up with for the Navy’s reluctance to perform shock testing is that recent
ship designs are so structurally weak and poorly designed that the testing will
reveal significant weaknesses and problems which will, rightly, make the Navy
look bad, so they try to avoid the testing.
Further, the Navy routinely fails to fund the necessary threat surrogates that would conclusively verify weapon and system performance. The Navy will spend several billion dollars on buying a ship but refuses to spend a hundred thousand dollars or a million dollars on simple, realistic threat surrogates to determine whether that ship can actually perform its duties effectively. I think, by now, you can reason out the why of the Navy's reluctance to perform realistic testing.
If the Navy had their way,
no testing would have been done on the LCS or its modules and we’ve seen how
that turned out. DOT&E is the only
thing standing between badly flawed weapons and sailor’s lives.
The Navy’s adversarial
relationship with DOT&E is inexplicable.
The Navy should be the strongest supporter of DOT&E since DOT&E
is the group that ensures that the Navy will get what it wants in the way of
weapon and system performance. Indeed,
the Navy’s history of being cheated and shorted by fraudulent manufacturer’s
claims should be all the evidence needed for the Navy to be DOT&E’s most
enthusiastic supporter. Sadly, this is
not the case. The Navy’s reluctance to
fund proper testing and perform said testing borders on criminal negligence and
incompetence. The next time we enter
high end combat we’re going to look fondly back on the faulty torpedo episode
of WWII compared to the failings we’ll find in today’s systems due to the
Navy’s unwillingness to embrace proper testing.
Many sailors will pay with their lives for the Navy’s gross negligence.
__________________________
Brief Background: DOT&E was established by an act of
Congress in 1983. The organization
reports directly to the Secretary of Defense which keeps it out of the chain of
command of the services, thus ensuring its independence. The current Director, Dr. J. Michael Gilmore
has served since 2009. Dr. Gilmore and
his group have been nothing short of spectacular in their performance and are
the saving grace of a badly flawed military acquisition process. Countless service members undoubtedly owe
their lives to DOT&E.
__________________________
Good Post. You cannot repeat it enough!
ReplyDeleteMy experience is that 80% of the people involved in Acquisition spend 80% of their time trying to ignore or circumvent the system to get something out that their Boss(es) want badly. And as the saying goes you want it badly your get it bad.
Instead the TRUE Acquisition Professionals spend 80% of their time reading, researching and UNDERSTANDING how the process is designed to produce reliable, WORKING systems. These few PROFESSIONALS are not a salve to the standards and regulations, they understand why they are their and WHEN it is appropriate to waiver them, or tailor them down to meet the objective.
IMHO you are ABSOLUTELY right in saying the impediment to rapid acquisition is the Navy itself. There are too few professionals and to MANY opportunistic Careerists.
This may be an excellent discussion of how DOT&E is supposed to work, or how elements of DOT&E work, but the truth is that unnecessary tests are certainly mandated (at least at the working level) by elements of DOT&E to examine features of systems that are either outside the specifications of the system or are really an assessment of how DOT&E expects the system to be used within a particular CONOPS. I have heard of test events where the on-site DOT&E reps did not allow the local testers to call off a weapons shot when the local testers knew that the weapons shot would be outside the performance bounds of the weapon. In other words, the limited numbers of weapons available for use in the testing were wasted in a scenario that everyone acknowledged and knew would result in a miss, and no value to the weapons program was derived by that event. In other cases, DOT&E has stated that the testing scenarios were being designed to test the bounds of the system and fully characterize performance rather than determining whether the system met the requirements. This has contributed to the feeling that the organization has run amok.
ReplyDeletePerhaps an interesting topic could be the need for DOT&E considering the Navy has what is supposed to be an independent test agent (COMOPTEVFOR). Does this mean that COMOPTEVFOR is not doing their job?
I've worked with experimental design in industry and what you're describing sounds like exactly the way designed testing should run. Some sets of test parameters (meaning some tests) are quite likely to fail or perform poorly but that's how you map out the response field of the system. If you only test a known, successful region of the test grid, you won't learn what the system can do under any but one, specific, perfect set of circumstances - which is almost assured to never occur in combat. This is a case of you and the "local testers" failing to understand how an experimental design works. It needs ALL the data, successes and failures.
DeleteOptimization of limited test materials is a branch of experimental design. The Navy is not alone in often having limited quantities of test materials. Designs can be, and are, optimized for conservation of material, when necessary.
DOT&E is exactly right and you and others fail to understand the methodology. Now, what would be nice is if DOT&E personnel took the time to explain the need and uses of even poor performance test points.
Before you comment further, you might want to read up on statistical experimental design.
One of the next few posts will be on why DOT&E is needed. Again, a seemingly obvious question but, sadly, one that some people need to have spelled out.
As far as COMOPTEVFOR, I've never seen a single report from them, good or bad, so it's difficult to evaluate their worth. I'll say this, though, there has been no indication that they've ever pointed out a problem and given the sheer immense number and seriousness of problems that we know about, that's a pretty damning indictment. They seem to be completely ineffectual. If you can point me at a single problem they've ever identified, I'd appreciate it.
CNO. I don't think you understand the relationship between DOT&E and COTF.
DeleteCOTF actually came manages many of the tests that you seem to give DOT&E credit for. And analyzes the results.
In terms of what they accomplish - most of it is quite obviously classified. But they have been at the forefront of pointing out many of the problems with Navy programs. LCS in particular.
So cite any verifiable accomplishment of theirs. Any report. Don't tell me classified. DOT&E manages to report lots of information without classification issues.
DeleteIn being purposely vague, I may have clouded the intent of the post. The scenarios that I attempted to describe are not within the bounds of effective implementation of design of experiments, including optimization of limited test materials. If a system is not required or designed to counter a specific threat, and the lack of effectiveness against that specific threat was proven through analysis early in the design phase and proven again during developmental testing, then why expend limited resources in operational testing to once again confirm that it lacks effectiveness against that threat? That's not testing the boundary conditions...that's conducting independent experimentation.
DeleteSimilarly, during live fire events, the launch platform will confirm with the test controllers prior to launch of a weapon. The confirmation serves several purposes - the primary is to ensure that the range is clear and the shot can be taken safely, and another purpose is to confirm that the shot will be accomplished against a valid target. In an end-to-end test, the test controllers can halt the release of the weapon if the weapon is about to be released against a false target...why expend a limited resource? However, there have been cases where DOT&E over-ruled the test controllers and those limited weapon releases were done against false targets, merely confirming that the launch platform did not have a good shot, which everyone already knew prior to expending that weapon. You don't require PhD level knowledge of statics to understand that.
COTF completes all operational testing, regardless of whether a system is under DOT&E oversight or not. They also facilitate operational assessments that can be done prior to more formalized testing, which helps in identifying problems. It's difficult to discuss effectiveness of testing of Navy systems without understanding the relationship between DOT&E, COTF, and the programs under test.
IP, you appear to have a few specific examples in mind. Not having been there, I can't comment specifically on them. I'll repeat that proper experimental design often calls for data test points in regions of anticipated poor performance - that's how you map out the entire response surface which includes the good regions. You can't map the good without the bad!
DeleteI'll also say that if a test point were omitted, it not be recorded as a valid data point. Instead, it would simply be a missing data point from a statistical analysis point of view. This would tend to skew the analysis to a more favorable result than there really was.
Your statement that you don't need a PhD in statistics confirms for me that you don't have a suitable understanding of statistics and experimental design to pass judgement on this. That's not a criticism, just an observation.
Again, I'll repeat, cite even one example of something productive COMOPTEVFOR has ever done. Where were they when the Navy was pushing ahead with badly flawed LCS modules, for example, and only DOT&E provided actual data and realistic assessments?
And you've also missed the point...again, it's not a question of testing in anticipated poor performance areas but rather knowingly testing in the no performance area. Operational testing is not conducted in a vacuum. Sub-system and system level developmental testing is conducted to verify performance and inform the boundary conditions...otherwise, how do you know if you're testing the good vs the bad? Then, during the conduct of the test, you cannot hide behind "experimental design" to justify poor decisions and poor use of resources. That happens all too often in weapons testing, contributing to conflict between programs and DOT&E.
DeleteThe hypothetical example (informed by a real world example would be the testing of an anti-air missile. Part of the missile end-to-end testing would be evaluating the sensor system ability to id and track the incoming threat, and then evaluate whether or not to engage with the system under test (ie, the anti-air missile). Prior to launch of the missile, the launch craft would confirm with range control that they are clear to release. If the launch craft is not properly tracking the incoming threat, then the engagement can be interrupted and evaluated without needlessly expending the system under test, and save that round for an engagement that would better evaluate performance. That hypothetical example is not testing the boundary of poor performance in experimental design...that's wasting the weapon through a poor decision to launch.
You seem to be describing a broken test system. There is no value or point to testing a broken system. If the radar has a malfunction then the entire test point is invalid (unless you're conducting reliability testing in which case it's a perfectly valid test point!). Again, not having been there and seen whatever specific case you're describing, I can't comment further other than to repeat that you seem not to have a fundamental grasp of experimental design requirements. You seem to be suggesting that we only complete test points that we know are tracking well. That's not how testing works. If there's a safety issue or a mechanical failure that invalidates the data point, then stop it, by all means. Otherwise, the point is valid and must be performed.
DeleteI would also note that "how bad" is a good measure. You may know that you don't have a good target track and the missile will miss but perhaps the magnitude of the miss is important. I have no idea what the test design was that you're referring to but, again, bad results are just as important as good results.
DOT&E personnel are experts at this. Not that someone, sometime, can't make a poor decision but as a general statement, I'd unhesitatingly accept their decisions. It's what they do for a living. The Navy's record on testing and their clear lack of knowledge on the subject, on the other hand ...
I don't want to continue this because I have no knowledge of the specific incident you're focused on. This discussion does, however, reinforce my premise that some (many?) people have a flawed (or no?) understanding of how DOT&E functions and base their opinions on that lack of understanding. That's why I wrote this post - to educate people just a bit on how and why DOT&E functions.
Honest question, though I admit it will sound snarky:
DeleteIt seems the Navy has two organizations (DOT&E and COMOPTEVFOR, which I hadn't heard of before). If we have two organizations doing all of these tests....
why haven't we had live fire tests of Aegis against realistic threats? Shouldn't that be part of the evaluations?
If I remember correctly, before battleship armor got installed it went through ballistics tests against live rounds to see how the armor would work.
We test Aegis against Coyote's, which are okay, but still nothing like a Brahmos. But these tests always seem to be one off events (1 missile, not a raid). And as far as I can tell we've never done BMD testing against MIRV's.
So, in the fleet we have a very expensive weapons system upon which we've hung our hat, but we don't *really* know if it works.
COMOPTEVFOR is an internal Navy test group that, honestly, I've never seen do anything worthwhile and has never, to the best of my knowledge, found a problem with anything. It's in the Navy chain of command so that should tell you about its ability to be impartial and effective.
DeleteDOT&E was established by Congress because they didn't feel like the services were conducting proper, realistic, accurate testing. DOT&E reports directly to SecDef which keeps it out of the service's chains of command and thus, largely, independent although SecDef is a potential source of conflict. Still, it seems to be working.
Remember, from the post, DOT&E can't make up tests. They can't just say, "We want to test Aegis against a full raid." That's up to the Navy to request. Once requested, DOT&E can come up with a test plan. However, DOT&E tests specifications, not battle scenarios which is what a full Aegis raid test would be. Do you see the distinction?
Forget it...you're missing the point and not understanding the issue/scenario. It's the difference between conducting an end-to-end evaluation with multiple systems, and reviewing the performance of the system under test within the context of the end-to-end evaluation with continuously changing variables. And in the hypothetical "how bad", you're describing a result with less than marginal value when there is a finite test asset availability. In my direct experience, DOT&E has a difficult time in establishing the goal posts, and then not moving those goal posts after testing has commenced, which greatly contributes to the frustration and broken relationship between DOT&E and the Navy.
DeleteWhile we're focusing on DOT&E, many of the findings attributed to DOT&E in the unclassified forum should have been found during developmental testing or the live fire portion of testing. There are DT and LFT&E counterparts to DOT&E within that some organizational structure...seems those organizations should have some alignment with their DOT&E counterparts, or the larger test organization is allowing failing systems to progress through DT and into OT.
"Remember, from the post, DOT&E can't make up tests. They can't just say, "We want to test Aegis against a full raid." That's up to the Navy to request. Once requested, DOT&E can come up with a test plan. However, DOT&E tests specifications, not battle scenarios which is what a full Aegis raid test would be. Do you see the distinction?"
This is not a true statement. That may be how DOT&E is intended to work but in practice, DOT&E absolutely makes up the tests and creates the scenarios to be tested.
Given JUST the LPD 17 and LCS, I would say YES! Along with INSURV and SupShips.
ReplyDeleteI have been in the room when test designs were discussed by Navy and Contractor people.
I can tell you when I questioned why X number of shots - what is the significance of X; I got the answer of just because another program did it. (X sounded to someone not familiar with Test Statistics like a lot)
I can tell further you that even the X shots were going to be single face straight and level inbound.
Lastly in order to cut testing costs the Contractors System test was going to be limited to 8 hours, when the current Fleet standard is 25 hours.
So before someone chucks rocks at DOT&E take a real hard look at the test plans that are published, they are sorely lacking.
I think it is gross corruption that is happening. The top brass hope to get cushy jobs working for the very defense industry that is ripping off the Navy and by extension, taxpayers.
ReplyDeleteThey have an adversarial relationship because they are advocates for the defense industry, not advocates for the American people. It is no different than the rest of the political system, where many politicians become lobbyists after they leave.
I believe that most senior officials do not believe in serving their nation (in actions), but rather their own wallets.
The DOT&E, if anything should be given more powers, including the power to halt a weapons system that has been considered vastly deficient in performance, versus the claims of the USN or the defense industry.
If a vessel is ever put into a situation against an enemy that can fight back ... well, whoever on board had better hope that they have a vessel that has been adequately tested.
"They have an adversarial relationship because they are advocates for the defense industry ..."
DeleteI would really like you to be wrong about this but, unfortunately, I suspect you're far more likely to be right than wrong. Despite my repeated criticisms of the practice of retired Admirals taking defense industry jobs, I hadn't quite made that same connection to the Navy's adversarial relationship with DOT&E. You may have nailed this one - sadly. Great observation and comment.
This is what you're up against:
Deletehttp://www.informationdissemination.net/2016/02/1980s-era-test-and-evaluation_3.html
If the Navy is providing the criterion against which DOT&E is testing.... The Navy needs to get its fowl aligned before the tests. Then they could pass them.
I've seen that. It's written by an unabashed LCS fan boy (that's OK) whose understanding of the function of DOT&E is utterly lacking. He starts by saying that DOT&E "demands" survivability. As you know from having read this post, DOT&E cannot and does not demand anything. They simply test the requirements that are given them BY THE NAVY. The author is ignorant of his subject matter, to say the least.
DeleteCom Nav,
ReplyDeleteI've conducted both DT and OT tests. Vast difference.
All OT test reports for significant ACAT programs are downloadable. Peruse them.
I was OTD (op test director) for an ACAT II program during the late Reagan administration. Worth about $1-2 Billion at the time. A moderately developmental, "upgrade" program as you would call it.
Our team monitored DT and followed the technical discrepancies, then took the system into OT.
Unfortunately, the system did not function well enough that it should result in a full scale buy. We failed it after about 6 weeks. Minor 0-3s and a couple O-4 pulled it off!
As a result the program got an ACAT IIS designation (special reporting to DoD/congress until corrected). The system was then "fixed" in about 8 months with a lot of work and my participation in another mini-DT to verify corrections.
It then went back into OT, passed with flying colors in 4 months exactly as planned originally. After completing the quicklook report in 72 hours, recommending full rate production, and then finishing the final report (90 days); we were done with OT. We also published the initial tactics guides, documented the R&M, etc. concurrently. Something OPTEVFOR doesn't do any more- not enough talent.
That is what was expected in those days. We got it done like everything else.
Nowadays everything is bait/switch, avoid OT at all costs and attempt to define OT timelines, costs and outcomes, upfront. Make them mushrooms by keeping them in the dark and feeding them you know what. After all, aren't DT folks (test pilot schoolwhizes) and the PMA/PEOs that much more smarter than a bunch of OT Bozo's?
It's a game. I know. I've seen both sides.
Anybody associated with the Navy air acquisition business and that has been exposed to NAVAIR 5.0 and COMOPTEVFOR, knows. LOL, cynically.
OT can work ComNav, I've seen it done in the late 80's, but it is going to take a culture change and some leadership to implement. We lack both today.
b2
Good feedback. Thanks.
DeleteNow, how did you conduct your testing? Experimental design or some other method? Did you use rigorous statistical analysis of the data or just some sort of pass/fail assessment? Multivariate analysis? Did you utilize statisticians? How did you determine the scope of testing? Did you test for variable interactions?
No statistical design can compete, on a cost basis, with just testing a couple conditions here and there, and passing the system if it works for those conditions. I'm sure the temptation is great and the internal structure of the services might not be able to resist this approach without an external body making a stink.
DeleteYou're quite right and the problem with the military is that they have an apparent conflict of interest. They want funding to continue to flow so, if it was up to them, they'd only test a couple of points that they were sure could pass and they'd cheat even in the testing of those couple points!.
DeleteI say "apparent" conflict of interest because a logical person would think that the service's "interest" would be in getting the best, most fully functional weapon system they could. Sadly, and counter-intuitively, leadership has prioritized funding over performance.
"No statistical design can compete, on a cost basis, with just testing a couple conditions here and there ..."
DeleteAnd yet, when you consider the multi-hundreds of millions (or billions) of dollars spent on a typical program, the cost of some additional testing is almost free on a comparative basis.
Also, the cost of testing is dwarfed by the cost of a sunk ship due to lack of testing.
Penny wise, pound foolish!
ComNav,
ReplyDeleteYes. Naval Aviation DT testing uses DoD and SECNAV approved statistical analysis techniques. So does OT. Both entities for test have staff ops analysis folks assigned to validate all Test and Evaluation Master Plan TEMP inputs as the system winds it way through first prototype, during the test itself and after the test during report writing.
At it's most basic the NAVAIR-5.0 squadron DT tester tells the PMA/PEO that the system meets spec and that it is ready for OT. You seem to be most informed on DT. This is all underwritten by OPNAV and the NAE.... who get reports regularly from the PMA and often pressurize the situation.
OT reports on the systems operational 'ilities, testing the system in the operational and inter-operational environments, under stress, as you call it. OT is conducted without external reporting until complete, unlike all the phases of DT.... OT recommends buy/not buy. That is the big difference. Hence, all the apprehension towards OT.
b2