Leupold BX-4 Rangefinding Binoculars

Statistics for differentiating between groups?

yakimanoob

Active member
Joined
Oct 10, 2021
Messages
254
Hey folks,

This is a very nerdy stats question. Apologies.

I'm a research and data professional at my day job, and I've had a passion project in mind for awhile to write up an article on using statistical inference for choosing ammo (this would apply to both choosing between factory ammo options or choosing between handloads). But my research has hit a bit of a brick wall.

Can anyone point me in the right direction for statistically testing whether two groups are actually different or if they're just random variations? In other words, say I shoot a 5 shot group with load A that measures 1" and a 5 shot group with load B that measures 3/4". We all know (whether we admit it to ourselves or not) that sometimes the same ammo will shoot 3/4" sometimes and 1" sometimes, so I'm interested in a statistical test to tell me the probability that the two groups are actually different. In a single dimension, this would normally be done with an F-test, but I can't seem to find any examples of a 2-dimensional version of that test.

I went down the rabbit hole of using mean radius to measure groups, and while there's a lot of intuitive value there, to the best of my knowledge there is not an established way to construct confidence intervals or hypothesis tests from the radius metric, since it follows a Hoyt distribution instead of a normal/gaussian distribution.

Thanks!
 
I would start with shooting more rounds per group. 5 shots per group doesn't tell you much. Some people suggest 20-30
 
I am not a statistician - but maybe principal components analysis? I don't think ten shots would give you any kind of decent analysis, but I'm curious what you come up with
 
Shoot a 20 shot group, and measure . Over on RS there is a long thread about this which will give you a headache, the gist is shoot 20 shots and you’ll know. And Hornady has a YouTube video about it as well.
 
Depending on the specific statistical question you are asking the right test can vary, but p-test and t-test would be a good starting point for evaluation. FWIW informally, anything less than 10 will be a total crap shoot.
 
Statistics guy here. You definitely need more than 5 shots. I would suggest 20 as an absolute minimum. There are a lot of very complicated ways you can analyze this. But I would do as VikingsGuy suggested and just do a standard t-test.

I did something similar many years ago in testing the strengths of tippet material for flyfishing. I tested all the brands I had access to at the time for their breaking strength. There were some major differences but mostly you got what you paid for, with a few exceptions.
 
5 shots can tell you plenty, sometimes. If one load shoots 5 shots into the same hole and the other load shoots a 6 MOA group, every reasonable shooter would (and should) conclude that the first load is better.

Conversely, no reasonable shooter would (or should) assume that a 3/4" 5 shot group is actually better than a 13/16" 5 shot group.

So there's obviously an inflection point somewhere. This is why we have the field of inferential statistics, and why I'm asking the question.
 
Am I missing something? Both the Wilcoxon Rank Sum and the T test measure the difference in means, which for group size is not relevant (one can always adjust their scope, after all). The F-test, and hopefully the multivariate version I'm hoping for, measure the difference in variance, which is the relevant parameter of group size.
 
Am I missing something? Both the Wilcoxon Rank Sum and the T test measure the difference in means, which for group size is not relevant (one can always adjust their scope, after all). The F-test, and hopefully the multivariate version I'm hoping for, measure the difference in variance, which is the relevant parameter of group size.
If I were getting fancy and I trusted my zero I would measure absolute distance of each point from the point of aim and average those across my sample size. If I was so so on my zero I would take the generalized center point of the grouping as my Poa and do the same math. I would get to 10 total by doing two five shot groups per test ammo. I would compare average distance from Poa between ammo types with t-test to determine whether there is sufficient difference to prove they were distinct groups and not just random subsets of the same. My guess is only 10 sample points would require greater than 65% difference between the two numbers to show any statistically powerful distinction.
 
5 shots can tell you plenty, sometimes. If one load shoots 5 shots into the same hole and the other load shoots a 6 MOA group, every reasonable shooter would (and should) conclude that the first load is better.

Conversely, no reasonable shooter would (or should) assume that a 3/4" 5 shot group is actually better than a 13/16" 5 shot group.

So there's obviously an inflection point somewhere. This is why we have the field of inferential statistics, and why I'm asking the question.
Statistically 5 shots isn't enough. What if those 5 shots all went in the hole just on sheer luck. It's an extreme example but possible with only 5 shots. Think of it like flipping a coin. If you have a quarter and a penny and the quarter goes heads 5 times in a row and the penny goes heads 2 times out of 5 does that mean the quarter is better for your next flip if you need a heads?
 
Had a lot of statistics in college and a bit in grad school. Loved most of it and used quite a bit during my decades interacting with people that were not very good at even basic math. I made a bit of money using statistics though sometimes being right does not mean you win the day as the herd can stampede over you in their blissful ignorance.

Remember the 1/3 Pound Burger from Burger King? Was in response to the success of McD's Quarter Pounder. BK was, "Let's give the masses more meat for the same price as McD's!" Well, focus groups routinely "knew" McD's had the bigger burger offering because 4 is greater than 3 thus 1/4 > 1/3.

Back on topic. Unless you are in shooting contests where fractions of an inch result in winning then if you can routinely hit a pie plate at 100 yards you will kill a lot of big game. Hit that pie plate at 200 yards and you will do quite good even on days is quite cold or quite warm or a bit of breeze is swaying tree limbs. Beyond 200 yards, well, I piled up several dozen big game from coast to coast and in woods and prairies and in canyon country and maybe had to let 5 critters walk because they were a tad beyond 250 yards and I could not get closer due to terrain or fading daylight or the critter was getting the heck out of Dodge City.

I think the time and money dialing in a rifle and load beyond "pie plate" accuracy is more of a mental exercise than investing that same time and money learning woodsman skills and how your target species behaves under various circumstance related to time of year, hunting pressure, fire and smoke, drought and monsoon, etc. Of course, I am old, mostly sit on a couch these days and am avoiding most hunts that need a complex rifle shot that dopes the wind that might be in two of three vectors over longer distances, terrain angle, temperature, humidity, rotation of the Earth, etc.

I used to shoot 5 round groupings at 200 yards. I shot factory loads. I would focus on the tightest 3 of the 5. I presumed human error was involved (likely me but maybe the factory got a bit sloppy) in the two that were not as good. So, would toss out those two. If was hitting the "pie plate" then when was hunting I generally used a good rest or a Trigger Stick. I rarely shot at anything with less vitals than a whitetail deer.

If you reach some conclusions on more rounds in a grouping and how to interpret then please share. I would find that interesting.
 
You would need to shoot ~10-20 five-shot groups with each load, then use a two-sample T-test (or ANOVA if you test more than two loads).

The observations to test for each load would be the group sizes.

For example:

Load 1:
.75”
.82”
.33”
Etc

Load 2:
1.4”
.66”
.91”
Etc
 
If I were getting fancy and I trusted my zero I would measure absolute distance of each point from the point of aim and average those across my sample size. If I was so so on my zero I would take the generalized center point of the grouping as my Poa and do the same math. I would get to 10 total by doing two five shot groups per test ammo. I would compare average distance from Poa between ammo types with t-test to determine whether there is sufficient difference to prove they were distinct groups and not just random subsets of the same. My guess is only 10 sample points would require greater than 65% difference between the two numbers to show any statistically powerful distinction.
The issue here is that the distance from center (I.e., the radius) does not follow a normal distribution and thus the t test is invalid. The distance from center follows something called a Nakagami-q aka Hoyt distribution, and it's unclear to me if there are established hypothesis tests for such a distribution.
 
You would need to shoot ~10-20 five-shot groups with each load, then use a two-sample T-test (or ANOVA if you test more than two loads).

The observations to test for each load would be the group sizes.

For example:

Load 1:
.75”
.82”
.33”
Etc

Load 2:
1.4”
.66”
.91”
Etc
I'm not sure about this. For a set of groups, does the diameter (aka group size) follow a normal distribution? I have a hard time buying that, since the groups (viewed as a set of xy coordinates) are samples from a bivariate normal distribution. The probability is highest in the middle, so I would expect the mode diameter to be smaller than the median - a violation of normality.

Non-normality invalidates the T test.
 
I'm not sure about this. For a set of groups, does the diameter (aka group size) follow a normal distribution? I have a hard time buying that, since the groups (viewed as a set of xy coordinates) are samples from a bivariate normal distribution. The probability is highest in the middle, so I would expect the mode diameter to be smaller than the median - a violation of normality.

Non-normality invalidates the T test.
I know, but I would expect a group size to vary in a way that’s fairly close to normal. At least no reason to expect a left/right skew.

The reality is, you’re picking the tiniest of nits here. If the response variable is group size, I think the method I described would get you to an answer. Can always revert to nonparametric if you’re that worried about normality.

You seem trained to enough on the subject to answer this without our help.
 
I know, but I would expect a group size to vary in a way that’s fairly close to normal. At least no reason to expect a left/right skew.

The reality is, you’re picking the tiniest of nits here. If the response variable is group size, I think the method I described would get you to an answer. Can always revert to nonparametric if you’re that worried about normality.

You seem trained to enough on the subject to answer this without our help.
No nitpicking intended. Just trying to work out a valid test between groups.
 
Back
Top