Cubs 2007 Pitch Tracking: Pictures Worth a Thousand Curves

One of the latest and most exciting developments in baseball research is the measurement and analysis of individual
pitches. For instance, the Pitch f/x system created by the
company Sportvision
tracks the in-flight movement of pitches from two different cameras,
thereby assessing a pitch's velocity, horizontal and vertical
movement. A bit less than 1/4th of all pitches from last year were so
assessed, and MLB has made the raw contents of that data available at this location. Better yet, there are several bloggers who, unlike me, have the
talent and dedication to transform that heaping mess of data into
meaningful findings. Most notable, Josh Kalk
has been developing player cards,
a la what's available at baseball-reference or fan graphs or baseball
cube, except with graphs incorporating this incredible new source of
information on pitch selection and pitch behavior. He also has
developed a remarkable application where you can select any
player and any pitch with just about any limiting parameter you could
want - say, Bob Howry fastballs to right-handed hitters on 0-2 counts with a velocity above 93 MPH that resulted in swinging strikes - and then view the results on a handy X/Y graph.

As if that's not enough, there's the more user friendly if less revolutionary pitch data commercially available at Baseball Info Solutions which is being applied by the talented folks at Fan Graphs.
Fan Graphs now offers data on individual players' pitch selections and
velocity, all thoroughly sortable. For instance, Tim Wakefield
and Chad Bradford feature the two slowest average fastballs in the
major at 74.2 and 78.6 MPH, respectively, while no one threw a changeup
with greater frequency last year than Matt Wise, at 54%

There's a gold mine of potential information available at our
fingertips, with The Baseball Analysts and The Hardball Times leading
the way in this sort of analysis. With far less sophistication than
what those guys can offer, let's see what it can tell us about the
Cubs' staff.

First,
the most basic stuff, drawing from the Fan Graphs info: who has bragging rights for biggest
fastball on the Cubs staff? Who wants to conceal the smallness of
their fastball in shame? From 2007, Fan Graphs says it's Marmol,
at an average of 93.3 MPH, followed by Wood at 92.9. Zambrano
doesn't even medel, coming in at fifth. The full results being:

Marmol 93.3
Wood 92.9
Howry 92.3
Dempster 92
Zambrano 91.6
Hart 91.5
Eyre 91
Wuertz 90.5
Marquis 90.4
Hill 89.4
Lieber 88.5
Lilly 88.4
Marshall 86.8

You could win a lot of bar bets on the question of who throws the faster average fastball, between Dempster and Z.

 

 

 

 

 

 

 

Ok, who throws the fastball with greatest frequency? This is
interesting in that Howry throws his fastball a whopping 18 % more than
the second most frequent gas-passer, Zambrano. Howry comes in at
86.2 %, to Zambrano's 68.2.

Five pitchers threw the fastball less than half the time last year
- Marmol, Dempster, Marshall, Wuertz and Hart, with Hart at just
31.3% Wuertz and Dempster both throw sliders slightly more than
50% of the time, with Hart and Dempster throwing them a third of the
time. To no surpise, Hill throws more curves than anyone, at 27.3
%, with Marshall, Lilly and Hart then following, all in the mid to
upper teens. No one else breaks ten percent. Dempster is
the only change-up artist of the group, at 20% frequency, with Lilly,
Marquis and Marshall just cracking ten percent.

That's all fun and good, but more interesting are observations like the one made at The Hardball Times, which points out that
Marmol's use of his slider jumped from 7.1% of all his pitches in 2006,
to 51% in 2007. Looking at it further, Fan Graphs' data indicate that Marmol did that at the
expense of a change up, which he stopped throwing entirely after using
it 11.6% of the time in 2006, and a curveball percentage that fell of
the table, if you will, from 19.7 % to 1.3%. It's also
interesting to see that he gained about a mile and a half per hour
velocity on both the fastball and slider, compared to the prior
year. How far does this change in approach go towards explaining
his breakout season? How badly does that throw off the legitimacy
of the very pessimistic projections listed at Fan Graphs, all of which
see last year as an aberration, and have Marmol falling back to earth in 2008?

The change in Marmol's numbers is by far the most pronounced, but
there are some other interesting year-to-year differences in pitch
compositions.

 

  • Marshall had the next most dramatic change, as he threw his
    slider 20.1% of the time in 2007 as opposed to just 2.4% of the time in
    2006, while his percentage of change ups and fastballs dropped by about
    10 and 9 percent respectively.
  • Wood almost completely abandoned his changeup last year,
    after throwing it just under ten percent of the time in 2006.
  • Zambrano's velocity on the fastball has dropped from 92.8 to 92.2
    to 91.6 over the three years of available data, while the average
    speeds on the curveball and changeup have both increased by one mile
    per hour, up to 72.6 and 83.5, respectively. That's 2.2 MPH less
    difference between the fastball and changeup than where he was at in
    2005. (In one early demonstration of the analytical opportunities that these new data offer, Lookout Landing
    has just posted a list of the ten pitchers who gained and lost the most
    velocity over that time span. At least Z isn't in Jason Jennings
    territory. Yet.)
  • Hill threw his fastball 10% less frequently than the previous year, with all his secondary pitches showing slight upward ticks.
  • Marquis almost doubled his use of the slider, from 8.8 to 16.4%.
  • Hart is the anti-Howry, throwing 31.3% fastballs, 33.7 %
    sliders, 15.7% curveballs, 17.5 percent cutters, and a handfull of
    changeups.

Fun fun fun. But then, there's the Pitch f/x device, which is
even more hours of wasted time. It looks like the application
still has some serious bugs in it (as does my ability to use it
correctly, no doubt) and user-friendliness issues (again, not unlike yours truly), but take a look at this image as an example

These
are the data on 254 sliders tracked by Pitch f/x that Marmol threw to
right handed batters last year. A few things to note: the
service is not yet available at all parks at all times, so it's not a
complete sample. You're looking at this chart as if from behind
home plate, with a right handed batter standing to the left. And
again, there are several hiccups in the system, the most problematic
one being that when I enter "lefty" for batter it spits out the
graph and charts for "righty" and visa versa.

Let's clean that graph up a bit, and look only at Marmol's called strikes on sliders to righties

 

 

Not much of a pattern here, he's getting called
strikes all over the zone. But then, take a look at their data on
the swinging strikes off the slider thrown to right-handed hitters

 

 

 

Intuitively,
it's exactly what you'd expect - low and away out of the zone.
But visually, I still find this a very striking demonstration.

 

As you might have noticed from some of my game recaps, I
became sort of transfixed last season with the hypnotic quality of
Howry's relief appearances, just drilling one 94 mph fastball after
another on the low outside corner to right-handed hitters, until their
eventual demise.

Here's the graph of 275 Howry fastballs to
right-handed hitters. Again, none of the images below are
complete data sets

It
seems to show a tendency towards the outer half, but let's clear that
up and show just the 28 available swinging strikes on fastballs to
righties

Not quite what I'd anticipated, but not surprising
either: lots of swings and misses at high fastballs. But
what about my low and outside fastballs? Let's try 44
called-strike fastballs to righties.

 

There
we go. It's still not as vivid at what I thought I was seeing in
person, or as that Marmol graph, but there we have a bunch of outside
fastballs to righties. I always like it when my subjective
viewing of the game matches up with the data.

 

That's still relatively simple stuff, but the possibilities of this
are just mind-numbing, and are keeping me up past my bedtime. Let's flip this, and look at it from the hitter's perspective; Here are right-handed pitchers throwing sliders to Alfonso Soriano

 

Cleaning that up, the 12 base hits Pitch f/x has for Soriano on sliders from righties

 

 

That's some good bad-ball hitting, there.
Notice, too, that the two home runs are on the sliders that likely were
the biggest mistakes - the one furthest inside and the one highest in
the zone. Of course, there's also this chart of Soriano's
swinging-strikes on sliders from righties.

 

 

Also about what you would expect. But for me, the most interesting
thing I've found, and that I'm almost capable of understanding and
applying, deals with pitch movement. For instance, who has bigger
curves, Hill or Lilly? You can go to the application (or just look at
Hill's player card, which I'll do in a bit) and set it to show you the
results of all Hill curveballs by "break" instead of by "location."
You're then informed by a chart that Hill's curve averaged 73.85 mph,
with a horizontal break of -6.84 inches (the negative meaning it is
breaking towards right-handed hitters, or away from lefties) and a
vertical break of -8.34 inches. One of the more difficult things to
grasp (well, at least for me) is that these numbers are standardized
against a theoretical pitch thrown without spin under idealized
conditions. So that means that Hill's curve is breaking downward by an
additional 8.34 inches than what you'd get with a baseball thrown under
these theoretical conditions. A pitch like Zambrano's fastball, which
has a positive value for its vertical break, is not actually rising,
it's just sinking less than what the normalized pitch would.

If I've lost you, (I may have lost myself), let's go to the image showing the movement on Hill's curve.

 

Again, don't confuse this graph with a depiction of the strike zone.
0-0 coordinates are where that theoretical pitch would travel. You can
see that Hill's curve has a sharp down and in break. Now, how does it
compare to Lilly's curve?

 

Lilly's chart, also available on his player card, tells us that he
threw his curve at an average of 71.57 mph, with a horizontal break of
-3.03 inches, or 3.03 inches in to a right-handed batter, and a
vertical break of -7.88 inches. Sorry, Ted Lilly Fan Club, but it looks
like the answer is that Hill has the bigger curve. Note the cluster of
pitches near the center of the chart - they most likely aren't
curveballs, but some other pitch that John Kalk's system still
struggles with giving a proper classification. Either that, or Lilly
throws more hangers. Either way, the result is a curveball with
significantly less horizontal movement, and a bit less vertical
movement.

 

If you go to the player cards sections, you get all of this and more,
but let me take one more graph that you can either create on your own
through pitch f/x or see pre-made on the player cards. The first one
from Rich Hill, showing the relative momement of ll his pitches

 

And the same for Ted Lilly

 

 

I particularly like how it shows Lilly and Hill's changeups and
fastballs share similar movement, in terms of vertical and horizontal
movement.

 

Ok, I can't get enough, so one more for your viewing consideration: Is Carlos Zambrano tipping his pitches? You tell me....

 

I wonder how the hell he threw that one sinker that's off by itself,
it looks like he must have shot-putted it outward from between his
eyeballs. Most likely, it's just a reminder that there are still some
serious bugs in the data. But there does seem to be a slight but
noticable difference from
where he releases the sinker compared to the slider.

This piece started out as a
"hey, you guys have GOT to check out the cool things I just found!"
piece, and evolved from there. I'd like to hear what sort of studies
you can dream up, if you have any requests or thoughts about directions
I could take this in future articles. Things I'm missing? Burning
issues to address? Fun comparisons to make? Particular players
you'd like to see highlighted? How much confidence do you give their
data when compared to your own real-world observations? Part of
my interest in this stuff
relates to my broader real-world academic interests: we have a
new series of related technologies being invented, and do not yet quite
know exactly how they will be put to use. What future do you see
for this?

 

Return to Homepage

Comments

Looks like Mcfail will get a ss.

Gotta run, will be back in mid-afternoon to check out the abuse, or the Uribe updates. Take care!

Didn't Carlos try some kind of Eephus pitch once last year. Maybe that's what the outlier is.

His velocity was down out of Mesa last year for well over a month, if I recall correctly. That likely explains it.

Excellent stuff, Trans!

BTW, the release point stuff isn't that big of a breaking news story. Z's got to get up on top of that sinker to get it to drop. There are always going to be minimal differences in the release point of the pitches you throw. However, 6 or 9 inches or a foot one way or the other isn't necessarily that noticeable at that kind of arm speed.

That's actually a really cool analysis--thanks for the heads-up on the websites.

Trans - really cool - you have my utmost respect. I am not a stathead, sabre guy or Bill James disciple but this is really interesting. What you put together is a tip of the iceberg there are tons of applications. I am really curious at the release point data you mentioned i never really noticed Z tiping his pitches but the two guys that I thought do were Rich Hill and the 06 version of Sean Marshall. This is really great stuff.

Anyone else see the numbers in those dot charts? We think we saw "30" . . . Last time we saw something that confusing, we were on a leather couch and talking about our mothers. . . etc, etc.

Don't apologize about Little Richie's "bigger" curve - it's not that surprising. What really matters is how a pitcher throws it and we still think TL does that better than anyone. Best example is a guy like Fransworth or Juan Cruz, freakish velocity or movement but ineffective on the mound.

Rich Hill has come a long way since his first few trips to the Show, but he has a long way to go before he's as good of a pitcher as TL.

Hmm...

Hill, 2007: 3.92 ERA, 1.20 WHIP 119 ERA+, 183K, 63 BB
Lilly, 2007: 3.83 ERA, 1.14 WHIP, 122 ERA+, 174 K, 55 BB

Long way to go?

Seems to be like they are almost identical, with Lilly having a very slight advantage.

You're forgetting the stat BDAS:

Hill 2007: .34
Lilly 2007: 1.10

Clearly, Lilly was superior. While Rich Hill only scored a 34%, Ted Lilly was a Bad Ass 110% of the time.

Advantage, TLFC.

That's all. Good stuff.

I'll be interested to see if Dempster's average fastball velocity decreases as a starting pitcher. I'm almost certain it will.

Any differences the graphs can illustrate about Marquis' first half vs his second? Or his better games vs his worse?

Thanks again T!

I to recall that there seemed to be a stretch where Rich Hill seemed to be tipping his pitches. He went from getting Tons of swings and misses early. Then he had a strech where even outs were frozen ropes right at people. With no stat to back up any of this off the top of my head. But Rich seems to have the type of repetoire that would be especially prone to pitch tipping.(ie major speed changes in his different pitches. The fact that he only really throws 3 pitches and so often only feels comfortable with 2 of those 3)

Terrific piece, Trans. I'm off too...to make some bar bets.

And btw very nice writeup trans. You should probably be a Doctor or something.

TRANS: REALLY excellent piece. And like Stevens, I too would be interested in the first-half/second-half 2008 differentials for Jason Marquis.

Ok I'm back, (but at office grading exams, boo).  At the moment, I don't see any option either at fan graphs or at the pitch f/x site that allows us to break velocity or pitch selection or movement down into half-seasons, but that's a brilliant suggestio.   My spring break starts in a couple days, and I'll try to do some more digging to address the questions on Marquis, Hill, Dempster, and the rest, for a follow-up article.  Thanks for the positive replies, I've been kind of hesitant about wading in to this. 

Sean Gallagher and Jose Ascanio to the Des Moines roster

Trans-

I don't see why you can't be done grading those exams in about 5 minutes and getting on to serious TCR business. Just find the nearest stairwell and fling the exams. Those that fly farthest get best grades.

Where's the hitch in that plan?

Because it still takes hours to collect the tossed exams, and then record the data, duh!

 

(And that, right there, is why I use a pseudonym....) 

(As a fellow teacher) Hey, Trans -- SHUT UP!! Damn, that's how secrets get out! :)

"That's some good bad-ball hitting, there. Notice, too, that the two home runs are on the sliders that likely were the biggest mistakes - the one furthest inside and the one highest in the zone."

another reason i dont buy the whole "soriano has to hit 1st to be effective" thing...

the guy sees and swings...what's probably more important in his case is having someone BEHIND him who would allow him to see a possible better mix of "hittable" pitches.

Fantastic, Trans. I am pretty curious about those outlier "curveballs" on both Lilly and Hill's plots. They may be misclassified, as you say, but they're still some pitch or other that break a certain amount over and down, and they're a small cluster out there on their own. I wonder if those are mistake pitches--hangers or some other bad release.

Could those handful of Rich Hill's pitches that don't break down at all but are at nearly -15 on the x axis actually be real? Is it possible to get a pitch to break only sideways?

Also, what does it tell us that the clusters of Lilly's pitch movement data points are a lot tighter than Hill's (for a given pitch type)? Is Lilly more predictable, or does he just have better control?

I wonder about the sample size for those pitch velocity calculations. Cubs starters made no more than 34 starts each last year. Since less than 25% of pitches are assessed, isn't it possible that some Cubs starters only had 5 or 6 outings in pitch f/x-equipped ballparks?

Several very good things to point out in there, thank you.  One thing that pitch f/x has trouble with is intentional walks.  It could be that what we are seeing there are intentional walks to left-handed batters.  (But how many lefties would Rich Hill be intentionally walking?  ehh....)    Pitch-outs are also a possibility.  More likely, in my best guess, is that it's some different variant of his curve, or just a bad pitch

 

And another good question about the tighter clusters.  My subjective view is that lilly has better control and more consistent results with the curve.  Stat-wise, Hill had 2.91 BB/9 last year, Lilly was 2.39.  

 

And yes, it's entirely posisble that by random variation of who is starting when and what pitches they were favoring that day, that you could get some weird sampling effects.  That said, doing some digging......  I come up with 1,623 pitches charted by pitch f/x out of the 3,070 that Hill threw last year, and 1,595 out of the 3,240 that Lilly threw.  So there IS a difference in charting, but I can't imagine that it could be a statistically significant difference, when we're talking about a few dozen pitches difference out of thousands. 

Excellent, thanks Trans. That's a lot of pitches charted out of the total, yes, and unlikely to be confounding. I didn't expect that there were that many.

Interesting point about IBBs and pitchouts. It makes me wonder how the system figures out movement to the left and to the right. Does it calculate the "expected", movement-free trajectory (that the standard pitch would be expected to follow) based on the pitch's initial trajectory? Surely a given release point can't be assigned a set expected target point, say, at a perfect right angle to a line drawn between 1st and 3rd bases--how could anyone know where the pitcher means to aim? That said, I'd expect that a pitchout or IBB would be assigned very little movement in the x or y directions, as opposed to turning up as a "flat" pitch with lots of sideways movement.

Now I'm just being picky.

Really, the system appears to be amazingly precise and reliable, even if there are a handful of pitches that don't fall in any of the clusters.

What I want to see tracked are those Ankiel pitches in the 2000 NLCS vs. the Mets. Talk about outliers!

Yeah, you're now officially at a level of technicality that I can't
answer, based upon my pretty close but certainly not exhaustive reading
of some of the explanations and FAQs.  I can only wish this was
around for Ankiel...  or maybe Randy Johnson's fastball behind
John Kruk in the AS game...

 

Another one that would
be fun would be to track Eephus pitches.  Rip Sewell, Orlando
Hernandez, Casey Fossum all come to mind, of course Sewell being from
several decades ago. 

wow, I had to look up the Eephus pitch. Hadn't heard of it. wild.

o.hernandez might wanna dust off that eephus if his velocity remains at 80-85mph

You might want to check out my blog, Cubs f/x
http://cubsfx.blogspot.com

Back in August, I started it to cover the Cubs and PITCHf/x.

During the season you'll get pitching previews, hitter profiles, umps and hopefully catchers.
Before opening day, each Cub pitcher will be profiled in detail, hitters a little later.

If you head over you'll find nearly 100 posts on the topic, amongst others, and you'll get an idea of what kind of stuff you'll see during the season.

Thanks
Harry

Regarding half seasons - it is tough to do last year, for most guys, since the system was (and is) in the process of being turned on, and tuned. Release point pick-up went form 40 to 55 to 50, and you can see park to park differences (e.g. Fenway is screwed up). Cubs have good data from July onward, too late for a good look at Marquis.

Trans, you can go very very far into this worm hole, beware :-)

Did you know that Corey Patterson swings at more high fastballs than anyone in baseball?
http://cubsfx.blogspot.com/2008/02/high-heat.html

Harry - my sincere apologies for being unaware of your fine work.  I will be sure to reference it in any future f/x related posts that I do here.  Neat stuff!

Did you know that Corey Patterson swings at more high fastballs than anyone in baseball?

I don't need a fancy-pants cyber-SABR-Moneyball-geek-in-my-Mom's-basement graph to tell me that.

"...geek-in-my-Mom's-basement"?

my?

The geek is calculating from INSIDE THE HOUSE?!

But seriously, that Corey thing is fascinating.

If you only knew what my Mom's basement was like (Ma's a pack-rat in her 70s) you'd know it would be impossible to do anything down there, let alone blog.

X
  • Sign in with Twitter