add statcast cliplink
It would be nice to have a link or at least the hash value where I can find the videoclip for a specific pitch. Right now I have to redo the search on the site to obtain it.
Is it available in the CSV export somewhere?
I've tried to make this. The best I could do was a pretty clunky process of scraping gameday XML data to get the play id used in baseball savant's video pages, and then joining that back to the statcast event. It went something like:
- Find a day's games from e.g. https://gd2.mlb.com/components/game/mlb/year_2020/month_07/day_25/scoreboard.xml
- Find those games' events from e.g. http://gd2.mlb.com/components/game/mlb/year_2020/month_08/day_12/gid_2020_08_12_miamlb_tormlb_1/inning/inning_all.xml (where the game ids can be found in the previous bullet's XML)
- Take the GUIDs from those games (atbat level wasn't hard to match with statcast, haven't tried pitch level before)
- Join those with Statcast (the columns
['game_date', 'home_team', 'away_team', 'at_bat_number']collectively seemed like a unique combo to match a Gameday GUID with a Statcast batted ball event) - Plug the GUID into the baseball savant url structure (e.g. https://baseballsavant.mlb.com/sporty-videos?playId=f9329f41-5c6a-431e-b001-a1a4ec7fb846, where
playIdis the GUID)
This approach was IMO way too slow to include in the default statcast scraper, but might be fine as a standalone gameday id or replay link scraper. Or maybe there's a faster way to get the GUIDs if you already know the statcast game id?
I think the best way to do this might actually be to ask Tom Tango nicely on Twitter.
They must have that GUID on the backend somewhere. If someone wants to try to convince Tango... Anyway, yeah. I don't see this exposed anywhere. From @jldbc's description, blah, someone can go implement that if they want to and I'll merge it happily, but 🤢
Well I solved it for now by making very specific search request to baseballsavant with the pitch data I am given. Looks like this.
def savant_clip(pitch):
clip_url = "https://baseballsavant.mlb.com/statcast_search?hfPT=&hfAB=&hfBBT=&hfPR=&hfZ=&stadium=&hfBBL=&hfNewZones=&hfGT=R%7C&hfC=[count]%7C&hfSea=[season]%7C&hfSit=&player_type=pitcher&hfOuts=[outs]%7C&opponent=&pitcher_throws=&batter_stands=&hfSA=&game_date_gt=[date_min]&game_date_lt=[date_max]&hfInfield=&team=&position=&hfOutfield=&hfRO=&home_road=&hfFlag=&hfPull=&pitchers_lookup%5B%5D=[pitcher_id]&metric_1=api_p_release_speed&metric_1_gt=[min_speed]&metric_1_lt=[max_speed]&hfInn=[Inning]|&min_pitches=0&min_results=0&group_by=name&sort_col=pitches&player_event_sort=h_launch_speed&sort_order=desc&min_pas=0&type=details&player_id=[pitcher_id]"
p_name = pitch['player_name'].split()
p_id = playerid_lookup(p_name[1], p_name[0])['key_mlbam'].values
pitch_map = {
"[Inning]": int(pitch['inning']),
"[pitcher_id]": p_id[0],
"[date_min]": date_min,
"[date_max]": date_max,
"[count]": str(int(pitch['balls']))+str(int(pitch['strikes'])),
"[season]": "2020",
"[outs]": int(pitch['outs_when_up']),
"[min_speed]": int(pitch["release_speed"]-1),
"[max_speed]": int(pitch["release_speed"]+1)
}
#print(pitch_map)
for k, v in pitch_map.items():
clip_url = clip_url.replace(k, str(v))
#print(clip_url)
site = requests.get(clip_url)
soup = BeautifulSoup(site.text, features="lxml")
for link in soup.find_all('a'):
#print(link.get('href'))
clip_savant = requests.get("https://baseballsavant.mlb.com"+link.get('href'))
clip_soup = BeautifulSoup(clip_savant.text, features='lxml')
video_obj = clip_soup.find("video", id="sporty")
clip_url = video_obj.find('source').get('src')
return clip_url
Its ofc not fully finished and one could include more parameters to reduce the chance of having more than one result. Haven't checked performance, but should be ok. Plus this could be executed in parallel to the original search request.
@schorrm I'm friendly with Tango on twitter and can ask him for help here if still needed, although it looks like @Maradonna90 may have solved this?
Let me know if I should ping him.
@kmedved I find @Maradonna90's approach here very interesting, I am inclined to leave it open at least for a bit (especially since I never really thought we needed the feature until he asked for it :) )
I know this issue is over a year old, but I figured I'd chime in and share what I've found.
I built a wrapper to use MLB Video Room's GraphQL API - querying for video feeds based on a few key pitch identifiers (similar to what @Maradonna90 shared above). Was able to get multiple video feeds & resolutions per pitch, which was cool.
I realized there is a much simpler way to do this:
- Use the normal Statcast CSV data, find a pitch you want to locate a clip for.
- Pass the appropriate game_pk to this endpoint: https://baseballsavant.mlb.com/gf?game_pk={GAME_PK}
- In "team_home" and "team_away" - statcast data is listed for each pitch, and the "play_id" is included
- Use this URL to get to the video page https://baseballsavant.mlb.com/sporty-videos?playId={PLAY_ID}