RoSIN: The RetroSheet Intraplay Notation

RoSIN is a formal grammar for describing baseball plays. It is based on a language that has been used to describe baseball play-by-play for over 10 years. That language -- used by RetroSheet, the baseball stats archive found at -- is a somewhat more expanded syntax, and has been used to account for every Major League Baseball game from 1908 to 1992.

The original RetroSheet syntax is part of a larger data format called the RetroSheet "Event File." These Event Files contain several components, including:

  • Game Metadata (start time, teams involved, weather, umpires)
  • Starting Lineups
  • Player Substitutions
  • Play Descriptions

Within any given Play Description, the RetroSheet Event File includes a RoSIN-esque string, but also includes an inning value, out-count, and Player ID. For example:


The RoSIN-esque string is the final field in the line, and has been highlighted in bold. RoSIN itself has been created to be a simplified (but fully functional) subset of all RetroSheet Event File play strings.

The Advantage of Validation

By zeroing in on a select subset of all available characters in RetroSheet's play descriptions, RoSIN enables software parsers to unambiguously validate play-by-play reporting.

Such validation should be able to help ensure the accuracy of data entry tools, and also simplify the construction of statistical presentation tools. Catching errors within play descriptions should also help prevent data integrity issues from populating databases.

Given that one of the design objectives of RoSIN is to be able to represent any play found in any RetroSheet Event File, every RetroSheet play should be able to be normalized into RoSIN.

RoSIN Documentation and Tools

RoSIN Downloads

RoSIN In Context

An agreed-upon RoSIN specification can help software developers work with baseball content in many ways. This section focuses on the use of RoSIN along with two other sports data standards:

SportsML is a robust, readily extensible standard for exchanging information about all sorts of sporting events. The schema is architected such that data properties that are common across most or many sports are included in the "SportsML Core," whereas sport-specific items reside within plugin schemas. SportsML documents can house lineups, injury reports, box scores, batter-by-batter coverage, cumulative season stats, contextual stats, and much more.

Within SportsML's <event-actions> section, there are constructs to describe at-bats and player substitutions. The <action-baseball-play> and <action-baseball-score> elements contain attributes for many high-level datapoints of the play, but do not hold attributes that can trace the path of the ball amongs defensive players, like RoSIN does. These SportsML elements do contain an attribute for "scorekeeper notation" which should house the full RoSIN string.

SportsML can also house details of every pitch, much like the RetroSheet Event Files do. Current SportsML pitch descriptions include pitch-type and ball-location. Other attributes could be added in the future, including pitch-velocity and vector data.

SportsDB is a relational database schema whose design objectives are to:

  • Model "Sports Reality" as effectively as possible
  • Model SportsML standard as sensibly as possible
  • Be capable of supporting queries for the most intense of sports data applications
  • Enable simplified coding techniques, where possible

Not every SportsML attribute and RoSIN substring will or should be exerpted into its own SportsDB database field. Only those that SportsDB users would clearly want to independently index and query need to be parsed into a unique field.

Pointers to the original, full SportsML file, and a field for the full RoSIN string (as well as a YAML version of the RoSIN string) can be stored in SportsDB, to enable further querying if necessary.

Other References


  • Ted Turocy has performed the initial technical analysis and development of the RoSIN grammar.
  • Project sponsored and initiated by XML Team Solutions.