Using NYPD Data and Watson Analytics to Better Understand Vehicle Accidents

Blog Home > Using NYPD Data and Watson Analytics to Better Understand Vehicle Accidents

Using NYPD Data and Watson Analytics to Better Understand Vehicle Accidents

What can data about vehicle accidents tell us? I recently was able to take data from the New York Police Department that spanned 2013-2015, refine it and upload it into Watson Analytics to see what I could learn about accidents and the injuries and deaths they cause. I was especially interested in discerning the latent patterns.

Before we get started
The real-world data set I used is courtesy of NYPD Motor Vehicle Collisions open data and based on motor vehicle accidents in the New York area. This data set was later merged with hourly weather data based on latitude and longitude and time of accident. The data was shaped and customized by removing all rows where “Borough” was blank and “Contributing Factor (for Vehicle 1)” was either blank or unspecified, because my plan was to analyze only those accidents that had information about the borough and the contributing factor for the primary vehicle. This calculation was also created: # Persons killed or injured = Persons Killed + Persons Injured

It begins with a simple question
The question I asked Watson Analytics was “I want to understand accidents by borough.”
1 NLP Query

As you can see here, Manhattan was the most accident prone area for the three-year period I was analyzing, while Staten Island the least.
2 Acc by Borough

Looking at those accidents over the years, I immediately realize that not only has there been a steady increase in accidents for all boroughs, but also that the increase is pretty steep for Brooklyn and Queens (compared to the others).  When I replaced year with quarters, we see the same pattern of gradual increase in accidents as the quarters progressed. Essentially, accidents are increasing by the day!
3 Acc by Borough by Year

More accidents do not necessarily mean more injuries or deaths
But not every accident results in an injury or a fatality! It’s really interesting when I made that distinction. So, although Manhattan had the most number of accidents, Brooklyn and Queens had more injuries and deaths due to them.
4 Boroughs by Accidents VS Inj&Deaths

Further, it’s clear that a passenger vehicle resulted in a majority of those accidents that led to injuries or deaths.
5 Inj&Deaths across Vehicle Types & Years

If I want to bolster the value of my analysis, the transactional data can be augmented with relevant contextual data. In this case, that could be the make and model of the specific vehicle used, highway information, demographic details of the driver and other related data.

Digging deeper for validation and new factors not easily seen
Although a few important characteristics were easy to spot, the questions remained about what drove injuries or deaths during an accident. So, I turned to another aspect of Watson Analytics, the simplified predictive model that can be used to:

  1. Validate existing understanding, knowledge or gut feel
  2. Discover novel factors that are not obvious when you first examine data.

Contributing factors (for the primary vehicle involved in the accident) seems to be the most important driver of injuries or deaths in an accident and its combination with Boroughs seemed to have the most significant bearing. I can not only see all the factor combinations ranked in order of their bearing to the target but I can also drill down into each one of them to learn more. Interestingly, weather plays a pivotal role too.
6 Pred_ Drivers of Injuries & Deaths from Accidents

Delving deeper and looking at the top 5 Contributing Factors across the three Boroughs (with the highest injuries and deaths from accidents), it’s evident that Driver inattention/distraction tops the list followed by Failure to Yield Right-of-Way. This is ironic given that either can be avoided and yet it ended up taking so many lives from accidents in New York boroughs in the last three years! But this is good insight because it makes it possible to fine-tune existing policies and focus areas with more education on the topic along with higher penalties for distracted or inattentive driving.
7 Disc_ Contri Factors & Boroughs

An analyst, an office manager or any other regular user with the dataset can arrive at these insights within moments. No pre-requisite understanding of statistics required.

How’s the weather there?
Based on what I learnt from the predictive insight, I decided to learn more about the 66% impact that the combination of Wind Direction and Weather Conditions had on injuries/deaths resulting from an accident. Although, it might be common sense that higher the wind speed, the higher the chances of accidents (and hence more chances of injuries or deaths), it is nice to have statistical proof that specific wind direction could a bearing on high injury or deaths during an accident. The NYPD can use these insights to monitor and control such situations with alerts and real-time dashboards.
8 Disc_ Wind Directions & Conditions

Displaying what I’ve learned
With Watson Analytics, I brought my insights together in easy-to-build, interactive displays. The daily view of both accidents and injuries or deaths resulting can be filtered to show any month we want. We can further break down Injuries and deaths into cyclists, pedestrians and motorists to understand them individually by region, time and other factors, to yield actionable insights. For example, my displays show that pedestrians in all boroughs run a higher risk of an accident related injury than cyclists.

9a Display_ Ov tab

Try Watson Analytics
If you haven’t used Watson Analytics yet, today’s a great day to try it. Visit for more details.