This week, I focused on cleaning the dataset and beginning exploratory data analysis. After dropping or imputing missing values and normalizing the text entries, I moved into visualizations. It was enlightening to see trends appear—certain states had significantly more incidents, and specific demographic groups were overrepresented. This helped guide which features to include in the model and gave me a better grasp of the problem space. With clean data and exploratory insights, I began model building. I used logistic regression and decision trees to predict the likelihood of a fatal police shooting based on features like age, race, armed status, and location. It took some iteration to tune the models, but seeing prediction accuracy improve over time helped solidify my understanding of classification techniques and their evaluation metrics like precision and recall.
week-7
Today, we have submitted project and had a discussion about dataset-2
I started looking at the Washington police shootings dataset today. One of the first challenges I encountered was data inconsistency, especially with missing values and non-standardized entries in critical columns like race, cause, and weapon. Some entries were ambiguous or labeled “unknown,” which posed difficulties for meaningful analysis. I spent time understanding the context and potential biases within the dataset, which is crucial for ensuring our model’s fairness and accuracy.
week-6
today 4th march , we were working on our project-1, i was working on the model building . we were working on age disparities.
As the project deadline approached, I finalized my findings. I discovered that armed status was most influenced by factors like age, race, and whether the subject fled the scene. The model helped identify biases and patterns in police use of force, especially against unarmed individuals. I compiled the visualizations, analysis, and model results into the final report and submitted it by the March 7th deadline. The project gave me strong experience in handling real-world data and applying predictive analytics techniques.