I finalized the findings and evaluated temporal trends across the clusters. There was an increase in protests over recent years, and a decline in average fatalities per event. These insights added valuable depth to our analysis. Wrapping up the model and contributing to the report gave me a well-rounded experience in both technical and interpretive aspects of data science. Project 2 is now complete and ready for submission!
Week-10
After cleaning and pre-processing, I implemented K-means clustering on the event locations. Using the Geopy library for geodesic distance and running the elbow method, we chose 4 as the optimal number of clusters. The results clearly grouped the country into different conflict zones based on event types. This week helped deepen my understanding of unsupervised learning and clustering concepts.
This week’s focus was on interpreting the clusters and understanding what each one represented. Northern regions mostly experienced violence against civilians, while central regions had more protests. I found it interesting how clustering highlighted patterns that might be hidden in raw data. We also visualized the clusters on maps, which made our results easier to communicate to others.
Week-9
This week, Final tuning and testing of the model were done this week. I compared multiple models and evaluated them using confusion matrices and ROC curves. We selected the best-performing model and generated predictions for unseen data. It was interesting to note how model performance varied depending on the features included. By now, the results were consolidated into a clean report with graphs and interpretation. Project 1 is complete!,
For Project 2, I began analyzing the ACLED US dataset on political violence. I noticed right away that it was much larger and more diverse in terms of event types. The data required geolocation checks, and several columns needed encoding or transformation. I also started thinking about which clustering method would suit our problem—K-means seemed like a strong candidate due to the spatial nature of the data.
week 8
This week, I focused on cleaning the dataset and beginning exploratory data analysis. After dropping or imputing missing values and normalizing the text entries, I moved into visualizations. It was enlightening to see trends appear—certain states had significantly more incidents, and specific demographic groups were overrepresented. This helped guide which features to include in the model and gave me a better grasp of the problem space. With clean data and exploratory insights, I began model building. I used logistic regression and decision trees to predict the likelihood of a fatal police shooting based on features like age, race, armed status, and location. It took some iteration to tune the models, but seeing prediction accuracy improve over time helped solidify my understanding of classification techniques and their evaluation metrics like precision and recall.
week-7
Today, we have submitted project and had a discussion about dataset-2
I started looking at the Washington police shootings dataset today. One of the first challenges I encountered was data inconsistency, especially with missing values and non-standardized entries in critical columns like race, cause, and weapon. Some entries were ambiguous or labeled “unknown,” which posed difficulties for meaningful analysis. I spent time understanding the context and potential biases within the dataset, which is crucial for ensuring our model’s fairness and accuracy.
week-6
today 4th march , we were working on our project-1, i was working on the model building . we were working on age disparities.
As the project deadline approached, I finalized my findings. I discovered that armed status was most influenced by factors like age, race, and whether the subject fled the scene. The model helped identify biases and patterns in police use of force, especially against unarmed individuals. I compiled the visualizations, analysis, and model results into the final report and submitted it by the March 7th deadline. The project gave me strong experience in handling real-world data and applying predictive analytics techniques.
Week-5
Today, our classmate Mahnoor delivered a presentation in which she shared her insights on the percentage of people shot in a particular county. She discussed patterns in the data, focusing on how these incidents have occurred continuously over a period of time. Her analysis provided valuable perspectives on trends and potential factors contributing to these occurrences.
I began experimenting with machine learning models to predict the likelihood of a suspect being armed. I used logistic regression and decision tree classifiers and evaluated their performance using accuracy and confusion matrices. I understood how feature selection and data balance affect model predictions. Although the accuracy wasn’t very high, the models gave interesting insights into which factors mattered most.
Week-4
Today, we explored various fundamental topics in statistics, including standard deviation and the standard normal distribution. Additionally, we engaged in a discussion on inference-based questions, particularly focusing on comparisons across different fields such as age, race, and other demographic factors. This discussion helped deepen our understanding of how statistical inferences can be applied to analyze and interpret data effectively.
Week-3
While analyzing the dataset, I came across several questions that could help me understand it better.
What is the racial and gender breakdown of individuals shot by police?
Are certain groups disproportionately affected?
What is the average age of individuals involved in police shootings?
Are younger or older individuals more frequently involved?
What percentage of the victims were unarmed versus armed?
What types of weapons were most commonly involved?
Week-2
Today, we engaged in a discussion about the Washington shooting dataset. Students actively participated by posing various questions related to the dataset. Professor G. Davis addressed these questions and elaborated on the key concepts, providing a concise yet insightful explanation to enhance our understanding of the topic.