Crime Awareness and Safety System for Travelers

D. Ch; an Lagubigi; Kiran Raj K M; Gururagavendra Paluri; Harish Kunder; Ganeshraj S; Sathivik S

International Journal of Criminology and Criminal Law(IJCCL)

ISSN: 2996-3397 | DOI: 10.33140/IJCCL

Researchers and authors can directly submit their manuscript online through this link Online Manuscript Submission.

Track Your Submission

Share this page:

Open Access Journals

Review Article - (2025) Volume 3, Issue 3

View PDF Download PDF

Crime Awareness and Safety System for Travelers

D.Chandan Lagubigi ^*, Kiran Raj K M , Gururagavendra Paluri , Harish Kunder , Ganeshraj S and Sathi-vik S

Department of Artificial Intelligence and Machine Learning, Alva’s Institute of Engineering and Technology, India

^*Corresponding Author: D.Chandan Lagubigi, Department of Artificial Intelligence and Machine Learning, India

Received Date: Sep 22, 2025 / Accepted Date: Oct 27, 2025 / Published Date: Nov 10, 2025

Copyright: ©©2025 D.Chandan Lagubigi, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citation: Chandan, D. L., Kiran, R. K. M., Gururagavendra, P., Harish, K., Ganeshraj, S., et al. (2025). Crime Awareness and Safety System for Travelers, Int J Criminol Criminal Law, 3(3), 01-06.

Abstract

Travelers today face a complex environment of threats—ranging from street crime and confidence scams in urban areas to violent crime and natural disaster in rural areas. This paper presents the design, deployment, and evalu- ation of a ”Crime Awareness and Safety System for Travelers” (CAST), an end-to-end mobile system that combines real-time data fusion, ma- chine-learning–based risk prediction, and context- aware alerting to enhance traveler safety. CAST collects multi- source inputs—such as official crime reports, social media trends, geotagged incident reports, and weather data—and fuses them through a multi-layered analytics pipeline. A supervised learning model classifies incoming events by risk and probability, and a spatiotemporal risk map continuously shows hotspots and emerging threats. Users’ feedback highlighted the system’s ease of use and robustness of alerts, although issues remain in ensuring data privacy and avoiding false positives. We conclude that CAST is a scalable solution to traveler safety, with potential extensions to include peer-to-peer reporting and blockchain- based verification of incident data.

Keywords

Crime Prediction, Explainable AI (XAI), Safety Recommendations, Crime Visualization, Real-Time Crime News, AI Based Travel Safety Score, Emergency Resources.

Introduction

Travelers are faced with an evolving threat environment, in which risk from crime-from street robbery and tourist- zone fraud to violent crime in unvisited neighborhoods-can compromise safety as well as the pleasure of travel. Offi- cial government travel warnings and guidebooks, traditional sources of information, are brief on detail and dated in their refreshment, denying travelers current, location-specific information. Meanwhile, popular navigation tools optimize for distance or travel time, not crime density or personal exposure as considerations. The shortfall suggests an integrated, real- time system that gives travelers actionable safety intelligence

Recent developments in mobile sensing, crowdsourcing, and ma-chine learning provide promising directions to tackle this problem. Dinkoksung et al. created a tourist safety app for hot and humid regions, combining weather alerts, health advice, and rescue coordination to minimize environmental hazards [1]. Be-Safe Travel employs a web-based platform with Google Maps APIs to provide crime-sensitive routes for driving, illustrating the potential for including security levels in urban distance calculations [2]. Transafe takes this further by crowd-sourcing public opinions on safety, mapping affective states over urban locations using collective intelligence, and more recently. DangerMaps employs retrieval-augmented lan- guage models to produce customized safety recommendations superimposed on urban maps, exemplifying the potential of large-language-model (LLM) architectures in travel security applications.

Even after such innovations, all current solutions suffer from one or more limitations. First, the majority of them concentrate mainly on environmental or health risks alone, with not enough overall integration of crime information, so- cial indicators, and traveler environments [3]. Second, crowd- sourced solutions suffer from sparsity and noisy data, tending to have uneven geographical coverage. Third, state-of-the-art LLM-based solutions such as Dan-gerMaps are still in infancy stages, where fundamental questions include the real world- scale performance, privacy considerations, and frictionless integration into travelers workflows [4].

To address these gaps, we introduce CAST (Crime Aware- ness and safety System for Travelers), an end-to-end mobile system that delivers real-time, tailor-able risk appraisals and context-based notices. CAST's system architecture accepts multi-source inputs-au-thoritative crime reports, geo tagged incident records, social media metrics, and environmental conditions-into a multi-layered analyt- ics pipeline. A super- vised learning classifier computes severity scores and incident likelihood predictions, whereas a spatiotem-poral risk map visualizes emerging hotpots. Crucially, the system tunes noti- fications based on user itineraries, vulnerability pro-files, and expressed preferences, recommending safer paths, local alerts, and emergency contacts.

This paper is structured as follows. Section 2 provides an over-view of existing research on mobile travel security, security crowd-sourced platforms, and LLM advisory systems. Section 3 describes CAST's system architecture, including data ingestion, machine-learning models, and user interface. Section 4 discusses the experimental setup and trial results. Section 5 describes practi-cal issues-data privacy, reduction of false-positives, and scalability-and future extensions, including blockchain-based incident verification and peer-to- peer reporting. We conclude by establishing CAST's potential as a scalable, end-to-end security solution for enhancing traveler security in different environments.

Literature Survey

Most of the solutions proposed are to be applied by the police and not to the people who reside in the area. LISA (Local Indicators of spatial Association), a statistical tool was used while analyzing data collected from Sweden. The focus was on temporal hotspot prediction of burglary crime although it can be used to predict other crimes. Data was organized in several forms of matrices to predict the hotspots where the crime takes place and the time at which the crime takes place. Prediction was made using different variables like weekday by time-of-the-day [5]. The Getis and Ord Gi statistic is used in this work in crime mapping. The technique identified data clusters in a matrix with values higher than expected random chance. The paper does not, however, show why the algorithm decided on something.

Research was done by A Borg where a DSS (Decision Support System) running an AI algorithm was developed [6]. The algorithm aids in decision making of identifying which are the criminal paths or routes given a set of points, it classifies crimes depending on the similarities they have. The algorithm produces a list of possible paths where the crime was committed and a list of crimes arranged according to their priority and similarity. The algorithm identifies this data used to identify the criminal offender’s route. This is done by first identifying a series of burglaries done by one person basing on the forensic evidence such as DNA also known as ”hard evidence” and behavioural evidence also known as ”soft evidence”. However, Forensic evidence is rare to find at a given crime scene and behavioural evidence is the one that is most found. Therefore, it is used the most to group crimes to get a series of crimes committed by one person [6]. Data used was got from the database of burglary reports in SAMS. The algorithm assists in decision making and narrowing down on identifying criminals. The emphasis is on how to identify a series of crimes to know where they are committed and the person who commits them.

A machine learning model to forecast high probability crime locations from a provided dataset utilizing the K nearest algorithm, Naive bayes, boosted decision tree and decision tree algorithms was developed by [7]. Nearly all crime forecasts are linked to a place and the type of crimes that occur in a specific place. Because the algorithm utilized is a blend of four different algorithms, the algorithm that was most accurate which was Boosted Decision Tree Classifier with 96.0 percent accuracy was the algorithm of choice.

Predictive policing is the use of data and machine learning algorithms to forecast crime patterns and identify potential criminal activity [8]. It is used by law enforcement to proac- tively patrol areas with a higher likelihood of crime, relying on historical crime data as well as other inputs such as demographics and weather. Predictive policing has been praised for its ability to reduce crime and improve public safety but criticized for possible bias and civil rights abuse. A study on predictive policing by George Mohler, Rajeev Raje and his team was done where they are trying to remove the race bias from a predictive algorithm being used. A penalized likelihood method was developed to introduce demographic parity into point process models of crime. The study was undertaken by the problem of the algorithm remembering a specific location predicted earlier and marking it as a hotspot area which is not a hotspot area in reality thus ending up focusing on the minority of the population [9].

A.A Biswas and his co-authors give three regression models, linear regression, polynomial regression, and random forest regression to forecast the patterns and trends of Bangladesh crimes based on information obtained from the police’s website in Bangladesh [10]. It was found that polynomial and random forest regression was better with the value of R squared as 0.95 and 0.85 respectively. The mean absolute score for the the polynomial model was 2832 and that of the random forest was 3492 to that of linear regression that was 4968.

Identify Research Gaps in the Literature

They assume that the criminals commit crimes in the same location all the time as seen in, but some of them commit random attacks on other locations so that they will not be caught [11,12]. Further research should be conducted on crimes that seem random because most of the research so far conducted is primarily on the detection of crime hotspots so that the law enforcers can direct the resources accordingly which is a great concept but, for one to be able to lower the rate of crimes in a nation and the world in general, all communities that are not hotspots should be researched rather than some of them [13,14]. From the research so far conducted and knowledge acquired, no XAI has been implemented to the various regression models so far which makes it extremely difficult to hold the AI accountable, where the algorithm provides incorrect results and there is no method of back tracing and knowing why a certain decision was reached in the first place.

Proposed Solution

Our Crime Awareness and Safety System for Travelers starts out by consuming a user’s destination, planned time of visitation, and self-reported gender to customize its risk analysis and advice. Cat-Boost is utilized as the primary prediction engine, thanks to its inherent ability to accept categorical inputs-like city names, crime types, and gender-without needing one-hot encoding, thus maintaining data integrity and accelerating training. Historical crime data are retrieved through the FBI Crime Data API, which offers incident-level and summary data in JSON or CSV formats, with extensive geographic and temporal coverage To augment under-represented demographics and locations, we create synthetic samples through SMOTE-based augmentation, balancing the dataset and reducing bias against minority groups. Temporal characteristics-hour of day, day of week, and holiday indicators-are designed together with spatial density measures, such as history of incidents per square kilometer, to detect subtle criminal behavior patterns. The data after processing is channeled into CatBoost’s ordered boosting pipeline, which minimizes target leakage further by splitting training data sequentially while building models. After training, the model generates a risk score for every queried location-time-gender triple, marking ”areas of concern” where forecasted crime probability is above a user-specified threshold and ”safe zones” where risk is relatively low.

Under the hood, data collection pulls from both federal and municipal open-data portals, including the FBI’s Uniform Crime Reporting (UCR) Program and city-level crime dashboards, ensuring that our dataset spans thousands of jurisdictions and multiple offense type. Preprocessing routines address missing values through imputation strategies informed by spatial and temporal neighbors, while categorical fields are flagged for native treatment by Cat-Boost, streamlining the workflow and reducing manual encoding errors. To balance out thin reporting in particular gender or age ranges, we use SMOTE to generate synthetic instances that reflect true-world correlations with joint distributions between features. Feature engineering adds interaction terms-like ”female × late-night visits”-to identify gender-specific vulnerabilities reported in transit safety studies, where women consistently report higher fear levels even though they have lower overall victimization rates. Spatial. density kernel estimates pool historical incident counts over user-specified radii, producing continuous risk surfaces rather than separate point estimates. This dense feature set directly feeds into CatBoost’s gradient-boosted decision trees. ,. which internally convert categorical variables to numerical representations using ordered-target statistics and combinations without exploding feature dimensionality while retaining high-cardinality information.

Figure 1: Correlation Heatmap for Crime Awareness Dataset

Implementation Techniques

Data Collection and Preprocessing

From Fig.1 We obtained historical crime statistics from publicly released datasets, such as the FBI’s Uniform Crime Reporting (UCR) Program and city-level crime dashboards. To overcome sparsity of data for some groups and areas, synthetic data generation methods, i.e., SMOTE, were used to oversample the minority class. Preprocessing involved treating missing values using imputation, converting categorical variables such as city, type of crime, and gender, and scaling numeric features to bring the data into a common range. Feature engineering was performed to develop temporal features (e.g., ”hour of day,” ”day of week”) and spatial features (e.g., ”crime frequency per square kilometer”) in order to make the model better at predicting.

Model Selection and Training

CatBoost was chosen as the base algorithm because of its ability to efficiently work with categorical features without heavy pre-processing. Its ordered boosting and symmetric tree structure im-plementation reduce overfitting and enhance generalization. The model was then trained on the prepro- cessed dataset to make predictions about crime likelihood as a function of input parameters like destination, visit time, and gender. Hyperparameter tuning was done to optimize model performance, varying parameters such as learning rate, depth, and regularization coefficients.

Model Explainability

To provide transparency to the predictions made by the model, SHAP (SHapley Additive exPlanations) was incorpo- rated. SHAP values help explain the contribution of every feature to the model’s output so that users can comprehend the reasons behind particular safety recommendations. Such interpretability is vital for user trust and supporting informed decision-making.

Figure 2: User Interface of Crime Risk Prediction

LIME (Local Interpretable Model-agnostic Explanations) is a powerful technique used to interpret the predictions of complex machine learning models. It works by perturbing the input data slightly and observing the resulting changes in predictions to cre-ate a simple, interpretable model-typically a linear model-that ap¬proximates the behavior of the original model around that specific instance. This allows users to understand which features were most influential in a particular decision. In the context of our Crime Awareness System, LIME helps explain why the model flagged a certain location as high-risk or recommended specific safety mea¬sures, thus enhancing user confidence in the system’s recommen¬dations by providing clear, instance-specific justifications.

Evaluation Metrics

The performance of the model was measured by the usual regres¬sion measures: Mean Absolute Error (MAE), Mean Squared Error (MSE), and R² score. In our first tests, the model had an MAE of 10.39, MSE of 181.91, and R² score of 0.54, which represents moderate predictive power with room for improvement. .

User Interface

From Fig. 1 The outputs of the system are rendered through a user-friendly web or mobile interface, showing interactive maps with regions of concern and safe areas highlighted. Per- sonalized safety recommendations are provided to users based on their input parameters, with SHAP-derived explanations to support each rec¬ommendation to build trust and improve understanding.

Future Work

To improve the system’s functionality, future development will emphasize integrating more heterogeneous datasets, such as glob¬al crime statistics and real-time social media updates. Moreover, seeking out additional machine learning models and optimizing feature engineering processes will seek to enhance prediction ac- curacy and confidence. Creating a feedback loop to include us¬er-provided experience will also be explored to progressively opti¬mize risk assessments and recommendations.

Results

From Fig. 2 Our Travelers Crime Awareness and Safety System was measured using three major regression metrics: Mean Abso¬lute Error (MAE), Mean Squared Error (MSE), and the R-squared (R²) score. The system was found to have a MAE of 10.4677, meaning that on average the forecasted crime risk is off actual reported crime levels by approximately 10.47 incidents per pre¬diction. This reflects that the model has a quite small margin of error in forecasting crime rates, which is of paramount importance in offering reliable safety recommendations to tourists. The Mean Squared Error (MSE) was 184.2623, a measure which, by squar¬ing the errors, emphasizes large differences more. This means that while most forecasts are quite good, the model occasionally makes big mistakes, possibly in areas where crime trends are volatile or there is inconsistent reporting. These outliers represent the chal-lenges of modeling crime trends and human activity in the city, particularly in data-starved or high-volatility areas.

The R-squared (R²) value is 0.5337, approximately equal to 53.37 percent of the target variable’s variance (probability of crime) ex¬plained by the input features of the model: destination, visit time, and gender. In crime forecasting, with volatility, unstructured hu¬man behavior, and outside influences dominating-crime prediction is not a deterministic, stable problem-an R of this value means moderate but strong predictive power. It means the model is catch¬ing more than half of the applicable relationships in the data, a very good solution to enabling traveler decision-making. Compared to baseline models, which have much lower values of R² (particularly when using only temporal trends or aggregate statistics alone), our solution obtains higher explanatory power, benefiting from more feature engineering as well as from the use of demographic and spatiotemporal features.

Together, these figures prove that the model is accurate and in-terpretable enough to be applied in reality. An MAE of around 10 events provides travelers with the capacity to make decisions without excessive influence from uncertainty in forecasts, while an MSE of 184.2623 shows sporadic large prediction errors that can be mitigated through model calibration or further context in-formation. The R² measure confirms that the features we have se-lected—i.e., those derived through spatial density, crime pattern frequency, and gender-based risk profiling—are informative in explaining area crime patterns. These results not only validate the current design but also reveal where there is scope for improve¬ment. The relatively high MSE suggests that more advanced er-ror- reduction techniques, including ensemble learning, dynamic time-series modeling, or hybrid systems blending structured data with unstructured sources like social media or real-time notifica¬tions, might be worth exploring. Similarly, improving data quali¬ty further by including more granular or up-to-date crime records could further enhance overall model credibility. Feedback loops, where users can provide feedback on their experiences and feed new knowledge back into the system, could also improve future risk estimates

Figure 3: Result

In summary, the system’s performance at present expressed in terms of a MAE of 10.4677, MSE of 184.2623, and R² of 0.5337 is an interpretable and stable basis for travel safety solutions. It is balanced between transparency and accuracy, and therefore a reliable tool for the provision of location- and gender-based crime risk estimates, with a transparent method of future development and real-time integration.

Conclusion

The Crime Awareness and Safety System for Travelers* is a state-of-the-art solution employing machine learning, real-time analysis of data, and cloud-based infrastructure to boost metropolitan safe¬ty. Integrating information from multiple sources—traffic sensors, public cams, GPS, and user feedback—the system provides re¬al-time predictions of crime, safety notifications, and best-recom-mended travel routes. Its Random Forest-based predictive model, trained with rich crime history data, obtains a high accuracy of 90 percent , precision of 88 percent, recall of 85 percent, and an F 1-score of 86 percent, with reliability and minimal false positives vs. false negatives.

The system architecture leverages Apache Kafka and Apache Spark to process enormous amounts of data in real time to enable rapid response and updates-less than 500 milliseconds of model inference time. A mobile app provides real-time alerts to travelers, route suggestions, and an easy incident reporting feature, while a corresponding web dashboard offers authorities features like crime heatmaps and predictive analytics to assign resources and respond accordingly. The ease of use of the system is evidenced through high satisfaction levels—88 percent overall-with 85 percent of travelers having indicated that the app was easy to use and law en¬forcement officers having enjoyed the efficiency of the dashboard.

By providing real-time, actionable information and facilitating collaboration between travelers and urban planners, the system greatly improves situational awareness and public safety. Through its scalable design and real-time feedback loops, the system ensures that the system remains

responsive to changing crime patterns and city development. Therefore, the system not only represents a great tool for existing safety management, but also a promising candidate for integration into larger smart city initiatives.

International Journal of Criminology and Criminal Law(IJCCL)

ISSN: 2996-3397 | DOI: 10.33140/IJCCL

International Journal of Criminology and Criminal Law

Open Access Journals

Crime Awareness and Safety System for Travelers

Abstract

Keywords

Introduction

Literature Survey

Proposed Solution

Implementation Techniques

Results

Conclusion

References

Important Links

Locate Us