Algorithms for Estimating Linear Function in Data Mining
Abstract
Thomas Hoang
The main goal of this topic is to showcase several studied algorithms for estimating the linear utility function to predict the users’ preferences. For example, if a user comes to buy a car that has several attributes including speed, color, age, etc in a linear function, the algorithms that we present in this paper help with estimating this linear function to filter out a small subset that would be of best interest to the user among a million tuples in a very large database. In addition, the estimating linear function could also be applicable in getting to know what the data can do or predicting the future based on the data that is used in data science, which is demonstrated by the GNN, PLOD algorithms [1,2]. In the ever-evolving field of data science, deriving valuable insights from large datasets is critical for informed decision-making, particularly in predictive applications. Data analysts often identify high-quality datasets without missing values, duplicates, or inconsistencies before merging diverse attributes for analysis. Taking housing price prediction as a case study, various attributes must be considered, including location factors (proximity to urban centers, crime rates), property features (size, style, modernity), and regional policies (tax implications). Experts in the field typically rank these attributes to establish a predictive utility function, which machine learning models use to forecast outcomes like housing prices. Several data discovery algorithms, including address the challenges of predefined utility functions and human input for attribute ranking, which often result in a time-consuming iterative process, which the work of cannot overcome[1-4]. The notable enhancement uses a Graph Neural Network (GNN) algorithm that builds on previous approaches. The GNN algorithm leverages the power of graph neural networks and large language models to interpret text-based values that earlier models like PLOD could not handle, significantly improving the reliability of outcome predictions. GNN extends PLOD’s capabilities by incorporating numerical and textual data, offering a comprehensive approach to understanding user preferences for data science and analytics applications.

