In network theory, collective classification is the simultaneous prediction of the labels for multiple objects, where each label is predicted using information about the object's observed features, the observed features and labels of its neighbors, and the unobserved labels of its neighbors.[1] Collective classification problems are defined in terms of networks of random variables, where the network structure determines the relationship between the random variables. Inference is performed on multiple random variables simultaneously, typically by propagating information between nodes in the network to perform approximate inference. Approaches that use collective classification can make use of relational information when performing inference. Examples of collective classification include predicting attributes (ex. gender, age, political affiliation) of individuals in a social network, classifying webpages in the World Wide Web, and inferring the research area of a paper in a scientific publication dataset.

Motivation and background edit

Traditionally, a major focus of machine learning is to solve classification problems. (For example, given a collection of e-mails, we wish to determine which are spam, and which are not.) Many machine learning models for performing this task will try to categorize each item independently, and focus on predicting the class labels separately. However, the prediction accuracy for the labels whose values must be inferred can be improved with knowledge of the correct class labels for related items. For example, it is easier to predict the topic of a webpage if we know the topics of the webpages that link to it. Similarly, the chance of a particular word being a verb increases if we know that the previous word in the sentence is a noun; knowing the first few characters in a word can make it much easier to identify the remaining characters. Many researchers have proposed techniques that attempt to classify samples in a joint or collective manner, instead of treating each sample in isolation; these techniques have enabled significant gains in classification accuracy.[1][2]

Example edit

Consider the task of inferring the political affiliation of users in a social network, where some portion of these affiliations are observed, and the remainder are unobserved. Each user has local features, such as their profile information, and links exist between users who are friends in this social network. An approach that does not collectively classify users will consider each user in the network independently and use their local features to infer party affiliations. An approach which performs collective classification might assume that users who are friends tend to have similar political views, and could then jointly infer all unobserved party affiliations while making use of the rich relational structure of the social network.

Definition edit

Consider the semi supervised learning problem of assigning labels to nodes in a network by using knowledge of a subset of the nodes' labels. Specifically, we are given a network represented by a graph   with a set of nodes   and an edge set   representing relationships among nodes. Each node   is described by its attributes: a feature vector   and its label (or class)  .

  can further be divided into two sets of nodes:  , the set of nodes for which we know the correct label values (observed variables), and  , the nodes whose labels must be inferred. The collective classification task is to label the nodes in   with a label from a label set  .

In such settings, traditional classification algorithms assume that the data is drawn independently and identically from some distribution (iid). This means that the labels inferred for nodes whose label is unobserved are independent of each other. One does not make this assumption when performing collective classification. Instead, there are three distinct types of correlations that can be utilized to determine the classification or label of  :

  1. The correlations between the label of   and the observed attributes of  . Traditional iid classifiers which make use of feature vectors are an example of approaches that use this correlation.
  2. The correlations between the label of   and the observed attributes (including observed labels) of nodes in the neighborhood of  .
  3. The correlations between the label of   and the unobserved labels of objects in the neighborhood of  .

Collective classification refers to the combined classification of a set of interlinked objects using the three above types of information.

Methods edit

There are several existing approaches to collective classification. The two major methods are iterative methods and methods based on probabilistic graphical models.[3]

Iterative methods edit

The general idea for iterative methods is to iteratively combine and revise individual node predictions so as to reach an equilibrium. When updating predictions for individual nodes is a fast operation, the complexity of these iterative methods will be the number of iterations needed for convergence. Though convergence and optimality is not always mathematically guaranteed, in practice, these approaches will typically converge quickly to a good solution, depending on the graph structure and problem complexity. The methods presented in this section are representative of this iterative approach.

Label propagation edit

A natural assumption in network classification is that adjacent nodes are likely to have the same label (i.e., contagion or homophily). The predictor for node   using the label propagation method is a weighted average of its neighboring labels  [4]

Iterative Classification Algorithms (ICA) edit

While label propagation is surprisingly effective, it may sometimes fail to capture complex relational dynamics. More sophisticated approaches can use richer predictors. Suppose we have a classifier   that has been trained to classify a node   given its features   and the features   and labels   of its neighbors  . Iterative classification applies uses a local classifier for each node, which uses information about current predictions and ground truth information about the node's neighbors, and iterates until the local predictions converge to a global solution. Iterative classification is an “algorithmic framework,” in that it is agnostic to the choice of predictor; this makes it a very versatile tool for collective classification. [5][6][7]

Collective classification with graphical models edit

Another approach to collective classification is to represent the problem with a graphical model and use learning and inference techniques for the graphical modeling approach to arrive at the correct classifications. Graphical models are tools for joint, probabilistic inference, making them ideal for collective classification. They are characterized by a graphical representation of a probability distribution  , in which random variables are nodes in a graph  . Graphical models can be broadly categorized by whether the underlying graph is directed (e.g., Bayesian networks or collections of local classifiers) or undirected (e.g., Markov random fields (MRF)).

Gibbs sampling edit

Gibbs sampling is a general framework for approximating a distribution. It is a Markov chain Monte Carlo algorithm, in that it iteratively samples from the current estimate of the distribution, constructing a Markov chain that converges to the target (stationary) distribution. The basic idea for Gibbs Sampling is to sample for the best label estimate for   given all the values for the nodes in   using local classifier   for a fixed number of iterations. After that, we sample labels for each   and maintain count statistics for the number of times we sampled label   for node  . After collecting a predefined number of such samples, we output the best label assignment for node   by choosing the label that was assigned the maximum number of times to   while collecting samples.[8][9]

Loopy belief propagation edit

For certain undirected graphical models, it is possible to efficiently perform exact inference via message passing, or belief propagation algorithms.[10] These algorithms follow a simple iterative pattern: each variable passes its "beliefs" about its neighbors' marginal distributions, then uses the incoming messages about its own value to update its beliefs. Convergence to the true marginals is guaranteed for tree-structured MRFs, but is not guaranteed for MRFs with cycles.

Statistical relational learning (SRL) related edit

Statistical relational learning is often used to address collective classification problems. A variety of SRL methods has been applied to the collective classification setting. Some of the methods include direct methods such probabilistic relational models (PRM),[11] coupled conditional models such as link-based classification,[12] and indirect methods such as Markov logic networks (MLN)[13] and Probabilistic Soft Logic (PSL).[14]

Applications edit

Collective classification is applied in many domains which exhibit relational structure, such as:

See also edit

References edit

  1. ^ a b Sen, Prithviraj; Namata, Galileo; Bilgic, Mustafa; Getoor, Lise; Galligher, Brian; Eliassi-Rad, Tina (2008). "Collective Classification in Network Data". AI Magazine. 29 (3): 93. doi:10.1609/aimag.v29i3.2157. hdl:1903/7546. ISSN 0738-4602.
  2. ^ Kajdanowicz, Tomasz; Kazienko, Przemysław (2018). "Collective Classification". Encyclopedia of Social Network Analysis and Mining. pp. 253–265. doi:10.1007/978-1-4939-7131-2_45. ISBN 978-1-4939-7130-5.
  3. ^ London, Ben; Getoor, Lise (2014). "Collective Classification of Network Data". Data Classification: Algorithms and Applications. 29: 399–416.
  4. ^ Zhu, Xiaojin (2002). Learning From Labeled and Unlabeled Data With Label Propagation (Technical report). CiteSeerX 10.1.1.14.3864.
  5. ^ Neville, Jennifer; Jensen, David (2000). Iterative classification in relational data (PDF). AAAI Workshop on Learning Statistical Models From Relational Data (SRL). AAAI. p. 8.
  6. ^ Chakrabarti, Soumen; Dom, Byron; Indyk, Piotr (1998). Enhanced Hypertext Categorization Using Hyperlinks. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery (ACM). pp. 307–318. doi:10.1145/276304.276332.
  7. ^ Jensen, David; Neville, Jennifer; Gallagher, David (2000). Why collective inference improves relational classification. ACM SIGKDD international conference on Knowledge discovery and data mining. Association for Computing Machinery (ACM). p. 5. doi:10.1145/1014052.1014125.
  8. ^ Mackskassy, Sofus; Provost, Foster (2007). "Classification in Network Data: A Toolkit and a Univariate Case Study" (PDF). Journal of Machine Learning Research: 935 - 983.
  9. ^ Geman, Stuart; Donald, Foster (1990). "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". Readings in Uncertain Reasoning. Morgan Kaufmann Publishers Inc. pp. 452–472.
  10. ^ Yedidia, J.S.; Freeman, W.T.; Y. (January 2003). "Understanding Belief Propagation and Its Generalizations". In Lakemeyer, Gerhard; Nebel, Bernhard (eds.). Exploring Artificial Intelligence in the New Millennium. Morgan Kaufmann. pp. 239–236. ISBN 978-1-55860-811-5. Retrieved 2009-03-30.
  11. ^ Getoor, Lise; Friedman, Nir; Koller, Daphne; Taskar, Benjamin (2002). "Learning Probabilistic Models of Link Structure". J. Mach. Learn. Res. 3: 679–707.
  12. ^ Lu, Qing; Getoor, Lise (2003). Link based Classification (PDF). International Conference on Machine Learning (ICML).
  13. ^ Richardson, Matthew; Domingos, Pedro M. (2006). "Markov logic networks". Mach. Learn. 62 (1–2): 107–136. doi:10.1007/S10994-006-5833-1.
  14. ^ Bach, Stephen; Broecheler, Matthias; Huang, Bert; Getoor, Lise (2017). "Hinge-Loss Markov Random Fields and Probabilistic Soft Logic". Journal of Machine Learning Research. 18: 1–67.
  15. ^ Jaafor, Omar; Birregah, Babiga (2017-07-31). "Collective classification in social networks". Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. New York, NY, USA: ACM. pp. 827–835. doi:10.1145/3110025.3110128. ISBN 978-1-4503-4993-2.
  16. ^ Fakhraei, Shobeir; Foulds, James; Shashanka, Madhusudana; Getoor, Lise (2015). "Collective Spammer Detection in Evolving Multi-Relational Social Networks". Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, New York, USA: ACM Press. pp. 1769–1778. doi:10.1145/2783258.2788606. ISBN 978-1-4503-3664-2.
  17. ^ Bhattacharya, Indrajit; Getoor, Lise (2007). "Collective entity resolution in relational data". ACM Transactions on Knowledge Discovery from Data. 1 (1). Association for Computing Machinery (ACM): 5. doi:10.1145/1217299.1217304. hdl:1903/4241. ISSN 1556-4681. S2CID 488972.
  18. ^ Luo, Ling; Yang, Zhihao; Yang, Pei; Zhang, Yin; Wang, Lei; Lin, Hongfei; Wang, Jian (2017-11-24). Wren, Jonathan (ed.). "An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition". Bioinformatics. 34 (8). Oxford University Press (OUP): 1381–1388. doi:10.1093/bioinformatics/btx761. ISSN 1367-4803. PMID 29186323.
  19. ^ Burford, Clint; Bird, Steven; Baldwin, Timothy (2015). "Collective Document Classification with Implicit Inter-document Semantic Relationships". Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics. Stroudsburg, PA, USA: Association for Computational Linguistics. pp. 106–116. doi:10.18653/v1/s15-1012.
  20. ^ Žitnik M, Zupan B (2015). "Gene network inference by fusing data from diverse distributions". Bioinformatics. 31 (12): i230-9. doi:10.1093/bioinformatics/btv258. PMC 4542780. PMID 26072487.
  21. ^ Triebel, Rudolph; Mozos, Óscar Martínez; Burgard, Wolfram (2008). "Collective Classification for Labeling of Places and Objects in 2D and 3D Range Data" (PDF). Data Analysis, Machine Learning and Applications. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 293–300. doi:10.1007/978-3-540-78246-9_35. ISBN 978-3-540-78239-1. ISSN 1431-8814.