Arbeitsgruppe Wissensmanagement
Network Mining (Blockseminar)
Das Seminar gibt einen Überblick über aktuelle Problemstellungen
im Data Mining auf vernetzten Domains, wie z.B. sozialen
Netzwerken, dem WWW und anderen. Wir untersuchen Algorithmen, die
geschickt die Struktur des vorliegenden Netzes ausnutzen um
relevante Personen zu finden oder Webseiten zu einem gegebenen Thema zu
ranken. In diesem Zusammenhang werden wir auch graphbasierte
Erweiterungen der klassischen
Klassifikation und graphbasierte Cluster-Verfahren kennenlernen.
Aktuelles:
- Die Co-Refererenten sind jeweils hinter den Vortragenden zu finden.
- Termin des Blockseminars ist Mo, 12.
Februar; Humboldt Kabinett, 9.00 Uhr . Die
Ausarbeitungen bitte spätestens eine Woche vorher (also bis zum
5.2.2007) abgeben.
- Die meisten Themen sind bereits vergeben; wer noch mitmachen
möchte findet freie Themen unten.
- Hier die Folien aus der Einführungsveranstaltung: [pdf]
- Durch die Verlegung des HK Maschinellen Lernens treffen wir
uns am Mittwoch,
18.10. von 15-17 Uhr im ESZ Raum 1305!
- Die Themenvergabe findet am Ende des 1. Treffens statt,
Themen und Literaturhinweise s.u.
- Den genauen Termin des Blockseminars legen wir in Absprache
mit den Teilnehmern fest (Januar/Februar).
Themen+Literaturhinweise:
- PageRank und HITS: Frank
Habermann - Co Referent: Daniel Renz (Michael Brückner)
- C. Ding, X. He, P. Husbands, H. Zha, H. D. Simon. PageRank, HITS and a unified framework for
link analysis, Proc. ACM SIGIR Conf. 2001
- L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: Bringing
order to the Web
- Jon M. Kleinberg: Authoritative
Sources in a Hyperlinked Environment. Stanford Digital Library
Technologies Project. Journal of the ACM. 1999.
- Diligenti et al.: A
Unified Probabilistic Framework for Web Page Scoring Systems,
2004.
- Amy N. Langville, Carl D. Meyer: A Survey of Eigenvector Methods for Web
Information Retrieval.
- Pagerank-Erweiterungen: Themen-Pagerank, Zeit-Pagerank,
Block-Pagerank : Daniel Renz - Co Referent: Charlotte Pix
(Ulf Brefeld)
- Yen-Yu Chen, Qingqing Gan, Torsten Suel: I/O-Efficient Techniques for Computing
Pagerank.
- Taher H. Haveliwala: TopicSensitive
PageRank.
- Bing Liu, Philip S. Yu, Xin Li: On the Temporal Dimension of Search.
- Matthew Richardson, Pedro Domingos: The Intelligent Surfer: Probabilistic
Combination of Link and Content Information in PageRank.
- Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D.
Manning, Gene H. Golub: Exploiting
the Block Structure of theWeb for Computing PageRank.
- Claudia Hauff and Leif Azzopardi: Age Dependent Document Priors in Link
Structure Analysis.
- Information Retrieval mit probabilistischem HITS: Alexandra Rostin - Co Referent: Frank Habermann (Uwe Dick)
- Learning to Probabilistically Identify
Authoritative Documents Cohn D, Chang H
- The missing link - a probabilistic model of document content
and hypertext connectivity, Cohn D, Hofmann T
10 min Pause
- Link Spam-Erkennung und -Unterdrückung: Florian Holzhauer - Co Referent: Christian Krowiorsch (Isabel Drost)
- Fetterly et al.: Spam, Damn Spam, and Statistics: Using
Statistical Analysis to Locate Spam Web Pages. 2004.
- Drost and Scheffer: Thwarting the nigritude ultramarine:
learning to identify link spam. ECML 2005.
- Wu and Davison: Identifying Link Farm Spam Pages. WWW 2005.
- Gyongyi et al.: Combating web spam with TrustRank. VLDB 2004.
- AIRWeb 2005, First International Workshop on Adversarial
Information Retrieval on the Web
- Query-Log Ranking: Charlotte Pix - Co Referent: Franziksa Brosy (Isabel
Drost)
- Z. Zhuang, S. Cucerzan, and C. Lee Giles: Network Flow for Collaborative Ranking.
ECML 2006
- R. Kannan, S. Vempala, and A. Vetta. On clusterings: good, bad and spectral.
FOCS 2000
- J. Shi and J. Malik. Normalized
cuts and image segmentation. PAMI, 2000
- J. Luxemburger and G. Weikum: Query-Log
Based Auhthority Analysis for
Web Information Search. WISE 2004
- Soumen Chakrabarti and Alekh Agarwal: Learning Parameters in Entity-relationship
Graphs from Ranking Preferences., ECML 2006
- Almeida, Rodrigo B, Almeida, Virgilio A FA Community-Aware Search Engine
- Probabilistic Relational Models: Franziska Brosy - Co Referent: Alexandra Rostin (Ulf Brefeld)
- Nir Friedman et al.: Learning
Probabilistic Relational Models, 1999.
- L. Getoor, N. Friedman, D. Koller, B. Taskar, Learning probabilistic models of link
structure, JMLR 3, 2002.
- Kubica J, Moore A, Schneider J, Yang Y, Stochastic link and group detection
Mittagspause - 40 min
- Community Detection mit kürzesten Wege-Analyse: Björn Schümann - Co Referent: Jochen Heyden (Laura Dietz)
- Tyler, Wilkinson and Huberman: Email as Spectroscopy:
Automated Discovery of Community Structure within Organizations
- Finding community structure in very large networks
Clauset A, Newman MEJ, Moore C
- Community Detection mit HITS: Jochen Heyden - Co Referent: Björn Schümann (Laura Dietz)
- Gibson D, Kleinberg JMK, Raghavan P,Inferring Web Communities from Link
Topology
- Flake et al.: Efficient
Identification of Web Communities, SIGKDD 2000
- Flake et al.: Methods for
Mining Web Communities: Bibliometric, Spectral, and Flow, 2003.
10 min Pause
- Collective Classification: Sebastian
Schütze - Co Referent: Florian Holzhauer (Peter Haider)
- B. Taskar, P. Abbeel, D. Koller, Discriminative probabilistic models for
relational data, Proceedings of UAI2002
- J. Neville, D. Jensen, Collective
Classification with Relational Dependency Networks
- P. Domingos, M Richardson, Mining
the Network Value of Customers, Proc. International Conference
on Knowledge Discovery and Data Mining, 2001
- Lu Q, Getoor, Link-based Text Classification
- Halbüberwachte Klassifikation auf vernetzten Daten: Christian Krowiorsch - Co Referent: Sebastian Schütze (Uwe Dick)
- M. Belkin, P. Niyogi, Semi-supervised
learning on riemannian Manifolds, Machine Learning 56 (1-3), 2004
- Dengyong Zhou, Bernhard Schölkopf, and Thomas Hofmann, Semi-supervised Learning on Directed Graphs,
Advances in Neural Information Processing Systems 2005
- Xiaojin Zhu, Zoubin Ghahramani, John Lafferty, Semi-Supervised Learning Using Gaussian
Fields and Harmonic Functions , ICML 2003
- Zhou, D., J. Huang and B. Schölkopf: Learning from Labeled and Unlabeled Data
on a Directed Graph. Proceedings of the 22nd International
Conference on Machine Learning
Offene Themen:
- Einführung in natürliche/zufällige Graphen: Habib Shakhawat (Peter Haider)
- Ausschnitt aus Wasserman and Faust
- J. Leskovec, J. Kleinberg, C. Faloutsos, Graphs over Time: Densification Laws,
Shrinking Diameters and Possible Explanations, KDD 2005
- Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar
Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet
Wiener, Graph structure in the web,
Computer Networks, International Journal of Computer and
Telecommunications Networking, 2000
- Watts D, Strogatz S, Collective
dynamics of small-world networks, Nature, 1998
- Characterizing and Mining the Citation Graph of the Computer
Science Literature, Knowledge and Information Systems, Vol. 6, No. 6.
(November 2004), pp. 664-678.
- Simkin, Roychowdhury, A
mathematical theory of citing
- Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S,
Stata R, Tomkins A, Wiener J, Graph
structure in the Web
- Newman, Forrest, Balthrop, Email
networks and the spread of computer viruses
- Bayessche Netze: Benjamin
Werner (Ulf Brefeld)
- E. Charniak, Bayesian
Networks without Tears, AI magazine, 1991
- P. Smyth. Belief networks,
hidden Markov models, and Markov random fields: a unifying view,
Pattern Recognition Letters, 1998.
- Linkvorhersage und Community-Mining mit Stochastic Block
Models: Manuel Hertlein
(Steffen Bickel)
- Estimation and Prediction
for Stochastic Blockstructures Journal of the American
Statistical Association, Vol. 96, No. 455. September 2001, pp.
1077-1087.
- Kemp, Griffiths and Tenenbaum Discovering
Latent Classes in Relational
Data (2004)
- Email-Spam-Filterung mit Hilfe von sozialen Netzen: Alexey Grachev (Michael Brückner)
- Golbeck, Hendler: Reputation Network Analysis for Email
Filtering
- Comparative Graph Theoretical Characterization of Networks of
Spam and Legitimate Email
- Scalable and Reliable Collaborative Spam Filters: Harnessing
the Global Social Email Networks
- Boykin, P., & Roychowdhury, V. (2004).Personal email
networks: an effective anti-spam tool. Preprint,
- Frequent Subgraph Mining: Christian
Gebhardt
- L. T. Thomas, S. R. Valluri, K. Karlapalem, MARGIN: Maximal Frequent Subgraph Mining,
Tech Report, 2006
- T. Horvath, J. Ramon, S. Wrobel, Frequent subgraph mining in outerplanar
graphs, International Conference on Knowledge Discovery and Data
Mining, 2006
- Klassifikation von Graphen: Peter
Siemen (Ulf Brefeld)
- T. Gärtner, A Survey of
Kernels for Structured Data, SIGKDD Explorations 2003
- K.M. Borgwardt, H-P Kriegel, Shortest-path
kernels on graphs,
ICDM 2005