. Summary
(This is part of the TextMiningProject )
The idea
There are many WikiTopics in similar fields, but not always linked together. I wanted to create an automatic solution to find the "See also" topics, and display them.
There is an assumption of TextMining that topics with common words are similar in meaning, field. So I downloaded some tools and wrote some scripts that read the contents of a wiki, calculate the "bags of words", and create clusters of the similar topics. The result is put to the the _ClusteringResults topic, and the WikiTalk script in ClusteringResultsBorders displays it, as seen now in the right border (also take a look at: AdministratorsGuide )
Right now, for using it, you have to include the following line in the topic:
Borders
Implementation
FwSync is used to get the actual topics (60 sec)
Doesn't delete deleted topics, so the repository should be fully deleted once in a while
A script prepares the content (30 sec)
copies the .wiki files with acceptable names (no special chars)
separates the PascalCased words
writes down the topic name, summary, keywords and headings more times, to give greater weight to the words found there.
Txt2Bow .exe form TextGarden calculates the bag of words model (5 sec)
BowKMeans .exe from TextGarden calculates the clusters, the output is an xml file (30 sec)
An xslt creates _ClusteringResults (ran by msxsl.exe)
Usually there are character coding problems with the xml file, so it needs some workaround
The topics' path sould be removed, xsl 1.0 is not capable of this. There is a script for this too.
FWSync uploads the new _ClusteringResults
I'll write a script that does this as a batch job, and can be ran periodically, eg. once a day.
-- SzaMa 2007.01.04.
SzaMa's project for enhancing FlexWiki with text mining algorithms
9/19/2007 7:03:30 PM - -76.84.225.95
defines and describes what a topic is.
1/24/2008 9:02:34 AM - FLWCOM-jwdavidson
Results of automatic clustering, prepared to be used by wikitalk. See: TextMiningProject
1/4/2007 4:40:05 AM - -84.2.157.119
WikiTalk is a language for including dynamic content in FlexWiki topics.
9/25/2008 5:53:56 PM - FLWCOM-jwdavidson
A border element that displays the similar topics, based on automatic clustering. See: TextMiningProject
1/4/2007 3:43:36 AM - -84.2.157.119
Information on installing, configuring and running a FlexWiki instance.
6/25/2008 5:42:43 AM - -80.169.35.71
A border element that displays the similar topics, based on automatic clustering. See: TextMiningProject
1/4/2007 3:43:36 AM - -84.2.157.119
FwSync is a command-line tool for editing the wiki.
7/21/2005 4:11:33 AM - -66.93.224.237
Click to read this topic 8/23/2004 10:34:00 AM - author unknown
FwSync is a command-line tool for editing the wiki.
7/21/2005 4:11:33 AM - -66.93.224.237
Marcell Szabó, student in computer sciences at bme.hu
1/24/2008 7:54:15 AM - FLWCOM-jwdavidson
Welcome to the home of FlexWiki , a collaboration tool, based on WikiWiki , implemented using Microsoft .NET technologies
This is FlexWiki , an open source wiki engine.
This site supports the new NoFollow anti-spam initiative.
Recent Topics
The software running this site. -> jump to HomePage
10/22/2006 7:52:17 AM - -81.182.199.248
Definition of WikiWiki on wikipedia.org
7/30/2007 9:11:19 AM - DerekLakin-90.199.71.244
The software running this site. -> jump to HomePage
10/22/2006 7:52:17 AM - -81.182.199.248
An extension to HTML that search engines use to ignore potential LinkSpam.
3/9/2008 4:21:17 PM - FLWCOM-jwdavidson
Similar topics (?)
The topics found similar by a text clustering algorithm
1/4/2007 4:44:26 AM - -84.2.157.119
A border element that displays the similar topics, based on automatic clustering. See: TextMiningProject
1/4/2007 3:43:36 AM - -84.2.157.119
SzaMa's project for enhancing FlexWiki with text mining algorithms
9/19/2007 7:03:30 PM - -76.84.225.95
What's up with the TextMiningProject? Ahh?
9/13/2007 10:51:50 AM - -217.117.80.2