Announce: Perl TM::Corpus
Here is TM::Corpus as experimental package to extend a topic map by all the documents it references. A map corpus is then all the internal and external (text and data) content a map covers.
Usage is simplistic:
use TM;
my $tm = ... # some map
use TM::Corpus;
my $co = new TM::Corpus (map => $tm) # bind with map
->update # link in all content from map
->harvest; # link in content external to map
my $tm = ... # some map
use TM::Corpus;
my $co = new TM::Corpus (map => $tm) # bind with map
->update # link in all content from map
->harvest; # link in content external to map
Once such a map corpus is in your hands, applications can use all sorts of text mining operations on it.
One obvious application is fulltext search which is bundled as trait TM::Corpus::SearchAble::Plucene.
Work supported by the Austrian Research Centers.
- rho's blog
- Login to post comments
- Printer-friendly version
