Pentaho Data Mining (Weka)Welcome to the community home for Pentaho Data Mining Community Edition (CE) also known as Weka. Pentaho Data Mining is a comprehensive set of tools for machine learning and data mining. Its broad suite of classification, regression, association rules and clustering algorithms can be used to help you understand the business better and also be exploited to improve future performance through predictive analytics. Community Edition is self supported open source software. An Enterprise Edition (EE) of Pentaho Data Mining including technical support and managed upgrades is also available. For more information about EE or for screen shots and datasheets, visit Pentaho Data Mining EE on Pentaho's corporate site.Recent News and Releases
- 10/31/11 New Weka 3.6.6 and 3.7.5 releases available, more info. Stable
New Features since 3.4
In Development
New Features in 3.7.5
In core weka:
- weka.classifiers.functions.SGDText - stochastic gradient descent for learning linear SVMs and logistic regression for text problems. Operates incrementally and directly on string attributes.
- New incremental version of the multi-class meta classifier (weka.classifiers.meta.MultiClassClassifierUpdateable).
- RandomForest now supports building trees in parallel.
- DatabaseLoader is now much faster when loading data sets with many nominal attributes.
- Database access now allows custom property files to be set at runtime, allowing access to databases different from the default one without having to restart Weka.
- TextDirectoryLoader can now operate incrementally.
- CSVLoader now supports files without a header row.
- Charts can now be exported to files from running Knowledge Flow processes via an offscreen rendering process.
- RemoveUseless filter now removes attributes with all missing values.
- Histogram visualization in the Explorer and Knowledge Flow is now faster.
- ClassifierPerformanceEvaluator in the Knowledge Flow is now multi-threaded to allow folds to be evaluated in parallel.
- File-based savers now support gzip compression.
- File-based loaders now support loading files as a resource from the classpath (including jars).
In packages:
- multiInstanceLearning - added MITI multi-instance tree learner and MIRI rule learner variant.
- RerankingSearch - a feature selection meta-search algorithm that speeds up the base search algorithm, contributed by Pablo Bermejo.
- timeseriesForecasting package now includes support for handling timestamp-based data which contains gaps in the regular time period.
- sasLoader - SAS sas7bdat file reader.
- CHIRP - A new classifier based on Composite Hypercubes on Iterated Random Projections, contributed by Leland Wilkinson.
- PSOSearch - An implementation of the Particle Swarm Optimization (PSO) algorithm to explore the space of attributes, contributed by Sebastian Luna Valero.
- wekaServer - A simple servlet-based server for executing data mining tasks (Explorer and KnowledgeFlow so far). Docs at URL=http://wiki.pentaho.com/display/DATAMINING/Weka+Server
- jfreechartOffscreenRenderer - Offscreen (headless) chart rendering in Knowledge Flow processes using the JFreeChart library.
Upcoming Training
Quick Links
Helpful Links Contribute to the Project
You can participate by contributing new code, reporting bugs, testing new releases, answering questions and more; Email us the proposed contribution and any other relevant details. Welcome to the team. |
|||||||||||||||||||||||||||||||||||||||||||||||