As part of our RapidMiner Studio 7.3 release, we added an in-product Beta Mode that allows you to take a sneak peek at experimental features that are currently in beta testing for upcoming product releases, with the click of a button. As opposed to a dedicated beta release, the Beta Mode allows you to interactively test features which are more substantial in nature and require broader and more extensive testing. To kick-off our new Beta Mode feature, we are introducing an initial, experimental implementation of a new core data management for the RapidMiner platform for you to try!
The page below outlines the goals of this implementation and it’s delivery through the Beta Mode feature. It also describes how to activate and deactivate the Beta Mode in order to test-drive the new feature. Please give it a try and provide as much feedback as possible via the “Submit Feedback” button at the bottom of this page. If you have any questions or need additional information feel free to email us.
As part of a special purpose project to improve overall product stability, lowering memory resource consumption and boosting performance, RapidMiner’s engineering team has revised the internal in-memory data management and data processing of the RapidMiner platform. As a result of extensive benchmarking, we have begun to change the way that data is internally organized and represented during processing – moving from row-oriented to column-oriented data management. The implementations released with RapidMiner 7.3 and 7.3.1 aim at providing the following performance benefits:
- Auto-detection of sparse data (7.3.1 only)
Automatically detect sparse attributes. Use special data structures that have a very low memory footprint to represent these data sets.
- Lower memory footprint
Smaller overhead for narrow (few attributes) data sets, as well as a more compact representation of dense, nominal attributes.
- Faster attribute set manipulations
Improve performance for operators that change the number of attributes, e.g., Generate Attributes.
- Improve management of temporary data
Improve memory usage and stability for processes that generate many temporary attributes, e.g, within loops or similar operators such as Optimize.
Enabling the Beta Mode in RapidMiner Studio 7.3 allows you to switch from the usual, existing row-based data management to the new, experimental column-based data management. This capability allows you to easily test and compare the execution characteristics of this new implementation against the existing one. we believe it is critical to give our users the opportunity to test it on-demand in real-world use cases to gain a more comprehensive understanding before it’s released for GA. We highly encourage you to enable the Beta Mode from time to time to run a process and send us feedback on your experience – both positive and negative. All your feedback is crucial to ensure that the new implementation does not perform worse (but better) in your use cases.
How to activate?
You can enable the Beta Mode feature included in our recent 7.3 release of RapidMiner Studio by going to the preferences tab and checking the corresponding item in the Updates section. There is no need to restart RapidMiner Studio – the change takes effect the next time you run a process.
You can see in the status bar in the lower left whether the Beta Mode is turned on:
What to look out for
The current changes enabled with the Beta Mode should be most notable when using operators that generate temporary attributes within meta operators. Just to name one example: using the Principal Component Analysis (PCA) inside a Cross Validation that in turn might be embedded in an Optimize operator. In this scenario, the PCA is executed many times and each iteration generates a temporary set of attributes independent from the other iterations.
Another interesting test case might be if you know of scenarios where using the Materialize operator inside meta operators (e.g. loops, optimization operators, etc.) improves the overall performance. Please try running these processes in Beta Mode without the use of Materialize.
However, we are in general interested in any performance related differences between regular and Beta Mode. Please use the “Submit Feedback” button below to let us know your experiences. Even letting us know that switching on the Beta Mode does not affect the memory consumption or runtime of your process is good and helpful feedback!
The experimental features enabled with the Beta Mode do not cover all aspects of data management nor all use cases yet, known limitations at this point are:
- No support for manual memory settings
The Beta Mode ignores any data management settings of operators such as Materialize.
- 3rd party extensions
Extensions that generate data sets might bypass the Beta Mode so that no effect can be seen by enabling it. This also applies for outdated extensions released by RapidMiner. Please make sure to install all available updates before testing the Beta Mode.
Your feedback is critical to the success of our beta program and we looking forward to your comments.
Please send all your feedback – positive or negative – via the “Submit Feedback” button below. Please submit separate reports for each new topic so that we are better able to track and address your comments.
If you run into a process that performs worse using the beta mode or shows a different behavior, please provide an example process. This will help us analyze what happens and why.