Welcome to the
RapidMiner 8.0 Beta!

Welcome

Thank you for participating in our RapidMiner 8.0 beta program. The program will run from October 26, 2017 to November 27, 2017 and will give you an opportunity to try the Server Architecture as well as some of the new features and improvements that we have been working on.

What’s New

RapidMiner Studio 8.0

Parallelized model optimization operators

The operators Optimize Parameters (Grid) and Loop Parameters now run in parallel leveraging multiple CPU cores (depending on license type). This translates directly into faster processes and improved performance when tuning hyperparameters of machine learning models.

Please be aware that the old operators are deprecated. Existing processes containing them will still use the previous versions and they need to be replaced by hand in order to benefit from the parallelization.

Regression Trees and further Decision Tree enhancements

We enhanced the capabilities of our Decision Tree and Random Forest algorithms:

  • Regression Trees - Our Decision Tree and Random Forest operators can now handle numerical labels and solve regression problems as well. A new splitting criterion based on the least-squared statistic has been added for these tasks.
  • Extremely Randomized Trees - For Random Forests, there is a new option to select splits randomly which helps to build more robust trees and to prevent overfitting. When selected, cuts for numerical attributes are done randomly.
  • Provision of feature weights - The Decision Tree and Random Forest operators now provide a new port that outputs feature weights, capturing and allowing to quantify the importance of attributes when building the tree.


Fuzzy operator search

Searching has never been easier! Now our operator search is fuzzy, which means that you can misspell a bit and still find what you are looking for.

Improved operator help

We have continued to improve the help texts of the RapidMiner operators. A new set of operators has been reviewed for and improved for this release to make it easier for new users to start working with RapidMiner Studio and to help experienced users get acquainted with any operator more easily.

This is the list of improved operators:

  • Naïve Bayes
  • Normalize
  • k-Means
  • k-NN
  • Join
  • Performance (binominal classification)

RapidMiner Server 8.0

Please notice that there are significant architectural changes from the 7.x versions. Carefully read this document and the installation guide before migrating your existing environments.

New fully scalable Architecture!

Big changes in this release: the new RapidMiner Server has a scalable and more robust architecture. This graph shows a component-view:

Each blue box represents a separate machine. The big box to the left represents the central RapidMiner Server, which provides the UI and single point of entry for user requests. These tasks are carried out within the RapidMiner Server central node:

  • Scheduling of user jobs (processes)
  • User, queue and permissions management
  • Execution of processes running on the local Job Agent (this Job Agent is deployed and configured by default, but it can be removed if the user prefers that all execution is done remotely)
  • Execution of processes scheduled through web services, web apps

The Job Agents are the new kids on the block. They can be deployed in remote machines and they are configured to point to one queue each. Job Agents check their queue and, if there are jobs to do, pick them up, spawn a Job Container, and execute the job (user process). Their role is to provide the scalability and, together with the queues, the resource sharing and management among users or projects.

Although scalability is the main new feature, it's still possible to run the RapidMiner Server in a single machine with one (or more) local Job Agents executing the jobs. In that case, the new architecture provides better fault tolerance and improved reliability.

Components

There are two components:

  • RapidMiner Server – includes all the services the older RapidMiner Servers provided, including scheduling, queues, web services, triggers, etc.
  • RapidMiner Job Agent – can be deployed on any local or remote machine for horizontal scalability. It needs to be configured to point to one of the queues defined in the RapidMiner Server. It's only responsibility is to pick up jobs from the queues and run them.

Process and data flow

When a user schedules a process from Studio or from the Server's UI, the process is placed into the corresponding queue. Any of the Job Agents connected to that queue pick up the work and run the process. The RapidMiner Server (and the user, through the UI or Studio) gets notified and logs become available.

The process execution is the responsibility of the Job Agent. It spawns a Job Container, which may connect to the repository or external data sources as needed depending on the process. There is no data flow from the Server to the Job Agents or the other way around.

Queues and Scheduling

Differently from what happened in previous versions, queues are now linked to Job Agents. Multiple Job Agents can connect to the same queue, but each Job Agent can only connect to one queue. Queues have user permissions and sending a process/job to a queue determines which Job Agents will work on it and which resources will be available for it. Many processes can be run in parallel by a single Job Agent if it's configured to start a certain number of Job Containers. However, a single process is always run by a single Job Agent (a process execution cannot be distributed to multiple machines).

If no free resources are available when a process is scheduled, it waits in the queue until it is picked up by any free Job Agent.

What doesn't change

This is a first step in the direction of creating a better architecture for all use cases. In this first release, we have focused on improving scalability and reliability for long running processes. As you can see in the architecture graph, processes being run as web services or WebApps are still executed in the Central Node of RapidMiner Server; that's still not scalable. With the current architecture, Job Agents still have some increased latency as they spawn a Job Container for every job they have to run. Therefore, we recommend big, long-running jobs be scheduled in the Job Agents, while fast, short jobs run as web services.

Main differences between the old and the new RapidMiner Servers

This table shows some of the main differences you will find in the new version.

RapidMiner Server 7.6 RapidMiner Server 8.0
Scalability Limited to one machine Single central Server for scheduling and administration, but multiple Job Agents that can be installed on multiple machines for process execution.
Queues Just for logically organizing jobs Each Job Agent is configured to only one queue (but each queue can be connected to multiple Job Agents). Job Agents are configured to use certain resources (memory and processes). Those resources become available for jobs scheduled in the corresponding queue. Therefore, queues are a means to share and limit the system's resources among users.
Resource Management Everything shares the same HW resources. The new queues allow having dedicated machines or resources for groups of users
More on Resource Management Jobs can take up as many resources as they need up to the capacity of the machine. Job Agents provide access to the resources (memory and CPU). By configuring the queues, Administrators can increase or limit the resources available to a user or group. They can be shared or be restricted to a group.
Web Services, and WebApps Run on Server Run on Server, too. They are not affected by the new architecture. They are not run on Job Agents.
Scheduling Jobs run as soon as they are requested or whenever they are scheduled. They always run on the Server. Jobs will be run on any Job Agent with free resources connected to the queue. If there are no free resources, jobs are queued.
Logging All logs are written and stored in the Server's file system. Logs remain in the (possibly remote) Job Agents. They can be retrieved from the central RapidMiner Server as long as the Job Agent is running.
UI First steps were taken towards a fresh new UI design of RapidMiner Server. The way the process list is shown and how queues are created have been renewed.
Extensions/JDBC drivers Deployed on the RapidMiner Server They have to be manually deployed on every Job Agent. Each Job Agent may have a different list of extensions, so it’s possible to create dedicated Job Agents for a particular use case.
Fault tolerance Everything runs on a single machine and a single JVM. Problems in a process can affect the whole system. Executions are run on separate JVMs and (potentially separate machines). All processes are fully independent and the whole system becomes much more robust and tolerant to problems in individual processes.

RapidMiner Radoop 8.0

Clustering based on Spark MLlib

We have updated the K-Means clustering operator in Radoop to use Spark MLlib's algorithm instead of Mahout. Mahout is no longer an active project and we have decided to deprecate our Mahout-based clustering operators (including the K-Means algorithms and Canopy) and replace them with new and more efficient Spark equivalents.

Cloudera Spark library upgrade

We always incorporate the changes in the major Hadoop distributions. We have added a new option to support the Cloudera Spark library. We adapt, so you don’t need to worry about technology upgrades or changes.

Support for HiveContext

HiveContext is a framework to use HiveQL’s powerful language and functionality within Spark script. This allows for a more flexible and easier to use scripting tool. Note that some cluster-side configuration may be needed to use HiveContext.

Documentation

We have set up a beta documentation page where you will find additional information - including the full patchnotes - about the new features of this release. It is available at docs-beta.rapidminer.com.

Differences for RC

RC (released Nov 28) contains some changes in RapidMiner Studio and RapidMiner Server compared to the first Beta:

RapidMiner Studio

  • Fixed Regression Trees
  • Fixed a bug in cluster model results when clicking on elements in the Folder view
  • Fixed Churn template input size
  • More informative error messages for Seemingly Unrelated Regression
  • Improved documentation for the following operators: Naive Bayes, Normalize, k-Means, k-NN, Join, Performance (Binominal Classification), and Seemingly Unrelated Regression
  • Fixed problem when exiting RapidMiner Studio while an error bubble was shown
RapidMiner Server
  • Triggers list now has tooltips
  • Triggers now display a warning if their queue does not exist anymore/has no Job Agents connected to it
  • It is now possible to change the target queue of scheduled processes via the UI
  • Scheduled Processes now display a warning if their queue does not exist anymore/has no Job Agents connected to it
  • All triggers are paused while waiting for pending migration steps and will be started as soon as the migration is completed
  • Fixed an issue that kept obsolete data in the database during migration to 8.0
  • Installer now contains more documentation on the first page, especially in regards to an upgrade
  • Version number is now shown in the installer
  • Deleting a property in the System Settings UI now resets its value properly in the database
  • Removed duplicate config.properties in the Job Agent distribution
  • Job Agents will now be killed forcibly via the stop-job-agent script
  • Job Container will not longer split the execution log file
  • Process logs found in the Execution Details can now be copied easily and the details themselves are more readable on small resolutions
  • Added com.rapidanalytics.security.x_frame_options property to allow administrators to disable embedding elements of RM Server into other websites
  • Added com.rapidanalytics.security.access_control_allow_origin and related properties to enable administrators to allow CORS. See documentation for more details about the new properties.
  • Deleting a property in the System Settings UI now resets its value properly in the database
  • Session cookie can no longer be accessed by scripts
  • Fixed some Server error responses
  • Fixed an issue that could cause errors during LDAP authentication
  • Fixed an issue that could prevent Salesforce connections to work on Server

Downloads

Below you can download the beta releases of RapidMiner 8.0. Please note that your existing licenses will determine the products and functionality you are able to test.

RapidMiner Studio

Windows

Installation: Extract all contents of the ZIP archive and run RapidMiner-Studio.exe

Note that this release cannot be used to update existing installations!

Mac OS X

Installation: Open the disk image and drag the RapidMiner Studio 8.0 Preview App to your Applications folder.

Other Platforms

Installation: Extract all contents of the ZIP archive and either run RapidMiner-Studio.sh (Linux) or RapidMiner-Studio.bat (Windows).

Note that this release cannot be used to update existing installations!

RapidMiner Server

All Platforms

Installation: See our Installation Walkthrough for details. Note that we strongly advice against upgrading existing installations! Please install this preview release separately from any of your production or backup systems. Please contact us with any questions you may have regarding installation.

RapidMiner Radoop

All Platforms

Installation: Save the JAR file to your .RapidMiner/extensions directory (located in your user directory).

If you need to install RapidMiner Radoop functions (Hive UDFs) manually then please contact us to discuss the beta UDF upgrade on your Hadoop cluster.

Feedback

Your feedback is a critical to the success of our beta program and we looking forward to your comments.

Please send all your feedback – positive or negative – via the “Submit Feedback” button below. Please submit separate reports for each new topic so that we are better able to track and address your comments.

For bugs or errors, please be as specific as possible when submitting your report:

  • How can we reproduce the error?
  • What UI elements were you interacting with?
  • Attach stack traces files that show the error.

    • RapidMiner Studio stores log files in .RapidMiner/rapidminer-studio.log and .RapidMiner/launcher.log. You can also enable the log view in RapidMiner Studio via View > Show Panel.
    • RapidMiner Server logs to ./standalone/log/server.log (relative to its installation directory).
    • RapidMiner Radoop can export logs after a connection test using the Extract Logs... button on the Manage Radoop Connections dialog