· Technical

Deploying a hybrid data & analytics platform today is significantly faster and easier by leveraging the
Cloudera CDP platform. What previously took several weeks, or months can now be accomplished in just a few hours. One of the key components of CDP is the Shared Data Experience (SDX) which serves as
the backbone for security and governance across both cluster- and container-based services. Ramsey International (RI) has developed Daedalus, a robust automation solution enabling organizations to
quickly deploy a production-level data & analytics platform. Daedalus delivers a Proof-of-Production™ platform that allows an organizationto immediately drive value, rather than investing time and money in throwaway proof-of-concepts.

broken image

A production-level platformrequires extensive access control and resource management. To prevent the power of cloud storage from becoming a security concern, Ranger Authorization Services (RAZ) enables fine-grained access control, like traditional on-premises policies and audit. RAZ together with SDX User Management provide access controls for AWS S3 and Azure ADLS, coupled with CDP resources including
Data Warehouse queries.

Many organizations are interestedin deploying role-based access control (RBAC) or attribute-based access control (ABAC) as their approach to access management. RBAC works well when groups of users require similar data and resource access, allowing policies to be set up for access to a collection of data resources and platform resources. ABACworks well in cases where characteristics of the data align well with access
permissions to be granted to the users. But what if the organization wants to use both approaches? What about the potential explosion in thenumber of groups? Who does the manualcreation and maintenance of policies in Ranger, while others manually support the groups and membership in your directory service with access by Lightweight Directory Access Protocol (LDAP)?

Let’s examine an example scenario at ACME Pharmaceuticals. ACME has 500 users of their CDP platform that provides data warehouse (DWX), data engineering (CDE), and machine learning services (CML). Additionally, they have a team that is usinga data engineering cluster for a specific application workload. ACME is using AWS as their cloud provider, sothe data is stored on S3. The organization currently has 200 databases being accessed by Hive or Impala, and 40 S3 locations where other data is stored. ACME is using Microsoft Active Directory (AD) for their user management platform.

One approach being considered by ACME is to set up a group for each CDP resource access level, and a group for each data access level. This would mean that a cdp-group-dwx-admin group would be created for users with admin access rights to DWX, and cdp-group-dwx-user group for those with only user access rights. The same approach would then be used for CDE, CML and the data clusters. For data access, a group would be created for each database to provide read-only, read-write and full rights to the resource. The same approach would again be used for the S3 storage locations, resulting in an additional 120 groups. In total, this approach will result in at least 728 groups. Once the groups have been added to AD, they then must be manually added and maintained in Ranger.

A second approach would be to leverage RBAC to simplify and reduce the number of groups. Using this approach, ACME would identify key roles for the users accessing the platform. ACME has determined that most of the usage falls into the following roles: admin, data curator, data engineer, data scientist, business analyst, payroll analyst and management. Most of the users are in only one group; however, they have found some users that are in several groups. Using this approach drives a much smaller number of groups but requires significantly more complex policies to be created in Ranger. Adding, removing,or changing a data asset, such as an S3 location, requires accessing each of the 7 groups for potential modification. Those users in multiple groups may be problematic if one of their group memberships grants access to a resource, but another has an explicit deny.

The challenge with both approaches being considered by ACME is the heavy manual workload needed to implement and manage the resource and access management. The first approach has resulted in an explosion of groups which will quickly exceed the limits of CDP. The second approach has only a handful of groups, but each one is complex, and manually managing changes is challenging.

Using the RI Daedalus solution, ACME can leverage automation for LDAP-driven access & resource management. For ACME, AD will be the key repository for the process. Using the benefits of the first approach, each CDP resource and data resource will be controlled by a unique group. This results in many groups;however, it provides the most granular control. Daedalus eliminates the manual effort of creating and managing Ranger policies, making it less prone to error. But what about the substantial number of groups?

Rather than directly materialize each group that is defined in LDAP, Daedalus performs compression of the number of groups through a distillation process. Combining the granular level of control with the concept of roles results in a significant reduction in the number of materialized groups. Daedalus performs a review of all the access control provided by the collection of individual groups, then creates a distilled group to represent the access. If a specific user has been granted membership in a collection of 15,20, 30 or even more detailed groups, it results in Daedalus creating a single group. All the users that have the same collection of detailed groups are then assigned to the distilled group which results in a significant reduction in the number of groups that are created and maintained within CDP and Ranger.

broken image

For organizations that are interested in using the RBAC approach, the defined roles can be collections of
groups rather than a requirement to define all of the specific access within each RBAC group. This allows each of the detailed access groups to be maintained independently, but then combined into the RBAC group as needed. Again, rather than attempting to manage this manually, it is all controlled by simple entries into AD/LDAP and distributed to CDP and Ranger for execution. This approach also allows for situations where certain users require unique access permissions outside of the standard group, allowing for the combination of RBAC, ABAC and direct user-level access control all being handled via LDAP.

The Group Distillery module of Daedalus was designed specifically to enable LDAP-driven access
management. The module reads all the groups and users from LDAP and implements the CDP resources plus IAM or Ranger policies for data access control. As groups are changed, removed, or added the module automatically revises the underlying policies. As users move between groups, are newly added, or even deleted in LDAP, those changes are quickly reflected in the CDP platform.

The RI Group Distillery module enables automated LDAP-driven access & resource management in CDP with the flexibility to deploy ABAC, RBAC, and even user-level control, without significant manual effort – reducing the risk of errors, cost and complexity.