The Data Operations team was created to accelerate the time-to-value of building ETL/ELT data applications by abstracting infrastructure matters and exposing a set of modulable and reusable plug-ins that encapsulate business value. Our decision to use Google Cloud’s cask data application platform (CDAP) to complete our mission is explained in the above-linked article published on Google Cloud in July 2021.
This blog will describe how CDAP exposes authentication in several ways, which option LiveRamp uses, and explain our solution and authentication flow.
Like all applications that contain or interact with proprietary or customer data, the data operations platform (DOP, our deployment of CDAP) needs to be secured with an authentication mechanism.
Authentication needs to keep users efficient (therefore happy) and it needs to present as little overhead as possible to maintain and onboard new users. This is done by not requiring users to log in each time they want to connect to a different system, not requiring them to remember a passphrase, or by making them use something they already know. For developers, a user base managed at the company level is an efficient, low-overhead solution.
Authentication in CDAP is described in the CDAP docs. They expose several out-of-the box solutions, including basic username/passphrase authentication and LDAP authentication, and give developers the option to write a custom authentication. They also provide a PROXY authentication mode, in which authentication happens upstream from CDAP, where all incoming requests are then trusted as already authenticated.
At LiveRamp, authentication is done using Okta SSO for all corporate applications. It provides a good user experience and ensures good security practices (e.g. enforcing password standards, multi-factor authentication, automatic removal of access on employee churn). To extend those benefits to our customers, the Nexus engineering team at LiveRamp has created LiveRamp SSO, based on Okta, granting access to LiveRamp applications to both LiveRamp employees and customers.
LiveRamp SSO is a good candidate for CDAP authentication because it matches all the constraints we exposed and can support our long-term vision to give direct access to the data operations platform for external users (e.g. LiveRamp customers). Finally, the Nexus team exposes an identity-aware proxy which manages authentication and sessions as a layer in front of a web application called web-app proxy (WAP).
Let’s dive into the technical details of our solution. Our goal was to configure CDAP to use the PROXY authentication mode and use the WAP as a layer in front of our CDAP deployment. The below schema presents our infrastructure: our ingress points to the WAP, which redirects to Okta if no session cookie is present. Once authenticated, all requests are then wrapped with user identity and passed down to the CDAP UI.
Next we’ll describe how we tailored existing solutions to our specific needs: how to integrate with CDAP PROXY mode, the different proxies we’re using, and the modifications we made to CDAP to allow for deploying sidecars to the UI pod.
Integration with CDAP PROXY mode
Since we want to use our own identity provider and authentication mechanism, we decided to leverage CDAP PROXY authentication mode, in which authentication happens in an upstream proxy, as shown in the diagram below:
- User identity should be included in the x-user-id header.
- User credentials should be included in the Authorization header.
It is then necessary to include those headers in the requests at the proxy-authentication level. Let’s describe how we did this in the next section.
One thing to note about this approach is that you must ensure that all traffic flows through the authentication proxy before hitting your application so no client can communicate with the application server directly. In addition, CDAP’s Kubernetes Services should be configured to only listen to the proxy’s port as an additional layer of security.
First, we need a proxy that will manage authentication and sessions. When accessing the DOP URL, that proxy will intercept your request and check if you have an active session. If not, it will redirect you to the identity provider (Okta in our case) to log in, then complete the flow and set a session and an access cookie. The proxy also checks for the expiry of the session cookie and reinitiates the flow if necessary.
The WAP transmits two useful pieces of information:
- User identity: included in the X-validated-Identity header of downstream requests
- User credentials: included in the access cookie
Second, we created a proxy (named DOP Authn Proxy, or DAP) to adapt the user identity and credentials into the exact format required by CDAP. This proxy is in charge of the following:
- User identity: parse the WAP’s “X-validated-Identity” header, extract the user-id, and include this in the “x-user-id” header of all requests to CDAP
- User credentials: parse the WAP’s access cookie, extract the bearer token (user credentials), and include this in the “Authorization” header of all requests to CDAP
See it depicted below:
We solved our problem using a sidecar proxy that’s commonly available at LiveRamp (but not fully configurable) to handle authentication and sessions, then created another simple proxy to adapt the data to the format required by CDAP’s PROXY authentication mode.
Deploying proxies alongside CDAP: using sidecars
If you’re reading this, it probably means you are trying to integrate with CDAP PROXY authentication mode, as we did. We described our solution, but we did not show exactly how we’re deploying those proxies alongside CDAP. Please note that our team deploys the Kubernetes version of CDAP, so this section will mention some Kubernetes-specific elements.
Those proxies are ideal to implement as sidecars: the address of any upstream proxy becomes 127.0.0.1, and the communication speed between sidecar containers is the highest possible, making the flow more seamless for the end user. In our specific case, those sidecars need to be deployed as containers inside the CDAP UI pod.
Since CDAP does not provide a way to deploy additional containers into a pod, we added that capability to the CDAP operator. After our change, a sidecar proxy can be deployed into a pod with the following configuration (based on the DOP Authn Proxy – UI pod example):
Looking more closely at this example, we’re telling the CDAP operator to deploy an additional container named dop-authn-proxy into the userInterface pod. The arguments are specific to that container and are related to where the proxy should listen and what it should consider as its upstream. Zooming in on the diagram we previously exposed, you can now more easily recognize one of the sidecar containers.
Another option would have been to deploy those proxies as classic pods inside the same Kubernetes cluster as our CDAP instance, but we decided to use sidecars because there were a few benefits associated with this approach:
- Deployment is simpler, as only one Kubernetes manifest is required
- Communication is slightly faster if happening inside a single pod rather than cross pods
- A Kubernetes cluster is cleaner, as there is only one pod and the proxy containers are hidden inside it
In another context, classic pods could be the right solution to use.
We just showed how we’re leveraging our company’s single-sign-on to log into CDAP. We also showed precisely how CDAP’s PROXY authentication mode functions, and how we’re integrating with it.
In the near future, we are planning on releasing this sidecar deployment into the open-source CDAP project. LiveRamp supports giving back and the open-source ethos that makes us all better.