Automate Two-factor Authentication with RSelenium
Many websites these days require two-factor authentication (2FA) at login, which presents a challenge for automation. I ran into this situation recently when I needed to extract some data from the Salesforce platform. This article will take you through the journey how I arrived at the final solution; and as always, here goes the TL;DR.
Before we tackle the ‘how-to’s, it helps to understand what the Salesforce 2FA process entails. This security feature is required at each login after users input their username and password. One of the options supported by Salesforce is through the free Salesforce Authenticator App on a mobile device. When a login occurs from the web browser, the user will be prompted with a push notification from the App. The App keeps track of a set of data that includes the username, the device where the web browser is on, and the location of the mobile device where the App is installed. The user can then verify the information and "approve" the browser login from within the App. The setup works well if you want to build an attended automation process by retaining supervision at the login level. However, unattended automation would require the 2FA from within the App to be integrated into the automation script.
Thankfully, the Salesforce Authenticator App has a feature called “Einstein Automation”, which can be activated by turning on the "Always approve from this location" at the prompt notification in the App. What it does is that the current location is now added to the list of trusted geo-locations and will now automatically log the user in from these locations.
This seems to have provided the solution to an unattended login! Just wait. I quickly discovered that the Einstein Automation seems to work only with logins that are user-prompted; as soon as the login occurs in a browser that is initiated by Selenium, users are prompted again even from the exact same location!
Looking at the list of trusted locations, it is clear that they are not forgotten by the Einstein Automation, and that this feature isn't turned off in the App. Somehow though the App chooses to ignore the "memory". With the process of elimination, since the one thing that is different is the browser and how it is initiated, my thinking cap directed me towards the browser. Maybe the App needed browser's help to execute on its memory?
BINGO! It turns out browser has memories too, and it is in the form of HTTP cookies. These cookies are stored in the browser as a type of "required cookies" allowing Salesforce to record and store encrypted authentication information. Clearly, these cookies are not associated with the new browser initiated by Selenium. And in order for Selenium-triggered logins to be recognized by Salesforce the same way as would manual logins, the browser initiated by Selenium needs to be able to retain and access the Authentication cookies.
Now that we’ve identified cookies as the missing piece that connects the App and browser in the authentication process, we need to devise a solution to fill in the missing piece. The first step is to find out where these cookies are saved. Chrome is the browser I primarily use for RSelenium tasks. With Chrome, one can create different profiles, and each profile has its own data directory where cookies are saved in. Therefore, if we can assign a specific chrome profile for Selenium, it would retain the memory of past authentication sessions due to the cookies stored under the profile. This turns out to be readily accomplishable by defining ChromeOptions with getChromeProfile.
TLDR starts here:
Step 1. Switch on Einstein Automation on the Salesforce Authenticator App
Step 2. Use ChromeOptions to assign and retrieve a specific profile which you have used for manual login. The default location for Windows 10 is C:/Users/%USERNAME%/AppData/Local/Google/Chrome/User Data. Note that a specific IP is assigned as a third argument to remove the possibility of changing IP between different sessions.
Step 3. Load this profile as ‘extraCapabilities’ when initiating the remote driver
Voila, Selenium meets Einstein and they have an intellectual conversation over Cookies (and possibly milk). You can now proceed to navigate to the login page and start enjoy fully automated data scraping from Salesforce!