Posts

Showing posts from January, 2018

Assignment 02: Spark application to extract the Message ID, Date, From and To fields

Assignment 02: Scenario Your big data consulting company has been hired by a small law firm to help them make sense of a document dump they have received for a big trial. The firm believes that the outcome of their trial depends on finding certain information in the emails from the opposition’s clients. They have secured an initial dump of employee’s emails at the company in question, but in order to get continuing data they need to prove that there is value in the sample. In order for their document analysts to do that in a timely manner, they will need some metadata extracted from each email so they can process it using their document review tools. If they are able to find what they need by the deadline, your company will get an ongoing contract to build a pipeline to process incoming document dumps (YAY!) Assignment Description Using the sample data consisting of a series of emails, write a Spark application to extract the Message ID, Date, From and To fie

Assignment 01 - Installing Azure CLI 2.0 and resizing VM

Image
Week 1 Homework Installing Azure CLI 2.0 and resizing VM Now that we have more experience working with Azure VMs, I’d like people to become familiar with the command line interface (CLI) to Azure.  Every possible operation is made available through the CLI, in contrast to the web portal where many things are difficult and sometimes not even possible to do. Install CLI for your environment You generally will want to have the CLI available on your local machine/laptop since you are really only interacting with the VMs at this point. Windows/Mac users follow instructions at https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest Ubuntu There is an apt repo available, but I was a little uncomfortable with it.  The CLI is basically a wrapper around some python scripts, so the easiest is to just use pip pip install azure-cli (use the --user option if not using env manager or sudo) Login to CLI (Link to your NetId accou