Setting up a cloud Data Science test environment, on a budget

Before I get into another long diatribe, know that the minimum you need to get started with R is R, and R Studio and know that they will run on just about anything. But if you want a bit more of an elaborate setup including SQL, read on.

Many years ago I took great pride in having a half dozen machines or more running all flavors of windows and SQL to play with and experiment on, it did not matter what it was, it would bend to my will. And in case you are wondering, NT would run very nicely on a Packard Bell.

Once I took over the CAT lab I was in hog heaven, I had a six figure budget and was required to spend it on cool fast toys and negotiate as much free stuff from vendors as I possible could. It was terrible, tough job to have. Jump forward to now, I own one Mac Book pro and one IPhone, and serves every need I have.

When your world is Python or R, everything is just boot loader to get you there. For the most part this is how I see the world now, my mac is just a boot loader to get me to R. When I was exclusively a SQL guy, windows was nothing more than a boot loader for SSMS, with a SQL Server living off in the distance somewhere.

Now, I have the need for R, Python, Revo tools, and every supported version of SQL, so what is a Mac guy to do?

Realistically, you only need one server with SQL, R and R Studio installed. From a playground perspective if you can get all of that on one machine, all the better. This will make your self training much easier. Worry about connectivity and security in a dev/test/prod environment at work later, first get your own play ground. SO, how do we do this?

All in one on your primary machine?

I did this for many years, I would have as many as three instances of different SQL versions running on my laptop or desktop, from a play ground perspective it worked fine, though be sure to cap the memory of each instance, and there is no reason to have them all running at the same time or any running f you are not using them.

Work Virtual Playground?

It is possible your employer will give you space on a virtualized machine to play around with, it never hurts to ask. Remember, you are looking for a SQL, R, and R Studio, and/or Python if you want to go that route. Or even a Vm hanging off of an external SSD connected to USB 3 maybe? Yes, i have done it, it is fast, not Tier 1 fast, but it gets the job done. I do have on standby VMware Fusion on my mac with a 500GB external SSD drive connected via USB 3. Though i rarely use it.

Cloud Virtual Machine?

Well, this is how I have rounded out my lab in the sky! I have a Visual Studio subscription that gives me $150 of Azure credit every month, this is fine for me but I have to manage it extremely closely. I have some pretty meaty machines and the only way I can afford them is to keep them turned off when I am not using them. That is important, if an Azure VM is stopped and deallocated you are only being billed for the storage cost, not the compute and licensing cost. I only use a few hours a week as much of my work can be done with the instances off. Also, there is a Data Science virtual machine that has SQL, R, R Studio, Revolution R and Server as well as Python, and a crap load of other DS tools installed for you, it’s the perfect configuration for a data science play ground. Just make sure, that you have the VM to auto shutdown and deallocate every day, otherwise you will pay through the teeth for compute time you do not need.

How to get some Azure credits!

So you have some options here; for Visual Studio Subscribers you get between $50 and $150 a month in Azure credit depending on the package you buy. If you must pay out of pocket I believe $45 a month is the lowest entry point (paid annually) and I would certainly ask your employer to help out with that cost since they will benefit from your increased knowledge. Though this will require the 1 year advance purchase for the training credits. There are a few packages you can buy, the annual subscription offers training credits at PluralSight, Opsgility, MSFT E-Learning and a couple other training benefits. So if you are seriously interested you get Azure credit as well as more training than you would probably complete in a year anyway.

BizSpark?

I have not personally looked into this, but i think i may in the near future, Bizspark can give you a load of Azure credits for companies less than five years old and revenue of less than one million, if you’re a little guy like me it may be worth asking.

Corporate EA

If your company has an Enterprise Agreement with Microsoft, you have Azure credits in the EA that you can use, and they are not a trivial amount either, if your company is not actively using them for other work, or they have some left over, see if you can get started with them. On that note, if your company has an EA agreement, you can usually get a discounted Visual Studio or MSDN subscription per/employee which will also give you access to Azure credits. There is probably one person in your company that is responsible for maintaining the Microsoft relationship, find them and have then ask. There are LOTS of benefits to employees in this agreements and they frequently go unused.

If all else fails just install R and R studio on anything you can get your hands on. I use a Visual Studio Enterprise Subscription that gives me $150 a month in credit, and so far I am able to stay within my budget. In the Azure portal there is a billing section to help you determine your current and projected burn, and there is a cap to prevent a huge bill.

You can see below I spend most of my money on storage as I keep my servers turned off all the time, and quite a bit of time in the my Azure Machine Learning Workspace as i was in it every day prepping for intersections, which i have since deleted to keep my bill down.