Oracle Cloud: Early Days

This article was written in Janaury 2021. As with all things in the cloud, prices, features and capabilities can change over time. Contact us if you'd like to understand what's now possible or available.

Summary

We've been using Oracle Cloud for a while, and have been trying to use it like we use all our clouds: in a "best practice" sort of way that will pass customer due diligence and internal security audits. We also like to set things up so that you can run things in production safe in the knowledge that they're not likely to go "bump in the night".

We looked at some of the major features of Oracle Cloud and how they stack up with their competitors. We've uncovered a few (glaring) shortfalls in Oracle's security policies which we're quite worried about. We've compared their prices with competitors and found something of a mixed bag, so it's not really clear who's cheaper and who isn't. We also looked at what they don't have, and how that might end up costing you more in the long run.

Introduction

Oracle Cloud Infrastructure (OCI) is the relative new-comer in the world of "big cloud" vendors. Our clients got interested because of some dramatically lower priced resources there, so we took a look to see how we could help.

Using Oracle products feels a bit uneasy. They've had a multi-decade reign of peril over their customers, utilising sharp-suited sales people to woo senior execs and go over the heads of the decision makers, they've deployed armies of very scary lawyers to write their contracts and to pursue any customer missteps. They've also got a fearsome reputation for "price gouging" their customers to extract as much money from them as possible regardless of the value they've actually added. With all that in mind, we're proceeding with caution - in the cloud, they've publicly published their prices, so whilst those prices may change, they'll change for lots of customers at once, and will affect their ability to attract new ones. We hope this reduces the chances of some of the slippery behaviour of the past.

One last thing is that if you're a big customer (think: Zoom, or similar), then you can cut a deal with Oracle that means none of the published prices apply to you. If you're doing this, then hopefully you know what you're getting into.

Features

At time of writing (early 2021), Oracle offers far fewer features than AWS, Google or Azure. That said, they've seemingly got a fairly decent line-up, presumably aimed at particular customer types (ie. the ones who already use Websphere and/or Oracle database). Even so, "general purpose" customers can find things to like here too.

Sadly, there are some notably absent features too. If you've been using one of the other big players, you'll probably take some of those features for granted, so beware that things you may think "obvious" may not be present here.

Compartments

Oracle uses "compartments" to separate resources. In AWS you'd use "accounts", and in Google you use "projects". They amount to much the same thing - each one is essentially separate from the others, both logically and visibly (in the UI you can select which compartment you want to see, and API requests specify the compartment you want to operate on too).

Compartments feel easier to use than Accounts or Projects. They're quite "light weight" and simple, but unlike Accounts do not separate IAM users and policies which are global. They do separate out most other resources, so you won't bump into "global namespace" clashes with two resources with the same name too often.

Global IAMs mean, like with GCP, you don't need to (re)create your primary permissions policies and groups and whatnot in every compartment. It's therefore easy for users to be given access, and then to use additional privileges for additional compartments.

All in all, compartments look completely usable and functional. They're a little less "invisible" to users who don't have permissions for them, but that's probably not a big issue in most cases.

Permissions Paradigm

Oracle uses group-based permissions (so is not role-based). This is functional enough, but means that for the most part, your users will always be operating at their highest level of privilege. That is, a developer who does some sysadmin work will always be at "sysadmin" level, where ideally they'd be working at "developer" level most of the time and only using "sysadmin" for the few times they need it.

This deficiency is more of an annoyance than a genuine problem for a lot of customers. I'd say it will be a genuine problem if you're a big organisation with lots of separate environments in Oracle cloud though. There you're likely to have lots of control-plane users, each with different job functions and probably quite a lot of "dual role" people as a result. It remains to be seen if Oracle Cloud would really be used in that manner though, and if customers would solve it by having multiple OCI accounts.

IAM and Security

Oracle has a relatively simple policy setup. Like other clouds, every resource has some permissions and you can assign "inspect", "read", "use" or "manage" to them. Unlike other providers, there's a consistent implicit deny in place, so absolutely nothing is accessible unless you've been specifically granted access to it.

Policies are relatively simple and readable. For example, you can have a policy statement such as:

Allow group Sysadmins to manage buckets in tenancy

This means the Sysadmins group can do anything they like to any buckets anywhere in your cloud. You can go further to make fairly fine-grained permissions by adding where clauses at the end (eg. where target.bucket.name='mybucket').

This all seems pretty easy to work with, and unlike other vendors, you can actually find the names of the permissions and resources you need in the online documentation. However, not all is as it seems at first glance - there are some serious problems with this scheme.

There are no Deny statements (because there's an implicit deny rule). As such, you can't say things like:

Never allow anyone except the superusers access to the Terraform state buckets
Allow the developers to do anything they like except with the monitoring instance
Never allow anyone to add users to the Administrators group

For the first one, you'd have to actually say something like:

Allow group developers to manage all-resources in container <123> where target.bucket.name!="terraform-state"
Allow group networks to manage all-resources in container <123> where target.bucket.name!="terraform-state"
Allow group wintel to manage all-resources in container <123> where target.bucket.name!="terraform-state"
... (and so on for all the groups of users you have)

Clearly, this sort of thing isn't terribly manageable in the long term. You can workaround the problem with clever use of compartments and placement of restricted resources, but it's not easy because your policy statements will get pretty convoluted. This is complexity where you want it least - if you can't understand your security, then you've probably made a mistake in it somewhere.

Elsewhere there are some oddities too. For example, one we found in the documentation is that to allow access to a single user, you can't just say Allow user xyz ..., instead you have to say allow any-user ... where user-id=<ocid> (and yes, that means you need to lookup the user's OCID and can't just use their username). The use of IDs in these sorts of cases makes reading the policy statements all but impossible (and there's no way to add comments to them to say what an OCID refers to).

Another one from the documentation is:

Allow MyGroup to manage groups where target.group.name!=‘Administrators’

This looks like it's going to allow MyGroup to do all group management, so long as they don't try to touch the Administrators group. Sadly though, this messes up with the UI (and equivalent API calls) so you can't actually see the groups to manage them. To solve this you need an additional rule that says MyGroup can inspect all groups. Once you do this, things work as you'd expect and control is maintained over the Administrators group. It's a shame you need two rules where one should have sufficed.

Further, you need to specify the where target.group.name!=‘Administrators’ clause on all rules you create to allow user/group management. This quickly gets tedious as you try to protect all the privileged groups with where all { target.group.name!=‘Administrators’, target.group.name!=‘superusers’, target.group.name!=‘networkArchitects’ ... }. This gets more crazy as you end up doubling up rules so groups can use the UI and API properly. This doesn't really seem usable if you want quite complex or plentiful rules.

IAMs and MFA Authentication

We've broken this topic out into it's own section because it turns out to be pretty complex. Like AWS, "local" users in OCI can't have MFA enforced - it's supported and users can turn it on easily enough (and the UI is far nicer than it is in AWS!), but if they forget, then what?

In AWS you can solve this problem by ensuring that users can't do anything unless they've logged on with MFA through clever use of IAM role policies. In OCI you can seemingly do something similar with where clauses such as (from the documentation):

... where request.user.mfaTotpVerified='true'

This looks perfect, but has some major limitations:

It only works for web UI users. API clients (either API Key or Token authenticated clients) do not have the mfaTotpVerified attribute (even though token clients need to authenticate with the browser, so will have entered an MFA to do so)
You have to specify the where clause on every policy statement, otherwise users will have access to things without MFA. There's no explict Deny to do this globally.

Ultimately, this means that if you're expecting your users to build infrastructure using Terraform or other API-based tools (you are, right?), then you cannot enforce MFA for local users in Oracle Cloud.

This is a major security failing, and probably makes local users unsuitable for any organisation that takes security seriously.

Oracle do have a possible workaround, but it's a lot more work. You can use Federated Users. These are not "local" to the cloud and are authenticated by some other system - maybe your SAML server, or you can use Oracle's own Identity Provider.

Federated users do not set the mfaTotpVerified attribute either, but you can enforce MFA in the Identity Provider, so users don't get to "jump" from there to the Oracle Cloud until they've enabled MFA. Thus, you can enforce MFA for all Federated users (even though OCI won't "know" that itself, and can't enforce it if someone makes a mistake configuring the Identity Provider).

Setting up the Oracle Identity Provider and putting users into it is an extra piece of fairlly intricate work. It makes the user experience more confusing and error-prone (so get ready for lots of "I can't log on" sort of support calls!).

Oracle's Identity Provider has provision for CSV bulk imports and such like (are they expecting a crowd?). If you don't have enough users to bother with such things, then you probably don't want the hassle of setting up federated users, and probably don't want the complexity and support overhead it adds. It doesn't seem possible to provision federated users with Terraform or an API, which means you're going to be clicking the mouse a fair bit. It seems though, that if you want to enforce MFA you're flat out of luck and will have to put up with it.

Object Storage (Buckets)

Oracle provide Object Storage (also known as buckets). They're "namespaced" into your account, so you only need the names of them to be unique in your account, which is pretty handy. You can make a bucket public or private, and the UI is quite clear if you're making a public one.

Apart from data storage though, buckets don't offer a lot of the features you might want. You can't have a bucket policy, you can't have object permissions and you can't make a website from a bucket either.

You can have lifecycles across the whole bucket, and you can have versioning too. However, the only protections you have against someone turning a private bucket into a public one is the permissions they've got. As with all permissions, you can't just say Deny all users bucket update, so you need to give away bucket permissions carefully if this sort of thing is a worry to you.

There's also no first-class way to use a bucket for Terraform remote state. However, they have provided an AWS-compatability layer, which works pretty well. You'll need an Access Key and Secret (which is a per-user entity), but from then on things look a lot like regular S3 (you need to turn off things like region checking, account number checking etc. because they're obviously not going to work here).

Using an Access Key/Secret feels a bit wrong after years of having AWS drumming in the need for temporary credentials. The good news is that they only apply to buckets though, so they can't be used on any other OCI resources. They can't be restricted to specific buckets though, so they have access to all the buckets that the user that owns them does (and no, they don't have the mfaTotpVerified attribute!).

Whilst S3 compatibility is a useful add-on feature (especially for migration, I guess), as elsewhere, the security of buckets cannot be tightened up to the levels a lot of customers might want.

Networking

In order to use any cloud, you're going to need a network in it. Amazon and Google call these "Virtual Private Clouds" (VPCs). Oracle decided to call them "Virtual Cloud Networks" (VCNs). You also have to think about "availability zones" (AZs), which Oracle called "Availability Domains" (ADs). Goodness knows why they had to change the established names of these things, or indeed what they're ever going to do if they provide a managed Active Directory service.

That said, Oracle VCNs are pretty simple. You build them out of subnets, which can be either private or public (only the latter can have optional Internet-facing IPs attached to resources in it). Like Google, their subnets can span ADs, so you don't need to build out three subnets where one would do perfectly well (like you do in AWS). This belies the ADs themselves though, which you still need to think about when you're running instances and whatnot.

There are ways to apply firewall rules to servers, or the entire network. These aren't as flexible as in AWS, but are perfectly functional as far as we could tell. There are the usual NAT Gateways and Internet Gateways which are all highly available and easily added to a VCN (although beware that modifying the route table may incur a short downtime).

As you'd expect, load balancers are available, and it looks that you can some fairly complex rules and redirects with them, add certificates and whatnot if you want.

There are ways to VPN from somewhere else into OCI (or use a private network link, but as with the other vendors, such things take time and money to setup). There is no "road warrior" client VPN available, and you can't initiate a VPN from OCI to some other location (so you'll have to use an Instance to do either or both of these things if you want them).

Databases

As you would expect, Oracle Cloud can provide a managed Oracle Database. We didn't explore this at all, but we imagine this is probably where OCI excels. We'll leave others to determine if it's actually a better option than running your database on-prem or using AWS's database options to engineer out Oracle Database entirely.

Elsewhere, you can get a managed MySQL database too. This comes in roughly the same packages as you'd find in AWS or Google. You're essentially renting an instance with MySQL on it, and instances start at 1 CPU with 8GB of RAM and go up to 16 CPUs and 512GB of RAM. You can only have stock MySQL so not MariaDB as others may provide under the covers.

Instances are in a particular availability domain, and have a failover in another domain. They live on a subnet in your Virtual Cloud Network (VCN), just as you'd find in AWS. You can have backups and there are inevitable maintenance windows. There are no serverless options, or indeed any sort of non-fixed instance-based hosting that you can find at other providers.

Just in case you were wondering, there are no options for Postgres or any other SQL databases. Assuming you're not an Oracle DB customer, then you don't get many choices, which is not unexpected given this is Oracle, but a shame because Postgres has been gaining market share and a plethora of features that aren't available in MySQL (some of which give it some NoSQL features, which you'd have to build yourself using instances if you use MySQL). It's a bit of a "tough sell" to say you have to "downgrade" to MySQL to use the Oracle Cloud (if indeed changing database types is even an option for you).

Likewise in the NoSQL world there aren't many options. In fact, there's only one, simply called "NoSQL Database". This is Oracle's answer to AWS's DynamoDB. We haven't used it extensively, but at first glance it looks to be a pretty competent contender. Being proprietary, it does mean you need a specific client to use it, but presumably if you have used some good abstractions in your code, you could switch from DynamoDB or even Redis to this reasonably easily. Unless you're careful you could find yourself "engineered in" to Oracle here, so as with all such things, make sure you know what you're getting into first.

There are no Redis or Memcache options available in Oracle Cloud. If you're looking to run your LAMP or Django stack here, you'll either have to run a couple of instances to host these, or re-engineer to use the NoSQL Database. Neither are probably a terribly attractive option, especially as other cloud providers don't force you down that path.

Containers

Many cloud users are finding they can use containers to run their workloads at lower cost, greater flexibility and more easily than running instances. Here, Oracle provides a managed Kubernetes solution for container customers.

We haven't used Oracle's Kubernetes feature directly, but notice that it will create instances in your VCN, just as Google's GKE does there. We're assuming then that Oracle Kubernetes is similar to Google's in the general running and management of nodes and containers. Given what we've seen elsewhere in OCI, that may be an incorrect assumption, so anyone looking to go down this path should look in detail first.

We note that AWS, GCP and now Oracle all provide Kubernetes. However, only the latter does not provide a smaller-scale container solution. For customers looking to run less than a few dozen containers, these small scale options (ECS in AWS and AppEngine in GCP) are usually a significant cost saving over their much larger Kubernetes counterparts.

ECS/AppEngine are not only simpler to use, but they also avoid the need to even think about "physical" nodes, capacities or other concerns. If you ask them for an extra container, they provide you with one without worries about where those containers will run. Not so with Kubernetes, and so you end up managing Kubernetes itself as much (if not more) than managing your application within it.

For smaller customers (a rough rule of thumb is "a few dozen containers" or smaller), Kubernetes is a dramatically more expensive option than the native container services, and so here in Oracle, you probably only want to run containers at all if you've got quite a few to run. For everyone else, you may be better off with instances instead of containers, which feels like a "downgrade", but may be the most cost effective option.

Instances

Since databases, NoSQL and container services are all a bit limited, you're going to want to run some instances eventually. Here Oracle has plenty of options. You can choose from AMD Rome or Intel Skylake (or legacy). If you use AMD, you can have between 1 and 64 CPUs, 1-1024GB of RAM. You just move the sliders up and down to whatever you want - there are no predefined instance types to choose from. You get about 1Gbps networking per CPU, and have to have at least 1GB of RAM per CPU too.

If you choose Intel Skylake then you do have to choose from predefined instance types, and they range from 1CPU, 15GB RAM to 24 CPUs and 320GB RAM. It looks like these are only here for people that can't use the AMD varieties.

The legacy types include the "always free" tier, where you can have a 1CPU, 1GB RAM server for free, although networking is limited to 0.48Gbps.

There are also bare-metal options too, we've never used one, they fall in about the middle of the possible configurations for VMs.

You get "Instance Configurations" to define how instances should be created. This really only says what Compartment to put it into, and what tags to apply. Instances can go into Pools of identical servers you can manage as a group. You can specify the number of servers in the pool, what subnets they should go on, etc. You can then go on to auto-scale a pool with rules so that more or less instances are created automatically.

There is also an option to make a "Cluster Network". We didn't investigate but it's a way to closely interconnect some servers for highly parallel compute tasks.

In terms of OS images, there are the usual Centos, Ubuntu and Windows images. There are also some custom images, in the relatively small collection, from what we'd describe as "big" vendors such as Cisco, Fortinet, IBM, Palo Alto and Riverbed. There are some smaller vendors making images too, but it seems clear who the sorts of customers Oracle is expecting to use their cloud are.

Pricing

When it comes to pricing, Oracle have some dramatically lower prices than their competitors, but in other areas are more expensive. This makes working out "which vendor is the cheapest" an exercise in futility, and has the ever present answer of "it depends on your workload". For the purposes of these comparisons, we're going to skip over any introductory discounts or credits, and we rather assume you're going to need resources outside the free tiers.

Starting with bandwidth, Oracle comes in about a tenth the price of other vendors. Pretty much everyone charges $0 for ingress, so it's only the outgoing traffic that matters. Some examples (using the vendor's favourite region):

1TB outbound/month - AWS $92, Google, $85, Oracle $0
10TB outbound/month - AWS $922, Google $650, Oracle $0
100TB outbound/month - AWS $7987, Google $6500, Oracle $783

Before you get too excited about this, just think about how much outbound traffic you actually have. I'll bet that unless you're the likes of Zoom or Netflix, a lot of people use less than, or hover around 1TB each month, and so are "only" paying $80-90 more with another vendor (which, as we'll see, may be recouped elsewhere).

Also think about where that outgoing traffic comes from. If it's primarily bucket storage, then for "standard" service (rather than an infrequent-access tier or whatever), you'll pay something like this:

100GB per month - AWS, $2.33, Google $2.00, Oracle $2.55
1TB per month - AWS $23.30, Google $20,00, Oracle $25.50

Buckets also have access charges (and bandwidth), so unless you just write your data and forget about it, you'll pay more than the above with all providers. Comparisons are hard, and very closely tied to useage, so we side-stepped making any here.

For most applications, you're going to need some instances. I'm going to assume a single instance with 4 CPUs, 16GB RAM with 100GB of persistent storage (and that the server is running for the whole month):

Oracle $150 / month
Google $114.09 / month
AWS $108.11 / month

Almost nothing is going to run on a single server though, so you should multiply this up by however many instances you really are going to use (this is especially worth thinking about if you're using Kubernetes, as it'll create instances on your behalf). As we've seen, you may need instances for things that would be managed in other providers, so that will need factoring in too.

Perhaps your website needs a database? Since Oracle only really has MySQL, we'll compare that to start with. We'll compare a 2 CPU, 8GB RAM instance, running for the whole month:

Oracle $68.19
Google $98.62
AWS $102.73

However, you wouldn't really use these options in Google or AWS. In AWS you can get your monthly cost down to more like $46.85 by using Aurora Serverless (although there usage does really affect the pricing). In Google you can get down to around $55 - and of course you can choose from other database types in those and other places too (which again may reduce your need for instances to run other software, may change your need for horizontal scaling or whatever else).

For any self-respecting application, you're probably going to need a load balancer or two. For this, you'll pay something like:

Oracle $0.0113/hour + $0.0001 per Mbps/hour
AWS $0.0225/hour + $0.008 per "load balancer capacity unit"/hour
GCP $0.025/hour + $0.008 per GB ingress

We spent a couple of weeks trying to get Oracle to tell us how they calculate the "Mbps/hour" for load balancers. They bounced us around various people, asked to know the name of our client and generally looked like they were up to their old tricks. Eventually we got somewhere with Support, who told us that bandwidth is the total of the ingress and egress, and despite the name is calculated per-minute. Thus, 1Gbps for 30 minutes would be charged as 500Mbps/hour. It seems also that if you only have one load balancer in your account, you don't pay the base price. It seems somewhat strange they're so cagey about all of this, because actually it looks like a pretty good deal when viewed alongside the competition.

We haven't looked into it in detail, but Oracle have blogged about up to 72% savings over AWS DynamoDB with their Cloud NoSQL Database. You're going to need to be a fairly big DynamoDB customer to see that sort of saving though. They're obviously looking to court a particular type of customer with this dramatic discount, although it remains to be seen who that really is, or if they'll switch.

Billing

Cloud billing is always a can of worms. Vendors look like they've provided great tools to "drill down" into your bill, but when the time comes to do it, we've always found that it's impossible to ask the question you really want to answer and so you end up mentally making estimates instead. That plus the variability of the actual billing amounts, the confusing array of usage based and subscription prices, and well, you can be sure of very little except that your vendor will charge you on time, every time.

Here, I'm afraid Oracle doesn't look wildly different from the others. Given the relative simplicity of the Oracle Cloud, it may be a little easier to get to the truth you seek, but we wouldn't count on it. In all honesty though, we haven't really spent enough time with Oracle Cloud bills to really form a definitive opinion.

Elsewhere

Elsewhere Oracle Cloud looks interesting. They've got options for serverless functions (think: "lambda functions"), and have some interesting looking features around blockchain and even logging analytics. We didn't look into these in detail yet.

Whilst not OCI itself, we found Terraform support to be pretty good. Some things are a bit tedious to do, but we found ways to do most of the things we wanted. A notable exception is the provision of local users, where we couldn't find a way to generate (or insert) an initial password, or to turn on or off the attributes like "can log on to the console" and "can use API keys" etc. There doesn't appear to be any Terraform (or API) support for federated users either.

We found the oci CLI tool a little cumbersome to log on with, but workable enough (it's a pain if you're logging on with different users in different sessions, simultaneously - otherwise most users will probably find it perfectly usable). We also got some prompt acknowledgements on the customer forum (Oracle Support lurk on the forums at the moment - a good thing because the community is really tiny, especially if you discount the Oracle employees in it!).

Update: We've heard that the CLI development team may implement some changes to make life a little easier for us in future. We'll look forward to that if/when it arrives.

There are no Directory Services in Oracle Cloud, so if you're hoping for a managed Active Directory, you're out of luck. This may be a problem for Windows-heavy deployments, so we're left wondering if they'll add such a feature in due course.

Oracle do Archive Storage, which is something like "cold line" bucket storage for things you need to keep but don't need to access (like backups, or compliance logs or whatever). Unlike the likes of Glacier though, requests to the data don't really cost more than to a regular bucket, and it's pretty fast with turn-arounds taking about an hour. There isn't a way to selectively migrate objects to archive though, it's a "whole bucket" sort of proposition.

We did find some pretty useful audit logging. It seems to log just about every interaction you have with the service, so it's pretty verbose and the UI didn't make things easy, but the point is that there are (exportable) logs available without really having to do anything to turn them on. We're not sure how useful these logs would be to do some forensic analysis, or something very "formal", but they gave us a few pointers when we were looking at a permissions issue.

Cloud vendors are notoriously cagey about their uptime and reliability. As Forbes said in 2018, "Either Oracle’s cloud runs perfectly or it isn’t being used that much.". Their point being that if AWS, Google or Azure have an outage, it makes the news really fast. Outages in Oracle don't seem to make the news, although maybe now Zoom and Tik Tok are on Oracle, maybe we will...?

One observation worth noting is the quality of Oracle's documentation. For the most part, it's really excellent. Coming from the confusing array of similar-but-different, each one tells you part of the answer of AWS, and the "say lots, but don't tell you much" world of GCP, Oracle's documentation is a relief. It's generally very clearly written, gets to the point, is well organised and easy to find. This is all just as well, because as we've noted, the Internet community (and Oracle's own community) is pretty tiny.

Conclusion

It's clear Oracle is much "smaller" than the incumbent competitors such as AWS and GCP. They offer far few features (at time of writing, at least), and the features they lack could cause some customers some problems. This makes it unlikely to be a "general purpose" cloud.

We have serious reservations about the details of security inside the Oracle Cloud. There are some pretty basic things that just aren't good enough (in our opinion). This may be a "deal breaker" for some, but more likely it'll turn into an additional implementation and running cost overhead to use the workarounds.

In terms of infrastructure itself we're pretty happy with what's on offer in Oracle. Customers should look carefully at pricing though, whilst there are some deep discounts in some areas, the missing features and occasionally higher prices may mean a higher total cost of ownership for many.

Whatever their offering is or is not, we've got to commend Oracle on their documentation, which is very good indeed. That and the inherent simplicity make on-boarding relatively painless (although beware that not everything in the cloud is easy!).

Oracle clearly has made a cloud that people want to use, but whether everyone wants to, or even should, remains to be seen.

Spotted a mistake, or just have something to say? Get in contact!

Image credit: https://flic.kr/p/2iFz6MN

Tags: oci oracle cloud

Other Pages