Avoid the pitfalls: what to keep in mind for a smooth start with cloud services

Many companies are looking for ways to migrate their data centre to the cloud platform. How to avoid potential pitfalls in migrating data centres to the public cloud? How to plan your migration so that you are satisfied with the end result and achieve the set goals?

March 27, 2022

Henri Kunstmann

Why the public cloud?

The public cloud provides the ability to scale as needed, to use a variety of convenient SAAS (Software as a Service), PAAS (Platform as a Service) and IAAS (Infrastructure as a Service) solutions, and to pay for the service exactly as much as you use it.

The public cloud gives a company the opportunity for a great leap in development, the opportunity to use various services of a service provider during development, as those accelerate the development and help create new functionality.

All of this can be conveniently used without having to house a personal data centre.

Goal setting

The first and most important step is to set a goal for the enterprise. The goal cannot be general; it must be specific and, if possible, measurable, so that it would be possible to assess at the end of the migration whether the set goal has been achieved or not.

Goal setting must take the form of internal collaboration between the business side and the technical side of the company. If excluding even one party, it is very difficult to reach a satisfactory outcome.

The goals can be, for example, the following:

Cost savings. Do you find that running your own data centre is too expensive and operating costs are very high? Calculate the cost, how much resource the company will spend on it, and set a goal of what percentage in savings you want to achieve. However, cost savings are not recommended as the main goal. Cloud providers also aim to make a profit. Rather, look for goals in the following areas to help you work more efficiently.

Agility, i.e. faster development of new functionalities and the opportunity to enter new markets.
Introduction of new technologies (ML or Machine Learning, IOT or Internet of Things, AI or Artificial Intelligence). The cloud offers a number of already developed services that have been made very easy to integrate.
End of life for hardware or software. Many companies are considering migrating to the cloud at the moment when their hardware or software is about to reach its end of life.
Security. Data security is a very important issue and it is becoming increasingly important. Cloud providers invest heavily in security. Security is a top priority for cloud providers because an insecure service compromises customer data and thus they are reluctant to buy the service.

The main reason for migration failure is the lack of a clear goal (the goal is not measurable or not completely thought out)

Mapping the architecture

The second step should be to map the services and application architecture in use. This mapping is essential to choose the right migration strategy.

In broad strokes, applications fall into two categories: applications that are easy to migrate and applications that require a more sophisticated solution. Let’s take, for example, a situation where a large monolithic application is used, the high availability of which is ensured by a Unix cluster. An application with this type of architecture is difficult to migrate to the cloud and it may not provide the desired solution.

The situation is similar with security. Although security is very important in general, it is especially important in situations where sensitive personal data of users, credit card data, etc. must be stored and processed. Cloud platforms offer great security solutions and tips on how to run your application securely in the cloud.

Security is critical to AWS, Azure, and GCP, and their security is invested into much more than individual customers could ever do.

Secure data handling requires prior experience. Therefore, I recommend migrating applications with sensitive personal data at a later stage of the migration, where experience has been gained. It is also recommended to use the help of different partners. Solita has previous experience in managing sensitive data in the cloud and is able to ensure the future security of data as well. Partners are able to give advice and draw attention to small details that may not be evident due to lack of previous experience.

This is why it is necessary to map the architecture and to understand what types of applications are used in the companies. An accurate understanding of the application architecture will help you choose the right migration method.

Migration strategies

‘Lift and Shift’ is the easiest way, transferring an application from one environment to another without major changes to code and architecture.

Advantages of the ‘Lift and Shift’ way:

In terms of labour, this type of migration is the cheapest and fastest.
It is possible to quickly release the resource used.
You can quickly fulfil your business goal – to migrate to the cloud.

Disadvantages of the ‘Lift and Shift’ way:

There is no opportunity to use the capabilities of the cloud, such as scalability.
It is difficult to achieve financial gain on infrastructure.
Adding new functionalities is a bit tricky.
Almost 75% of migrations take place again within two years. Either they go back to their data centre or they use another migration method. At first glance, it seems like a simple and fast migration strategy, but in the long run, it will not open up the cloud’s opportunities and no efficiency gains will be achieved.

‘Re-Platform’ is a way to migrate where a number of changes are made to the application that enable the use of services provided by the cloud service provider, such as using the AWS Aurora database.

Benefits:

It is possible to achieve long-term financial gain.
It can be scaled as needed.
You can use a service, the reliability of which is the service provider’s responsibility.

Possible shortcomings:

Migration takes longer than, for example, with the ‘Lift and Shift’ method.
The volume of migration can increase rapidly due to the relatively large changes made to the code.

‘Re-Architect’ is the most labour- and cost-intensive way to migrate, but the most cost-effective in the long run. During the re-architecture, the application code is changed sufficiently that it can be handled smoothly in the cloud. This means that the application architecture will take advantage of the opportunities and benefits offered by the cloud

Advantages:

Long-term cost savings.
It is possible to create a highly manageable and scalable application.
An application based on the cloud and micro services architecture enables to add new functionality and to modify the current one.

Disadvantages:

It takes more time and therefore more money for the development and migration.

Start with the goal!

Successful migration starts with setting and defining a clear goal to be achieved. Once the goals have been defined and the architecture has been thoroughly mapped, it is easy to offer a suitable option from those listed above: either ‘Lift and Shift’, ‘Re-Platform’ or ‘Re-Architect’.

Each strategy has its advantages and disadvantages. To establish a clear and objective plan, it is recommended to use the help of a reliable partner with previous experience and knowledge of migrating applications to the cloud.

Turbulent times in security

We are currently living very turbulent times: COVID-19 is still among us, and at the same time we are facing a geopolitical crisis in Europe not seen since the second world war. You can and you should prepare for the changed circumstances by getting the basics in order.

March 9, 2022

Miika Heini Business Lead, Cloud Security

On top of the normal actors looking for financial gains, state-backed actors are now likely to activate their campaigns against services critical for society. This expands beyond the crisis zone, we have for example already seen Denial of Service attacks against banks. It is likely that different types of ransomware campaigns and data wipers will also be seen in western countries targeting utilities providers, telecommunications, media, transportation and financial institutions and their supply chains.

So what should be done differently during these times in terms of securing our business and environments? Often an old trick is better than a bagful of new ones, meaning that getting the basics right should always be the starting point. There’s no shortcuts in securing the systems, there’s no single magic box that can be deployed to fix everything.

Business continuity and recovery plans

Make sure you have the business continuity plan and recovery plan available and revised. Also require the recovery plans from your service providers. Make sure that roles and responsibilities are clearly defined and everyone knows the decision making tree. Check that the contact information is up to date, and your service providers and partners have your contact information correct. It is also a good idea to practice cyberattack scenarios with your internal and external stakeholders to see potential pitfalls of your plan in advance.

Know what you have out there!

How certain are you that the CMDB you have is 100% up-to-date? When’s the last time you have checked how your DNS records have been configured? Do you really know what services are visible to the internet? Are you aware what software and versions you are using in your public services? These questions are the same what malicious actors are going through when they are gathering information on where to attack and how. This information is available on the internet for everyone to find out, and this is something that all organizations should also use for their own protection. There are tools and services (such as Solita WhiteHat) available to perform reconnaissance checks against your environment. Use them or get a partner to help you in this.

Keep your software and systems updated

This is something that everyone of us hears over and over again, but still: It is utmost important to keep the software up-to-date! Every single software contains vulnerabilities and bugs which can be exploited. Vendors are nowadays patching vulnerabilities coming to their attention rather quickly, so use that as your own benefit and apply the patches.

Require MultiFactor Authentication and support strong passwords

This one is also on every recommendation list and it’s not there for nothing. Almost all services nowadays provide the possibility to enable MFA, so why not to require it. It is easy to set up and provides an additional layer of security for users, preventing brute forcing and password spraying. It doesn’t replace a good and strong password, so a rather small thing to help users in creating strong passwords and prevent using same passwords in multiple services is to provide them a password manager software, such as LastPass or 1Password. If you have SSO service in place, make sure you take the most out of it.

Take backups and exercise recovery

Make sure you are backing up your data and services. Also make sure that backups are stored somewhere else than in the production environment, to prevent for example ransom trojans making them useless. Of course, just taking backups is not enough, but the recovery should be tested periodically (at least yearly) to make sure that when recovery is needed it will actually work.

What if you get hit

One famous CEO once said that there are two types of companies: ones that have been hacked and ones who don’t know they have been hacked. So what should you do if you even suspect that you have been attacked:

Notify authorities

National authorities run CERT (Computer Emergency Response Team) teams, who maintain the situational awareness and coordinate the response actions on national level. For example in Finland its kyberturvallisuuskeskus.fi and in Sweden cert.se. So if you suspect a possible data leak or attack, notify the local CERT and at the same time, file a police report. It is also advisable to contact a service provider who can help you to investigate and mitigate the situation. One good source to find a service provider providing Digital Forensics and Incident Response services is from dfir.fi.

Isolate breached targets and change/lock credentials

When you suspect a breach, isolate the suspected targets from the environment. If possible cut off network access and let the resources still run, this way you are not destroying possible evidence by turning off the services (shutting down servers, deleting cloud resources). At the same time, lock the credentials suspected to be used in the breach and change all the passwords.

Verify logs

Check that you have logs available from the potentially breached systems. Best case would be that the logs are available outside of the system in question. If not, back them up to external storage, to make sure that it doesn’t get altered or removed by the attacker.

Remember to communicate

Communicate with stakeholders, remember your users, customers and public. Although it may feel challenging to tell these kinds of news, it’s much better to be open in the early stages than to get caught your pants down later on.

To summarise

The threat level is definitely higher due to above mentioned circumstances, but getting the basics in order helps you to react if something happens. Keep also in mind that you don’t have to cope in this situation alone. Security service providers have the means and capacity to support you in efficient way. Our teams are always willing to help to keep your business and operations secure.

Alerting Estonian Citizens with Azure

Why not take advantage of the public cloud? Read how the system for transmitting alarm messages to consumers was born in Estonia. With this piece of writing, we go through one specific and technical real-life journey from the birth of an idea to its implementation in the Microsoft Azure cloud environment.

February 15, 2022

Mark Slavin

The author of the article works for the company Solita as a Data Engineer since 2019 and specialises in cloud-based data platforms. By now, he has accumulated more than 20 years of experience in various fields of the IT sphere – development, architecture, training. Many interesting projects have been made both in Estonia and abroad.

The beginning

In the digital state’s Hackathon event, Solita addressed the transmission of government’s cultural messages to televisions via an Android app.

In parallel, the Ministry of the Interior’s IT Centre (SMIT) pointed out a similar need. Sending alarm messages from the SITREP (Situation Reporting) system to the mobile application ‘Ole valmis!’ (‘Be ready!’) was considered. The purpose of that application became to warn the user of a nearby accident or deteriorating weather (such as a snowstorm, or a coastal flood).

In conclusion, since the pattern was the same, it seemed reasonable to create a single, unified solution.

Problems and objectives

SITREP did not have a public interface for requesting messages, but it did have the functionality to send messages. So it was possible to interface the system directly with the back-end of the ‘Ole valmis!’ (‘Be ready!’) application. The following goals emerged in the process of the Hackathon, which led to the development of a separate cloud-based system.

Transmission of messages (push and pull) must be possible to several channels in parallel, whoever the consumer is.
The messaging functionality must be separate from SITREP.
The interface must be secure so that a malicious actor cannot send false alarms.
It must be easy to administer.
It must be possible to subscribe / categorise messages by subject and location.
Setting up the system must be quick and easy.
The system must be flexible and inexpensive to maintain.
The available development time is a couple of man-months.

Why Microsoft Azure?

Solita participated in the Hackathon in partnership with Microsoft, which is why Azure was chosen as the cloud environment – although similar solutions could be created with the help of AWS or Google, for example. Azure also provides some direct benefits.

Most components can be easily integrated with Active Directory (although Active Directory was not an end in itself in the first iteration, this was one argument to consider in the future).
The range of services (in other words – the arsenal of ‘building blocks’ of the completed system) is really impressive and also includes exclusive components – in the following we will take a closer look at some of them.

For example, API Management is, to put it simply, a scalable API gateway service, and as a big bonus, it includes a public Web portal (Developer Portal) that is convenient for both the user and the administrator. In addition, the interface can be redesigned to suit your needs. The main value comes precisely from the ease of use – you don’t have to have much Azure-specific knowledge to sign up, send / receive messages, set the final destination, describe conversion rules.

The Developer Portal provides developers with pre-written sample code for consuming messages (presented in cURL, C#, and Python, for example). In addition, of course, a built-in firewall, resilience, and resistance to DDoS-type attacks are provided. All of the above saves a lot of time (and money!) from the user’s, administrator’s and developer’s point of view.

Infrastructure creation process

From the architect’s point of view, the aim was to create a system based on the most standard possible components (provided by Azure itself), and the installation of which would be simple enough for anyone with a basic knowledge of the working principle of the cloud. Account had also to be taken of the fact that the requirements were also still evolving.

From the beginning, we relied on the principle of IaC (Infrastructure as Code) – the entire infrastructure of the solution in the cloud is unambiguously described as human and machine readable code. In addition, the installation process would be incremental (if a new version is created, the existing infrastructure could be updated instead of recreating), configurable and automated; the code would be as clear and editable as possible. Figuratively speaking, you press ‘deploy’ and you don’t need much else.

All of the above is made possible by a tool called Terraform, which is quite common, especially among those who administer infrastructures – so-to-speak the de-facto standard for precisely cloud infrastructures. It is a free tool produced by HashiCorp that is perfect for situations like this – a person describes in a code what resources he needs, and Terraform interprets the code into instructions that can be understood by the (cloud) environment to create, modify or delete them.

Terraform has the following strengths that were the decisive factor:

its spread and wide platform support,
the ability to ‘remember’ the state of the installed infrastructure,
the simple but powerful HCL language that can be used to describe even complex logic.

The method officially supported by Microsoft for achieving the same are ARM templates (ARM templates are essentially structured static JSON or YAML). The entire Azure infrastructure can be described based on purely ARM templates, but then more code is created and the possibilities of directing the installation logic are greatly reduced.

Changing requirements and options

The first thing that the work continued on was creating a message store (for pull scenario and debugging).

The initial understanding of the message format was quite simple:

single-level JSON,
a few required attributes (timestamp, author, etc.),
rest of the schema was completely open.

Based on the above and on the principled decision to use only Microsoft Azure components + to install the entire solution with a single command, two options remained on the table for storing and reading JSON data without a defined schema:

Table Storage (default; although by operating principle it is a key / attribute type service),
Cosmos DB.

The ability to query data via HTTP(S) (less development work) and a lower cost (especially important in the prototype phase) spoke in favour of Table Storage; Cosmos DB had the advantage of storage flexibility, as it stores data in several regions. However, the situation was changed by the fact that SITREP’s messages came as a multi-level JSON and some of the critical attributes were at a ‘deeper’ level. Therefore, Table Storage no longer met the new requirement and the Cosmos DB component had to be introduced instead.

In addition, there was a real possibility that the system would be used in a context other than alarm messages – it had to be taken into account that the system could be used for transmitting virtually any message from many senders to different channels in different combinations. In essence, the goal became to create a messaging platform (Message Service Provider) that would functionally resemble the products of Twilio or MessageBird, for example.

Not a single line of ‘real’ code

So, by now, the following was architecturally in place:

all incoming messages and queries went through API Management,
all messages were stored in the Cosmos DB database.

At the same time, pushing messages to destinations through API Management remained an open issue. And exactly which component handles the database of messages and destination addresses?

Microsoft Azure offers options for almost any scenario, from an application hosted on a virtual machine to a serverless component called Azure Function. You can also use a number of intermediate variants (Docker, Kubernetes, Web App), where the user may or may not have direct access to the server hosting the application.

In the context of the present solution, all the above solutions would have meant the following:

a person with a developer background would have been needed to create the system,
the system as a whole could no longer be installed very easily – the application code would have been separate from the infrastructure code.

Fortunately, Azure has provided the Logic App technology that addresses the above issues. It’s a way to describe business logic as a flowchart – for example, you can visually ‘draw’ a ready-made Logic App ‘application’ in the Azure Portal, using the online interface.

It is true that in more complex cases, such as conversion operations, you will probably need to write a few lines of code, but this is far from traditional programming. Writing Logic App code is more akin to developing Excel macros than Python or Java.

The Logic App flow code can be saved as an ARM template, and the ARM template can be used as part of a Terraform installation script – making the Logic App a great fit for this context. Starting a single workflow in this solution costs in the order of 0.0005 euros per occasion (according to the consumption-based plan) – yes, there are even cheaper solutions like Azure Function, but in this case the infrastructure needs to be installed and developed separately.

Support components

Azure has well-thought-out tools for monitoring the operation of the system; in this case we focus on two of them: Azure Monitor and Log Analytics. The first, as the name suggests, is a set of dashboards provided by the Azure platform that help monitor the status of applications and components (including in real-time), such as load, memory usage, and user-defined metrics.

Since the Monitor is ‘included’ with every solution by default, it may not be correct to consider it as a separate component – it is simply a question of displayed indicators. Log Analytics, on the other hand, is a place to direct the logs of all components so that they can be conveniently analysed and queried later. This helps detect system errors and quickly track down errors. You can even query Log Analytics for metrics to display later in the Monitor, such as the number of errors per day.

Results and observations

In summary, the architecture of the solution came out as follows.

Broadly, the objectives set out at the start were achieved and the principles were followed (IaC, Azure components only, etc.). Clearly, Microsoft Azure offers a truly comprehensive suite of services with typically 99.95-99.99% SLAs; however, ‘the seven nines’ (99.99999%) or even higher are not uncommon. Such a high percentage is achieved through redundancy of components and data, optimised hardware usage, and exceptionally strict security measures in the region’s data centres.

Installing a system from scratch on an Azure account takes 45-60 minutes, and the lion’s share of this is provisioning API Management – a kind of heavyweight in Microsoft Azure, with a number of internal components hidden from the user (firewall, web server, load balancer, etc.).

There were no obstacles, but the development revealed that Terraform is a few steps behind Microsoft Azure as a third-party tool – in other words, when Microsoft launches a new Azure service, it will take time for HashiCorp developers to add functionality to their module. In this case, for example, the ARM template for the new component can be partially grafted into Terraform scripts, so that the creation of the infrastructure can be automated in any case.

In conclusion

Public cloud providers, such as Microsoft Azure, have hundreds of different services that can be considered Lego blocks – combining services to create the solution that best meets your needs.

The article describes how an MSP-like product was created from scratch that has reached the pre-live status by now. The same product can be assembled from different components – it all depends on the exact needs and on the possibilities to include other competencies, such as C# or Java developer’s. The public cloud is diverse, secure, affordable and ever evolving – there are very few reasons not to take advantage of it.

Thank you: Janek Rozov (SMIT), Timmo Tammemäe (SMIT), Märt Reose (SMIT), Kristjan Kolbre (‘Ole valmis!’), Madis Alesmaa (‘Ole valmis!’), Elisa Jakson (Women’s Voluntary Defence Organisation / ‘Ole valmis!’), Henrik Veenpere (Rescue Board).

Hybrid Cloud Trends and Use Cases

Let's look at different types of cloud services and learn more about the hybrid cloud and how this cloud service model can exist at an organisation. We’ll also try to predict the future a bit and talk about what hybrid cloud trends we are expecting.

January 31, 2022

Aleksei Hodunkov

As an IT person, I dare say that today we use the cloud without even thinking about it. All kinds of data repositories, social networks, streaming services, media portals – they work thanks to cloud solutions. The cloud now plays an important role in how people interact with technology.

Cloud service providers are inventing more and more features and functionalities, bringing them to the IT market. Such innovative technologies offer even more opportunities for organisations to run a business. For example, AWS, one of the largest providers of public cloud services, announces over 100 product or service updates each year.

Cloud services

Cloud technologies are of interest to customers due to their economy, flexibility, performance and reliability.

For IT people, one of the most exciting aspects of using cloud services is the speed at which the cloud provides access to a resource or service. A few clicks at a cloud provider’s portal – and you have a server with a multi-core processor and large storage capacity at your disposal. Or a few commands on the service provider’s command line tool – and you have a powerful database ready to use.

Cloud deployment models

In terms of the cloud deployment model, we can identify three main models:

• A public cloud – The service provider has publicly available cloud applications, machines, databases, storage, and other resources. All this wealth runs on the IT infrastructure of the public cloud service provider, who manages it. The best-known players in the public cloud business are AWS, Microsoft Azure and Google Cloud.

In my opinion, one of the most pleasant features of a public cloud is its flexibility. We often refer to it as elasticity. An organisation can embark on its public cloud journey with low resources and low start costs, according to current requirements.

Major public cloud players offer services globally. We can easily launch cloud resources in a geographical manner which best fits our customer market reach.

For example, in a globally deployed public cloud environment, an organization can serve its South American customers from a South American data centre. A data centre located in one of the European countries would serve European customers. This greatly improves the latency and customer satisfaction.

There is no need to invest heavily in hardware, licensing, etc. – organisation spends money over time and only on the resources actually used.

• A private cloud – This is an infrastructure for a single organisation, managed by the organisation itself or by a service provider. The infrastructure can be located in the company’s data centre or elsewhere.

The definition of a private cloud usually includes the IT infrastructure of the organisation’s own data centre. Most of these state-of-the-art on-premise solutions are built using virtualisation software. They offer the flexibility and management capabilities of a cloud.

Here, however, we should keep in mind that the capacity of a data centre is not unlimited. At the same time, the private cloud allows an organisation to implement its own standards for data security. It also allows to follow regulations where applicable. Also, to store data in a suitable geographical area in its data centre, to achieve an ultra low latency, for example.

As usual, everything good comes with trade-offs. Think how complex activity it might be to expand the private cloud into a new region, or even a new continent. Hardware, connectivity, staffing, etc – organisation needs to take care of all this in a new operating area.

• A hybrid cloud – an organisation uses both its data centre IT infrastructure (or its own private cloud) and a public cloud service. Private cloud and public cloud infrastructures are separate but interconnected.

Using this combination, an organisation can store sensitive customer data in an on-premise application according to regulation in a private cloud. At the same time, it can integrate this data with corporate business analytics software that runs in a public cloud. The hybrid cloud allows us to use the strengths of both cloud deployment models.

When is a hybrid cloud useful?

Before we dive into the talk about hybrid cloud, I’d like to stress that we at Solita are devoted advocates of cloud-first strategy, referring to public cloud. At the same time, cloud-first does not mean cloud-only, and we recognize that there might be use-cases when running a hybrid model is justified, be it regulation reasons or very low latency requirements.

Let’s look at some examples of when and how a hybrid cloud model can benefit an organisation.

Extra power from the cloud

Suppose that the company has not yet made its migration to public cloud. Reasons can be lack of resources or cloud competence. It is running its private cloud in a colocation data centre. The private cloud is operating at a satisfactional level while the load and resource demand remains stable.

However, the company’s private cloud lacks extra computing resources to handle future events of demand growth. But an increased load on the IT systems is expected due to an upcoming temporary marketing campaign. As a result of the campaign, the number of visitors to the organisation’s public systems will increase significantly. How to address this concern?

The traditional on-premise way used to be getting extra resources in the form of additional hardware. It means new servers, larger storage arrays, more powerful network devices, and so on. This causes additional capital investment, but it is also important that this addition of resources may not be fast.

The organisation must deliver, install, configure the equipment – and these jobs cannot always be automated to save time. After the load on the IT systems has decreased with the end of the marketing campaign, the situation may arise that the acquired additional computing power is not used any more.

But given the capabilities of the cloud, a better solution is to get additional resources from the public cloud. Public cloud allows to do this flexibly and on-demand, as much as the situation requires. The company spends and pays for resources only as it needs them, without large monetary commitments. Let the cloud adoption start 😊

The organisation can access additional resources from the public cloud in hours or even minutes. We can order these programmatically, and in automated fashion in advance, according to the time of the marketing campaign.

When the time comes and there are many more visitors, the company will still keep the availability of its IT systems. They will continue to operate at the required level with the help of additional resources. This method of use is known as cloud bursting, i.e. resources “flow over” to another cloud environment.

This is the moment when a cloud journey begins for the organization. It is an extremely important point of time when the organization must carefully evaluate its cloud competence. It needs to consider possible pitfalls on the road to cloud adoption.

For an organisation, it is often effective to find a good partner to assist with cloud migration. The partner with verified cloud competence will help to get onto cloud adoption rails and go further with cloud migration. My colleagues at Solita have written a great blog post about cloud migration and how to do it right.

High availability and recovery

Implementing high availability in your data centre and/or private cloud can be expensive. As a rule, high availability means that everything must be duplicated – machines, disk arrays, network equipment, power supply, etc. This can also mean double costs.

An additional requirement can be to ensure geo-redundancy of the data and have a copy in another data centre. In such case, the cost of using another data centre will be added.

A good data recovery plan still requires a geographically duplicated recovery site to minimise risk. From the recovery site, a company can quickly get its IT systems back up and running in the event of a major disaster in a major data centre. Is there a good solution to this challenge? Yes, there is.

A hybrid cloud simplifies the implementation of a high availability and recovery plan at a lower cost. As in the previous scenario described above, this is often a good starting point for an organisation’s cloud adoption. Good rule of thumb is to start small, and expand your public cloud presence in controlled steps.

A warm disaster recovery site in the public cloud allows us to use cloud resources sparingly and without capital investment. Data is replicated from the main data centre and stored in the public cloud, but bulky computing resources (servers, databases, etc.) are turned off and do not incur costs.

In an emergency case, when the main data centre is down, the resources on the warm disaster recovery site turn on quickly – either automatically or at the administrator’s command. Because data already exists on the replacement site, such switching is relatively quick and the IT systems have minimal downtime.

Once there is enough cloud competence on board, the organisation will move towards cloud-first strategy. Eventually it would switch its public cloud recovery site to be a primary site, whereas recovery site would move to an on-premise environment.

Hybrid cloud past and present

For several years, the public cloud was advertised as a one-way ticket. Many assumed that organisations would either move all their IT systems to the cloud or continue in their own data centres as they were. It was like there was no other choice, as we could read a few years ago.

As we have seen since then, this paradigm has now changed. It’s remarkable that even the big cloud players AWS and Microsoft Azure don’t rule out the need for a customer to design their IT infrastructure as a hybrid cloud.

Hybrid cloud adoption

Organisations have good reasons why they cannot always move everything to a public cloud. Reasons might include an investment in the existing IT infrastructure, some legal reasons, technical challenges, or something else.

Service providers are now rather favouring the use of a hybrid cloud deployment model. They are trying to make it as convenient as possible for the customer to adopt it. According to the “RightScale 2020 State of the Cloud” report published in 2020, hybrid cloud is actually the dominant cloud strategy for large enterprises:

Back in 2019, only 58% of respondents preferred the hybrid cloud as their main strategy. There is a clear signal that the hybrid cloud offers the strengths of several deployment models to organisations. And companies are well aware of the benefits.

Cloud vendors vs Hybrid

How do major service providers operate on the hybrid cloud front? Microsoft Azure came out with Azure Stack – a service that is figuratively speaking a public cloud experience in the organisation’s own data centre.

Developers can write the same cloud native code. It runs in the same way both in a public Azure cloud and in a “small copy” of Azure in the enterprise’s data centre. It gives the real cloud feeling, like a modern extension to a good old house that got small for the family.

Speaking of multi-cloud strategy as mentioned in the above image, Azure Arc product by Microsoft is worth mentioning, as it is designed especially for managing multi-cloud environments and gaining consistency across multiple cloud services.

AWS advertises its hybrid cloud offering portfolio with the message that they understand that not all applications can run in the cloud – some must reside on customers premises, in their machines, or in a specific physical location.

A recent example of hybrid cloud thinking is AWS’s announcement of launching its new service ECS Anywhere. It’s a service that allows customers to run and manage their containers right on their own hardware, in any environment, while taking advantage of all the ECS capabilities that AWS offers in the “real” cloud to operate and monitor the containers. Among other things, it supports “bare” physical hardware and Raspberry Pi. 😊

As we’ve also seen just recently, the next step for Amazon to win hybrid cloud users was the launch of EKS Anywhere – this allows customers using Kubernetes to enjoy the convenience of a managed AWS EKS service while keeping their containers and data in their own environment, on their own data centre’s machines.

As we see, public cloud vendors are trying hard with their hybrid service offerings. It’s possible that after a critical threshold of hybrid services users is reached, it will create the next big wave of cloud adoption in the next few years.

Hybrid cloud trends

The use of hybrid cloud related services mentioned above assumes that there is cloud competence in the organisation. These services integrate tightly with the public cloud. It is important to have skills to manage these correctly in a cloud-native way.

I think we will see a general trend in the near future that the hybrid cloud will remain. Multi-cloud strategy as a whole will grow even bigger. Service providers will assist customers in deploying a hybrid cloud while maintaining a “cloud native” ecosystem. So that the customer has the same approach to developing and operating their IT systems. It will not matter whether the IT system runs in a “real” cloud or on a hybrid model.

The convergence of public, private and hybrid models will evolve, whereas public cloud will continue to lead in the cloud-first festival. Cloud competence and skills around it will become more and more important. The modern infrastructure will not be achievable anymore without leveraging the public cloud.

Growing from a small cloud team to a community of over 70 people

The cloud community at Solita has grown substantially in the past years. While the industry has been developing rapidly, we've done our best to stay ahead of the game. Our recipe for growth has two main ingredients: a sensitive ear to our clients’ evolving needs and a clear vision to build a caring work culture where people can excel in their jobs.

January 20, 2022

Karri Lehtinen

Jari Jolula

We are long-term solitans, and we’ve had the opportunity to witness the growth of our cloud community from the start. The team was created when we saw the first signs from our clients that there is a need for cloud specialization. In the beginning, in 2014, we only had a couple of people in the team. But we had our eyes in the market, and we started developing our cloud services from a holistic point of view. We wanted to be a full-service cloud partner and kept this in mind when planning and developing our services.

In 2016 when we saw an increased number of opportunities in the market, we officially launched our Cloud & Connectivity unit. Since then, we have grown from a small team to a community of over 70 cloud professionals. These people are specialized in demanding cloud services, like infrastructure, cyber security, and public cloud platforms.

It all started from our clients’ needs

The key driver in our growth has been the willingness to serve our clients better and pay close attention to their needs. When we saw a growing demand, we responded to it. In the past years, the world has turned more complex, and technologies have developed very fast. These days a big part of our job is to help our clients to choose the right solutions that serve their needs in the best possible way.

It’s also been an interesting journey to see how the attitudes towards cloud services have evolved. There was a lot of resistance and insecurity in the beginning. These days it’s quite the opposite. Our client portfolio has become diverse, including organizations from many different industries. We also work with several organizations in the public sector, and we’ve built cloud solutions for companies in the heavily regulated financial sector. Cloud has become mainstream, and we’ve been glad to be part of this development.

As an example, we’ve helped Savings Bank build a solid cloud foundation and supported them with their “cloud first” strategy, which states that as much as possible should be built on the public cloud. Solita CloudBlox has allowed them to get up and running quickly, and results so far have been great. As for billing and reporting, the contrast between the transparency of Solita CloudBlox and the traditional data center world could not be greater. For the first time ever, Savings Bank gets thoroughly granular reports on every hour worked, every euro spent, and every cloud component implemented – literally, an eye-opener for the bank.

Our people have enabled our growth

Solita is a value-driven and people-focused company, and our people play a key role in our growth equation. While managing growth is not always easy, during every step, we’ve tried to do our best to make sure that people are happy to work here. We have fostered our culture and kept the organization lean and agile.

In practice, it means that our people have a voice, and their opinions are appreciated. Our professionals get plenty of freedom to execute their work the best way they think they should, and everyone gets to have a say in which type of projects they want to work. Caring is one of our core values, and it’s visible in many ways. Our people here really care for one another. There is always help and support available; you are never left alone.

Cloud&Connectivity Day at the Biitsi 2021 / Nea Tamminen

But being a frontrunner company in many arenas also means that there

is a requirement to learn new things, explore and test new products, and investigate new services. For some, it could become overwhelming or uncomfortable. Also, the fact that we deliver a wide range of projects means that people must be adaptable, and managing oneself becomes essential. We live in a constant change in this industry, so we need to stay alert and aim to be a bit ahead of the game all the time. And it’s our job as leaders to provide the best possible set-up for our people to do so.

The growth in the cloud community has also provided a personal growth journey for both of us. We’ve understood that like this community, also we personally are constantly evolving and never “ready”. We’ve gone through tough moments as well, but when we’ve had the courage and humility to ask for help, there’s been a line of colleagues providing their support. That’s the beauty of our culture and something we also want to hold on to in the future.

Writers of this blog, Jari Jolula and Karri Lehtinen, are long-term Solitans and have been leading our Cloud offering since the beginning.

Would you have something to add in our culture? Please check our openings and contact us!

Want to know how we do managed cloud? Please visit our Solita CloudBlox site.

AWS re:Invent 2021: Sustainability in cloud, Local zone to Finland, developer experience and more

re:Invent was held in Las Vegas last week. During and around the event AWS has again announced a bunch of new services and features. Five of them got my attention: Sustainability for Well-Architected review, Local Zone in Finland, CPU chips, better developer experience and free-tier upgrade.

December 7, 2021

Heikki Simperi Lead Cloud Architect for AWS at Solita

AWS re:Invent was held in Las Vegas, Nevada between Nov. 29th and Dec 3rd 2021. As a customer or developer, after the main event of cloud service provider (CSP) you can have mixed feelings. On the other hand, WOW feeling from the new service that solves my existing problem easily. On the other hand, I might have developed a feature for a month and now it’s available by checking a box. You will get used to these mixed feelings! The best way to avoid this is by having a strong relationship with CSP’s both in business and technical perspective. AWS’s CTO Dr. Werner Volgels said: “It’s our fault that AWS has hundreds of services with thousands of features!“. More than 90% of new services and features are based on direct customer feedback.

It’s All About Teamwork

For human beings, it is impossible to really know each service in a detailed manner. I work with several teams to find out the most suitable solutions for our projects during a typical work week for me. During our work we are taking into account constraints such as overall architecture, schedule, budget and current knowledge of the team. Developing something new is never a unicorn task but it is teamwork.

The other team members have done the groundwork for it already even if there is a technical problem that finally one person solves. It’s the same as when a group of people is trying to light a campfire and the last one succeeds. Without preliminary work from previous people, he probably wouldn’t have succeeded either.

Here are my top five AWS’s announcements from the last couple of weeks.

Sustainability Pillar for AWS Well-Architected Framework

Year after year, sustainability thinking has become the most important theme both in the world and in information technology. For example, we have noticed that sustainability has reflected our system tenders, where customers also score bids based on their sustainability values. This is the only right way!

The usage of cloud providers has always been justified by economies of scale which also covers sustainability values like energy efficiency, recycling processes and so on. The Well-Architected Framework is a structured way to analyse your workload and now also it’s sustainability values.

The AWS’s Alex Casalboni sums up the new pillar nicely into six things:

Understand your impact
Establish sustainability goals
Maximize utilization
Anticipate and adopt new, more efficient hardware and software offerings
Use managed services
Reduce the downstream impact of your cloud workloads

Local Zone to Finland in 2022

Finland with Norway and Denmark will get its own AWS Local Zone in 2022. A Local Zone provides selected AWS services closer to end-users with single-digit latency. It is a pretty new concept starting from Las Vegas in December 2019. The concept is expanding with 30 new locations in 2022. It provides typical data center services like private networking (VPC), EC2 virtual machines, EKS and ECS for containers and EBS for storage. Only selected EC2 instance types are available in the Local Zone environment.

You can manage Local Zone resources seamlessly like resources in regions, for example via AWS Console. Each Local Zone has their “mother region” for configuration and larger integrations:

The Local Zone can be very useful for many things like regulated systems and data (location inside Finland) and virtual desktop applications (low latency).

More information about Local Zones.

CPU Chip Competition Continues

For a long time, Intel had gained a superior market share in the CPU market. Then suddenly big ICT companies, like Apple and AWS, published their intention to start using self-made ARM M1 and Graviton processors. In re:Invent the 3rd generation of AWS Graviton chip was announced. With Graviton chips customers can benefit from lower overall cost due to better performance and cheaper instance pricing. The Graviton3 supports bfloat16 format which has been typically only supported by special made AI processors. EC2 C7g is the first type to use Graviton3.

Services like Lambda (FaaS) and Fargate (serverless containers) are also now supporting Graviton2. For infrastructure as code, it is just a matter of changing a parameter and you are ready. But it is just not so simple for compiled software like Docker containers (OS libraries) or 3rd party libraries. The reason is the change of processor architecture from x86 to ARM.

In a migration you should start first producing one version for x86 and one for ARM. You need to execute adequate testing on a case-by-case basis. For example Docker image repositories support multiple processor architectures for a single image. In some cases with compiled 3rd party software you are just stuck and not able to use ARM based chips. My dear colleague Timo pointed out that there is no efficient way to run Oracle XE database in his M1 MacBook laptop.

Graviton3 and EC2 C7g (C stands for computing intensive workload) instance type.

List of Graviton enabled software.

Fargate support for Graviton2.

AWS Cloud Development Kit version 2 for better developer experience

AWS Cloud Development Kit (CDK) is roughly two years old. In the opinion of me and many others it is the most efficient way to handle AWS infrastructure because of two things: it is programmatic (Typescript, Python, Java, etc.) and the CDK library provides sensible defaults and best-practice security policies.

The main focus of version 2 of the AWS CDK library is to make development experience better. The library structure changed totally. In v1 you handle tens of small CDK library packages (one for each AWS service). Now with v2, all stable libraries were consolidated to a single package. With experimental libraries the v1 model stays.

I strongly recommend most projects to start migrating to CDK v2 immediately, and only projects that are in solid maintenance mode now or soon could stay in v1. The v1 one is supported fully for the next 6 months and for bug fixes up to June 2023. Luckily, in most cases the migration is actually pretty easy and the code changes can be done in hours than in days. After code changes, you should test deployment thoroughly. The migration guide for CDK v1 to v2.

AWS CDK has improved the developer experience also by speeding up code change deployments with hot swap and Watch mode. With hot swapping functionality you can deploy eg. lambda code changes in a few seconds. When no infrastructure changes exist, CDK deployment uses direct API calls and skips Cloudformation layer in hot swap mode. Remember to use this feature only in development and testing environments because Cloudformation stack is not up to date after the bypassing operation.

CDK watch mode makes development faster by monitoring your code and assets for changes and automatically performing the optimal form of deployment every time a file change is detected. This means that you do not need to run the cdk deploy command manually anymore.

Free Tier Data expansions

At least now, everybody should use Amazon CloudFront in front of any AWS based Web or media service. You can now use it for free up to 1000 GB per month (data out). And to remind, data transfer from AWS services to CloudFront edge locations (origin fetches) has been free of charge for some years now.

Also the free tier of regional Internet outbound traffic increased from 1GB to 100GB per month.

More details about the announcement.

–

What are the most important new services or features that you use? Zoph.io’s GitHub page (thanks Arto, check his Youtube channel) compiled a comprehensive listing of new services and features.

What would You log?

This is first part of our new blog series about centralized log management in Solita Cloud. We provide numerous log management platforms to our customers and now we would like to share our experience on why centralized log management is so important and how to avoid most common pitfalls.

December 2, 2021

Aleksi Valkeinen

This blog will provide an overview of centralized log management and why it is so important. We will share our experience on what is most important when starting a logging project and how to avoid most common pitfalls. Also lemons.

Why centralized logging?

This is the question we want to answer today. Why do we need centralized logging and why should you implement it? What are the benefits and what can you expect? Interesting topics to be sure, but at the same time, why alone is quite a boring question. The reasons for having centralized logging are rather obvious but we don’t want to focus on that alone. The saying when life gives you lemons, make lemonade, fits quite well here. The logs are the lemon and you just need to know how to turn it into tasty lemonade with the aid of centralized log management.

The reason why we compared logs to lemons is the nature of logs that exist without any form of management. It can be very chaotic and leave you feeling sour after your storage has become full one too many times. Centralized log archiving can move the logs to another place, but that alone does not solve the issue of storage constraints. Those constraints are an example of why Centralized Log Management is important. It takes a step beyond just central archival of the logs.

Some can also see logs as a necessary evil, something that you are forced to store. There are quite many laws which require that companies storing personal data must also monitor and log access to it. Even logs themselves might contain personal data, like location or payment information. In EU, GDPR allows legitimate use of consumers personal information, as long as we take the necessary precautions to protect it and we are transparent on how we use it. This practically means monitoring and controlling who can access and view said data.

More local legal regulations can define for example how long a company must store it’s logs. Such legal requirements can quite easily push a company to adopt centralized log archiving in a quick manner. This is where things can get a bit messy and while logs are collected and stored, there is no real good way to leverage them. Using our previous example of comparing logs to lemons, we can make a comparison that the logs are lemons that still hang in a tree. When using central log archiving, we only collect those lemons into a basket. They’re all in the same place but we’re not using them. With centralized log management, we can process them into lemonade.

Tied to the legal requirements is security. Centralized log archiving can provide you with a single location to store and access all your audit, access and any other log. But security is not just storing logs, those logs need to be taken advantage of and analyzed. Systems like Security Information and Event Management (SIEM) can help with threat analyzing and detection. Alerts for certain actions or logins, dashboards for traffic based on access logs and traceability between applications with trace tokens to see what a user has done, are some of the possibilities that centralized log management helps enable. These features can help not only to improve a company’s security but also bring them value and important information from their systems.

Troubleshooting an issue from logs is something every developer is familiar with and having logs centralized somewhere with easy access will certainly improve developers’ life. Aforementioned trace and event logs from applications can be used for monitoring, troubleshooting and security purposes. Even the possibility of using your events and tracing for machine learning can help to understand your users and further improve your system.

Moving forward

To be able to take advantage of centralized logging you need to understand your needs for both, the present and the future. While sizing is more flexible today, especially when working in the cloud, the underlying architecture and logic needs to be planned beforehand. It is important to know the entire lifecycle of your logs, from the applications where they’re generated, all the way to the long-term storage where they’re retained.

Each log that is logged should have a clear purpose. This can’t be stressed enough. Every log has to mean something important. It’s useless to log anything that doesn’t provide any value.

There are exceptions, like when debug logs are needed for troubleshooting, but generally in its default state only the important, descriptive logs should be collected and stored. Configuring event or trace logs from applications can be very beneficial, but these need some work from the development teams. Centralized logging management projects are very much a group effort. Both developers, infrastructure, security and other parties need to work together and understand what they want and what they actually need from centralized logging management.

This all might seem like quite a lot of work and that’s because it is. But this is where good planning, proper project management and agile development comes into play. There should be a clear start and an end to the project. Not everything needs to be done immediately. Nothing is built over night, so take your time to improve and add features as time goes by and your system is ready for them. Just keep in mind what is the goal you want to achieve and what kind of value it is that you want to generate in the end. We will talk more about this in our next centralized log management blog post that dives deeper into how to start this type of a project.

There will be value

The value of centralized log management reaches further than just meeting requirements and helping accelerate development. The ability to transform your log data to alerts, dashboards, threat detection or even use them for machine learning can help to appraise the logs that are filling your disks. Centralized logging doesn’t need to be just a legal obligation or a development tool, it can be both. It can provide multiple avenues of value while meeting those pesky requirements. But as it was already said, achieving this will take time and the approach should be well planned and methodical.

With new technologies constantly emerging and becoming popular, it also challenges us to change with it. Containerized workloads running on orchestrators like Kubernetes challenges the way we think about our softwares lifecycle. And all this statelessness needs a centralized way of managing things as old ways are no longer applicable.

At least for your logging needs the transformation is easy, you can just contact us and we will help you to design, develop, deploy and take care of your centralized logging management. Or just keep reading out blogs as they come out.

Solita is a technology, data and design company with years of experience working in cloud and on-premises. We have helped companies in their transformation into technology driven organizations and brought centralized log management into their lives. Our experience rests both in large and small setups with data from few hundred gigabytes per day into terabytes of data. Technologies behind us are commonly know and popular, Elasticsearch, Opensearch, Graylog and cloud specific services like CloudWatch, to mention a few. These will be the vocal point in our examples as these are what we work with day-to-day basis. Migration from on-premise to cloud and changing technologies is also something we are very familiar with as cloud is constantly gaining more popularity.

In our next Centralized Logging Management blog we will talk about how this kind of project should be started and how it’s actually done properly from start to finish. At later date we will return for more in-depth technical view on different features and how to use them.

The learning curve of a Cloud Service Specialist at Solita

Tommi Ritvanen is part of Cloud Continuous Services Team in Solita. The team consists of dozens of specialist who ensure that customers' cloud solutions are running smoothly. Now Tommi shares his experiences of learning path, culture and team collaboration.

October 4, 2021

Tommi Ritvanen

I’ve been working at Solita for six months as a Cloud Service Specialist. I’m part of the cloud continuous services team, where we take care of our ongoing customers and ensure that their cloud solutions are running smoothly. After our colleagues have delivered a project, we basically take over and continue supporting the customer with the next steps.

What I like about my job is that every day is different. I get to work and learn from different technologies; we work with all the major cloud platforms such as AWS, Microsoft Azure and Google Cloud Platform. What also brings variety in our days is that we have different types of customers that we serve and support. The requests we get are multiple, so there is no boring day in this line of work.

What inspires me the most in my role is that I’m able to work with new topics and develop my skills in areas I haven’t worked on before. I wanted to work with public cloud, and now I’m doing it. I like the way we exchange ideas and share knowledge in the team. This way, we can find ways to improve and work smarter.

We have this mentality of challenging the status quo positively. Also, the fact that the industry is changing quickly brings a nice challenge; to be good at your job, you need to be aware of what is going on. Solita also has an attractive client portfolio and a track record of building very impactful solutions, so it’s exciting to be part of all that too.

I got responsibility from day one

Our team has grown a lot which means that we have people with different perspectives and visions. It’s a nice mix of seniors and juniors, which creates a good environment for learning. I think the collaboration in the team works well, even though we are located around Finland in different offices. While we take care of our tasks independently, there is always support available from other members of the cloud team. Sometimes we go through things together to share knowledge and spread the expertise within the team.

The overall culture at Solita supports learning and growth, there is a really low barrier to ask questions, and you can ask for help from anyone, even people outside of your team. I joined Solita with very little cloud experience, but I’ve learned so much during the past six months. I’ve got responsibility from the beginning and learned while doing, which is the best way of learning for me.

From day one, I got the freedom to decide which direction I wanted to take in my learning path, including the technologies. We have study groups and flexible opportunities to get certified in the technologies we find interesting.

As part of the onboarding process, I did this practical training project executed in a sandbox environment. We started from scratch, built the architecture, and drew the process like we would do in a real-life situation, navigating the environment and practising the technologies we needed. The process itself and the support we got from more senior colleagues was highly useful.

Being professional doesn’t mean being serious

The culture at Solita is very people-focused. I’ve felt welcome from the beginning, and regardless of the fact that I’m the only member of the cloud continuous services team here in Oulu, people have adopted me as part of the office crew. The atmosphere is casual, and people are allowed to have fun at work. Being professional doesn’t mean being serious.

People here want to improve and go the extra mile in delivering great results to our customers. This means that to be successful in this environment, you need to have the courage to ask questions and look for help if you don’t know something. The culture is inclusive, but you need to show up to be part of the community. There are many opportunities to get to know people, coffee breaks and social activities. We also share stories from our personal lives, which makes me feel that I can be my authentic self.

We are constantly looking for new colleagues in our Cloud and Connectivity Community! Check out our open positions here!

Using Azure policies to audit and automate RBAC role assignments

Usually different RBAC role assignments in Azure might be inherited from subscription / management group level but there may come a time when that's just way too broad spectrum to give permissions to an AD user group.

September 15, 2021

Marko Laitinen

While it’s tempting to assign permissions on a larger scope, sometimes you might rather prefer to have only some of the subscription’s resource groups granted with a RBAC role with minimal permissions to accomplish the task at hand. In those scenarios you’ll usually end up with one of the following options to handle the role assignments:

Include the role assignments in your ARM templates / Terraform codes / Bicep templates
Manually add the role to proper resource groups

If neither these appeal to you, there’s a third option: define an Azure policy which identifies correct resource groups and then deploys RBAC role assignments automatically if conditions are met. This blog will go over with step-by-step instructions how to:

Create a custom Azure policy definition for assigning Contributor RBAC role for an Azure AD group
Create a custom RBAC role for policy deployments and add it to your policy definition
Create an assignment for the custom policy

The example scenario is very specific and the policy definition is created to match this particular scenario. You can use the solution provided in this post as a basis to create something that fits exactly to your needs.

Azure policies in brief

Azure policies are a handy way to add automation and audit functionality to your cloud subscriptions. The policies can be applied to make sure resources are created following the company’s cloud governance guidelines for resource tagging or picking the right SKUs for VMs as an example. Microsoft provides a lot of different type built-in policies that are pretty much ready for assignment. However, for specific needs you’ll usually end up creating a custom policy that better suits your needs.

Using Azure policies is divided into two main steps:

You need to define a policy which means creating a ruleset (policy rule) and actions (effect) to apply if a resource matches the defined rules.
Then you must assign the policy to desired scope (management group / subscription / resource group / resource level). Assignment scope defines the maximum level of scanning if resources match the policy criteria. Usually the preferable levels are management group / subscription.

Depending on how you prefer governing your environment, you can resolve to use individual policies or group multiple policies into initiatives. Initiatives help you simplify assignments by working with groups instead of individual assignments. It also helps with handling service principal permissions. If you create a policy for enforcing 5 different tags, you’ll end up with having five service principals with the same permissions if you don’t use an initiative that groups the policies into one.

Creating the policy definition for assignment of Contributor RBAC role

The RBAC role assignment can be done with policy that targets the wanted scope of resources through policy rules. So first we’ll start with defining some basic properties for our policy which tells the other users what this policy is meant for. Few mentions:

Policy type = custom. Everything that’s not built-in is custom.
Mode = all since we won’t be creating a policy that enforces tags or locations
Category can be anything you like. We’ll use “Role assignment” as an example

{
	"properties": {
		"displayName": "Assign Contributor RBAC role for an AD group",
		"policyType": "Custom",
		"mode": "All",
		"description": "Assigns Contributor RBAC role for AD group resource groups with Tag 'RbacAssignment = true' and name prefix 'my-rg-prefix'. Existing resource groups can be remediated by triggering a remediation task.",
		"metadata": {
			"category": "Role assignment"
		},
		"parameters": {},
		"policyRule": {}
	}
}

Now we have our policy’s base information set. It’s time to form a policy rule. The policy rule consists of two blocks: policyRule and then. First one is the actual rule definition and the latter is the definition of what should be done when conditions are met. We’ll want to target only a few specific resource groups so the scope can be narrowed down with tag evaluations and resource group name conventions. To do this let’s slap an allOf operator (which is kind of like the logical operator ‘and’) to the policy rule and set up the rules

{
	"properties": {
		"displayName": "Assign Contributor RBAC role for an AD group",
		"policyType": "Custom",
		"mode": "All",
		"description": "Assigns Contributor RBAC role for AD group resource groups with Tag 'RbacAssignment = true' and name prefix 'my-rg-prefix'. Existing resource groups can be remediated by triggering a remediation task.",
		"metadata": {
			"category": "Role assignment"
		},
		"parameters": {},
		"policyRule": {
			"if": {
				"allOf": [{
						"field": "type",
						"equals": "Microsoft.Resources/subscriptions/resourceGroups"
					}, 	{
						"field": "name",
						"like": "my-rg-prefix*"
					},	{
						"field": "tags['RbacAssignment']",
						"equals": "true"
					}
				]
			},
			"then": {}
		}
	}
}

As can be seen from the JSON, the policy is applied to a resource (or actually a resource group) if

It’s type of Microsoft.Resources/subscriptions/resourceGroups = the target resource is a resource group
It has a tag named RbacAssignment set to true
The resource group name starts with my-rg-prefix

In order for the policy to actually do something, an effect must be defined. Because we want the role assignment to be automated, the deployIfNotExists effect is perfect. Few mentions of how to set up an effect:

The most important stuff is in the details block
The type of the deployment and the scope of an existence check is Microsoft.Authorization/roleAssignments for RBAC role assignments
An existence condition is kind of an another if block: the policy rule checks if a resource matches the conditions which makes it applicable for the policy. Existence check then confirms if the requirements of the details are met. If not, an ARM template will be deployed to the scoped resource

The existence condition of then block in the code example below checks the role assignment for a principal id through combination of Microsoft.Authorization/roleAssignments/roleDefinitionId and Microsoft.Authorization/roleAssignments/principalId. Since we want to assign the policy to a subscription, roleDefinitionId path must include the /subscriptions/<your_subscription_id>/.. in order for the policy to work properly.

{
	"properties": {
		"displayName": "Assign Contributor RBAC role for an AD group",
		"policyType": "Custom",
		"mode": "All",
		"description": "Assigns Contributor RBAC role for AD group resource groups with Tag 'RbacAssignment = true' and name prefix 'my-rg-prefix'. Existing resource groups can be remediated by triggering a remediation task.",
		"metadata": {
			"category": "Role assignment"
		},
		"parameters": {},
		"policyRule": {
			"if": {
				"allOf": [{
						"field": "type",
						"equals": "Microsoft.Resources/subscriptions/resourceGroups"
					}, 	{
						"field": "name",
						"like": "my-rg-prefix*"
					}, {
						"field": "tags['RbacAssignment']",
						"equals": "true"
					}
				]
			},
			"then": {
				"effect": "deployIfNotExists",
				"details": {
					"type": "Microsoft.Authorization/roleAssignments",
					"roleDefinitionIds": [
						"/providers/microsoft.authorization/roleDefinitions/18d7d88d-d35e-4fb5-a5c3-7773c20a72d9" // Use user access administrator role update RBAC role assignments
					],
					"existenceCondition": {
						"allOf": [{
								"field": "Microsoft.Authorization/roleAssignments/roleDefinitionId",
								"equals": "/subscriptions/your_subscription_id/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c" // RBAC role definition ID for Contributor role
							}, {
								"field": "Microsoft.Authorization/roleAssignments/principalId",
								"equals": "OBJECT_ID_OF_YOUR_AD_GROUP" // Object ID of desired AD group
							}
						]
					}
				}
			}
		}

The last thing to add is the actual ARM template that will be deployed if existence conditions are not met. The template itself is fairly simple since it’s only containing the definitions for a RBAC role assignment.

{
	"properties": {
		"displayName": "Assign Contributor RBAC role for an AD group",
		"policyType": "Custom",
		"mode": "All",
		"description": "Assigns Contributor RBAC role for AD group resource groups with Tag 'RbacAssignment = true' and name prefix 'my-rg-prefix'. Existing resource groups can be remediated by triggering a remediation task.",
		"metadata": {
			"category": "Tags",
		},
		"parameters": {},
		"policyRule": {
			"if": {
				"allOf": [{
						"field": "type",
						"equals": "Microsoft.Resources/subscriptions/resourceGroups"
					}, 	{
						"field": "name",
						"like": "my-rg-prefix*"
					}, {
						"field": "tags['RbacAssignment']",
						"equals": "true"
					}
				]
			},
			"then": {
				"effect": "deployIfNotExists",
				"details": {
					"type": "Microsoft.Authorization/roleAssignments",
					"roleDefinitionIds": [
						"/providers/microsoft.authorization/roleDefinitions/18d7d88d-d35e-4fb5-a5c3-7773c20a72d9" // Use user access administrator role update RBAC role assignments
					],
					"existenceCondition": {
						"allOf": [{
								"field": "Microsoft.Authorization/roleAssignments/roleDefinitionId",
								"equals": "/subscriptions/your_subscription_id/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c" // RBAC role definition ID for Contributor role
							}, {
								"field": "Microsoft.Authorization/roleAssignments/principalId",
								"equals": "OBJECT_ID_OF_YOUR_AD_GROUP" // Object ID of desired AD group
							}
						]
					},
					"deployment": {
						"properties": {
							"mode": "incremental",
							"template": {
								"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
								"contentVersion": "1.0.0.0",
								"parameters": {
									"adGroupId": {
										"type": "string",
										"defaultValue": "OBJECT_ID_OF_YOUR_AD_GROUP",
										"metadata": {
											"description": "ObjectId of an AD group"
										}
									},
									"contributorRbacRole": {
										"type": "string",
										"defaultValue": "[concat('/subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c')]",
										"metadata": {
											"description": "Contributor RBAC role definition ID"
										}
									}
								},
								"resources": [{
										"type": "Microsoft.Authorization/roleAssignments",
										"apiVersion": "2018-09-01-preview",
										"name": "[guid(resourceGroup().id, deployment().name)]",
										"properties": {
											"roleDefinitionId": "[parameters('contributorRbacRole')]",
											"principalId": "[parameters('adGroupId')]"
										}
									}
								]
							}
						}
					}
				}
			}
		}
	}
}

And that’s it! Now we have the policy definition set up for checking and remediating default RBAC role assignment for our subscription. If the automated deployment feels too daunting, the effect can be swapped to auditIfNotExist version. That way you won’t be deploying anything automatically but you can simply audit all the resource groups in the scope for default RBAC role assignments.

{
	"properties": {
		"displayName": "Assign Contributor RBAC role for an AD group",
		"policyType": "Custom",
		"mode": "All",
		"description": "Assigns Contributor RBAC role for AD group resource groups with Tag 'RbacAssignment = true' and name prefix 'my-rg-prefix'. Existing resource groups can be remediated by triggering a remediation task.",
		"metadata": {
			"category": "Tags",
		},
		"parameters": {},
		"policyRule": {
			"if": {
				"allOf": [{
						"field": "type",
						"equals": "Microsoft.Resources/subscriptions/resourceGroups"
					}, 	{
						"field": "name",
						"like": "my-rg-prefix*"
					}, {
						"field": "tags['RbacAssignment']",
						"equals": "true"
					}
				]
			},
			"then": {
				"effect": "auditIfNotExist",
				"details": {
					"type": "Microsoft.Authorization/roleAssignments",
					"existenceCondition": {
						"allOf": [{
								"field": "Microsoft.Authorization/roleAssignments/roleDefinitionId",
								"equals": "/subscriptions/your_subscription_id/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c" // RBAC role definition ID for Contributor role
							}, {
								"field": "Microsoft.Authorization/roleAssignments/principalId",
								"equals": "OBJECT_ID_OF_YOUR_AD_GROUP" // Object ID of desired AD group
							}
						]
					}
				}
			}
		}
	}
}

That should be enough, right? Well it isn’t. Since we’re using ARM template deployment with our policy, we must add a role with privileges to create remediation tasks which essentially means we must add a role that has privileges to create and validate resource deployments. Azure doesn’t provide such policy with minimal privileges out-of-the-box since the scope that has all the permissions we need is Owner. We naturally don’t want to give Owner permissions to anything if we reeeeeally don’t have to. The solution: create a custom RBAC role for Azure Policy remediation tasks.

Create custom RBAC role for policy remediation

Luckily creating a new RBAC role for our needs is a fairly straightforward task. You can create new roles in Azure portal or with Powershell or Azure CLI. Depending on your desire and permissions to go around in Azure, you’ll want to create the new role into a management group or a subscription to contain it to a level where it is needed. Of course there’s no harm done to spread that role to wider area of your Azure environment, but for the sake of keeping everything tidy, we’ll create the new role to one subscription since it’s not needed elsewhere for the moment.

Note that the custom role only allows anyone to validate and create deployments. That’s not enough to actually do anything. You’ll need to combine the deployment role with a role that has permissions to do the stuff set in deployment. For RBAC role assignments you’d need to add “User Access Administrator” role to the deployer as well.

Here’s how to do it in Azure portal:

Go to your subscription listing in Azure, pick the subscription you want to add the role to and head on to Access control (IAM) tab.
From the top toolbar, click on the “Add” menu and select “Add custom role”.
Give your role a clear, descriptive name such as Least privilege deployer or something else that you think is more descriptive.
Add a description.
Add permissions Microsoft.Resources/deployments/validate/action and Microsoft.Resources/deployments/write to the role.
Set the assignable scope to your subscription.
Review everything and save.

After the role is created, check it’s properties and take note of the role id. Next we’ll need to update the policy definition made earlier in order to get the new RBAC role assigned to the service principal during policy initiative assignment.

So from the template, change this in effect block:

"roleDefinitionIds": [
	"/providers/microsoft.authorization/roleDefinitions/18d7d88d-d35e-4fb5-a5c3-7773c20a72d9" // Use user access administrator role update RBAC role assignments
]

to this:

"roleDefinitionIds": [
	"/providers/microsoft.authorization/roleDefinitions/18d7d88d-d35e-4fb5-a5c3-7773c20a72d9", // Use user access administrator role update RBAC role assignments
	"/subscriptions/your_subscription_id/providers/Microsoft.Authorization/roleDefinitions/THE_NEW_ROLE_ID" // The newly created role with permissions to create and validate deployments
]

Assigning the created policy

Creating the policy definition is not enough for the policy to take effect. As mentioned before, the definition is merely a ruleset created for assigning the policy and does nothing without the policy assignment. Like definitions, assignments can be set to desired scope. Depending on your policy, you can set the policy for management group level or individual assignments to subscription level with property values that fit each individual subscription as needed.

Open Azure Policy and select “Assignment” from the left side menu. You can find “Assign policy” from the top toolbar. There’s a few considerations that you should go over when you’re assigning a policy:

Basics

The scope: always think about your assignment scope before blindly assigning policies that modify your environment.
Exclusion is a possibility, not a necessity. Should you re-evaluate the policy definition if you find yourself adding a lot of exclusions?
Policy enforcement: if you have ANY doubts about the policy you have created, don’t enforce the policy. That way you won’t accidentally overwrite anything. It might be a good idea to assign policy without enforcement for the first time, review compliance results and if you’re happy with them, then enforce the policy.
- You can fix all the non-compliant resources with a remediation task after initial compliance scan

Remediation

If you have a policy that changes something either with modify of deployIfNotExists effect, you’ll be creating a service principal for implementing the changes when you assign the policy. Be sure to check the location (region) of the service principal that it matches your desired location.
If you select to create a remediation tasks upon assignment, it will implement the changes in policy to existing resources. So if you have doubts if the policy works as you desire, do not create a remediation task during assignment. Review the compliance results first, then create the remediation task if everything’s ok.

Non-compliance message

It’s usually a good idea to create a custom non-compliance message for your own custom definitions.

After you’ve set up all relevant stuff for the assignment and created it, it’s time to wait for the compliance checks to go through. When you’ve created an assignment, the first compliance check cycle is done usually within 30 minutes of the assignment creation. After the first cycle, compliance is evaluated once every 24 hours or whenever the assigned policy definitions are changed. If that’s not fast enough for you, you can always trigger an on-demand evaluation scan.

Solita Cloud Manifesto – We all love tinkering

August 30, 2021

Lauri Siljander

“Tinkering” is a term for a form of tweaking. One user of the Finnish urban dictionary website puts it as “what computer people call programming or some such tweaking”. Tinkering and tweaking are words often used by specialists working with automatic data processing, and especially infrastructure and operating systems; sometimes several times a day. Many use tinkering as a general description of everything that an infrastructure expert does for a job; some of us drive cars, some of us wash cars, some of us tinker with infrastructure.

In order to find new approaches to the hard core of tinkering, I had a talk with two fresh faces at Solita Cloud, Tommi Ritvanen and Lauri Moilanen. We focused on the question of what tinkering actually is, reaching pretty far down the rabbit hole, and also discussed what is included in tinkering. Finally, we naturally considered how the newly-published Solita Cloud Manifesto has manifested in the everyday work of our professionals.

We gave a lot of thought to whether tinkering is a professional activity. It could also be seen as an amateurish term – something that refers to artisanal “gum and tape” contraptions rather than professional, fully automated and easily replicated solutions. Lauri Moilanen said that he puts in as much work as possible to minimise tinkering. This is not the full picture, as he continues to state that tinkering is a very interesting phase, but it’s only the first phase. What is even more interesting is how the chosen initial setup can be refined into a professional final product. Tommi Ritvanen had a different perspective. If there is no ready-made solution for the required product, he sees tinkering as producing the automated final product.

In the Solita Cloud Manifesto, we posit that “tinkering is a combination of interest and learning experiences”.

Tommi suggests that tinkering is not always smooth sailing. One has to – or gets to – work on, polish, iterate and grapple with the final product. One cornerstone of learning at Solita is learning by doing, and we believe that 70 per cent of creative experts’ learning happens through everyday work and experiments. An academic attitude in thinking is highly advantageous when learning from books or documentation and to come up with hypothetical solutions for a given problem – or when considering the problem itself.

True individual learning events are related to learning by doing, and, in particular, to learning by doing something outside one’s comfort zone. Purely technical slogging is rarely what happens in Solita projects. Because our business is about people creating solutions for others who use them, the object of tinkering is often not only our customers’ processes and operating methods, but also those of Solita.

Tommi and Lauri have different profiles at Solita Cloud. Their career paths are different, but both think that they now find themselves in a position where they have wanted to be. Working at Solita is the sum of their own choices. Currently, Tommi works in the Cloud maintenance team and Lauri works as an infrastructure specialist in two projects.

Is it okay to say no at Solita, and is it possible to pursue your own interests?

Tommi says that the maintenance team has people from different backgrounds. The team members can take their work in whatever direction they wish, but their activities are limited by the reactive side of maintenance. “Maintenance requires you to take care of things, even if you don’t want to,” says Lauri, who has experience in maintenance work prior to Solita. Maintenance team members must be flexible and willing to learn and do things. There is no hiding in a silo. “You encounter many new things in maintenance, so you take them on and want to try everything,” says Tommi.

In projects, the resourcing stage largely dictates what the specialist will be doing. Although the situation may be an overly complex puzzle with indirect consequences for every move, at its best, the resourcing stage can be a dialogue between the specialist, the account, and Cloud’s resourcing manager, Jukka Kantola. The importance of communication is highlighted even before the actual work begins.

Lauri says that there is a sense of psychological safety at Solita Cloud. According to him, it is okay to say “no” to resourcing, and there is no hidden pressure in the environment forcing people to do what they would rather not. He points to the important observation that he is not afraid to say no in this environment.

The customer’s end goal may not be clear at the resourcing stage, so the team of specialists is expected to continuously investigate it and find facts – and maintain situational awareness. In other words, the work’s expectations may not be precisely worded, or the vision of the desired end result may change already in the early stages of the project. Changes are likely as the project progresses, whether the identified project model is like a waterfall or agile.

Those individuals working in Solita projects are authorised and obligated to discover and clarify what needs to be done and why, as in what needs to be achieved by the end result. The will to understand the whole connects all Solita employees, and Solita project managers are especially skilled at taking hold of all the strings and leading communication.

The specialists have personal motivations that we have collected somewhat frequently through, for example, our Moving Motivators exercises. Lauri offers his thoughts on how to express the feeling of “doing a job with a purpose”: “When you turn off your computer, you are left with a feeling that you did something meaningful and accomplished things,” says Lauri about his personal motivation. Motivated people produce better results, which equally benefit Solita, our customers, and the employee in question. Motivation, but also confusion, often manifest in people attempting to challenge external requirements and pushing themselves physically to reach high-quality results. “An attitude of ‘just get something done’ kills motivation and often surfaces when there’s a rush,” says Lauri.

In the crossfire of moving targets and ambiguous goals, we at Solita have to understand the limits of our personal ability and be able to take on technically challenging situations in projects as a community. We at Solita Cloud are a group of people from a variety of backgrounds with different motivations, and we may not be focused on a single clear target in our everyday work. But we all love tinkering.