Solita Cloud Academy: a Fast Track to Cloud Business

Solita Cloud Academy is a three-month intense training for people who are familiar with IT business and want to work with cloud solutions. Students are employed by Solita from the first day of their training.

It’s not a secret that the competition for talent in the tech industry is tough and particularly challenging among cloud professionals. According to forecasts, the cloud market will multiply in the coming years. Nordic countries are leading the way in cloud adoption, and to keep the development going, there is a huge need for skilled people.

To meet the needs in the market, Solita came up with a solution: a carefully tailored training path for IT professionals who desire to become Cloud experts.

“The talent shortage is a real issue, so we decided to train people ourselves. We want to continue growing our cloud business and serve our customers with their digital transformations also in the future”, says Karri Lehtinen, SVP of Solita Cloud Platforms.

Solita Cloud Academy educates new public cloud specialists to work on Solita projects. Right now, the main focus will be on Microsoft Azure training and Solita Cloud Platforms Way of Working, including mastery of Solita CloudBlox.

Cloud Academy will be started several times during 2023, and two to four students will be recruited for each round. Academians will get 12 months membership in the Sovelto PRO program and will accomplish the Azure Solution Architect Path during the first three months. The program is designed together with Sovelto Eduhouse.

“The benefit of this concept is that people can join customer projects quickly after three months of training. We want to keep the groups small to ensure sufficient support and a clear track to customer projects”, says Saila Karonen, the Talent Acquisition Owner in the Cloud Community at Solita.

The demand for Azure know-how is very high right now, which explains the chosen technology. But it is very possible that some of the future academies will focus on other technologies. Each Academy is tailored based on the prevailing needs in the market.

Solita Cloud Academy is for people who already have experience in the IT industry

This Academy is specifically targeted at people with experience in the IT industry who are now willing to learn how to design and implement public cloud infrastructures with infra-as-code. The ideal candidates have worked on IT projects before and share Solita’s philosophy
about automation coming first.

“We’d like to see cloud-curious people who know the industry and understand tech. They could be, for example, software developers or system specialists who have at least a basic understanding of IT Infrastructure Platforms”, says Lauri Siljander, the Principal of Cloud Academy.

On top of technology training, students will go through full onboarding to Solita. They will be employed by Solita from day one and receive fair compensation during these three months of studying. After graduation, there will be a salary review and discussion.

A career path to Solita’s Cloud Community

The first group of Solita Cloud Academy students is in their second month, and the experience so far has been positive. Students have felt that it’s a good combination of guided and autonomous studying, with both peer and tutorial support at hand. People are participating from different parts of Finland, so the program is fully virtual and currently conducted in Finnish.

Solita Cloud Academy is a path to Solita’s Cloud Community, a unit of about 100 Cloud Professionals in Finland. The community is passionate about quality and cares for customers’ business results. It’s a workplace where people are encouraged to craft their own path based on their interests while being open to learning and sharing their knowledge with others.

Joining Solita Cloud Community means being part of a value-driven culture where people help each other and want to make a long-lasting impact – together.

keynote vogels heikki

AWS re:Invent greetings from Thursday and Friday

People loves visualizations and spatial intelligence is the new norm soon. It was the last conference day for me. After re:Play and the night it was time to fly back home.

Today I woke up again without an alarm at 7 am. Sunny walk to the Venetian via the Caesar Forum. It is an easy 20 minute walk from the Paris hotel.

Dr. Werner Vogels Keynote (KEY005)

The keynote started with an extremely entertaining  and educational video about how the world is basically asynchronous. The video showed an example about a synchronous world when Dr. Vogels visited a restaurant. For example each customer walked in one at the time, order could be done only one item at the time, food was prepared one item at the time and so on. Basically a busy restaurant is a great example of how working asynchronously speeds up the service.

keynote vogels heikki

He reminded us world is asynchronous. The synchronising is just an illusion and a simplification. The most systems need to be event driven at least for background processing.

The announcement of the AWS CodeCatalyst was important. It is a Unified Software Development Service. Wait a minute, Why it is important? During the last few years GitHub’s ecosystem has risen to a dominant market position. More major player competition is needed strongly. GitHub also published their equal product called CodeSpaces a while ago. The whole idea of running development environments in the cloud can change development work dramatically in a few years.

Dr Vogels talked then about the new era of 3D modeling and the Unreal’s demos were impressive. The most natural language for humans is visualization. AWS wants to visualize everything. Spatial intelligence is another major thing. It allows us to think how objects react with the physical world, eg. trying out new shoes virtually.

To short visit to Expo

After the keynote I visited the serverlesspresso in the Expo area to catch a cup of cappuccino. It is a concrete sample of coffee shop where you don’t need to be in a queue and the running cost for software cost less than 100 USD per month. The first step is to scan coffee shop’s QR Code with your phone’s camera. Then you choose what to order, and finally pickup the order after a couple of minutes.

 

APIs: Critical for data transfer, but how do you keep them secure? (NET316)

AWS APIGW with edge-optimized is a good choice to publish APIs by default. It protects you automatically from the “internet noise” up to OSI layer 4 (TCP). To have more complete protection it is good to add Web Application Firewall (WAF) to have protection up to layer 7 (Application). For example it adds protection against malicious HTTP requests (SQL injection, XSS etc.), and block traffic from known bad ip traffics.

Here is two newest features on AWS WAF summarized by AWS:

  • AWS WAF Bot Control gives you visibility and control over common and pervasive bot traffic that can consume excess resources, skew metrics, cause downtime, or perform other undesired activities.
  • AWS WAF Fraud Control – Account Takeover Prevention is a managed rule group that monitors your application’s login page for unauthorized access to user accounts using compromised credentials.

I had missed the feature that WAF supports custom responses for blocked requests, eg. JSON response with 200 OK code.

Introducing AWS KMS external keys (SEC336)

Before we could use AWS managed HSM clusters (normal KMS CMK) or use our own AWS CloudHSM cluster. AWS announced support for external HSM providers (XKS). In XKS setup you can purchase HSM from any other HSM vendor that supports AWS open source XKS specification. The external HSM can work as a “kill switch” for all data in AWS. If you block key usage, AWS or any 3rd party cannot open the data anymore.

The KMS service will connect the external HSM directly via public interface or via customer’s VPC using customer’s managed XKS proxy service. The XKS Proxy can be a Fargate service where a customer runs an image provided by AWS. You can use any connection method towards the external HSM from the proxy, eg. direct connection or VPN service.

This can be a very important feature for sensitive private or public sector data. In most cases the AWS KMS CMK is more than enough service and it is certified for credit card, health care, etc. data.

re:Play and heading back home on Friday

The re:Play is the main festival of AWS re:Invent conference. The festival is held in the Las Vegas festival area 2 miles North from venues.

It is a massive transportation challenge to move tens of thousands of people in a short period of time. This year the transportation was a bit of a hassle. My friends were first in the line and were waiting still the bus for 30 minutes. My bus was waiting for 20 minutes for unloading. Finally our full bus was directed just to leave the unloading area accidentally. Finally we unloaded in the middle of the way out from the unloading area. All good after that.

At the festival area are multiple stages, game area, head-phone listening area with bar, food areas, you name it. Everything worked nicely in the area. The main show was DJ Martin Garrix. Awesome atmosphere in the main stage’s massive tent.

My flight leaves in 30 minutes (fingers crossed). The overall experience of the conference was extremely good, so much to see, learn and experience.

AWS re:Invent greetings from Wednesday

Most cloud users are developing apps on top of cloud platforms by a managed services provider (MSP). Solita provides MSP services with Solita CloudBlox for all major clouds. Behind the scenes is happening a lot, for example identity and network management. Today was all about this.

My Wednesday started with a small sunny walk to the MGM Grand Conference center for breakfast and sessions. It is remarkable how life changes just after a block from the Strip. Here is one picture where you can see hundreds of meters of wall of enormous 9 floor garage buildings.

Morning sessions

Architectural innovation for highly distributed edge workloads (HYB307)

The session was mostly about AWS Outpost instances located in on-prem. I missed the hybrid term when choosing the session. I was expecting to learn more about local zones etc. AWS Outpost is basically an extension for your VM related workloads to on-prem. For example data center cooling, networking and physical security are customer’s responsibilities in a shared responsibility model. AWS owns the device and provides 4h SLA service coverage for it. The Outpost connects to the AWS control plane via multiple VPN’s managed by AWS. It has local network connectivity. Basically it is good for some latency critical software or for regulatory purposes.

Reimagine the security boundary with Zero Trust (SEC324)

The first half we discussed the new AWS Verified Access (AVA) service. It provides an important solution to have “micro VPN” from the user’s browser up to the VPC’s private load balancer. The client is first authenticated via normal SSO (eg. AzureAD). Then the Chrome/Firefox browser plugin creates a secure connection to AVA endpoint. Finally the client can access private VPC ALB/NLB if the access is granted.
In AVA the access to applications can be controlled at a detailed level via access groups and their policies. Software defined perimeters (SDP) can be adjusted. The AVA service creates access logs in Open Cybersecurity Schema Framework (OCSF) format. Standardized log format makes integrations to SIEM service easy.
The pricing is something to notice. If we have an application which has three environments, the service will cost us 21 000 USD in three years plus 0.02 USD/GB for data transfer. Still I think the price is relatively high if your app is not truly a large scale enterprise app or you have multiple micro services = applications used by the client.
The other half we talked about basics. AWS has many fundamentally important services that people don’t associate with ZeroTrust services. AWS IAM eg. with AWS APIGW creates a solid managed solution to control service access. Inside a VPC, the usage of security group relationships (SG A can connect SG B’s port 443) makes a huge difference. PrivateLinks to share services between VPCs’ is again a solid structure. Also new AWS VPC Lattice a.k.a “consolidated service mesh” looks promising to great cross-account service mesh networking.
The lunch at Caesar Forum was great.

Afternoon and evening sessions

AWS network architectures for very large environments (NET303)

When you are using a single AWS account for multiple workloads you might end up with hard limits that 99% percent of AWS users don’t know exist. Before there was a 50 000 ip address limit under the hood (not public information). During that period AWS monitored each VPC’s IP usage. When the usage was over 10k AWS proactively contacted the customer to discuss needs and maybe to rethink the architecture a bit. Nowadays the limit is around 250 000.
It was time to move to Caesar Forum by the re:Invent shuttle service.

Network operations, management, and governance best practices (NET305)

The session highlighted four network operations categories: collect, monitor, troubleshoot and analyze. The price comparison can sometimes relieve strange things. For example VPC Reachability Analyzer costs 0,10€ per run and Transit Gateway Route Analyzer is free to use. If you would like to monitor your essential routes every 5 to 10 minutes even the smallest fee can cost a lot at the end.
NetDevOps stands for network development operations. It is a huge cultural change for many organizations. I guess more than 90 percent of networks are still managed by ClickOps method, in a good scenario including documentations.
In Solita CloudBlox development we believe strongly to devops culture. For example if I need to add a new route to AWS TransitGateway or create a new Route53 DNS Zone, I will make changes to a branch in the GIT repository. After the PR of the branch is accepted and merged, it will automatically trigger the CICD pipeline. In the end of the pipeline AWS organization is fully up to date including the latest changes defined.

Designing a multi-account environment for disaster recovery (ARC319)

The session was about business continuity (BC). The basic explained solution was to have a reserved organizational unit (OU in AWS Organizations). In the BC OU you will have one or more BC accounts with very limited access. Each solution must follow 3-2-1 pattern: (3) 1 primary and 2 secondary copies of data, (2) 2 accounts and (1) 1 cross-region (or outpost solution).
In case when one or multiple accounts gets compromised, you need to be ready to restore  accounts from ground up. In this kind of DR scenario you would need to retrieve secondary backup from BC account to newly re-created accounts. Old accounts would be locked and reserved for future investigations.

The cake vending machine

After a long day I decided to walk back to the hotel to write this blog post and to pack my luggages ready for Friday. Thursday is awesome re:Party again so I will be late at the hotel.
Every day I pass the cake vending machine twice in the Paris Hotel & Casino complex. Often there are a bunch of people using it. That’s all for today, bye!

AWS re:Invent greetings from Tuesday

The second day of re:Invent started with Adam Selipsky's keynote. The most interesting announcement from Monday was Lambda SnapStart for Java functions.

Tuesday started with breakfast in the Venetian and watching the keynote of Adam Selipsky (CEO of Amazon Web Services).

Keynote Adam Selipsky (KEY302)

It showed that all the other people wanted to watch Mr. Selipsky live. About 45 minutes before the massive ballroom was fully occupied. As a backup plan I watched the keynote in the registration lounge in Venetian. I shared a cozy sofa with my colleagues Tero and Joonas.

Mr. Selipsky went from space deep under the water, and from there to a high mountain with tricky transitions. He remind us that during the next five years the amount of data that exists will doubled. During the talk, he announced several interesting new services/features that people loved based on cheers. Multiple new integrations and services provide capabilities to make extraction and analysis easier. For full coverage you need to visit AWS news blog. Here is the list of top announcements from Monday and Tuesday which got my attention.

Lambda with Java has the lowest cold start delay after today. How crazy is that?

For example Lambda with NodeJS is popular because it has a very short cold start delay. It means the time to initialize the lambda before it can handle requests. On the contrary, Java is famous for having problematic cold start delays. The lambda itself is slower and in addition to that, Java apps are typically use SpringBoot-like platforms. The cold start time can then vary anything from a few seconds to 30-60 seconds.

The free Lambda SnapStart feature runs the initialisation phase automatically after each code deployment. At the end, it creates a snapshot of the lambda’s memory. Later, when the lambda is actually invoked, the memory snapshot is restored and no initialisation work is required. It makes the cold start 10-1000x faster depending on the function.

Check out Jeff’s blog post for more information or just test it by yourself!

Serverless AWS OpenSearch for big and spiky workloads

From the AWS website: “Amazon OpenSearch Service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more”. After today we don’t need to scale up and down the cluster when using Amazon OpenSearch Serverless. The new release will automatically scale the domain up and down based on the load of indexing and searching.

The pricing is always crucial information. The smallest cluster size costs about 690 USD/month. So it is probably the most expensive serverless product in AWS. With on-demand instances we can get almost 400% more CPU and RAM for the same price. So, it will be very expensive even with relatively steady workloads.

Good news is that saving data is much more affordable than with on-demand nodes. The data storage costs are very low compared to on-demand nodes (0,024€/GB vs. 0,122€/GB). Serverless stores data to AWS S3 service. More information can be found from documentation.

The AWS Local Zone for Helsinki and Copenhagen was opened

If you need to run low latency software in Finland or Denmark, you can use AWS Local Zone service. Both work under the Stockholm region and provide a small set of services and their features. More information about services.

Sessions on Tuesday

Putting cost optimization into practice (ARC202) was a workshop session. During the session, I was able to test AWS services like AWS S3 for storing files, AWS Glue to make a searchable database from it, and finally make visualizations and reports with AWS QuickSight. The QuickSight service usage was new for me like almost all business intelligence (BI) products.We extracted data from AWS Cost and usage reports (CUR) with very tested documented guidance.

Between sessions I visited the RIOT Games breakout room to have cappuccino and desert.

What’s new in contact centers with Amazon Connect (BIZ202) I visited quickly after the previous workshop. The Amazon Connect service is totally new for me. It can for example recognise the caller by voice and then the worker does not need to ask security questions if the risk factor is low. How cool that is!

The corridors are always busy when more than 50 000 people are moving around.

“AWS Infrastructure as Code Year ahead” session also between other booked sessions briefly. The last slide summarizes where the IAC world is going based on AWS.

Some fruits for the snack on Tuesday.

AWS Lambda Powertools: Lessons from the road to 10 million downloads (OPN306) session was all about how to manage open source projects for thousands of developers and collaborators. The project needed to pull the handbreak for months to work on operational excellence and not just creating new features and documentations.

The rooms have multiple nice decorations like this all over the venues.

AWS Well-Architected Framework: Improve performance with caching (ARC402) was a chalk talk session. The session did not go very deep even though it was a 400 level session. The questions were from common scenarios. For me it was a good recap of different cache techniques and methods.

After the session it was again time to decide which reception to go to. I decided to go to the AWS EMEA reception to The Mirage Hotel Ballroom with Joonas, Tero and Ville.

In the reception there was artists to draw picture of you.

 

AWS re:Invent greetings from Weekend and Monday

It is almost the end of 2022 and time to travel to AWS re:Invent conference at Las Vegas. I will post a few articles from re:Invent experience so stay tuned!

From Door to Door!

On 26th Nov Saturday 7:15 I took a taxi to the central railway station of Tampere. From there I took a train to the Helsinki-Vantaa Airport. After security checks we had a relaxing time in the Finnair Business Lounge.

The flight to Dallas was delayed but after a run I managed to catch the connectioning flight with the help of tens of people who let me skip in line (Thank you all!). Thanks to @aripalo for the taxi ride to the Las Vegas Strip! Finally I arrived at the Paris Las Vegas Hotel & Casino on Saturday night at 20:00. It took a total of 23 hours of traveling from door to door. Huh!

On Sunday Matias had booked us a table from the famous (by them at least) Nacho Daddy. No need to wonder why the visitors ended up leaving with plastic containers. Food and drinks were excellent.

Alarm clock failure on Monday

I woke up late because my phone did not ring (no idea why). So I woke up at 7:45 accidentally. My first session started at 8:30 in the Mandalay Bay (3,5 km away). I need to hurry up. After a shower, brushing teeth and putting on clothes, I took my bag and left! During the walk to the elevators I booked an Uber, the only way I can make it. The order was fulfilled quickly and finally after 20 minutes I was on time in the session.

My Sessions for Monday

AWS GameDay: The New Frontier (GHJ301) is the new revision of the GameDay. At the beginning you form a group of four (random). Each group tries to solve as many quests as they can. By solving a quest they earn points. The more points, the more reputation in the leaderboard. In the New Frontier version teams needed to use services like AWS IoT (SiteWise, Core), AWS CodeCommit, AWS CodePipeline, Splunk (partner), and so on to solve quests.

 

It is nice to get fast feedback (points) when you are doing something right and who don’t like to have some competition! AWS organizes GameDay events throughout the year around the world. For example in the Splunk quest we needed to do APM analysis for a software and find out which class or method is causing the slowness, and finally fix those issues by editing the code.

The GameDay had a special guest as a motivational speaker, Jeff Barr (Chief Evangelist for AWS).

To MGM Grand for afternoon and lunch

Automating threat detection & incident response in your AWS environments (SUP301) was a workshop session. All workshop sessions start with a small intro to the topic. Then each participant will start individually to solve problems step by step by following the workshop material. The latest workshop material is going to be publicly available soon so anyone can do it also by themselves.

For example, I needed to configure automated alerts when EC2 instances where violating SecurityHub rules (eg. no EC2 instance profile, too wide FW configuration, not allowed SSH key). In this kind of AWS held workshops it is only about the concepts of services because almost all things are pre-configured and pre-deployed. It has its downsides but it allows you to focus on larger concepts in a short amount of time.

For me it took only about 30 minutes to do the exercises so I jumped into the next door’s event about 5G and AWS Wavelength Zones. Very interesting how to tech has got the latency down to less than 20 ms from mobile device to the edge computing. One example was that a customer was able to example for a couple of seconds different applications.

It was time to have some coffee and snacks.

 

Layered VPC security and inspection (NET311) was a very fast paced lecture about different concepts of networking in AWS. The session covered various scenarios for inspecting traffic between two VPC subnets, between VPCs, between multiple regions, and between on-premises and cloud infrastructure. The techniques used AWS services like AWS TransitGateway, AWS Gateway LoadBalancer, AWS Network Firewall and AWS Firewall Manager. The pace was very fast so I might need to go through the material again.

Optimizing Amazon OpenSearch Service domains for scale and cost (ANT315) was my last session (17:30-18:30) was a chalk talk. Again after a basic introduction to the topic people could ask their questions about the topic.

There were some questions about the roadmap but the AWS does not reveal anything typically in public events. The most interesting questions and answers related to balancing the nodes, recommendations of node types for different use scenarios, what the OpenSearch indices actually consist of (index -> Shard/Lucene index -> segments) and the optimisation process of different “variables” in the configurations.

Time to relax

Finally after a long day it was time to travel to the Venetian for the AWS Nordic reception event. Enjoyed a lot of Italian food and drinks at Buddy V’s Ristorante.

 

Feeling small and cozy – while being big and capable

About two months ago I started my new Cloud journey in a company that has grown - and grows - very fast. Initially I had an image of a small, nimble and modern company inside my head - and it was a surprise to realize that Solita has 1500+ employees nowadays. But has it become a massive ocean-going ship, too big to steer swiftly? A corporation slowly suffocating all creativity and culture?

Fortunately not. As our CEO Ossi Lindroos said (adapting / quoting some successful US growth company) in our starter bootcamp day:

“One day we will be a serious and proper company – but that day is not today!”

Surely, Ossi is not saying that we should not take business seriously and responsibly at all. Or that we should not act like a proper company when it comes to capabilities towards customers – or ability to take care of our own people. We do act responsibly and take our customers and our people seriously. But instead the idea – as I interpret it – is that we have that caring small community feel even when growing fast – and we want to preserve it no matter how big we grow.

Can a company with good vibrations preserve the essentials when it grows?

Preserving good vibrations

Based on my first weeks I feel that Solita has been able to maintain low hierarchy, open culture with brave and direct communication, not to forget autonomous and self-driven teams, people and communities. Like many smaller companies inside a big one, but sharing an identity without siloing too much. Diversity with unity.

I started in Solita Cloud business unit and the first thoughts are really positive. Work is done naturally crossing team or unit boundaries. Teams are not based on single domain expertise, but instead could act as self-preserved cells if required. Everyone is really helpful and welcoming. Company daily life and culture is running on Slack – and there you can easily find help and support without even knowing the people yet. And you get to know people on some level even without meeting them: that guy likes daily haikus, that girl listens to metal music etc.

“One day we will be a serious and proper company – but that day is not today!”

Petrus Enjoying Good Vibrations

Some extra muscle

And size is not all about downsides. Having some extra muscle enables things like getting a proper, well-thought induction and onboarding to new people that starts even before the first day with prepackaged online self-learning – and continues with intensive bootcamp days and self-paced, but comprehensive to-do-lists, that give a feeling that someone has put real effort on planning all this. Working tools are cutting-edge, whether choosing your devices and accessories or using your cloud observability system.  And there is room for the little bonus things as well, such as company laptop stickers, caps, backpacks and different kind of funny t-shirts. Not to mention all the health, commuting and childcare benefits.

And for customers, having some extra muscle means being a one-stop shop yet future-proof at the same time. Whether the needs are about leveraging data or designing and developing something new, or about the cloud which enables all this, customer can trust us. Now and tomorrow. Having that small community feeling and good vibrations ensures that we’ll have brilliant, motivated and healthy people helping our customers in the future as well.

Culture enables personal growth

And when the culture is supporting and enabling, one can grow fast. A while ago, I used to be a rapid-fire powerpoint guy waving hands like windmills – and now I’m doing (apologies for the jargon) Customer deployments into the Cloud using Infrastructure-as-Code, version control and CI/CD pipelines – knowing that I have all the support I need, whether from the low-threshold and friendly chat community of a nimble company, or a highly productized runbooks and knowledge bases of a serious and proper company. Nice.

Now, it’s time to enjoy some summer vacation with the kids. Have a great summertime you all, whether feeling small and cozy or big and capable!

Is cloud always the answer?

Now and then it might feel convenient that an application should be transferred to cloud quickly. For those situations this blog won’t offer any help. But for occasions when the decision is not yet made and a bit more analysis is required to justify the transformation, this blog post will propose a tool. We believe that often it is wise to think about various aspects of the cloud adoption before actually perform it.

For all applications there will be a moment in their lifecycle that the question whether the application should be modernised or just to be updated slightly. The question is rather straightforward. The answer might not as there are business and technological aspects that should be considered. Having the rational answer available is not easy task. Cloud transformation should have always business need as well as it should be technologically feasible. Many times there might be an interest to make the decision rather hasty and just move forward due to the fact that it is difficult to gather the holistic view to the application. But just neglect the rational analysis because it is difficult might not always be the suitable path to follow. Success in cloud journey requires guidance from business needs as well as technical knowledge.

To address this issue companies can formalize cloud strategy. Some companies find it as an excellent choice to move forward as during the cloud strategy work the holistic understanding is gathered and guidance for the next steps is identified. Cloud strategy also provides the main reason why cloud transition is supporting the value generation and how it is connected to the organisation strategy. However, sometimes the cloud strategy work might be contemplated to be too large and premature activity. In particular when the cloud journey is not really started and knowledge gap might be considered to be too vast to overcome and it is challenging to think about structured utilisation of the cloud. Organizations might face challenges in maneuvering through the mist to find the right path on their cloud journey. There are expectations and there are risks. There are low-hanging-fruits but there might be something scary ahead that has not even have a name.

Canvas to help the cloud journey

Benefits and risks should be considered analytically before transferring application to the cloud. Inspired by Business Model Canvas we came up a canvas to address various aspects of the cloud transformation discussion.  Application Evaluation Canvas (AEC) (presented in figure 1) guides the evaluation to take into account aspects widely from the current situation to expectations of the cloud.

 

cloud transformation

Figure 1. Application Evaluation Canvas

The main expected benefit is the starting point for the any further considerations. There should be a clear business need and concrete target that justifies the cloud journey for that application. And that target enables also the work to define dedicated risks that might hinder reaching the benefits. Migration to cloud and modernisation should always have a positive impact on value proposition.

The left-hand side of the canvas

The current state of the application is addressed with the left-hand side of the Application Evaluation Canvas. The current state is evaluated through 4 perspectives;  key partners, key resources, key activities and cost related. Key Partner section advice seeking answers to questions such who are the ones that are working with the application currently. The migration and modernisation activities will impact those stakeholders inevitably. In addition to the key partners, also some of the resources might be crucial for the current application. For example in-house competences that relates to rare technical expertise. These crucial resources should be identified. Furthermore, not only competences are crucial but also lots of activities are processed every day to keep the application up-and-running. Understanding about those will help the evaluation to be more meaningful and precise. After key partners, resources, and activities have been identified, the good understanding about the current state is established but that is not enough. Cost structure must also be well known. Without knowledge of the cost related to the current state of the application the whole evaluation is not on the solid ground. Costs should be identified holistically, ideally not only those direct costs but also indirect ones.

…and the right-hand side

On the right-hand side the focus is on cloud and the expected outcome. Main questions that should be considered are relating to the selection of the hyperscaler, expected usage, increasing the awareness of the holistical change of the cloud transformation, and naturally the exit plan.

The selection of the hyperscaler might be trivial when organisation’s cloud governance guides the decision towards pre-selected cloud provider. But for example lacking of central guidance or due to the autonomous teams or application specific requirements might bring the hyperscaler selection on the table. So in any case the clear decision should be made when evaluate paths towards the main benefit.

The cloud transformation will affect the cost structure by shifting it from CAPEX to OPEX. Therefore realistic forecast about the usage is highly important. Even though the costs will follow the usage, the overall cost level will not necessary immediately decrease dramatically, at least from the beginning the migration. There will be an overlapping period of current cost structure and the cloud cost structure as CAPEX costs won’t immediately decrease but OPEX based costs will start occurring. Furthermore the elasticity of the OPEX might not be as smooth as predicted due to the contractual issues; preferring for example annual pricing plans for SAAS might be difficult to be changed during the contract period.

The cost structure is not the only thing that is changing after cloud adoption. The expected benefit will be depending on several impact factors. Those might include success in organisational change management, finding the required new competences, or application might require more than lift-and-shift -type of migration to cloud before the main expected benefit can easily be reached.

Don’t forget exit costs

In the final section of the canvas is addressing the exit costs. Before any migration the exit costs should be discussed to avoid possible surprises if the change has to be rolled back.  The exit cost might relate to vendor lock-in. Vendor lock-in itself is vague topic but it is crucial to understand that there is always a vendor lock-in. One cannot get rid of vendor lock-in with multicloud approach as instead of vendor lock-in there is multicloud-vendor lock-in. Additionally, orchestration of microservices is vendor specific even a microservice itself might be transferable. Utilising somekind of cloud agnostic abstraction layer will form a vendor lock-in to that abstraction layer provider. Cloud vendor lock-in is not the only kind of lock-in that has a cost. Utilising some rare technology will inevitable tide the solution to that third party and changing the technology might be very expensive or even impossible. Furthermore, lock-in can have also in-house flavour, especially when there is a competence that only a couple of employees’ master. So the main question is not to avoid any lock-ins as that is impossible but to identify the lock-ins and decide the type of lock-in that is feasible.

Conclusion

As a whole the Application Evaluation Canvas can help to gain a holistic understanding about the current state. Connecting expectations to the more concrete form will to support the decision-making process how the cloud adoption can be justified with business reasons.

Log Format in Modern Architectures

In this blog, we will take a look at different modern architectures and systems, how centralized log management should be approached regarding these setups and what to take into account.

Modern solutions can grow rather complex and can carry a lot of legacy. That’s why streamlining your logs and centralizing them is quite frankly the only option. But when we have the log data in order, it’s quite easy to transform log data into dashboards, for example to visualize data like success percentage from HTTP responses or API request rate. It’s also much easier to implement or integrate systems like SIEM into your architecture, when you already have a working centralized logging.

System Complexity

The more complex your system is, the more it can benefit from centralized logging and it’s monitoring. If it’s done well that is. A good example on when centralized logging is almost mandatory, is any type of microservice – based architecture. Microservices or microservice architecture, is an architectural style in which application is a collection smaller services. These services have specific functionalities and can be independently deployed, tested and developed. Comparing this to microservices counterpart, monolithic architecture, which is a single unit running everything, we can avoid issues like single bug breaking whole system or updating one thing requires a full deployment, risking outages. With microservices a bug is limited to a single service and functionality. Updates, like security patching, can be done to a single service without disrupting the system. Below is a diagram on how microservices can work for example in an e-commerce application.

 

Microservices have their own downsides, like more complex deployments where instead of one service you must take care of possibly hundreds of individual services and their integrations. Orchestration tools like Kubernetes, OpenShift and Docker Swarm can help but these too bring additional complexity. It can also be a pain to troubleshoot, where a misconfiguration in one service can cause an error in another. Therefore, having applications logging with unique identifiers and events is important in more complex systems.

Also, a common situation is a hybrid solution, where let’s say Kubernetes, is managed by a cloud provider while databases still exist on-premises. Staying on top of what’s happening in these kinds of setups is challenging, especially when old systems can cause some legacy related issues. But all hope is not lost, by following the same rules in both the cloud and on-premises, these hybrid solutions can be tamed, at least when it comes to logging. Below is an example of an hybrid environment, where part of the solution is run on AWS and some are still on-premise. It’s quite simple to integrate all systems to centralized logging, but it becomes more important to have log format that all services will follow due to system complexity.

Another topic worth to discuss is SIEM (security information and event management). Many companies have a requirement to track and manage their security events and know what is going on in their system. Now managing SIEM isn’t an easy task, far from it. Anyone who works with it, needs to know what they want to follow, what they need to react to and how to get that information out of the system. Usually, audit logs are delivered to SIEM in a specific format, which enables SIEM to understand how important and critical each log is. If logs are formatted and implemented properly, implementing or integrating your logging to SIEM shouldn’t be an issue. But if not, delivering raw audit log can quickly raise costs in size and in the amount of work required to get it running.

 

Standard log format

Usually, you need to know what kind of data you want to log. While many services provide metrics out of the box for easy integrations, usually logs need more attention to be useful. Each log should follow a standard log format. Now imagine if each service would have totally different log format. Issue is not only that there would be a huge number of fields but that those fields might overlap. If a field is overlapping and the data types are different, one of the log lines will not be indexed. Format could include fields like:

  1. Service name
  2. Timestamp
  3. Unique identifier
  4. Log level
  5. Message

Creating your own log format makes most sense when you control logging configuration in your software. For third party software, it’s not always possible to modify their logging configuration. In this case it’s usually best to filter and mutate log data based on certain tags or fields. This can get more complex with hybrid environments, but it’s highly beneficial, when everything is following the same format, even if only partly. It can also be beneficial to have these logs in separate index, to avoid possible conflicts with fields. Using features like dynamic mapping in Elasticsearch can make our life and scaling easier in the future.

Unique Identifier

In addition to standard log format, it is useful to have a unique identifier, especially with microservices, which we will talk about later. This identifier is attached to incoming request, and it stays the same while the request moves through the system. This comes handy when troubleshooting, where the first thing is to identify the unique ID in the log. By searching for this ID, it’s possible to track requests trail from the moment it came to the system to where it failed or they did something weird. Identifier can be something like a UUID in a header of an HTTP request or something more human readable. Having standard format and unique id means that the organization or development unit needs to work with the same standard requirements and guidelines. While our typical log format example provides important information like timestamp and service name, we need more information in our message field (or in some other field). Something simple as HTTP responses are usually a good place to start and are easy to filter when looking for errors.

Log levels

Log levels are rather self-explanatory, for example most logging frameworks have the following log levels:

  • ERROR
  • WARN
  • INFO
  • DEBUG
  • TRACE

If our earlier blog tried to explain anything, it should be that log levels like TRACE and DEBUG shouldn’t be logged, except when actual troubleshooting is needed. It’s good to plan your levels so, that only ERROR and WARNING are needed to notice any issues with your software and INFO shouldn’t be just renamed DEBUG log. Having some custom levels, like EVENT, can help to filter logs more efficiently and quickly describe what the log is about.

Event logs

To improve the ability to track and troubleshoot your applications, event logs are really handy. They also have high business value, if event logs are used to create event-driven architecture. Having event logs requires work (again) from the application team. It’s more difficult to modify third party applications, as maintaining those changes requires dedication. Event logs should contain information on what type of event has happened in the application. These events can be their own log level, like EVENT, or just be included in the message. In a larger system, having events tied with the unique identifier helps to keep track of users and their process through the system. Even if events aren’t their own special category, all applications should log messages that make sense to developers and people reliant on said information. Implementing event information and unique identifiers is a large task and needs to be done unitedly across the whole system. But the payoff is clear visibility to the system through logs and the ability to leverage log data for monitoring, troubleshooting and security purposes. When using log4j2 in java based applications, it’s possible to use EventLogger class, which can be used to provide a simple mechanism for logging events in your application.

Conclusion

Logging is easy when you only have one piece of software or just a small system to operate. But the higher we grow and more we stack up our stack, more difficult it gets to see everything that’s happening. That’s why we need to but our trust into more software, that can handle all the information for us. Having a proper logging is crucial in this modern day and modern architecture but not that many are able to take advantage of it. Most of the centralized log management tools can be used to visualize and create dashboards from your log data, turning the flood of logs into something useful rather easily.

Information Security Assurance Frameworks

There are many ways to demonstrate the maturity level of information security management. National and international standards and frameworks can be used as criteria for measuring the level of information security. Here is a brief overview of common frameworks in use.

ISO/IEC 27001

International Organization for Standardisation (ISO)  and International Electrotechnical Commission (IEC) maintain and publish the ISO/IEC 27001 standard on information security management system (ISMS) and its requirements. It is part of the 27000 family of standards which address information security. ISO/IEC 27001 is probably the most famous one, because it is the one that can be certified. It emphasises risk-based approach, continuous improvement and commitment from the top-management.  The standard itself has seven mandatory clauses and Annex A, which defines controls in 14 groups to manage information security risks. ISO/IEC 27001 certification requires a third-party audit by an independent certification body, so certified organisations can be trusted to have an implemented and maintained information security management system. 

It should be noted that the audit does not necessarily provide assurance on how well the controls have worked, merely that they exist. It is also a good idea to examine the scope of the management system, as it might cover only some of the operations of the organisation. Statement of Applicability is another document that should be examined; it defines which controls have actually been implemented and which have been left out and why. 

Note: The standard is being reviewed and based on changes in ISO/IEC 27002  (implementation guide of the controls in Annex A) there will be some changes.

Mandatory clauses Annex A
Context of the organisation Information security policies
Leadership Organisation of information security
Planning Human resource security
Support Asset management
Operation Access control
Performance evaluation Cryptography
Improvement Physical and environmental security
Operations security
Communications security
System acquisition, development and maintenance
Supplier relationships
Information security incident management
Information security aspects of business continuity management
Compliance

 

Assurance Reports

ISAE 3000, ISAE 3402 and SOC 2® are standards and frameworks for assurance reports. Assurance reports provide independent attestation that a service provider has adequate controls of the subject matter, in this case information security. They are more common in the United States and Canada, but also used in Europe. Cloud providers or other service providers utilising cloud services often have some assurance report.

ISAE 3000 

International Standard on Assurance Engagement 3000 is a standard which defines how assurance engagements other than financial audits should be conducted. It does not define any controls in itself, but rather how the auditing should be done. The reader of an ISAE 3000 assurance report automatically knows that the assurance is conducted objectively and independently in a professional manner. It is up to the subject matter and the criteria whether it provides assurance and what sort of assurance on information security.

ISAE 3402

ISAE 3402 is also an international standard on assurance engagements, it focuses on internal controls of a service provider. Like ISAE 3000, it does not define any exact criteria to be used, but they have to be applicable to the subject matter. 

SOC 2®

SOC 2® or System and Organisational Controls 2 is AICPA’s (American Institute of Certified Public Accountants) framework for information security assurance for service providers. The abbreviation should not be confused with Security Operations Center! It uses Trust Service Criteria (TSC) as the criteria used to assess the level of information security. The TSC includes requirements on security, availability, processing integrity, confidentiality and privacy. For a SOC report, the security requirements are mandatory.  An official SOC 2® report can only be given by an AICPA’s certified public accountant, which is bypassed by the rest of the world with ISAE 3000 reports that are compliant with the SOC 2® framework. 

ISAE 3000, ISAE 3402 and SOC 2® can be done either as Type I or Type II reports. Type I provides assurance that the controls are described and suitable and is similar to an ISO/IEC 27001 certification. Type II provides assurance that, in addition to being described and suitable, the controls have also operated effectively during the audit period (typically 12 months). For example, for Type I report the auditor might inspect that a policy and procedure for incident management exists. For Type II report the auditor would also inspect a sample of incidents that occurred during the audit period to ensure the procedure was followed. 

It is worth noting that the actual reports are not publicly available, although there might be a summary of such assessment having been done. However, the reports can be requested from business partners or when negotiating possible partnerships. It also requires some level of expertise in security and auditing to assess controls descriptions and testing procedures in the reports. 

KATAKRI and PITUKRI

KATAKRI or Kansallinen turvallisuusauditointikriteeristö, literally national security audit criteria, is a comprehensive criteria published by the Finnish national security agency. It consists of three domains: security leadership, physical security and technical security. Katakri is used to assess the suitability of an organization to handle officials’ classified information, but as a public criteria can also be used by anyone as a benchmark criteria. 

PITUKRI or Pilvipalveluiden turvallisuuden auditointikriteeristö, literally cloud service security audit criteria is meant for assessing cloud service security in the context of Finnish requirements. 

PCI-DSS

PCI-DSS is an abbreviation for Payment Card Industry Data Security Standard. It is an international standard used to assess the security related to payment card transactions. It was created by major credit card companies and maintained by Payment Card Industry Security Standards Council. Compliance with the PCI-DSS is required in practice from businesses that process credit or debit card transactions. The standard has 12 requirements divided into six groups: secure networks and systems, cardholder data protection, vulnerability management, access control, network monitoring and testing and information security policy. 

The process of PCI-DSS compliance is a three step process of assessing, remediating and reporting. Assessing means identifying cardholder data and relevant assets and processes which are analysed to recognize vulnerabilities. Remediating means fixing the vulnerabilities which is followed by reporting to banks and card brands. Compliance with the standard requires an annual scoping of anything that has anything to do with the cardholder data. The assessment requires a qualified security assessor. 

What to think of certificates and information security assurance?

The importance of information security assurance depends on the business done with the party having (or not having) them. If your business partner, especially on the vendor side, has anything to do with processing or storing your information and data, you should be interested in their information security controls. And if you are providing such services, clients will come easier if they are assured their data is safe and secure. Certifications and assurance reports can also reduce the number of audits: every business partner does not have to do vendor audits if there is independent assurance provided.

As for vendor relationships, information security frameworks might have requirements for vendors and suppliers. Although the responsibility for these controls will be on the certificate holder, they might have effects on business partners too.

If you want to do business with public sector, there will probably be national regulation and requirements. For example, with Finnish public sector attention should be paid to possible Katakri requirements such as related to physical security and doing confidential work in Katakri approved areas. 

Trustworthy assurance requires independent and accredited parties to provide them, such as accredited certification body for ISO/IEC 27001 or CPA for ISAE 3000. The party providing assurance or certification should not provide consultation on the subject, at least not for the customer that is being certified by them. If implementation help is needed, another party should be used. For example, if you want to get ISO/IEC 27001 certified, Solita can help you in the implementation and then a certification body can conduct the audit and grant the certificate.

Most importantly everyone should be aware that certifications and assurance reports do not guarantee impenetrable security against breaches and incidents. Suppliers, customers and partners with a certificate or an assurance report are, however, more likely to be better prepared to recognise, mitigate and handle breaches and incidents when they occur. To get the most out of information security assurance, all interested parties should also know how they are achieved and what subjects they address. 

Cloud-technology is a tool

Us engineers are eager to get our hands dirty and dive into difficult architectural- and implementation tasks. We love to speak about technology and how to implement wonderful constructs. Good architecture and implementation is crucial, but we should not forget to put the customer to the spotlight. Do we have a solid business case and most important where is the value?

To project hardened seniors reading this, what would resonate in the manner to get you chills.

You know the feeling, when things really click. Your peer colleague succeeds in difficult project or the architecture comes together.

To us, who have experienced dozens of projects and clients, we tend to know the direction for the chills for success.

Meaning, we have created models from the past experiences. These models are formal structures explaining the world around.

The models

If we master and understand the models, it improves our communication, enables us to explain difficult scenarios, reason and cooperate with our teammates, predict the outcome from a solution and furthermore explore different options.

When our thinking is logical, consistent and based on past experiences, we are more likely to make wise choices. No matter if we speak about development team trying to implement difficult business case or just us cooking for our family.

The same is true, when a salesperson tries to find out what a customer truly needs or when we are implementing the next cloud-enabled solution for the customer.

Building value

It is all about building value. Is the food you prepare the value or is it merely a tool for you to have energy to play with your children?

Nevertheless I’m sure each and everyone of us have made up models, how to build value and get the chills in different situations.

Good architecture and implementation is crucial, but we should not forget to put the customer to the spotlight.

Do we have a solid business case and most important where is the value? We should not ask what  the customer wants, but what the customer needs.

Try 3 whys the next time you’re thinking of making a big decision:

“Why we are building the infrastructure to the cloud?”

“We need to grow our capacity and we need to be highly available and resilient”

“Why it is important to improve scalability and resilience?”

“The old infrastructure can not handle the increased traffic”

“Why the old infrastructure is not enough anymore?”

“We have analysed the platform usage. If we get more capacity on the rush hours, we can serve more customers.”

Productise

All right! We are convinced the customer needs the cloud migration and start working with the implementation.

Solita has productised the whole cloud journey, so we can quickly get into speed.

Ready made instructions, battle proven implementations and peer-support from earlier projects will help you up to speed.

Technical solutions follow a model, meaning there are few ways to provision a database, few ways to do networking in the cloud, few ways to optimize expenditure.

When we do not plan and develop everything from scratch, we build value.

Re-use

According to study published in the book Accelerate: The Science of Lean Software and DevOps, to be productive, everything should be in version control.

For sure the application code is already in the version control, but also system configuration, application configuration and scripts for automated build, test and delivery should be in version control.

The study revealed, keeping configurations and scripts in the version control correlated more to the software delivery performance than keeping the application code in the version control.

When building for cloud, the infrastructure must be defined in code. And the code should reside.. in version control!

This enables us to move faster without trade offs. We can recognise modules, implement them once, create automated tests and then re-use the codebase in customer repositories.

Every time we are able to do this, we build value.

Automate

Humans are erroneous. This is an experience based model. When computers handle the repetitive work, it enables people to solve problems.

We know from experience, automation requires investments and should be implemented from day one. Investments done here will result smaller development lead time, easier deployments and more quality code.

In cloud, we describe our infrastructure as code. This goes hand in hand with automation. Based on this model, we choose to automate recurring tasks such as building code, testing and making the deployments.

As a result we speed up the feedback loops, have repeatable results each time and enable developers make quality code and automated tests. We test our infrastructure in the pipeline and once again build value.

Deliver

Deliver continuously in small chunks is an experience based model.

You most surely want to have your piece of code tested and delivered to production before you forget, what you were doing.

Short lived branches or Trunk-Based Development predict performance. Failures in small changes are far easier to fix.

Also test automation is key part of continuous delivery. Reliable automated tests predict performance and improves quality. Diagram below is from the book Accelerate. 

High Performers who had adopted the Continuous Delivery model spent far more time on new work than in unplanned or rework.

Although unplanned, emergency and refactoring work is a necessity, value is built when implementing new features.

Software Delivery Performance

Measuring software delivery performance is difficult. An incorrect productivity metric can easily lead to poor decisions.

If you want to deliver the feature as quickly as possible without trading for quality, one key metric is development Lead Time. This is because code not in production is a waste.

For example Software Delivery Performance can be split to 4 topics:

  • Lead Time (from starting implementation to delivery to production)
  • Deployment Frequency (more often results smaller changes)
  • Mean Time to Restore (how to recover from failure)
  • Change Failure Rate (how often the changes cause a failure)

According to the study made by the authors of the book Accelerate, these are the measures from different types of Organisations:

Conclusion

Past experiences makes us what we are. Stop and think the models you have crafted. Challenge yourself, interact with your peers, find out new models for building value.

Together with the team and with the customer, you’ll will find the best solution for the opportunity at hand. Remember the actual implementation is only a fraction of the whole journey.

On pure technical aspect, we should re-use as much as possible and we should automate as much as possible. When talking about cloud migration, infrastructure must be described as code (IaC).

Does your organisation understand and use the models?

Does your organisation productise and re-use the codebase?

Let’s build value!