My Tuesday at AWS re:Invent 2019

Please also check my blog post from Monday.

Starting from Tuesday each event venue provides great breakfast with lightning speed service. It scales and works. It’s always amazing how each venue can provide food services for thousands of people in short period of time. The most people are there for the first time so guidance has to be very clear and simple.

Today started with keynote by Mr. Andy Jassy. I was not able to join the live session at Venetian because of my next session. Moving from one location to another takes at least 15 minutes, and you have to be at least 10 minutes before at your session to reclaim you reserved seat. Starting last year, the booking systems forces one hour cap between sessions in different venues.

Keynote by Andy Jassy on Tuesday

You can find the full recap written by my colleagues here: Andy Jassy’s release roller coaster at AWS re:Invent 2019

Machine learning was the thing today. The Sagemaker service received tons of new features. Those are explained by our ML specialists here: Coming soon!

So I joined overflow room at Mirage for Jassy’s keynote session. Everyone has their personal headphones. I have more than 15 years background in software development, so it was love in first sight with CodeGuru. There are good news and bad news. It is service for static analysis of your code, to make review and finally but not definitely least to provide realtime profiling concept via installed agent.

The profiling information is provided in 5 minutes periods and it will provide profiling for several factors: CPU, memory and latency. It was promising product because Mr. Jassy told that Amazon has used it by itself for couple of years already. So, it is mature product already.

So, what was the bad news. It supports only Java. Nothing to add to that.

The other interesting announcement for me was general availability of Outposts. Finally, also in Europe you can have AWS fully managed servers inside your corporate datacenter. Those servers integrate fully to AWS Console and can be used eg. for running ECS container services. The starting price 8300 USD per month is very competitive because it already includes roughly 200 cores, 800GB memory and 2,7TB of instance storage. You can add EBS storage additionally starting from 2.7TB.

You can find more information here: https://aws.amazon.com/blogs/aws/aws-outposts-now-available-order-your-racks-today/

Performing analytics at the edge – IOT405

This session was a workshop and level 400 (highest). It was held by Mr. Sudhir Jena (Sr. IoT Consultant, AWS) and Mr. Rob Marano (Sr. Practice Manager, AWS).

Industry 4.0 a.k.a. IoT was totally new sector for me. It was very informative and pleasant session. It was all about AWS IoT Greengrass service which can provide low latency response but still managed platform which tons of features for handling data stream from IOT devices locally.

For multiple people it was first touch to AWS Cloud Development Kit which I fell in love about three months ago. It has multiple advances like refactoring, strong typing and good IDE support. You can find more information about AWS CDK here: https://docs.aws.amazon.com/cdk/latest/guide/home.html

In our workshop session we demonstrated to receive temperature, humidity etc. time series data stream from IoT device. The IoT device was in our case EC2 which simulated IoT device. From AWS IoT Greengrass console you can eg. deploy new version of analytic functions to the IoT devices.

Material for workshop can be found from Github: https://github.com/aws-samples/aws-iot-greengrass-edge-analytics-workshop

AWS Transit Gateway reference architectures for many VPCs – NET406

This was a session and it was held by Nick Matthews (Principal Solutions Architect, AWS) in glamorous Ballroom F at Mirage. It fits more than thousand people. The session was almost full, so it was a very popular session.

To summaries the topic, there are several good ways to do interconnectivity between multiple VPCs and corporate data center. In small scale you can things more manually but in large scale you need automation.

One provided solution for automation based on the use of tags. The autonomous team (owner of account A) can tag their shared resources predefined way. The transit account can read those changes via CloudTrail logging. So, each modification will create CloudTrail audit event which triggers lambda function. The function checks for if change is required and makes change request item to metadata table in DynamoDB to wait for approval. The network operator is notified via SNS (Simple notification service). The operator can then allow (or decline) the modification. Another Lambda will then do the needed route table modifications for the transit account and for the account A.

If you are interested, you can watch video from August 2019: https://pages.awscloud.com/AWS-Transit-Gateway-Reference-Architectures-for-Many-Amazon-VPCs_2019_0811-NET_OD.html

If you want to wait, I’m pretty sure that this re:Invent talk was also recorded and can be found from AWS Youtube channel in few week: https://www.youtube.com/user/AmazonWebServices

Fortifying web apps against bots and scrapers with AWS WAF – SEC357

Mr. Yuri Duchovny (Solution architect, AWS) held the session. It was the most intensive session with a lots of todo with many architectural examples and usage scenarios in demo screen. The AWS WAF service has got a new shiny UI in AWS Console. Also the AWS published few new features already in last few weeks, eg. Managed rules to give more protection in nondisruptive way. The WAF it self did not have multiple predefined rules for protection, only XSS (Cross-site Scripting) an SQLi (SQL Injection) were supported. All other rules needed to configure manually as regular expressions or so.

The WAF is service that should always be turned on for CloudFront Distribution, Application Load Balancer (ALB) and API Gateway.

The workshop material is again public and can be found from here: https://github.com/gtaws/ProtectWithWAF

Encryption options for AWS Direct Connect – NET405

Mr. Sohaib Tahir (Sr Solutions Architect, AWS) from Seattle was the teacher in this session. It was more listening than doing because of the short period of time. We (attendees) were group of seven from USA, Japan and Finland.

There was five possibilities to encrypt direct connection:

1. Private VIF (virtual interface) + application-layer TLS
2. Private VIF + virtual VPN appliances (can be in transit VPC)
3. Private VIF + detached VGW + AWS Site-to-site VPN (CloudHub functionality)
4. Public VIF + AWS Virtual Private Gateway (GP, IPSec tunnel, BGP)
5. Public VIF + AWS Transit Gateway (BGP, IPSec tunnel, BGP) NEW!

It’s good to remember that single VPN connections has 1,25 Gbps limit which can be hit easily with DX connection and eg. data intensive migration jobs. AWS recommendation is to use number five architecture if it is possible. Using the fifth architecture requires to have own direct connection so you cannot use shared model direct connection from 3rd party operator.

AWS published yesterday cross-region VPC connectivity via Transit Gateway. During the session Mr. Tahir started to do demonstrate this new feature ad-hoc but we ran out of time.

My Monday at AWS re:invent 2019

I started the day with breakfast at Denny’s. It was nice to have typical (I think) American breakfast. Thanks Mr. Heikki Hämäläinen for your company. By the way, all attendees from Solita are wearing those bright red hoodie shown in the picture. Thanks to our Cloud Ambassador Anton Floor. The hoodie makes it a lot easier to spot a colleague in a crowded places. Okay, let’s start going through my actual sessions.

How NextRoll leverages AWS Batch for daily business operations – CMP311

Advertisement company’s Tech Lead Mr. Roozbeh Zabihollahi described shortly their journey with AWS Batch service. If I remember correctly, they use about 5000 CPU years which is huge amount of computing power. It was nice to hear NextRoll allows their teams quite freely to choose which services they want to use. Nowadays Mr. Zabihollahi sees that more and more teams are looking into AWS Batch as a promising choice to use, rather than Hadoop or Spark.

Mr Zabihollahi believes that AWS Batch is good for several things:

AWS Batch is good for

If you are consideration start using AWS Batch you should be familiar at least these challenges:

The Mr. Steve Kendrex (Sr. Technical Product Manager, AWS) presented the road map of AWS Batch service. The support for Fargate (a.k.a serverless container service) is coming but Steve could not provide details for a wide audience. My personal guess is the spot instance support for Fargate is coming soon which provide key cost efficiency factor for batch operations.

Build self-service registration with facial recognition – ARC320

My first builder session this year was about integrating facial recognition for registering guests to an event. Me and four other attendees were led by Mr. Alan Newcomer (Solutions Architect, AWS) to this interesting topic. Mr. Newcomer had lived before near Las Vegas which was interesting to hear about him.

Each builder session starts with short queuing for the right table which you have hopefully reserved a spot beforehand:

The hall has multiple tables which each has 7 chairs, one for a teacher and 6 for participants, and screen for guidance purposes.

Typical the teacher provides website which has all the required information to do exercise. Additional to that the teacher provides unique password for each participant eg. for AWS Console login. After that each participant can start doing the exercise by themselves. The teacher provides helps whenever needed. You need to keep good pace all the time to be able to do whole exercise.

During the recognition session we built an application with had tree main functionalities:  user registering, do RSVP one day before the event and finally registering user at event via facial recognition. You actually look up the workshop material by yourself here: http://regappworkshop.com/

Managing DNS across hundreds of VPCs – NET411

This was my second chalk talk today. It started very well because right at the beginning audience heard real life problems from different attendees. The chalk talk was guided by Mr. Matt Johnson (Manager, Solutions Architecture, WWPS, AWS) and Mr. Gavin McCullagh (Principal System Development Engineer, AWS). They did extremely well.

It was reminded that the support for overlapping private zones was published recently. It enables autonomous structured dns management in multi-account environment.  For more information go to: https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-route-53-now-supports-overlapping-namespaces-for-private-hosted-zones/

During the session we looked up four different architecture for sharing DNS information with multiple VPCs (~accounts). The number four “Share and Associate Zones and Rules” was the most interesting which suites for massive number of private DNS zones and VPCs. It has hub account for outbound DNS traffic to corporate network and it uses private zone associating between VPC’s. The associating does not yet have native CloudFormation support but there are several ways to handle association, eg. using CloudFormation custom resources or custom Lamda function.

One major feature request was that AWS should support DNS query logging (query and response) in the VPC. The audience wanted to receive the logging information to CloudWatch log groups. The logging is needed for security/audit and debugging purposes.

Processing AWS Ground Station data in AWS – NET409

This my second builder session sounded very fancy, handling data from satellites. The attendees had very different experience from AWS and from satellites, everything from one to five in both topics. After the session I needed to update my CV…

In the sky there are few open satellites. Those can be listened by AWS Ground Station service and the data received to AWS account. The data link between the Ground Station and your VPC is made via elastic network interface (ENI).

In the example case we received 15 Mbps stream for 15 minutes. It was the period that the satellite was visible for the Ground Station’s antenna system. The stream from Ground Station needs always to be received by to Kratos DataDefender software that will parse UDP traffic. The Ground Station traffic is not in right order and sometimes missing species which is handled by the DataDefender.

The data stream was analyzed in few phases via S3 bucket and EC2 instances. The final product was precise TIFF format picture of the view of the satellite passing the Ground station antenna. The resolution was about 1 megapixel per kilometer.

Nordics Customer Reception

The evening ended to pleasant and well organised the Nordics Customer Reception event at the Barrymore. The Solita was one of the sponsors of the event. From the terrace we had great view towards the Encore hotel:

Would you like to hear more what happens in re:Invent 2019? Sign up to our team’s Whatsapp group to chat with them and register to What happens in Vegas won’t stay in Vegas webinar to hear a whole sum-up after the event.

New call-to-action

Kicking the tires of AWS Textract

Amazon Web Services' new ML/AI service Amazon Textract came to general availability and I gave it a quick test.

AWS has multiple services in AI/ML field. These include, for example, Amazon Comprehend for text analysis, Amazon Forecast for predicting future from set of data and Amazon Rekognition to extract information from pictures. Amazon Textract is a new service in this field and it was just announced to be generally available. Textract is a service which does Optical Character Recognition (OCR) from multiple file formats and stores output in a more usable format in JSON.

At the moment of release the AWS Textract can detect Latin-script characters from standard English alphabet and ASCII symbols. It can use PNG, JPEG and PDF as input files. I would say that there are enough input formats but would have wanted to see more languages available. Of course Finnish is not something that I assume to see anytime soon or at all. Textract is now available in three regions in US and Ireland in Europe.

Analyse test

Textract allows one to easily test what kind of results they can get with it. One can open Textract service and first see a sample document created by AWS. This helps to get started and get some kind of idea how to use it. Documents can be uploaded directly from the console and it automatically creates a S3 bucket to store them.

Textract sample document

 

I did tests with multiple files and file formats to see how it performs but used one PDF document as an example for this post. The PDF I used was AWS Landing Zone immersion day information sheet because it was handily available and had text, table and image in it. On the left in the picture, we can see again the areas where Textract has identified content and on the right is the extraction. From this kind of clear and simple document it seems to have picked up everything easily. It took around 10 seconds for this document to be analysed.

Test document

 

I would say that Textract handled all the files I gave it without too much problem. The view of the file and places where it finds text does not always align even though text output is correct. This happened for example with my CV where the visual representation was off on many places.

Visual analyse sample

Results

Outputs can also be downloaded directly from the console in a zip file and it will provide these four files.

  • apiResponse.json
  • tables.csv
  • keyValues.csv
  • rawText.txt

Tables.csv, keyValues.csv and rawText.txt are all quite clear. Tables holds all the tables and fields Textract found from the document and keyValues.csv holds form data. This is the table that was found in the document. It has been correctly read and put in table. Interestingly, it has also added empty columns for the long empty spaces between texts.

Test document table

 

Rawdata.csv contains extracted text from document in a raw format. It has all the text in non edited format, all the words just after each other.

H Automated Landing Zone Immersion Day Please join the AWS Nordics Partner team for an immersion day for the Automated Landing Zone. Learn how to set up an account structure according to best practices with the help of the ALZ solution. After you have performed this training, you will get access to the ALZ solution tools and materials sO you can use when setting up customer environments. This training will also be helpful for those of you interested in the AWS Control Tower service that will be available later this year. WHEN: April 1st 2019 (no joke) WHERE: AWS Office at Kungsgatan 49 in Stockholm Preliminary agenda 10:00 10:30 Welcome and Registration 10:30 10:40………

Textract also gives a full output of the process. This information is in JSON format and contains all the information about the findings. There is detailed information what was found and in where. It also gives a confidence percentage of the finding. This is a very large JSON document even with a small PDF, almost as big file as the original PDF.

    {
      "BlockType": "WORD",
      "Confidence": 99.962646484375,
      "Text": "account",
      "Geometry": {
        "BoundingBox": {
          "Width": 0.0724315419793129,
          "Height": 0.012798813171684742,
          "Left": 0.448628693819046,
          "Top": 0.37925970554351807
        },
        "Polygon": [
          {
            "X": 0.448628693819046,
            "Y": 0.37925970554351807
          },
          {
            "X": 0.5210602283477783,
            "Y": 0.37925970554351807
          },
          {
            "X": 0.5210602283477783,
            "Y": 0.39205852150917053
          },
          {
            "X": 0.448628693819046,
            "Y": 0.39205852150917053
          }
        ]
      },
      "Id": "f1c9bdeb-f76a-44ff-8037-6cb746d5613d",
      "Page": 1
    },

 

Conclusion

Textract is a needed addition to AWS AI/ML service family and fills the gap in analysis tools. Textract says that it will read English from multiple file formats and seems to do that well. All tests with PDFs and pictures were successful. Of course one wouldn’t use this service like this and upload single files manually. Textract has support in AWS cli and both Java and Python SDKs. That makes it possible to have, for example, automatic triggers in S3 bucket when new files are uploaded which launches Textract to do it’s thing. Overall a nice service which will probably be a very useful one for text analysis use cases.

Download a free Cloud Buyer's Guide

AWS Summit Berlin 2019

My thoughts on the Berlin AWS Summit 2019

What is an AWS Summit?

AWS Summits are small, free events that happen in various cities around the world. They are a “satellite” event of the re:Invent which takes place in Las Vegas every year in November. If you cannot attend re:Invent, you should definately try to attend an AWS Summit.

Berlin AWS Summit

I have had the pleasure of attending the Berlin AWS Summit for 4 years in a row.

Werner Vogels

The event was a 2 day event held on 26-27 of February 2019 in Berlin. The first day was more focused for management or new cloud users and the second day had more deep-dive technical sessions. The event started with a keynote held by Werner Vogels, CTO of Amazon. This year the Berlin AWS Summit seemed to be very focused on topics around Machine Learning and AI. Also I think this year there were more people attending compared to 2018 or 2017.

You will always find other sessions that are interesting to you, even if ML&AI are currently not on your radar. For example I attended the session about “Observability for Modern Applications” that showed how to use AWS X-Ray and App Mesh to monitor and control large scale microservices running in AWS EKS or similar. App Mesh is currently in public preview and it looks very interesting!

The partners

Every year there are a lot of stands by various partners showcasing their products to the passers by. You can also participate in raffles with the cost of your email address (and obvious marketing emails that will ensue). Most of them will also hand out free swag, stickers or pens etc.

stands 1Stands 2Stands 3

Solita Oy is an AWS Partner, please check our qualifications on the AWS Partners page.

Differences to previous years

This year there was no AWS Certified lounge which was a surprise to me. It is a restricted area for people who have an active AWS Certification where they can network with other certified people. I hope it will return next year again.

 

Thank you for the event!

Thank you and goodbye

Choosing provider for cloud

Sticking with your old habits and misconceptions is dangerous, choosing cloud partner is something that should be done with care.

There is nowadays a plethora of cloud operators to choose from and almost everyone has their favourite. AWS is the oldest and probably has the most features and services, Azure is go to place when running Microsoft-related applications or workloads and if you are looking into using AI or ML you go with Google. This has been a common misconception.

In reality choosing your cloud is not so black and white. Providers who came into the game a bit later than Amazon have been investing heavily on the development and are fast catching up. Amazon haven’t been resting on AI or ML front either. And there is also Alibaba, the Amazon of China, who is also pushing hard on the west now and seems to have focus on AI and ML.

Relying on this kind of categorising is dangerous as cloud operator strengths could change quite quickly and it might limit your capability to operate efficiently.

This is where you need to focus. Map your main goals when using the cloud. Check the options available by yourself if your skillset is up to date with all the options. This might be almost impossible as cloud providers are pushing new services almost daily. So I highly recommend that you move to the most important step and choose a partner to help you.

Choosing your partner right can make some serious cost saving and accelerate your development. Do your homework and spend some time benchmarking potential partners. Make sure your partner has enough real life experience on running and building to cloud.