The aws_ext python package contains some useful functions (built on top of boto3) for managing some aws services. At the moment only some utilities for the Aws Glue Data catalog Installation pip install aws_ext Usage import boto3 import aws_ext session = boto3.session.Session() GLUE from aws_ext import glue_databases glue_client = session.client("glue") Extracting tables with (too) many versions glue_databases.get_tables_with_many_versions(glue_client, database_name="mydb", threshold=1) Deleting old tables versions glue_databases.delete_old_tables_versions(glue_client, database_name="mydb", keep=1, dryrun=True)

June 25, 2021 · 1 min · 67 words · Matteo Redaelli

Using a GraphQL gateway for backend services (Active Directory, AWS and Qliksense Api samples)

Complex web sites read and write data from/to several backend systems using different interfaces (sql, soap , rest, rpc,..). But it could be simpler and useful to create a single endpoint and interface for all the backends. With GraphQL the frontend applications get from the backends only the list of fields they need and do not receive the static list of the fields provided by the soap/rest services. I played with graphql and Walmart lacinia implementing one GraphQL backend for LDAP/Active Directory and one for Qliksense Repository rest api....

October 11, 2020 · 2 min · 253 words · Matteo Redaelli

Using Terraform for managining Amazon Web Service infrastructure

In the last days I tested Terraform (Use Infrastructure as Code to provision and manage any cloud, infrastructure, or service) for managing some resources in a AWS cloud ebvironemnt. In this sample I’ll create and schedule a lambda function Create a file "" with the content: variable "aws_region" {default = "eu-west-1"} variable "aws_profile" {default = ""} variable "project" {default = "my_project"} variable "vpc" {default= "XXXXX"} variable "subnets" {default= "XXXX"} variable "aws_account" {default= "XXX"} variable "security_groups" {default= "XXXX"} # variable "db_redshift_host" {default= ""} variable "db_redshift_port" {default= ""} variable "db_redshift_name" {default= ""} variable "db_redshift_username" {default= ""} variable "db_redshift_password" {default= ""} Create a file lambda....

September 30, 2019 · 2 min · 365 words · Matteo Redaelli

Scheduling AWS EMR clusters resize

Below a sample of howto schedule an Amzon Elastic MapReduce (EMR) cluster resize. It is useful if you have a cluster that is less used during the nights or in the weekends I used a lambda function triggered by a Cloudwatch rule. Here is my python lambda function import boto3, json MIN=1 MAX=10 def lambda_handler(event, context): region = event["region"] ClusterId = event["ClusterId"] InstanceGroupId = event["InstanceGroupId"] InstanceCount = int(event['InstanceCount']) if InstanceCount >= MIN and InstanceCount <= MAX: client = boto3....

July 22, 2019 · 1 min · 136 words · Matteo Redaelli

AWS Lake Formation: the new Datalake solution proposed by Amazon

AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for analysis. A data lake enables you to break down data silos and combine different types of analytics to gain insights and guide better business decisions. However, setting up and managing data lakes today involves a lot of manual, complicated, and time-consuming tasks....

November 29, 2018 · 2 min · 332 words · Matteo Redaelli