Friday, February 19, 2021

 AWS Athena Table auto partition using 'ALTER TABLE'


Hello Geeks,

Below program is the solution to create automatic Athena table partition by using 'Alter table' command which is much faster than 'MSCK REPAIR' command specially when we dealing with large tables. Using AWS Lambda function(server-less solution) and DynamoDB (for storing latest partition information).

So, Lets' start..

first we will create Dynamo Table:

It's very straight forward, goto AWS dynamodb console, Create DynamoDB table.

> Table name* -> xxxxx  <this table name we have to provide as lamda env variable

> Primary key* 

    -> tab_name  -> String <put any string like test><this column I am using as Primary key> 

> day -> Number (put default 0)

> month -> Number (put default 0)

> Year -> Number (put default 0)

 --------------------------------------------

>AWS Lambda function 

>Function name: any

> runtime: python 3.7

> IAM role required: AmazonAthenaFullAccess, CloudWatchFullAccess,AmazonS3FullAccess or Athena Query Result Directory access

 

--------------

import boto3
import os
import _datetime
from botocore.exceptions import ClientError

#define client object
athena_client=boto3.client('athena')
dynamodb=boto3.resource('dynamodb')

#Environment Variables
athena_table_str=os.environ['ATHENA_TABLES']
athena_table_list=list(athena_table_str.split(','))
athena_db_name=os.environ['ATHENA_DB_NAME']
query_output=os.environ['QUERY_OUTPUT']
delta=int(os.environ['DELTA'])
dynamo_table_name=str(os.environ['DYNAMO_TABLE_NAME'])

#Initilization year, month and date (initial value)
init_y=int(os.environ['INIT_YEAR'])
init_m=int(os.environ['INIT_MONTH'])
init_d=int(os.environ['INIT_DAY'])

def get_partition_date(tablename):
    dynamoTable=dynamodb.Table(dynamo_table_name)
    try:
        response=dynamoTable.get_item(Key={'tab_name': tablename })
        if 'Item' not in response:
            dynamoTable.put_item(
                Item={
                    'tab_name':tablename,
                    'year' : init_y,
                    'month': init_m,
                    'day'  : init_d
                })
            response=dynamoTable.get_item(Key={'tab_name': tablename })  
            return response['Item']
        else:
            return response['Item']
    except ClientError as e:
        print(e.response['Error']['Message'])
    
def lambda_handler(event, context):
    for t in athena_table_list:
        d_obj=get_partition_date(t)

        y=d_obj['year']
        m=d_obj['month']
        d=d_obj['day']
        
        start_date = _datetime.date(y,m,d)
        day_delta=_datetime.timedelta(days=1)
        end_date=start_date + delta*day_delta
    
        queryContext={'Database' : athena_db_name}
        resultConfig = {'OutputLocation': query_output}
    
        try:
            for i in range ((end_date - start_date).days):
                end_date=start_date + i*day_delta
                sql = "alter table {} add partition (created_dt='{}')".format(t,end_date)
                result = athena_client.start_query_execution(QueryString=sql, QueryExecutionContext=queryContext, ResultConfiguration=resultConfig )
        
            dt_lst=(str(end_date).split('-'))
            yy=int(dt_lst[0])
            mm=int(dt_lst[1])
            dd=int(dt_lst[2])
        
            dynamoTable=dynamodb.Table(dynamo_table_name)
            dynamoTable.put_item(
                Item={
                    'tab_name':t,
                    'year' : yy,
                    'month': mm,
                    'day'  : dd
                })        
        except Exception as e:
            print(e)
            print('Error')
            raise e

--------------

Required Environment Variables:-

ATHENA_DB_NAME: <same db name which is using/creating in athena console>

ATHENA_TABLES : List of athena tables i.e table1,table2,table3,table4

DELTA: It's an integer value, showing no of days you want to create partition. Ex: 4

DYNAMO_TABLE_NAME: <Create dynamo table first and put table name here>

INIT_DAY:  <integer value:> from which day partitions start to create

INIT_MONTH: <integer value:> from which month partitions start to create

INIT_YEAR: <integer value:> from which year partitions start to create

QUERY_OUTPUT: Athena Query Result Location <you may find as , goto athena console, click settings then search "Query result Location"

Sunday, November 17, 2019

How to Upgrade Amazon Aurora MySQL from 5.6 to 5.7

Upgrade AWS MySQL Aurora RDS version from 5.6 to 5.7 with minimal downtime


== Source : MySQL version 5.6 Aurora 1.17.3
== Target  : MySQL version 5.7 Aurora 2.05.0

Full backup: we will use latest backup snapshot of the source
Incremental changes:  DMS or native Mysql replication (recommended to use native Mysql Replication)


kindly look into the prerequisites before proceeding with the migration. Just to list a few,
== Enable binary logs on master (requires reboot).

    --  enable binary logs by editing into parameter group , parameter is: "log-bin"
    --  can ensure by executing this command on source "show binary logs"
 
== Increasing the binary log retention on master so that they are not purged before they are replicated on the slave cluster.

    -- call mysql.rds_show_configuration;
    -- call mysql.rds_set_configuration('binlog retention hours', 240);     #10 Days 

== Ensure proper connectivity between the clusters. Replication is a "pull" process, and also it happens over the public network.
    -- create replication user on source
mysql> CREATE USER 'repl_user'@'<domain_name>' IDENTIFIED BY '<password>';
mysql> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl_user'@'<domain_name>';

You may refer the below MySQL document so that you can identify the points to be considered before setting up replication between different versions of MySQL.

[+] MySQL replication compatibility: https://dev.mysql.com/doc/refman/5.7/en/replication-compatibility.html

Considering AWS DMS,  you have to review the limitations of using MySQL based engine as source and target for DMS, and see if it affects your use case.
[+] Limitations on Using a MySQL Database as a Source for AWS DMS : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html#CHAP_Source.MySQL.Limitations 

[+] Limitations on Using a MySQL-Compatible Database as a Target for AWS Database Migration Service : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.MySQL.html#CHAP_Target.MySQL.Limitations
 

Steps:-
Phase 1)
(i) Restore latest snapshot of the source and select required DB Engine version ( Aurora (MySQL 5.7) 2.05.0) , also enable log exports (error log, general log )

(ii) Once Restoration is completed, has to setup native replication between source and target
    -- collect mysql binlog file and position , this you can get from the target Logs & Events tab or 
you have to find into Error/General log file. 

(iii) Go to Target cluster 

CALL mysql.rds_set_external_master ('mydbinstance.123456789012.us-east-1.rds.amazonaws.com', 3306, 'repl_user', '<password>', 'mysql-bin-changelog.000031', 107, 0);
CALL mysql.rds_start_replication;

107 --> is position
0 --> SSL is disabled (1 == SSL enabled)

(iv) show slave status\G
    
 
Phase 2)

    - Application stops write
    - Stop replication (execute below command from target i.e 5.7 cluster)

        -- CALL mysql.rds_stop_replication;    
    - Update db end point
       -- rename db endpoint 5.6 to old and 5.7 to same db endpoint which is in application config.  (it can be modify directly from the AWS RDS console) system reboot needed.
    - Start application

    OR

    - Start application with new end point.
    - Stop Replication

Phase 3
    - Release old 5.6 MySQL aurora cluster

Wednesday, May 11, 2016

How to reset timezone in Linux/Centos server:

There are two way to do this:

1) cp /usr/share/zoneinfo/Singapore /etc/localtime
*2) ln -s /usr/share/zoneinfo/Singapore /etc/localtime
* For creating soft link, please make sure localtime file should not be exist in /etc path folder. You can remove or rename localtime file first
Update Confluence (wiki) page by node.js

Getting started.....

# mkdir  -p /usr/local/wiki_update_folder
Install confluence-api via npm:
# npm install confluence-api

# vi update_wiki_page_git_version.js

var child = require('child_process')
var Confluence = require("confluence-api");
var config = {
    username: "test-user",    // your confluence userid
    password: "test-pass",      // your confluence password
    baseUrl:  "https://<your_wiki_url>/wiki"
};

var confluence = new Confluence(config);
confluence.getContentByPageTitle("SPACE_NAME", "PAGE_TITLE", function(err, data) {
    ver_no = data.results[0].version.number;
    ver_no=ver_no+1 ;
    child.exec("date",function(err,stdout,stderr){
        confluence.putContent("SPACE_NAME","PAGE_ID",ver_no, "PAGE_TITLE",stdout,function(err, data) {});
    });
}); 



e.g:
If you want to display your script output in wiki. 
It can be very useful in case you want to publish any status into confluence. Also it can be automated by cron or any task scheduler commands. 

var child = require('child_process')
var Confluence = require("confluence-api");
var config = {
    username: "testuser",
    password: "testpass",
    baseUrl:  "https://wiki-confluence-url/wiki"
};


var confluence = new Confluence(config);
confluence.getContentByPageTitle("WIKI_SPACE_NAME", "PAGE_TITLE", function(err, data) {
    ver_no = data.results[0].version.number;
    ver_no=ver_no+1 ;
    shell_comm="sh test.sh'";
    child.exec(shell_comm,function(err,stdout,stderr){
        confluence.putContent("WIKI_SPACE_NAME","WIKI_PAGE_ID",ver_no, "PAGE_TITLE","<pre>"+stdout+"</pre>",function(err, data) {});
    });
});

Read Recent Logs only

Hi All,

Sometimes, we have a situation to read recent logs from the log file which continuously growing from the current logs; source can be anything like system logs, application logs, database logs, device logs etc etc...

My Problem- Need to read all recent slow queries of mongo (nosql database) logs and notified when it breaches threshold (400ms).

Solution- It can be achieved by using various scripting/programming language like node.js or other scripting language, I opt 'BASH script.


#!/bin/bash

## Please update the value according to your system/environment.
## set variable
logfilepath=/var/log/mongo                                          # please change the path as per the environment
name=mongod.log                                                         # mongo log file name
logrecord=<path_to_capture_current_script_log>        # for recording current script log
MaxLLN_file=<file_to_hold_max_LLN_value>         # for maintaing Max Last Line Number
slow_query_log_record=<logged_only_slow_queries>
tmpfile=<temp_file_path>
mailnotification=<notification_filename_which we have to monitored/display/forward>

## Set variable for mail
to='abc@xyz.com'
from='Sender:xxx'
sub='<subject_of_the_mail>'

## No need to change/update anything here. Please change cautiously,if needed.
## If file exists having start line number then save it into a file.
if [ -f $MaxLLN_file ];then
    echo -e "[`date +%F_%T`] $MaxLLN_file File exist, max LLN value is copied" >> $logrecord
    startlineno=`cat $MaxLLN_file`
else
    # If file is not exist
    echo -e "[`date +%F_%T`] $MaxLLN_file doesn't exist so startlineno counter start from 1" >> $logrecord
    startlineno=1
    echo 1 > $MaxLLN_file
fi
while true
do
    startlineno=`cat $MaxLLN_file`
    totallineno=`wc -l $logfilepath/$name|awk '{print $1}'`
    if [ $startlineno -le $totallineno ];then
        echo -e "[`date +%F_%T`] Slow log search start from between $startlineno untill $totallineno line number" >> $logrecord
        slowqeries_count=`sed -n "$startlineno","$totallineno"p $logfilepath/$name | grep -c 'ms'`
        if [ $slowqeries_count -ge 1 ];then
            echo -e "[`date +%F_%T`] $slowqeries_count slow queries found between $startlineno and $totallineno line number" >> $logrecord
            sed -n "$startlineno","$totallineno"p $logfilepath/$name | grep 'ms' > $tmpfile
            `cat $tmpfile >> $slow_query_log_record`
            echo -e "Dear All,\n\nBelow are the Slow query logs of Mongo Server `hostname` ( taking greater than or equal to 400ms)\n\n" > $mailnotification
            cat $tmpfile | while read line
            do
                if [ `echo $line | gawk '{print $NF-400}'` -ge 0 ];then echo $line >> $mailnotification
                fi
            done
            echo -e "\n\n Thanks & Best Regards,\nAnurag Bisht" >> $mailnotification
            echo "Subject: $sub" | cat - $mailnotification | sendmail -F"$from" -t "$to"
            echo > $mailnotification;
        else
            echo -e "[`date +%F_%T`] $slowqeries_count Slow queries found between $startlineno and $totallineno line number" >> $logrecord
        fi
        totallineno=`expr $totallineno + 1`
        echo $totallineno > $MaxLLN_file
        echo -e "[`date +%F_%T`] New search will start from `cat $MaxLLN_file` value" >> $logrecord
        echo -e "---------------------------------" >> $logrecord
    fi
done

Monday, July 27, 2015



If you have Open source MySQL database. And want to move either all, or some of the data into MongoDB (noSQL Database). There may be any other direct/indirect way to do it,  but you can also achieve this thing from below method (Add on) :) 

Step 1. 
Export all of your MySQL data as a CSV file.

select columns INTO OUTFILE '/path/to/csv
   FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
   LINES TERMINATED BY '\n' from table [where clause]

Step 2.
Import the CSV into MongoDB.

mongoimport -d dbname -c collname --type csv -f fields-sep-by-coma  --drop /path/to/csv 


if your csv file has a header row add --headerline option.


Suppose you have a big table having more than 100 columns then how can you display/show those columns as a comma separated. Here is the way- 

mysql>select GROUP_CONCAT(CONCAT("'",COLUMN_NAME,"'")) 
from INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = 'your_table_name' AND 
TABLE_SCHEMA = 'your_database_name';

Secure authentication in Mysql (For DBA Administration purpose)

Dear Folks,

For secure authentication in mysql database server, "mysql_config_editor" tool ( available from mysql 5.6.6 version) enables you to store authentication information in encrypted format like below-


[devpcs-readonly]
user = dba
password = *****
host = 10.0.7.1
[devcpdb-readonly]
user = dba
password = *****
host = 10.0.2.1
[sitpcs-repl]
user = dba
password = *****
host = 172.31.2.14
[sitcpdb-repl]
user = dba
password = *****
host = 172.31.2.1

Above details can be seen by the below command -
mysql_config_editor print --all

Now we will see a demo to save a new authentication connection-

shell> mysql_config_editor set --login-path=<name_to_which_you_want_to_connect> —host=<IP_of_remote/local_server> —user=username —password

Enter Password:xxxxxxxxx

E.g
shell>mysql_config_editor --login-path='uat-pcs-readonly' --host='10.22.0.1' --user='readonly' --password
Enter Password: xxxxxx

Trying to login now-

shell>mysql --login-path=uat-pcs-readonly 
mysql>

You can also use --prompt option for avoiding any discrepancy among the multiple mysql console.

Example-

shell>mysql --login-path=uat-pcs-readonly --prompt="\u@\h [\d@\p] mysql> " 
readonly@172.22.0.1 [(none)@3306] mysql>


In this way, we do not need to write or provide database password again and again.

Obj-
Do not save mysql password in plain text flat file (csv/txt/.sh/etc)

Pros-
For DBA Administration related automate scripts, this feature can be useful in many cases.

Cons-
Anybody can login into mysql database by issuing --login-path option. So do not provide "root" credentials anywhere. Restrict user privileges.

Reference-
http://dev.mysql.com/doc/refman/5.6/en/mysql-config-editor.html