MySQL DB

Friday, February 19, 2021

AWS Athena Table auto partition using 'ALTER TABLE'

Hello Geeks,

Below program is the solution to create automatic Athena table partition by using 'Alter table' command which is much faster than 'MSCK REPAIR' command specially when we dealing with large tables. Using AWS Lambda function(server-less solution) and DynamoDB (for storing latest partition information).

So, Lets' start..

first we will create Dynamo Table:

It's very straight forward, goto AWS dynamodb console, Create DynamoDB table.

> Table name* -> xxxxx <this table name we have to provide as lamda env variable

> Primary key*

-> tab_name -> String <put any string like test><this column I am using as Primary key>

> day -> Number (put default 0)

> month -> Number (put default 0)

> Year -> Number (put default 0)

--------------------------------------------

>AWS Lambda function

>Function name: any

> runtime: python 3.7

> IAM role required: AmazonAthenaFullAccess, CloudWatchFullAccess,AmazonS3FullAccess or Athena Query Result Directory access

--------------

import boto3
import os
import _datetime
from botocore.exceptions import ClientError

#define client object
athena_client=boto3.client('athena')
dynamodb=boto3.resource('dynamodb')

#Environment Variables
athena_table_str=os.environ['ATHENA_TABLES']
athena_table_list=list(athena_table_str.split(','))
athena_db_name=os.environ['ATHENA_DB_NAME']
query_output=os.environ['QUERY_OUTPUT']
delta=int(os.environ['DELTA'])
dynamo_table_name=str(os.environ['DYNAMO_TABLE_NAME'])

#Initilization year, month and date (initial value)
init_y=int(os.environ['INIT_YEAR'])
init_m=int(os.environ['INIT_MONTH'])
init_d=int(os.environ['INIT_DAY'])

def get_partition_date(tablename):
    dynamoTable=dynamodb.Table(dynamo_table_name)
    try:
        response=dynamoTable.get_item(Key={'tab_name': tablename })
        if 'Item' not in response:
            dynamoTable.put_item(
                Item={
                    'tab_name':tablename,
                    'year' : init_y,
                    'month': init_m,
                    'day' : init_d
                })
            response=dynamoTable.get_item(Key={'tab_name': tablename })
            return response['Item']
        else:
            return response['Item']
    except ClientError as e:
        print(e.response['Error']['Message'])

def lambda_handler(event, context):
    for t in athena_table_list:
        d_obj=get_partition_date(t)

        y=d_obj['year']
        m=d_obj['month']
        d=d_obj['day']

        start_date = _datetime.date(y,m,d)
        day_delta=_datetime.timedelta(days=1)
        end_date=start_date + delta*day_delta

        queryContext={'Database' : athena_db_name}
        resultConfig = {'OutputLocation': query_output}

        try:
            for i in range ((end_date - start_date).days):
                end_date=start_date + i*day_delta
                sql = "alter table {} add partition (created_dt='{}')".format(t,end_date)
                result = athena_client.start_query_execution(QueryString=sql, QueryExecutionContext=queryContext, ResultConfiguration=resultConfig )

            dt_lst=(str(end_date).split('-'))
            yy=int(dt_lst[0])
            mm=int(dt_lst[1])
            dd=int(dt_lst[2])

            dynamoTable=dynamodb.Table(dynamo_table_name)
            dynamoTable.put_item(
                Item={
                    'tab_name':t,
                    'year' : yy,
                    'month': mm,
                    'day' : dd
                })
        except Exception as e:
            print(e)
            print('Error')
            raise e

--------------

Required Environment Variables:-

ATHENA_DB_NAME: <same db name which is using/creating in athena console>

ATHENA_TABLES : List of athena tables i.e table1,table2,table3,table4

DELTA: It's an integer value, showing no of days you want to create partition. Ex: 4

DYNAMO_TABLE_NAME: <Create dynamo table first and put table name here>

INIT_DAY: <integer value:> from which day partitions start to create

INIT_MONTH: <integer value:> from which month partitions start to create

INIT_YEAR: <integer value:> from which year partitions start to create

QUERY_OUTPUT: Athena Query Result Location <you may find as , goto athena console, click settings then search "Query result Location"

Sunday, November 17, 2019

How to Upgrade Amazon Aurora MySQL from 5.6 to 5.7

Upgrade AWS MySQL Aurora RDS version from 5.6 to 5.7 with minimal downtime

== Source : MySQL version 5.6 Aurora 1.17.3
== Target : MySQL version 5.7 Aurora 2.05.0

Full backup: we will use latest backup snapshot of the source
Incremental changes: DMS or native Mysql replication (recommended to use native Mysql Replication)

kindly look into the prerequisites before proceeding with the migration. Just to list a few,
== Enable binary logs on master (requires reboot).
-- enable binary logs by editing into parameter group , parameter is: "log-bin"
-- can ensure by executing this command on source "show binary logs"

== Increasing the binary log retention on master so that they are not purged before they are replicated on the slave cluster.
-- call mysql.rds_show_configuration;
-- call mysql.rds_set_configuration('binlog retention hours', 240); #10 Days

== Ensure proper connectivity between the clusters. Replication is a "pull" process, and also it happens over the public network.

-- create replication user on source

mysql> CREATE USER 'repl_user'@'<domain_name>' IDENTIFIED BY '<password>';

mysql> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl_user'@'<domain_name>';

You may refer the below MySQL document so that you can identify the points to be considered before setting up replication between different versions of MySQL.

[+] MySQL replication compatibility: https://dev.mysql.com/doc/refman/5.7/en/replication-compatibility.html

Considering AWS DMS, you have to review the limitations of using MySQL based engine as source and target for DMS, and see if it affects your use case.
[+] Limitations on Using a MySQL Database as a Source for AWS DMS : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MySQL.html#CHAP_Source.MySQL.Limitations

[+] Limitations on Using a MySQL-Compatible Database as a Target for AWS Database Migration Service : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.MySQL.html#CHAP_Target.MySQL.Limitations

Steps:-

Phase 1)

(i) Restore latest snapshot of the source and select required DB Engine version ( Aurora (MySQL 5.7) 2.05.0) , also enable log exports (error log, general log )

(ii) Once Restoration is completed, has to setup native replication between source and target

-- collect mysql binlog file and position , this you can get from the target Logs & Events tab or

you have to find into Error/General log file.

(iii) Go to Target cluster

CALL mysql.rds_set_external_master ('mydbinstance.123456789012.us-east-1.rds.amazonaws.com', 3306, 'repl_user', '<password>', 'mysql-bin-changelog.000031', 107, 0);

CALL mysql.rds_start_replication;

107 --> is position

0 --> SSL is disabled (1 == SSL enabled)

(iv) show slave status\G

Phase 2)

    - Application stops write
    - Stop replication (execute below command from target i.e 5.7 cluster)
        -- CALL mysql.rds_stop_replication;
    - Update db end point
       -- rename db endpoint 5.6 to old and 5.7 to same db endpoint which is in application config. (it can be modify directly from the AWS RDS console) system reboot needed.
    - Start application

    OR

    - Start application with new end point.
    - Stop Replication

Phase 3
    - Release old 5.6 MySQL aurora cluster

Wednesday, May 11, 2016

How to reset timezone in Linux/Centos server:

There are two way to do this:

1) cp /usr/share/zoneinfo/Singapore /etc/localtime
*2) ln -s /usr/share/zoneinfo/Singapore /etc/localtime

* For creating soft link, please make sure localtime file should not be exist in /etc path folder. You can remove or rename localtime file first

Update Confluence (wiki) page by node.js

Getting started.....

# mkdir -p /usr/local/wiki_update_folder
Install confluence-api via npm:
# npm install confluence-api

# vi update_wiki_page_git_version.js

var child = require('child_process')
var Confluence = require("confluence-api");
var config = {
    username: "test-user",    // your confluence userid
    password: "test-pass",      // your confluence password
    baseUrl: "https://<your_wiki_url>/wiki"
};

var confluence = new Confluence(config);
confluence.getContentByPageTitle("SPACE_NAME", "PAGE_TITLE", function(err, data) {
    ver_no = data.results[0].version.number;
    ver_no=ver_no+1 ;
    child.exec("date",function(err,stdout,stderr){
        confluence.putContent("SPACE_NAME","PAGE_ID",ver_no, "PAGE_TITLE",stdout,function(err, data) {});
    });
});

e.g:
If you want to display your script output in wiki.
It can be very useful in case you want to publish any status into confluence. Also it can be automated by cron or any task scheduler commands.

var child = require('child_process')
var Confluence = require("confluence-api");
var config = {
    username: "testuser",
    password: "testpass",
    baseUrl: "https://wiki-confluence-url/wiki"
};

var confluence = new Confluence(config);
confluence.getContentByPageTitle("WIKI_SPACE_NAME", "PAGE_TITLE", function(err, data) {
    ver_no = data.results[0].version.number;
    ver_no=ver_no+1 ;
    shell_comm="sh test.sh'";
    child.exec(shell_comm,function(err,stdout,stderr){
        confluence.putContent("WIKI_SPACE_NAME","WIKI_PAGE_ID",ver_no, "PAGE_TITLE","<pre>"+stdout+"</pre>",function(err, data) {});
    });
});

Read Recent Logs only

Hi All,

Sometimes, we have a situation to read recent logs from the log file which continuously growing from the current logs; source can be anything like system logs, application logs, database logs, device logs etc etc...

My Problem- Need to read all recent slow queries of mongo (nosql database) logs and notified when it breaches threshold (400ms).

Solution- It can be achieved by using various scripting/programming language like node.js or other scripting language, I opt 'BASH script.

#!/bin/bash

## Please update the value according to your system/environment.

## set variable

logfilepath=/var/log/mongo # please change the path as per the environment

name=mongod.log # mongo log file name

logrecord=<path_to_capture_current_script_log> # for recording current script log

MaxLLN_file=<file_to_hold_max_LLN_value> # for maintaing Max Last Line Number

slow_query_log_record=<logged_only_slow_queries>

tmpfile=<temp_file_path>

mailnotification=<notification_filename_which we have to monitored/display/forward>

## Set variable for mail

to='abc@xyz.com'

from='Sender:xxx'

sub='<subject_of_the_mail>'

## No need to change/update anything here. Please change cautiously,if needed.

## If file exists having start line number then save it into a file.

if [ -f $MaxLLN_file ];then

echo -e "[`date +%F_%T`] $MaxLLN_file File exist, max LLN value is copied" >> $logrecord

startlineno=`cat $MaxLLN_file`

else

# If file is not exist

echo -e "[`date +%F_%T`] $MaxLLN_file doesn't exist so startlineno counter start from 1" >> $logrecord

startlineno=1

echo 1 > $MaxLLN_file

while true

startlineno=`cat $MaxLLN_file`

totallineno=`wc -l $logfilepath/$name|awk '{print $1}'`

if [ $startlineno -le $totallineno ];then

echo -e "[`date +%F_%T`] Slow log search start from between $startlineno untill $totallineno line number" >> $logrecord

slowqeries_count=`sed -n "$startlineno","$totallineno"p $logfilepath/$name | grep -c 'ms'`

if [ $slowqeries_count -ge 1 ];then

echo -e "[`date +%F_%T`] $slowqeries_count slow queries found between $startlineno and $totallineno line number" >> $logrecord

sed -n "$startlineno","$totallineno"p $logfilepath/$name | grep 'ms' > $tmpfile

`cat $tmpfile >> $slow_query_log_record`

echo -e "Dear All,\n\nBelow are the Slow query logs of Mongo Server `hostname` ( taking greater than or equal to 400ms)\n\n" > $mailnotification

cat $tmpfile | while read line

if [ `echo $line | gawk '{print $NF-400}'` -ge 0 ];then echo $line >> $mailnotification

done

echo -e "\n\n Thanks & Best Regards,\nAnurag Bisht" >> $mailnotification

echo "Subject: $sub" | cat - $mailnotification | sendmail -F"$from" -t "$to"

echo > $mailnotification;

else

echo -e "[`date +%F_%T`] $slowqeries_count Slow queries found between $startlineno and $totallineno line number" >> $logrecord

totallineno=`expr $totallineno + 1`

echo $totallineno > $MaxLLN_file

echo -e "[`date +%F_%T`] New search will start from `cat $MaxLLN_file` value" >> $logrecord

echo -e "---------------------------------" >> $logrecord

done

Monday, July 27, 2015

Import Mysql data (or a CSV file) into MongoDB

If you have Open source MySQL database. And want to move either all, or some of the data into MongoDB (noSQL Database). There may be any other direct/indirect way to do it, but you can also achieve this thing from below method (Add on) :)

Step 1.

Export all of your MySQL data as a CSV file.

select columns INTO OUTFILE '/path/to/csv'

FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'

LINES TERMINATED BY '\n' from table [where clause]

Step 2.

Import the CSV into MongoDB.

mongoimport -d dbname -c collname --type csv -f fields-sep-by-coma --drop /path/to/csv

if your csv file has a header row add --headerline option.

Suppose you have a big table having more than 100 columns then how can you display/show those columns as a comma separated. Here is the way-

mysql>select GROUP_CONCAT(CONCAT("'",COLUMN_NAME,"'"))

from INFORMATION_SCHEMA.COLUMNS

WHERE TABLE_NAME = 'your_table_name' AND

TABLE_SCHEMA = 'your_database_name';

Secure authentication in Mysql (For DBA Administration purpose)

Dear Folks,

For secure authentication in mysql database server, "mysql_config_editor" tool ( available from mysql 5.6.6 version) enables you to store authentication information in encrypted format like below-

[devpcs-readonly]

user = dba

password = *****

host = 10.0.7.1

[devcpdb-readonly]

user = dba

password = *****

host = 10.0.2.1

[sitpcs-repl]

user = dba

password = *****

host = 172.31.2.14

[sitcpdb-repl]

user = dba

password = *****

host = 172.31.2.1

Above details can be seen by the below command -

mysql_config_editor print --all

Now we will see a demo to save a new authentication connection-

shell> mysql_config_editor set --login-path=<name_to_which_you_want_to_connect> —host=<IP_of_remote/local_server> —user=username —password

Enter Password:xxxxxxxxx

E.g
shell>mysql_config_editor --login-path='uat-pcs-readonly' --host='10.22.0.1' --user='readonly' --password
Enter Password: xxxxxx

Trying to login now-

shell>mysql --login-path=uat-pcs-readonly
mysql>

You can also use --prompt option for avoiding any discrepancy among the multiple mysql console.

Example-

shell>mysql --login-path=uat-pcs-readonly --prompt="\u@\h [\d@\p] mysql> "
readonly@172.22.0.1 [(none)@3306] mysql>

In this way, we do not need to write or provide database password again and again.

Obj-
Do not save mysql password in plain text flat file (csv/txt/.sh/etc)

Pros-
For DBA Administration related automate scripts, this feature can be useful in many cases.

Cons-
Anybody can login into mysql database by issuing --login-path option. So do not provide "root" credentials anywhere. Restrict user privileges.

Reference-
http://dev.mysql.com/doc/refman/5.6/en/mysql-config-editor.html

MySQL DB

Friday, February 19, 2021

Sunday, November 17, 2019

How to Upgrade Amazon Aurora MySQL from 5.6 to 5.7

Upgrade AWS MySQL Aurora RDS version from 5.6 to 5.7 with minimal downtime

Wednesday, May 11, 2016

Read Recent Logs only

Monday, July 27, 2015

About Me

Blog Archive