woensdag 3 augustus 2016

WebLogic and SaltStack. Part 1. Stop and Start applications

In my current project the goal is to use saltstack as the default provisioning tool for all products for all environments. Now I'm not going give a course on saltstack, but in short it is like puppet but then newer, hotter, whatever. To get an idea have a look at this excellent tutorial.

Purpose

The requirements were simple: we want to stop and start WebLogic applications from one central location. Once you get started with Salt and get a feeling of it's power you'll start daydreaming on automating all tedious WebLogic tasks like installation, configuration of domains, deployments, datasources, jms resources, etc, etc. But let's start small.


Environment

Since you have completed the tutorial you now know that each server will have an agent (the minion) which receives commands from the salt master.

Salt-ssh

For the most common operating systems a minion is available but not for Solaris. Which is what we are using right now. Luckily the smart people of Salt came up with an alternative agentless solution: salt-ssh. So now you can run salt like you would normally would without installing anything on the target server.

Roster

To use salt-ssh you need to add the server to the roster file at /etc/salt.
roster

Accept ssh key

You need to accept the server's ssh key, the easiest way to do that is to use the -i option on the first salt-ssh command.
sudo salt-ssh -i 'sahwls*' test.ping

The State Module

Browsing around on the pages with the state modules you will find a large number of already build module to configure a wide variety of COTS applications like apache, postgress, splunk, etc. but no WebLogic. That's is not (yet) available. Now you can write your own module, but all we want for the moment is starting and stopping and writing a complete module for that seems a bit exaggerated.

The Alternative

So no module, but to stop and start an application using python is just a couple of lines and a shell script to call wlst.sh. So what if we could call that shell script using salt?

Python and shell script

The alternative is to place a shell and a python script on the target server and have salt-ssh call that shell script. Let's first look at the shell script.

Shell script

manageApplication.sh. Environment specific settings like portnumber, hostname and WebLogic password are stored in a separate properties file.
#!/bin/bash
#===================================================
# Script to start an application
#
# Author: Norbert Terhorst
#
# 14-6-2016     Norbert Terhorst    Intial Version
#
#===================================================
echo 'Action:'${ACTION}' for application '${APPLICATION_NAME}' on '${ENVIRONMENT}
SCRIPT=$(readlink -f $0)
SCRIPT_PATH=$(dirname $SCRIPT)
. ${SCRIPT_PATH}/SetEnvVariables.sh
PROP_PATH=${SCRIPT_PATH}/properties; export PROP_PATH # Location of all properties defined values file
PROP_FILE="$PROP_PATH/has-${ENVIRONMENT}-domain.properties"; export PROP_FILE # Python variables properties load file
WLST_HOME="$MIDDLEWARE_HOME/oracle_common/common/bin"; # Location of WLST executable
VAL=$(${WLST_HOME}/wlst.sh -loadProperties ${PROP_FILE} ${SCRIPT_PATH}/manageApplication.py ${APPLICATION_NAME} ${ACTION})
S1=`expr index "$VAL" '###'`
S2=`expr index "$VAL" '***'`
#print this line as last so saltstack can read the result
echo
echo ${VAL:$S1+2:S2-S1-3}
exit
manageApplication.py. Salt likes to know whether the command has executed succesfuly and if a state has changed. To do this it looks at last printed line for the following keys: changed and comment. The python script will retrieve the current state of the application and checks this against the required action (stop or start) and decide whether or not to do something (changed=yes) or leave it as it is (changed=no).
import socket;
import os.path;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
# get state of application
def getState(applicationName):
  state=""
  domainConfig()
  cd('/AppDeployments/'+applicationName+'/Targets')
  myTargets = ls(returnMap='true')
  domainRuntime()
  cd('AppRuntimeStateRuntime')
  cd('AppRuntimeStateRuntime')
  print '\n\n'
  for targetinst in myTargets:
    curstate=cmo.getCurrentState(applicationName,targetinst)
    state = state + ";" + targetinst+":"+curstate
  return state
# main
print '';
print 'SETTING GLOBAL VARIABLES ...';
application_name = sys.argv[1];
action = sys.argv[2];
domain_home = os.getenv('DOMAIN_CONFIGURATION_HOME');
admserver_laddress=wladmladdress;
admserver_lport=wladmlport;
admuser=wladmusrname;
admpass=wladmusrpwd;
admserver_url='t3://' + admserver_laddress + ':' + admserver_lport;
print '';
print 'CONNECT TO ADMIN SERVER';
print '';
print 'USING USERNAME AND PASSWORD';
connect(admuser, admpass, admserver_url);
#get state before stopping
beforeState=getState(application_name);
serverConfig();
myProgress=""
try:
  if action == 'stop':
        progress=stopApplication(application_name);
        myProgress=progress.getState();
  else:
        progress=startApplication(application_name);
        myProgress=progress.getState()
except:
  myProgress="failed to stop"
#get state after stopping
afterState=getState(application_name);
disconnect()
#set reply. The shell sctipt parses the print statement of this script and makes sure that the following line are printed at last so that saltstack can interpret the result
if beforeState != afterState:
  print "###changed=yes comment='"+afterState+"'***"
else:
  print "###changed=no comment='"+myProgress+"'***"

Now, how to tie a shell script and salt together?

cmd.run

For the remote execution of scripts the module 'cmd' can be used. For this scenario the function run is used. In the state file the cmd.run can be called.

State file

For this case the file manageApplication.sls has been created.
weblogicapplication:
  cmd.run:
    - name: /data/deployment/software/scripts/wls12c/manageApplication.sh
    - runas: {{ weblogic_user }}
    - stateful: True
    - env:
      - ENVIRONMENT: {{ weblogic_environment }}
      - ACTION: {{ action }}
      - APPLICATION_NAME: {{ application_name }}

The value of keyword name is the complete path to the script on the target server. This shell script uses three environment variables, these are set with env: statement. This state file is generic is generic for all applications and actions. To achieve this all environment specific and actions is defined in pillars. The weblogic_user and weblogic_environment are retrieved from pillar data in /srv/pillar, the other two parameters action and application_name are retrieved from the command line.

pillar

The other two parameters are static for that environment and are stored in a pillar files:

top.sls
development:
  'G@os:Solaris and G@nodename:sahwlsd*':
    - weblogic
test:
  'G@os:Solaris and G@nodename:sahwlst*':
    - weblogic

development/weblogic.sls
weblogic_environment: dev
weblogic_user: wlsdadm

test/weblogic.sls
weblogic_environment: tst
weblogic_user: wlstadm


To 'apply'  this state issue the following command:
sudo salt-ssh '<target server>' state.sls weblogic.manageApplication pillar='{"application_name":"<application name>","action":"<action>"}'

For example. To start application 'myApplication' on the the test environment issue the following command:
sudo salt-ssh 'sahwlst*' state.sls weblogic.manageApplication pillar='{"application_name":"myApplication","action":"start"}'

BEA-141149 invalid attempt was made to connect to the Administration Server

Last days spent some time on an issue that a managed server on a remote server was not able to connect to the admin server. Log in the admin console showed the following error:

####<Aug 2, 2016 8:13:47 AM CEST> <Error> <Management> <sahwlsdXXYYZZ.corp.XXYYZZ.com> <AdminServer> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <a31650b4-48fe-4b71-837a-80dfa6caf0d9-0000001e> <1470118427330> <[severity-value: 8] [rid: 0] [partition-id: 0] [partition-name: DOMAIN] > <BEA-141149> <An invalid attempt was made to connect to the Administration Server with a salt of n+Aj1CJZPev9gLWFjKvxQw== and a signature of o0BE/0lZi+TsHRrngSUCPrIgge6gFbfNznsdbirx1Tc=, likely due to private key mismatch.>
The domain consisted of one AdminServer and two managed servers. The first managed server was together on the same machine as the AdminServer and started fine. The second managed server on another machine failed to start. 

Long story short. On both servers there were no active ntp deamons, resulting in a time differences of 6 minutes. And the timestamp is part of the encryption. Activating the ntp daemon fixed this issue.