Friday, 4 March 2016

Set you linux host name and domain

Recently I have created a new VM linux server on CloudSigma. The Server runs Fedora 22. In order to setup the hostname and network domain I have changed the following files:

[root@illumine ~]# cat  /etc/host
securepdf

[root@illumine ~]# cat /etc/hostname
securepdf

[root@illumine ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

178.XXX.XXX.132   securepdf.illumineit.com securepdf

Test it using ping:

[root@securepdf ~]# ping securepdf
PING securepdf.illumineit.com (178.XXX.XXX.132) 56(84) bytes of data.
64 bytes from securepdf.illumineit.com (178.XXX.XXX.132): icmp_seq=1 ttl=64 time=0.036 ms



The quest for the Holy Cloud.

The last 10 days I am struggling myself to choose a cloud provider.  

My selection criteria:

  • Free of charge for a trial use. No credit card registration.
  • Easy to use with what I know without having to invest on extra study
  • The resources the cloud provider offers for trial/free tries, like CPU, allowed Network bandwidth. The more resources offered, the best scoring for the cloud provider.
  • Technology used for Automation and VM provisioning. 

I tried several cloud providers by the order they appear on Google. 

First of all, I dumped Amazon Cloud Services only for one reason: I don not really want to put my Credit Card even before I have to pay for something just because the site asks it. If it was not Amazon behind the site- would you put your card? So no Amazon for me.

Second try with Openshift from Red Hat. I registered there and created a VM with Tomcat7/JBoss "cartridge". Cool - worked out easy and in about 10 minutes I managed to register and create a VM. However:  the machine has too many restrictions, for example you cannot add the packages you like with RPM or yum. Moreover, the Tomcat7 differs from the standard tomcat you download from Apache. When I tried to deploy one of my apps in the new machine there the deployment failed. Also, I did not like the approach of automation implementation with rhc tools. It reminded me some nightmares I had with Chef´s knife. 

My next try was with DigitalOcean. They do not have a free plan but instead they offer a voucher with discount. Again when I tried to register, after following the link in the confirmation email that was sent from them, I was redirected to their page asking for my Credit Card details again: "Thanks! Please add a credit card to activate your account." Thanks but no thanks guys. "There are other orange trees that also make oranges" as an old Greek piece of mind says.

Finally I got there:  cloudsigma.com.  No credit card requirement for a test drive of 7 days. So I created a VM with Fedora 22 in less than a minute. If you register with them you can run your instance for free for 7 days with a limitation about port 25 for email. They offer VNC client on their site to connect to the running VM. I also got connected using Putty and OpenSSL tools with a minimal configuration of the security keys. At some point, I could not find the Super user credentials for the VM but there was a message box with 24/7 online help even for the trial users. The operator responded instantly and gave me some hints.  The extra bonus for this cloud provider is the billing scheme they apply: they bill the usage of the resources not the resources. So you pay if you exceed your contract threshold per 5 minutes sampling. They utilize HTTP/HTTPS API for cloud management and Operations, a design that according to my opinion is the most flexible way to build your applications on top. 

From my quest for the holy cloud I think I made the correct decision with cloudsigma.com.

Tuesday, 15 July 2014

Pattern to Deliver Different Automation Templates per server group

The Problem: 
I have 3 groups servers that utilize different settings like LDAP, Apache config, Splunk. Each group has around 30 servers. Each of the configuration file for LDAP, Apache and Splunk does NOT have the same format, so a general automation Ruby template cannot be used for all three groups of servers. 

For example I cannot have a Splunk authentication.conf.erb for all groups like:
[default]

[Corporate AD]
bindDN = <%= @node['splunk']['ldap-bindDN'] %>
charset = utf8
bindDNpassword = <%= @node['splunk']['ldap-bindDNpassword'] %>
SSLEnabled = 0
port = 389
userBaseDN =  <%= @node['splunk']['ldap-userBaseDN'] %>
host =  <%= @node['splunk']['ldap-binddn'] %>

[authentication]
authType = LDAP
authSettings = <%= @node['splunk']['ldap-authSettings'] %>

# Here the splunk Stanga is always different for all 4 group of servers!!!
[roleMap_Corporate]
admin = wewvffsf3f
myreporting = 0110052012E
power = 0110052012E;0110052012E;0110052012E;0110052012E;

Question: 
How to apply automation for all four server groups by having templates of different formats ?

Solution:
 I give each server group a group id as an attribute:
node['splunk']['group-id'] = groupA or groupB or groupC 

Then in my Chef project I organize my templates folder as follows:

Contents of  my-chef-project/templates/default
  • groupA-authentication.conf.erb : describes LDAP settings for Group A
  • groupA-authorization.conf.erb : describes Splunk Authorization settings for Group A
  • groupB-authentication.conf.erb : describes LDAP settings for Group B
  • groupB-authorization.conf.erb : describes Splunk Authorization settings for Group B
  • groupC-authentication.conf.erb : describes LDAP settings for Group C
  • groupC-authorization.conf.erb : describes Splunk Authorization settings for Group C
Each of those templates is bare simple text without any parameters or anything else except perhaps node IP, hostname... See an example groupA-authentication.conf.erb :
[default]

[Corporate Settings]
bindDN = CN=splunk,OU=Services,OU=Company Page,OU=Resources,DC=illumine,DC=gr
SSLEnabled = 1
port = 437
host = ldap.illumine.com
client =  <%= @node['ip'] %>

[authentication]
authType = LDAP
authSettings = Corporate Settings

[roleMap_Corporate kl]
admin = nottellingya
blog = 0110341333450057252012E
puser = 0110003532234123412342012E;0110003532234123412342012E;0110003532234122412342012E

In my automated delivery chef recipe for any type of those templates I do something like the following chef ruby illustrates:

  template "/opt/splunk/etc/system/local/authentication.conf" do
    source "#{node['splunk']['group-id']}_authentication.conf.erb"
    owner 'splunk'
    group 'splunk'
    mode 0600
    variables()
    ignore_failure true
  end

Note that:

  • The template that is sourced is bound to the server´s group ID. 
  • Any server is the group will take the same group template. 
  • The parameter ignore_failure true denotes that if a template is not found for this group-id then no configuration is delivered and Chef automation will continue without brake.

Friday, 14 March 2014

Concurrent mode failure: Tuning JVM GC for Solr

The machine
I have an 8 CPU VM server with 32GB RAM running Solr. My JVM is 1.6.0_37 with the following JVM settings:
-Xms28g
-Xmx28g
-XX:NewSize=6g
-XX:MaxNewSize=6g
-XX:SurvivorRatio=4
-XX:PermSize=512m
-XX:MaxPermSize=512m
-XX:SoftRefLRUPolicyMSPerMB=500
-XX:+PrintCommandLineFlags
-XX:+HeapDumpOnOutOfMemoryError
-XX:+DumpGCHistoryOnOutOfMemory
-XX:+DumpDetailedClassStatisticOnOutOfMemory
-XX:HeapDumpPath=/opt/alfresco/tomcat/dumps
-verbose:gc
-Xloggc:/opt/alfresco/tomcat/dumps/gc-logs/gc-2014-03-13-10-00-07.log
-XX:+GCHistory
-XX:+CMSClassUnloadingEnabled
-XX:+DisableExplicitGC
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintTenuringDistribution
-XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC

The reason for such huge heap is that the Solr data are about 130 GB and Sorl is heavily utilized from around 100 concurrent threads performing text search on documents.
I notice that Sorl application pauses for some time without responding. I discovered on the GC logs the following problem:
2014-03-17T14:29:23.438+0100: 7991.661: [GC2014-03-17T14:29:23.438+0100: 7991.661: [ParNew (promotion failed)
Desired survivor size 268435456 bytes, new threshold 15 (max 15)
- age   1:  125233576 bytes,  125233576 total
: 2621440K->2228808K(2621440K), 5.9399450 secs]2014-03-17T14:29:29.378+0100: 7997.601: [CMS2014-03-17T14:29:32.715+0100: 8000.938: [CMS-concurrent-sweep: 21.573/33.920 secs] [Times: user=118.80 sys=3.29, real=33.91 secs]
 (concurrent mode failure): 20465547K->11174034K(26214400K), 33.5774490 secs] 22674213K->11174034K(28835840K), [CMS Perm : 47873K->47648K(524288K)], 39.5176760 secs] [Times: user=46.19 sys=2.36, real=39.51 secs]

This issue is summarized in the official ORACLE documentation for JVM v6 as follows:

..a concurrent collection needs to be started at a time such that the collection can finish before the tenured generation becomes full; otherwise the application would observe longer pauses due to concurrent mode failure. There are several ways a concurrent collection can be started. 

See:  Concurrent Mode Failure

The message "concurrent mode failure" signifies that the concurrent collection of the tenured generation did not finish before the tenured generation became full. In other words, the new generation is filling up too fast, it is overflowing to tenured generation but the CMS could not clear out the tenured generation in the background. When a concurrent mode failure happens, the low pause collector does a stop-the-world (STW) collection. All the application threads are stopped, a different algorithm is used to collect the tenured generation (our particular flavor of a mark-sweep-compact), the applications threads are started again, and life goes on....

Seems that a concurrent mode failure is responsible for a "Stop the World" JVM pausing.
See also another wonderful Blog about the same issue here :

In order to treat problem we tune the following JVM flags:
-XX:CMSInitiatingOccupancyFraction=10
Indicating that a concurrent collection will start if the occupancy of the tenured generation exceeds 10% instead of 92% that is the default threshold.
-XX:CMSIncrementalSafetyFactor=100
Indicating to the JVM GC to start a concurrent collection at the next opportunity without any delay.
See also Oracles GC tunning instructions here

Tuesday, 28 January 2014

Overriding finalize() for reference count on active threads

The need for this small article came from the implementation of a custom thread controller. In my design I am using a ThreadGroup that holds threads that dynamically load classes and invoke methods of objects created on the fly.

The problem here comes with the ThreadGroup. I want to have the number of all threads in the group and we mean absolute number not something like myThreadGroup.activeCount() that shows only the active threads of the group. Normally ThreadGroup.activeCount() returns an estimate of the number of active threads in this thread group, so we cannot rely on the this.
One solution is to hold a container of references to all the new threads, but then I have to implement custom synchronized code for accessing the container and blah blah...

The solution I finally followed was to implement an object reference count on the parent class CustomPlugin. When a new object of any descendant class rooted from CustomPlugin, reference count is augmented. When a plugin exits execute() method, the curring thread exits run().
Plugin object reference terminated and destroyed by the JVM GC and the number of concurent plugins is decreased to something less than MAX_TOTAL_PLUGINS. Only then a new plugin can be created.

public abstract class CustomPlugin extends Thread implements Plugin{
 
 public static final int MAX_TOTAL_PLUGINS = 3;
 
 private static int refCount = 0;

 public CustomPlugin(String name) throws Exception{
    if(!allowLoad()){
       throw new Exception("Maximum plugin objects already loaded!");
    }
    Thread.currentThread().setName(name + "-" + Thread.currentThread().getId() );
    refCount++;
 }
 

 @Override
 protected void finalize(){
    try {
      super.finalize();
    } catch (Throwable e) {
      e.printStackTrace();
    }finally{
      refCount--; 
    }
 }

 public static boolean allowLoad() {
    return (refCount>=MAX_TOTAL_PLUGINS?false:true);
 }

 public abstract execute(String [] args);
}
Remember that you have only CustomPlugin.MAX_TOTAL_PLUGINS=3 available threads that you can use for your plugins, meaning that only MAX_TOTAL_PLUGINS can be used in total, regardless if their threads are sleeping! When a plugin exits execute() method, the curring thread exits run(). Plugin object reference terminated and destroyed by the JVM GC and the number of concurrent plugins is decreased to something less than MAX_TOTAL_PLUGINS. Only then a new plugin can be called.

Wednesday, 22 January 2014

Implementing 2-way SSL in Java using TLS and Self Signed Certificates part4

Debug the Client/Server Communication 

This part depends on the:
  • Communication credentials, or the Keystore/Trustore file created in Part-1
  • Server implemented in Part-2
  • Client implemented in Part-3

To start with we need to include on our JVM arguments on both client and server the following option:
-Djavax.net.debug=all

So for example to run the server we do:
$ java TwoWaySslServer -Djavax.net.debug=all

As a result, we have all the debugging information we need for network operations from the JVM and the imported classes, like SSLServerSocket.

Having our server in debug mode, we can observe that the Keystore (mysystem.jks) was loaded correctly if we notice the following private key log entry:

***
found key for : mysystem
chain [0] = [
[
  Version: V3
  Subject: CN=mysystem, L=Chalandri, ST=Athens, C=GR
  Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11

  Key:  Sun RSA public key, 1024 bits
  modulus: 127165120252867655295291767293201905001859144644390543231567027664179957281793248832544049047501722299712701237862474932181664724853946779349563166371011412260964043029373627517538842247060193170649833260910804612805979354599504164270912367917881965338674535760796311997608873587262396297225200721624071184029
  public exponent: 65537
  Validity: [From: Wed Jan 22 12:46:44 CET 2014,
               To: Mon Jan 22 12:46:44 CET 2024]
  Issuer: CN=mysystem, L=Chalandri, ST=Athens, C=GR
  SerialNumber: [    af1b0164 bd3095fa]

Certificate Extensions: 3
[1]: ObjectId: 2.5.29.35 Criticality=false
AuthorityKeyIdentifier [
KeyIdentifier [
0000: 03 B2 09 B4 89 36 32 E4   D6 A3 51 4A 5D 3B CD ED  .....62...QJ];..
0010: B2 00 77 EC                                        ..w.
]

]

[2]: ObjectId: 2.5.29.19 Criticality=false
BasicConstraints:[
  CA:true
  PathLen:2147483647
]

[3]: ObjectId: 2.5.29.14 Criticality=false
SubjectKeyIdentifier [
KeyIdentifier [
0000: 03 B2 09 B4 89 36 32 E4   D6 A3 51 4A 5D 3B CD ED  .....62...QJ];..
0010: B2 00 77 EC                                        ..w.
]
]

]
  Algorithm: [SHA256withRSA]
  Signature:
0000: 78 58 C9 F5 C0 42 2D 62   B5 0D 8A 79 6B 57 5A 85  xX...B-b...ykWZ.
0010: 8C 85 20 4D 7B B3 8A 0A   DF 83 D9 D1 5A FA F6 26  .. M........Z..&
0020: 53 56 DB FE B3 82 42 35   0C BF E8 BD 75 0A 18 7A  SV....B5....u..z
0030: D7 B0 36 E5 4E F9 82 FB   23 57 EC 23 3F D0 92 9E  ..6.N...#W.#?...
0040: 9C D6 FA 26 32 7C B6 4A   62 A4 4B AB F7 D3 64 7C  ...&2..Jb.K...d.
0050: 37 92 ED F2 2B 62 BC E7   A6 35 E6 87 67 9E BD 0D  7...+b...5..g...
0060: 97 5E 0F 31 A9 B1 AB 64   CC F9 4B 51 3E 90 7B 2F  .^.1...d..KQ>../
0070: E9 2E 23 E5 BC D3 DA 32   20 3B 6C 2C B8 E2 7C 6B  ..#....2 ;l,...k

]

If we don´t find our private key in the above debugging information, we need to check again the parameters:

  System.setProperty("javax.net.ssl.keyStore","mysystem.jks");
  System.setProperty("javax.net.ssl.keyStorePassword","welcome");

Similarly again in debugging mode, we can observe that the Trustore (mysystem.jks) was loaded correctly from server if we notice the following log entry:
***
trustStore is: mysystem.jks
trustStore type is : jks
trustStore provider is : 
init truststore
adding as trusted cert:
  Subject: CN=mysystem, L=Chalandri, ST=Athens, C=GR
  Issuer:  CN=mysystem, L=Chalandri, ST=Athens, C=GR
  Algorithm: RSA; Serial number: 0xaf1b0164bd3095fa
  Valid from Wed Jan 22 12:46:44 CET 2014 until Mon Jan 22 12:46:44 CET 2024

trigger seeding of SecureRandom
done seeding SecureRandom
SERVER
 socket class: class com.sun.net.ssl.internal.ssl.SSLServerSocketImpl
 Socker address = 0.0.0.0/0.0.0.0
 Socker port = 8095
 Need client authentication = true
 Want client authentication = false
 Use client mode = false

If we don´t find the self signed certificate key in the above debugging information, we need to check again the parameters:

System.setProperty("javax.net.ssl.trustStore","mysystem.jks");
System.setProperty("javax.net.ssl.trustStorePassword","welcome");

Some really interesting Exceptions you may come across while debugging your application:

javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
Caught on Server
Probably you have created the client socket with something like:
clientSocket = new Socket();
Instead of :
SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory.getDefault();
clientSocket = (SSLSocket) factory.createSocket("localhost",port);


java.lang.IllegalStateException: KeyManagerFactoryImpl is not initialized
You have not initialized KeyManagerFactory object, meaning that you should call method init() with a valid KeyStore and Certificate Password:
  String keystoreFile = "/opt/mysystem/etc/mysystem.jks";
  String keystorePassword = "welcome";
  KeyStore ks = KeyStore.getInstance("JKS");
  ks.load(new FileInputStream(keystoreFile), keystorePassword );
  KeyManagerFactory kmf = KeyManagerFactory.getInstance("SunX509");
  kmf.init(ks,"myCertificatePassword");

java.security.cert.CertificateException: No X509TrustManager implementation avaiable
Check the trustrore. Ensure that property javax.net.ssl.trustStore is set appropriately.

javax.net.ssl.SSLHandshakeException: null cert chain
Received on server
Check that both Client and server share the trustore.
Check that trustore contains the same signed certificate

java.io.IOException: Keystore was tampered with, or password was incorrect
Check the Keystore password, system property javax.net.ssl.keyStorePassword

Implementing 2-way SSL in Java using TLS and Self Signed Certificates part3

Step 3: The Client (Get the complete code here)

The client also requires the Keystore/Trustore created in Part-1

Again in the client we have to do a couple of things similar to the server:

The first is to specify the Java Keystore/Trustore we created in  Part-1 of this article:

System.setProperty("javax.net.ssl.keyStore","mysystem.jks");
System.setProperty("javax.net.ssl.keyStorePassword","welcome");

System.setProperty("javax.net.ssl.trustStore","mysystem.jks");
System.setProperty("javax.net.ssl.trustStorePassword","welcome");

Similarly with the server side described in Part-2, we have to create the client socket as an SSLSocket:

SSLSocketFactory factory = (SSLSocketFactory) SSLSocketFactory.getDefault();    
SSLSocket sslSock = (SSLSocket) factory.createSocket("localhost",8095);

The entire code of the client can be downloaded here.

Next article, Part-4, of this Blog series will assist you to debug the SSL/TLS client/server communication.