Running Baseline Phase For the First Time With Your Cloud

This section assumes that CBTOOL is already started and has successfully connected with your cloud.

Setting Up Parameters

In baseline phase, application instances for the two workloads, KMeans and YCSB, are created five times. That is, instances are provisioned, data is generated, load generator is run, data is deleted, and then the instances are deleted. This is controlled by the following parameters:

iteration_count: 5
run_count: 1
destroy_ai_upon_completion: true

Thus, a total of 35 and 30 instances are created and destroyed for each YCSB and KMeans workloads, respectively.

Creation of data, instantiation of load generator, and deletion of data comprises a run, which is controlled by run_count parameter. If a tester knows that in their cloud, baseline results will be worse than elasticity phase results (due to performance isolation etc), they must set the run_count to five or higher before starting a compliant run.

For compliant run, iteration_count must be 5 and destroy_ai_upon_completion must be true.

Cloud Name

Please make sure that the cloud name in osgcloud_rules.yaml matches the cloud name in the CBTOOL configuration.:

cloud_name: MYOPENSTACK

CBTOOL configuration file is present in ~/osgcloud/cbtool/configs/\*_cloud_definitions.txt

YCSB Baseline Measurement

Preparation

Set the appropriate thread count for YCSB in the osgcloud_rules.yaml file, e.g.,:

For centos images:
uncomment below line under cassandra section:

#uncomment this for centos images
#cassandra_conf_path: /etc/cassandra/conf/cassandra.yaml

should be:

#uncomment this for centos images
cassandra_conf_path: /etc/cassandra/conf/cassandra.yaml

for centos & ubuntu images:

thread_count: 8

The tester will have to measure the thread count yourself for your cloud. The default thread count is 8.

In general, the higher the thread count, the higher will be the throughput (it will reach capacity for AI with some number of threads). Consequently, the scalability results of a cloud under test may be higher, if there is no drastic decrease in elasticity measurements.

Running

The YCSB baseline script parameter description is as follows:

usage: osgcloud_ycsb_baseline.py [-h] [--console_log_level CONSOLE_LOG_LEVEL]
                               [--runrules_yaml RUNRULES_YAML]
                               [--flush_log FLUSH_LOG] [--version] --exp_id
                               EXP_ID

It is run as follows:

python osgcloud_ycsb_baseline.py --exp_id SPECRUNID

where SPECRUNID indicates the run id that will be used across baseline
and elasticity + scalability phases.

By default, the script logs the run to a file. If you will like to show the run on the console, type the following:

python osgcloud_ycsb_baseline.py --exp_id SPECRUNID --console_log_level DEBUG

By default, the results for this experiment are present in:

~/results/SPECRUNID/perf/

If five iterations are run (which are needed for a compliant run), the tester should expect to find five directories starting with SPECRUNIDYCSB in the ~/results/SPECRUNID/perf directory.

Following files will be present in the directory. The date/time in file and directory names will match the date/time of your run:

baseline_SPECRUNID.yaml
osgcloud_ycsb_baseline_SPECRUNID-20150811233732UTC.log
SPECRUNIDYCSBBASELINE020150811233732UTC
SPECRUNIDYCSBBASELINE120150811233732UTC
SPECRUNIDYCSBBASELINE220150811233732UTC
SPECRUNIDYCSBBASELINE320150811233732UTC
SPECRUNIDYCSBBASELINE420150811233732UTC

K-Means Baseline Measurement

Preparation

The following parameters may be changed in osgcloud_rules.yaml depending on how Hadoop was set up in the instance image. The default value of the parameters is shown below:

centos images:
java_home: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85-2.6.1.2.el7_1.x86_64
hadoop_home: /usr/local/hadoop
dfs_name_dir: /usr/local/hadoop_store/hdfs/namenode
dfs_data_dir: /usr/local/hadoop_store/hdfs/datanode

ubuntu images:

java_home: /usr/lib/jvm/java-7-openjdk-amd64
hadoop_home: /usr/local/hadoop
dfs_name_dir: /usr/local/hadoop_store/hdfs/namenode
dfs_data_dir: /usr/local/hadoop_store/hdfs/datanode

Running

The KMeans baseline script parameter description is as follows:

usage: osgcloud_kmeans_baseline.py [-h] [--console_log_level CONSOLE_LOG_LEVEL]
                               [--runrules_yaml RUNRULES_YAML]
                               [--flush_log FLUSH_LOG] [--version] --exp_id
                               EXP_ID

It is run as follows:

python osgcloud_kmeans_baseline.py --exp_id SPECRUNID

where SPECRUNID indicates the run id that will be used across baseline
and elasticity + scalability phases.

By default, the script logs the run to a file. If you will like to show the run on the console, type the following:

python osgcloud_kmeans_baseline.py --exp_id SPECRUNID --console_log_level DEBUG

By default, the results for this experiment are present in:

~/results/SPECRUNID/perf/

If five iterations are run (which are needed for a compliant run), the tester should expect to find five directories starting with SPECRUNIDKMEANS in the ~/results/SPECRUNID/perf directory.

Following files will be present in the directory. The date/time in file and directory names will match the date/time of your run:

baseline_SPECRUNID.yaml
osgcloud_kmeans_baseline_SPECRUNID-20150811233302UTC.log
SPECRUNIDKMEANSBASELINE020150811233302UTC
SPECRUNIDKMEANSBASELINE120150811233302UTC
SPECRUNIDKMEANSBASELINE220150811233302UTC
SPECRUNIDKMEANSBASELINE320150811233302UTC
SPECRUNIDKMEANSBASELINE420150811233302UTC

Configuring Supporting Evidence Collection

Make sure that supporting evidence parameters are set correctly in osgcloud_rules.yaml file.:

support_evidence:

    instance_user: cbuser
    instance_keypath: HOMEDIR/osgcloud/cbtool/credentials/cbtool_rsa
    support_script: HOMEDIR/osgcloud/driver/support_script/collect_support_data.sh
    cloud_config_script_dir: HOMEDIR/osgcloud/driver/support_script/cloud_config/

    ###########################################
    #  START instance support evidence flag is true
    # for public and private clouds. host flag
    # is true only for private clouds or for
    # those clouds where host information is
    # available.
    ###########################################
    instance_support_evidence: true
    host_support_evidence: false
    ###########################################
    # END
    ###########################################

instance_user parameter indicates the Linux user that is used to SSH into the instance. It is also set in the cloud configuration text file for CBTOOL.

instance_key_path indicates the SSH key that is used to SSH into the instance. Please make sure that the permissions of this file are set to 400 (chmod 400 KEYFILE)

support_script indicates the path of the script that is used to gather supporting evidence.

cloud_config_script_dir indicates the path where scripts relevant to gathering cloud configuration are present. These scripts differ from one cloud to the other.

instance_support_evidence indicates that whether to collect supporting evidence from instances. This flag is ignored for simulated clouds. For testing of baseline phase, it is recommended to set this flag to false.