The University of Texas at Austin

 

EXPLORATORY CANDIDATE ALGORITHM PERFORMANCE
CHARACTERISTICS IN SINGLE AND COMMERCIAL SYMMETRIC
MULTIPROCESSING (SMP) ENVIRONMENTS FOR THE ADVANCED ENCRYPTION STANDARD (AES)

January 30, 1999

Researchers:

Larry Leibrock, Ph.D.

Gabe Eapen,

Doug Dexter

 

Research Objectives:  The purpose of this exploratory study was to develop an initial estimate of baseline performance of the NIST Advanced Encryption Standard (AES) candidate algorithms for both Pentium IITM and Pentium IIITM systems. The study was based on the assumption that the AES baseline should depend on the use of commercial-off-the-shelf systems, typically available in most commercial and governmental settings. The general assumption for this work is that the AES baseline should initially avoid any non-commercially available development environments and therefore rely on typically used, commercially available software development and information technology production environments. The second purpose of the study was to develop an initial estimate as to the baseline performance increase, if any, in using a commercial Pentium II commercial symmetric multiprocessing system and a Pentium III commercial system (uniprocessor).

The rationale for this type of study is to help characterize the range of expected performance in commercial implementations of the AES candidate codes in typical client/server enterprise settings.

Methodology: The AES codes were obtained from NIST. Each candidate algorithm was derived from the NIST CDROM containing the ANSI C and JavaTM referenced source code. Each ANSI C AES candidate source code was then complied using Microsoft C++ version 6.0. All compilation was done separately on two specific systems under test (SUT). The first SUT was a Dell Precision 610 workstation with dual 450Mhz Intel Pentium II ®, and 512MB memory. The second SUT was a Dell Optiplex GX1p with a single 500Mhz Intel Pentium III and 512MB memory. The operating environment system for each was Microsoft Windows® NT 4 with service pack 4. The software development environment was Microsoft Visual ® C++ version 6.0. Specific details for each SUT configuration are listed below.

Each AES algorithm was tested using both the Monte Carlo Test and Known Answer Test (MCTKAT), as required in the original NIST specification. However, there is a discrepancy with this initial specification. NIST required all algorithm submissions to include a MCTKAT test program as per the NIST specification for that program. However, each MCTKAT test program is not uniform because authors wrote their own program based on the specification provided by NIST. This could explain the large variance in the measured timing of the algorithms. NIST should require the same test program (and other inputs) for each algorithm.

AES Candidate Codes: Because of the limitations of time and the short duration for presentation of results, no optimization for any of the AES candidate codes or code performance reorganization was performed. While this is clearly an apparent weakness of the study, the researchers assert that the test results can give an initial estimate of baseline performance for this class of systems.

Single Processor Timings: A single case was derived and timed across all candidate codes. The Monte Carlo test program consisted of four million cycles through the candidate algorithm implementations. These cycles are divided into four hundred groups of 10,000 iterations each. The times we collected is for a complete run of the Monte Carlo test (i.e. 400 x 10,000 iterations). We have not examined each test program to see if it actually does the required iterations. As these programs were already submitted to NIST per their specification, in the interest of time we have assumed that they were correctly done.

Multiprocessor Processor Timings: Again the single case was derived and timed across all candidate codes, as above.

Comparative Results: The PIII results display a clear performance increase. What is not apparent is if the increase is due to the performance increase inherent in a 500Mhz PIII cpu as compared to single 450 Mhz XEON cpu. Further tests should be completed using a single PII 450 XEON SUT (with the same cpu cache) to determine if both cpus were effectively utilized in the SMP SUT. Additionally, steps should be taken to optimize the code for use in a SMP SUT. This would provide the clearest indication of any performance increase when comparing uniprocessor to multiprocessor performance.

The following performance data was derived.

These are the test results for the KAT & Monte Carlo Tests provided with each algorithim

AES Algorithm

SMP Time (min)

P3 Time (min)

Fastest

Time Difference

Cast-256

14.52

13.15

PIII

1.37

Crypton

3.93

3.62

PIII

0.32

DEAL

8.35

7.57

PIII

0.78

DFC

6.60

6.17

PIII

0.43

E2

30.82

27.88

PIII

2.93

Frog

4.88

4.42

PIII

0.47

HPC

16.70

15.12

PIII

1.58

Loki97

8.97

5.35

PIII

3.62

Magenta

82.10

74.65

PIII

7.45

Mars

3.13

2.85

PIII

0.28

RC6

1.33

1.23

PIII

0.10

Rijndael

104.40

95.87

PIII

8.53

Saferpls

49.03

59.40

SMP

10.37

Serpent**

>>4hrs

>>4hrs

N/A

N/A

Twofish

17.18

15.55

PIII

1.63

** Test was terminated after 4 hours.

Working Findings: The results are interesting in that the performance increase for both single and multiple processors were faster that initially estimated. However, some caution should be taken when reviewing these findings.

  1. We did not have time to run multiple cases or check the data set iterations across the systems or AES candidate codes.
  2. We did not have access or the necessary time to look for performance bottlenecks in the source code. We have some indication that certain codes will have must faster execution if the code was redesigned to avoid 16 bit references.
  3. The degree of code parallelism is uncertain. We did experience some compilation problems with the development environment.
  4. The tentative findings of this initial research do not provide enough quantitative information to rule out any particular AES candidate due to computational "costs". Rather, the study points to the need of further work.
  5. Each MCTKAT test code should be reviewed to see if it is optimized for multiprocessor use.

However, the findings do point out that further work should be directed at exploring the relative computational efficiency of commercial SMP Servers. The development of a better baseline comparison for computational efficiency, perhaps will be applicable to both hardware and software implementations of the AES candidates.

Future Directions: We intend to better develop the present baseline and the present research methodology. Hopefully, this initial work will provide more knowledge and perhaps a better framework for analysis of prevailing commercial hardware implementations during Round 2 AES analysis. Further research is believed to have both technical and business relevance in the use of AES-based technology in the emergent creation of (PKI) public key infrastructures. A second value of further work resides in development of a more rigorous crosscutting performance analysis among the AES candidate set.

Proprietary Rights: As this research paper or any digest of the submitted paper will be made available to the public, it does not contain any proprietary information. All NIST copyrights have been sent (via facsimile) to NIST.

Contact:  Larry Leibrock, Ph.D. The University of Texas at Austin, M/S B6003, Austin Texas 78712-1172 leibrock@mail.utexas.edu ; telephone 512-471-1650 or via fax at 512-232-1831. http://niim.bus.utexas.edu