High performance computing systems for signal processing

  • Graham A. King

    Student thesis: Doctoral Thesis


    The submission begins by demonstrating that the conditions required for consideration under the University's research degrees regulations have been met in full. There then follows a commentary which starts by explaining the origin of the research theme concerned and which continues by discussing the nature and significance of the work. This has been an extensive programme to devise new methods of improving the computational speed and efficiency required for effective implementation of FIR and IIR digital filters and transforms. The problems are analysed and initial experimental work is described which sought to quantify the performance to be derived from peripheral vector processors. For some classes of computation, especially in real time, it was necessary to tum to pure systolic array hardware engines and a large number of innovations are suggested, both in array architecture and in the creation of a new hybrid opto-electronic adder capable of improving the performance of processing elements for the array. This significant and original research is extended further by including a class of computation involving a bit sliced co-processor. A means of measuring the performance of this system is developed and discussed. The contribution of the work has been evident in: software innovation for horizontal architecture microprocessors; improved multi-dimensional systolic array designs; the development of completely new implementations of processing elements in such arrays; and in the details of co-processing architectures for bit sliced microprocessors. The use of Read Only Memory in creating n-dimensional FIR or IIR filters, and in executing the discrete cosine transform is a further innovative contribution that has enabled researchers to re-examine the case for pre-calculated systems previously using stored squares. The Read Only Memory work has suggested that Read Only Memory chips may be combined in a way architecturally similar to systolic array processing elements. This led to original concepts of pipelining for memory devices. The work is entirely coherent in that it covers the application of these contributions to a set of common processes, producing a set of performance graded and scalable solutions. In order that effective solutions are proposed it was necessary to demonstrate a solid underlying appreciation of the computational mechanics involved. Whilst the published papers within this submission assume such an understanding , two appendices are provided to demonstrate the essential groundwork necessary to underpin the work resulting in these publications. The improved results obtained from the programme were threefold: execution time; theoretical clocking speeds and circuit areas; and speed up ratios. In the case of the investigations involving vector signal processors the issue was one of quantifying the performance bounds of the architecture in performing specific combinations of signal processing functions. An important aspect of this work was the optimisation achieved in the programming of the device. The use of innovative techniques reduced the execution time for the complex combinational algorithms involved to sub 10 milliseconds. Given the real time constraints for typical applications and the aims for this research the work evolved toward dedicated hardware solutions. Systolic arrays were thus a significant area of investigation. In such systems meritorious criteria are concerned with achieving: a higher regularity in architectural structure; data exchanges only with nearest neighbour processing elements; minimised global distribution functions such as power supplies and clock lines; minimised latency; minimisation in the use of latches; the elimination of output adders; and the design of higher speed processing elements. The programme has made original and significant contributions to the art of effective array design culminating in systems calculated to clock at 100MHz when using 1 micron CMOS technology, whilst creating reductions in transistor count when compared with contemporary implementations. The improvements vary by specific design but are of the order of30-l00% speed advantage and 20-30% less real estate usage. The third type of result was obtained when considering operations best executed by dedicated microcode running on bit sliced engines. The main issues for this part of the work were the development of effective interactions between host processors and the bit sliced processors used for computationally intensive and repetitive functions together with the evaluation of the relative performance of new bit sliced microcode solutions. The speed up obtained relative to a range of state of the art microprocessors (68040, 80386, 32032) ranged from 2: 1 to 8: 1. The programme of research is represented by sixteen papers divided into three groups corresponding to the following stages in the work: problem definition and initial responses involving vector processors; the synthesis of higher performance solutions using dedicated hardware; and bit sliced solutions
    Date of Award1996
    Original languageEnglish
    Awarding Institution
    • Nottingham Trent University

    Cite this