Instruction set compiled simulation A technique for fast and(3)

来源:网络收集 时间:2025-04-29 下载这篇文档 手机版
说明:文章内容仅供预览,部分内容可能不全,需要完整文档或者需要复制内容,请下载word后使用。下载word有问题请添加微信号:xuecool-com或QQ:370150219 处理(尽可能给您提供完整文档),感谢您的支持与谅解。点击这里给我发消息

template inst=CustomizeTemplate(template,inst)

newStr=”InstMemory[addr]=new template inst”

TempProgram=AppendInst(TempProgram,newStr) endfor

DecodedProgram=Compile(TempProgram)

End

Algorithm2:DetermineTemplate

Inputs:Instruction inst(Binary),and Mask Table maskTable. Output:Template.

Begin

foreach entryin Mask Table

if mask matches inst return template

endfor

End

Algorithm3:CustomizeTemplate

Inputs:Template template,Instruction inst(Binary).

Output:Customized Template with Parameter Values.

Begin

switch instClassOf(inst)

case Data Processing:

switch(inst[31:28])

case1110:condition=Always endcase

case....

...

endswitch

switch(inst[24:21])

case0100:opcode=ADD;endcase

case....

...

endswitch

......

return template

endcase/*Data Processing*/

case Branch:...endcase

case LoadStore:...endcase

case Multiply:...endcase

case Multiply LoadStore:...endcase

case Software Interrupt:...endcase

case Swap:...endcase

endswitch

End

We illustrate the power of our technique to generate an optimized decoded instruction using a single data processing instruction.We show the binary as well as the assembly of the instruction below.

Binary:1110|000|0100|0|0010|0001|01010|00|0|0011

(cond|000|op|S|Rn|Rd|shift immed|shift|0|Rm) Assembly:ADD r1,r2,r3LSL#10

(op{}{S}Rd,Rn,Rm shift#)

The DetermineTemplate function returns the DataPro-cessing template(shown in Example1)for this binary in-struction.The CustomizeTemplate function generates the following customized template for the execute function. void DataProcessing

SftOper>::execute()

{

if(Always::execute()){

_dest=Add::execute(_src1,_sftOperand.getValue());

if(False::execute()){

//Update Flags

...

}

}

}

After compilation using a C++compiler,several opti-mizations occur on the execute()function.The Always::exec-ute()function call is evaluated to true.Hence,the check is removed.Similarly,the function call False::execute()is eval-uated to false.As a result the branch and the statements inside it are removed by the compiler.Finally,the two func-tion calls Add::execute(),and

However,if the instruction is modi?ed then the modi?ed binary is re-decoded.This decoding is similar to the one performed in the compile time decoding of instructions ex-cept that it uses a pointer to an appropriate function.While we develop the templates for each class of instructions,we also develop one function for each class.The mask table mentioned in Section3.1maintains the mapping between a mask for every class of instruction and the function for that class.The decoding step during run time consults the mask table and determines the function pointer.It also updates the instruction memory with the decoded instruction i.e.,it writes the new function pointer in that address.

The execution process is very simple.It simply invokes the function using the pointer speci?ed in the decoded in-struction.

Since the number of instructions modi?ed during run time are usually negligible,using a general unoptimized function for simulating them does not degrade the performance.It is important to note that since the engine is still very simple, we can easily use traditional interpretive techniques for ex-ecuting modi?ed instructions while the instruction set com-piled technique can be used for the rest(majority)of the instructions.Thus,our instruction set compiled simulation (IS-CS)technique combines the full?exibility of interpretive simulation with the speed of the compiled simulation.

4.EXPERIMENTS

We evaluated the applicability of our IS-CS technique us-ing various processor models.In this section,we present sim-ulation results using a popular embedded processor,ARM7 [17],to demonstrate the usefulness of our approach.

4.1Experimental Setup

The ARM7processor is a RISC machine with fairly com-plex instruction set.We used arm-linux-gcc for generating target binaries for ARM7.Performance results of the dif-ferent generated simulators were obtained using Pentium3 at1GHz with512MB RAM running Windows2000.The generated simulator code is compiled using the Microsoft Vi-sual 7d430e3c5727a5e9856a6117 compiler with all optimizations enabled. The same C++compiler is used for compiling the decoded program as well.

In this section we show the results using two application programs:adpcm and jpeg.We have used these two bench-marks to be able to compare our simulator performance with previously published results[1].

The arm-linux-gcc with-static option generates approxi-mately50K instructions for the benchmarks.When all op-timizations are enabled in the MS VC++compiler,it takes about15minutes to compile and generate the decoded pro-grams.

4.2Results

Figure4shows the simulation performance using our tech-nique.The results were generated using an ARM7model. The?rst bar shows the simulation performance of our tech-nique with run-time program modi?cation check enabled. Our technique can perform better if it is known prior to ex-ecution that the program is not self modifying.The second bar represents the simulation performance of running the same benchmark by disabling the run-time check.We could achieve upto9%performance improvement by disabling the instruction modi?cation detection and updation

mechanism.

Figure4:Instruction Set Compiled Simulation We are able to perform simulation at a speed of upto12 MIPS using the P3(1.0GHz)host machine.To the best of our knowledge the best performance of a simulator having the?exibility of interpretive simulation has been JIT-CCS [1].The JIT-CCS technique could achieve a performance upto8MIPS on an Athlon at1.2GHz with768MB RAM. Since we did not have access to a similar machine,our com-parisons are based on results run on a slower machine(Pen-tium3at1GHz with512MB RAM)versus previous results [1]on a faster machine(Athlon at1.2GHz with768MB RAM).On the jpeg benchmark our IS-CS technique per-forms40%better than JIT-CCS technique.The same trend (30%improvement)is observed in case of adpcm benchmark as well.Clearly,these are conservative numbers since our experiments were run on a slower

machine.

Figure5:E?ect of Di?erent Optimizations There are two reasons for the superior performance of our technique:moving the time consuming decoding out of the execution loop,and generating aggressively optimized code for each instruction.The e?ects of using these techniques are demonstrated in Figure5.The?rst bar in the chart is the simulation performance of running the benchmarks on an ARM7model of Simplescalar[3]that does not use any of these techniques.The second bar shows the e?ect of do-

ing the decoding process at compile time and using function pointers during execution.The use of function pointer in the decoded instruction is similar to[1].We are able to achieve better result than JIT-CCS[1]even in this category because of the fact that JIT-CCS technique performs decod-ing of instruction during run-time(at least once)while we are doing it during compile time.Besides,they use a soft-ware caching technique to reuse the decoded instruction but we do not.The last bar is our simulation approach that uses both techniques:compile-time decode and using templates to produce optimized code.

We have demonstrated that instruction set compiled sim-ulation coupled with our instruction abstraction technique delivers the performance of compiled simulation while main-taining the?exibility of interpretive simulation.Our simu-lation technique delivers better performance than other sim-ulators in this category,as demonstrated in this section. 5.SUMMARY

In this paper we presented a novel technique for instruc-tion set simulation.Due to the simple interpretive simula-tion engine and optimized pre-decoded instructions,our in-struction set compiled simulation(IS-CS)technique achieves the performance of compiled simulation while maintaining the?exibility of interpretive simulation.The performance can be further improved by disabling the run-time change detection which is suitable for many applications that are not self modifying.

百度搜索“70edu”或“70教育网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,70教育网,提供经典知识文库Instruction set compiled simulation A technique for fast and(3)在线全文阅读。

Instruction set compiled simulation A technique for fast and(3).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印 下载失败或者文档不完整,请联系客服人员解决!
本文链接:https://www.70edu.com/fanwen/1369323.html(转载请注明文章来源)
Copyright © 2020-2025 70教育网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:370150219 邮箱:370150219@qq.com
苏ICP备16052595号-17
Top
× 游客快捷下载通道(下载后可以自由复制和排版)
单篇付费下载
限时特价:7 元/份 原价:20元
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:xuecool-com QQ:370150219