您现在的位置:首页>外文期刊>Microprocessors and microsystems

期刊信息

  • 期刊名称:

    Microprocessors and microsystems

  • 中文名称: 微处理器和微系统
  • 刊频: 0.516
  • ISSN: 0141-9331
  • 出版社: -
  • 简介:
  • 排序:
  • 显示:
  • 每页:
全选(0
<1/20>
1443条结果
  • 机译 使用一级缓存堆栈距离直方图预测多级LRU缓存未命中
    摘要:For cache analytical modeling, the stack distance theory is widely utilized to predict LRU-cache behaviors. Typically, the stack distance histogram collecting is implemented by profiling memory references. However, the profiled memory references merely reflect instruction fetching and load/store executions, which only represent the memory accesses to first-level (L1) caches. That is why these traces cannot be applied directly to construct stack distance histograms for downstream (L2 and L3) caches.
  • 机译 使用基于禁忌搜索的群体优化将应用映射到网格NoC
    摘要:A hybrid optimization scheme is presented that combines Tabu-search, communication volume based core swapping and Discrete Particle Swarm Optimization (DPSO) for NoC (Network-on-Chip) mapping. The main goal of the optimization is to map an application core-graph such that the overall communication latency of the NoC is minimal. It is assumed that the target NoC has a 2D-mesh topology. DPSO is used as the main optimization technique where each swarm particle move is influenced by the global and local best, previous visited search space locations, and a deterministic method to reduce communication volume of existing mapping. We employ a Tabu-list to discourage swarm particles to re-visit the explored search space and propose an alternative direction towards the intended movement direction. The methodology is tested for some multimedia applications as well as randomly generated large network of synthetic cores-graphs. For larger applications, our hybrid scheme generates high quality NoC mapping solutions as compared to DPSO based existing techniques.
  • 机译 用于节能数据库查询处理和优化的特定于应用程序的体系结构
    摘要:Data processing on a continuously growing volume of data and the increasing power restrictions have become an ubiquitous challenge in our world today. Besides parallel computing, a promising approach to improve the energy efficiency of current systems is to integrate specialized hardware. This paper presents two application specific architectures to accelerate basic database operators frequently used in modem database systems: an extended instruction set based on a given Cadence Tensilica processor (ASIP) and a comparable application specific integrated circuit (ASIC). The ASIP is implemented in a system-on-chip and manufactured in a 28 nm CMOS technology to realize measurements of performance and power consumption. Furthermore, the comparison with the ASIC blocks allows to quantify the results with the ASIP approach in terms of throughput, area, and energy efficiency as well as to discuss the capabilities and limitations when accelerating selected database operators.
  • 机译 使用具有专有FTL的SSD来实现值得信赖的存储
    摘要:In recent years we have seen an increasing deployment of flash-based storage, such as SSD, in mission-critical applications due to its fast read/write speed, small form factor, strong shock resistance, etc.. SSDs use a middle layer called flash translation layer (FTL) to maintain the compatibility with the traditional magnetic-based HDDs. Unlike the traditional HDD where the host OS has the knowledge on where and how to access data, SSD uses FTL to translate and implement all operations. Even worse, FTL, which is considered as one of most important intellectual properties of flash-based storage, is often proprietary. This brings up a serious security concern on design trustworthiness when the manufacturer either accidentally or intentionally implements those operations incorrectly or maliciously. We analyze the possible threats that are brought up by the design trust issues, and propose simple yet effective schemes as countermeasures with overhead evaluation.
  • 机译 QCA XNOR门比较器的新颖设计和分析
    摘要:Quantum-dot cellular automata (QCA) is an emerging nanotechnology. It has attracted much interest for its potential for faster speed, smaller size, and lower power dissipation than conventional transistor-based technology. QCA XNOR gate is proposed in this letter and the reliability, AVG Energy Dissipation of Circuit (AVG EDC) of it have been analyzed. Multi-bit comparators have been implemented with preferable XNOR gate proposed in this letter and they have lower complexity and Efficient Complexity than previous ones. The detailed simulation results using QCADesigner are presented finally.
  • 机译 基于区域的双存储体寄存器分配,可减少指令编码架构
    摘要:In embedded systems, small code size is important due to memory constraints. One technique to achieve a small code size is reducing the instruction encoding from 32-bit to 16-bit, such as the ARM THUMB or MIPS-16 architectures. This half-size encoding leads to shorter register operands, making fewer registers available for register allocation and causing more spills, although invisible registers can be used as spill locations via copies. We propose reconstructing the original register file into dual-banks, added with the bank toggle instruction for bank changes and the inter-bank copies between the banks. We also propose an efficient dual-bank register allocation technique based on regions in the code to reduce spills. As a case study, we applied our banked register allocation model for the THUMB architecture. We found that the code size decreases by as much as 8% (5.8% on average) while the performance improves by as much as 11.1% (3.3% on average). Our results indicate that we would better organize the register file of an embedded CPU that can provide reduced encoding into dual banks for better quality of register allocation, rather than using the invisible registers for spills.
  • 机译 在工艺变化下,eDRAM缓存的产量和能耗之间的有效折衷
    摘要:eDRAM cells have been considered as a promising alternative to conventional SRAM cells and already adopted in commercial processors. However, eDRAM cells need to be refreshed periodically, resulting in non-negligible energy and performance overhead. Moreover, under process variations, retention time of eDRAM cells exhibits non-uniform distributions. This phenomenon affects both manufacturing yield and eDRAM refresh burden. In this paper, we first analyze eDRAM module (cache) yield and retention time failure patterns under process variations. Based on our analysis, we disclose most of the failing cache lines have only one faulty cell and propose a cost-efficient technique to save those one-cell failing cache lines. Our technique maintains a one-cell failing line (OFL) buffer which manages the status of the one cell failing cache lines. By effectively curing one-cell failing lines, our technique significantly improves manufacturing yield by up to 46.1% under the identical refresh intervals. In addition, our technique can be used to loosen refresh intervals with comparable yield. By using the loosened refresh intervals, our technique reduces energy per instruction and improves performance by up to 19.9% and 1.3%, respectively. (C) 2017 Elsevier B.V. All rights reserved.
  • 机译 基于混合BCD / excess-6表示的FPGA上的十进制加法
    摘要:Decimal arithmetic has recovered the attention in the field of computer arithmetic due to decimal precision requirements of application domains like financial, commercial and interne. In this paper, we propose a new decimal adder on FPGA based on a mixed BCD/excess-6 representation that improves the state-of-the-art decimal adders targeting high-end FPGAs. Using the proposed decimal adder, a multioperand adder and a mixed binary/decimal adder are also proposed. The results show that the new decimal adder is very efficient improving the area and delay of previous state of the art decimal adders, multioperand decimal addition and binary/decimal addition.
  • 机译 实时嵌入式可见和红外图像配准,用于无创皮肤癌筛查
    摘要:We present an embedded system architecture that implements real-time multimodal registration to enable dual-camera spatio-temporal feature extraction in a skin cancer screening application. We test the system on a combination of visible and long-wave infrared image sequences, but it can be easily extended to setups operating in different sections of the spectrum. Image registration is performed by matching common features between each frame of a visible image to each frame of an infrared image sequence to estimate a projective transformation between them. The parameters of this transformation are estimated recursively on line with the video, thus enabling image registration in real time. The algorithm is implemented using a combination of embedded software and dedicated hardware units on a heterogeneous reconfigurable system-on-a-chip. The hardware performs feature detection and extraction, while the software estimates the transformation parameters and maps each visible video frame onto the infrared image coordinates. implemented on an FPGA, our prototype runs at 540 frames per second with a 135 MHz clock, consumes 1.8 W and utilizes 29% and 54% of the logic and multiplier resources of the chip, respectively.
  • 机译 实施建筑能源管理系统以应对居民需求
    摘要:Demand response is proposed as a solution to handle the fluctuations in the power supply in a scenario with higher penetration of renewable energy sources. Although demand response already offers a positive business case in certain domains, it still lacks maturity in other areas, especially in the residential domain. This paper presents a comprehensive study of a novel BEMS to strengthen the adoption of residential demand response. The proposed consumer-centric BEMS monitors the building performance and its surroundings, interacts with the residents, optimally controls DERs and provides demand response to an aggregator. The BEMS is conceived with a multimodal objective: exploit flexible consumption through demand response and run the building in a energy efficient manner. The system architecture and hardware and software design are detailed. A prototype of the envisioned BEMS has been developed and deployed in a 12-storey residential building. The prototype performance, the scalability, the data monitoring capabilities, and the interaction with the residents and controllability of DERs of the BEMS are demonstrated. Moreover, the study provides an estimated of the total flexibility potential of the testbed.
  • 机译 基于早期未命中预测的高性能GPU的定期缓存绕过
    摘要:The aim of the hierarchical cache memories that are equipped for GPUs is the management of irregular memory access patterns for general purpose workloads. The level-1 data cache (L1D) of the GPU plays an important role for its ability in the provision of high bandwidth and low-latency data accesses. Unfortunately, the GPU L1D may become a performance bottleneck due to facing many performance challenges such as cache contention and resource congestion. These critical issues come from a large number of simultaneous requests from the SIMT cores to the limited-capacity MD. We observe that many applications have a large number of requests with a very low reuse probability, resulting in the GPU performance degradation. To overcome these challenges, we propose an efficient cache bypassing mechanism that can periodically filter the access stream and make an accurate bypassing decision to improve the efficiency of the L1D. The proposed technique uses a small storage amount to save the tag array of the L1D for the early miss prediction before it makes the bypassing decision. The experiment results reveal that the proposed technique significantly increases the cache efficiency and the GPU performance.
  • 机译 使用FPGA的基于相干性的双麦克风语音增强技术
    摘要:This paper, presents a design and implementation of dual microphone coherence based speech enhancement technique using field programmable gate array (FPGA). In order to have a proper enhancement of dual microphone system, we require to estimate the time delay of arrival (TDOA) between the two microphone signals which is followed by the application of the proposed speech enhancement algorithm. We have used TDOA algorithm based on phase transform to minimize the effect of reverberation for localization of the sound sources. Coherence based technique has been used for speech enhancement process which requires no background noise estimation. In this way, we can achieve a high localization accuracy and also the capability of dealing with coherent noise. In the proposed system, TDOA and speech enhancement processes are executed concurrently exploiting the parallel logic blocks of FPGA, thus increasing the throughput of the system to a great extent. We have implemented our design on Spartan6 Lx45 FPGA device. The subjective evaluation of the proposed design with normal hearing listeners using comprehensibility listing test has been done and its performance has been compared to the existing state of the art research works. The objective evaluation of the proposed design also designates the significant melioration over the existing state of the art research works. The subjective and objective evaluation infer that our proposed hardware induce feasible solution for hearing aid and other hand-held devices.
  • 机译 在FPGA上实现安全的TLS协处理器
    摘要:In this paper we present a secure implementation architecture of a coprocessor for the TLSv1.2 protocol, on an FPGA. Techniques were used that increase the resistance of the design to side channel attacks, and also protect the private key data from software based attacks. The processor was implemented with a secure true random number generator which incorporates failure detection and thorough post-processing of the random bitstream. The design also includes hardware for signature generation and verification; based on elliptic curve algorithms. The algorithms used for performing the elliptic curve arithmetic were chosen to provide resistance against SPA and DPA attacks. Implementations of the AES and SHA256 algorithms are also included in order to provide full hardware acceleration for a specific suite of the TLSv1.2 protocol. The design is analysed for area and speed on a Virtex 5 FPGA. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 用于JPEG2000解码器的双模式逆离散小波变换的高速,基于存储器效率的基于行的VLSI架构
    摘要:In this paper, a high speed, memory efficient VLSI architecture has been presented for dual mode (9/7 lossy and 5/3 lossless filters) line based inverse discrete wavelet transform (IDWT) to support JPEG 2000 decoder. The new algorithm has been developed to reduce the critical path and on chip memory requirements in the proposed design. Multipliers are implemented with shift and add technique to reduce the critical path to an adder delay (T-a) and achieve maximum possible frequency for both modes. The on chip memory requirement of the M x N 2-D lossy and lossless IDWT filters are 5N and 3N, respectively. The comparison of results shows that the proposed design surpasses previous line based IDWT architectures in the aspects of less on chip memory requirements and shortest critical path delay. This architecture supports line based approach, where the input images are scanned line by line and, both vertical and horizontal filtering operations execute simultaneously to reconstruct the images. The proposed architecture is synthesized and implemented in Xilinx xc4vfx100-12 device and is offering the maximum frequency of 306 MHz for an 512 x 512 image. Power analysis performed using Synopsys Design Compiler with UMC 90-nm CMOS process, it consumes 130 mW power at 306 MHz frequency. The implementation results show that the proposed architecture can support even digital cinema (image resolution: 4096 x 1080) with 3 levels recomposition at 90 frames/s. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 基于不可约三项式的GF(2(m))上的低空间复杂度和低功耗半收缩期乘法器体系结构
    摘要:This paper proposes a three bit-serial and digit-serial semi-systolic GF(2(m)) multipliers using Progressive Product Reduction (PPR) technique. These architectures are obtained by converting the GF(2(m)) multiplication algorithm into an iterative algorithm using systematic techniques for scheduling the computational tasks and mapping them to Processing Elements (PE). Three different semi systolic arrays were obtained. ASIC implementation of the proposed designs and previously published schemes were used to verify the performance of the proposed designs. One proposed design has at least 29% lower area compared to previously published bit/digit serial multipliers. This design has also at least 70% lower power compared to previously published bit/digit serial multipliers. Another proposed design has at least 12% lower power-delay product (PDP) compared to previously published bit/digit serial multipliers. This makes the proposed designs more suited to resource-constrained embedded applications. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 基于模糊逻辑的能量和吞吐量感知的MPSoC设计空间探索
    摘要:Multicore architectures were introduced to mitigate the issue of increase in power dissipation with clock frequency. Introduction of deeper pipelines, speculative threading etc. for single core systems were not able to bring much increase in performance as compared to their associated power overhead. However for multicore architectures performance scaling with number of cores has always been a challenge. The Amdahl's law shows that the theoretical maximum speedup of a multicore architecture is not even close to the multiple of number of cores. With less amount of code in parallel having more number of cores for an application might just contribute in greater power dissipation instead of bringing some performance advantage. Therefore there is a need of an adaptive multicore architecture that can be tailored for the application in use for higher energy efficiency. In this paper a fuzzy logic based design space exploration technique is presented that is targeted to optimize a multicore architecture according to the workload requirements in order to achieve optimum balance between throughput and energy of the system. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 节能的同义词数据检测和虚拟缓存一致性
    摘要:The cache memory consumes a large proportion of the energy used by a processor. In the on-chip cache, the translation lookaside buffer (TLB) accounts for 20-50% of energy consumption of the on-chip cache. To reduce energy consumption caused by TLB accesses, a virtual cache can be accessed by virtual addresses which are issued by a processor directly. However, a virtual cache may result in the synonym problem. In this paper, we propose low-cost synonym detection hardware and a synonym data coherence mechanism. These reduce the energy consumption incurred by TLB lookups, and maintain synonym data consistency in the virtual cache. The proposed synonym detection hardware efficiently reduces the number of blocks that must be looked up in a virtual cache for saving energy. In addition, the proposed synonym data coherence mechanism also reduces the number of invalidated blocks in the virtual cache to prevent the destruction of cache locality. The simulation results show that our proposed energy-aware virtual cache consumes 51%, 27%, and 20% less energy than the traditional physical cache, traditional virtual cache, and synonym lookaside buffer (SLB), respectively. In addition, our proposed design shows almost the same static energy consumption as SLB, and reduces static energy consumption by about 20% compared with the traditional physical cache and virtual cache. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 多核SSL / TLS安全处理器体系结构及其具有自动优先算法的FPGA原型设计
    摘要:In this paper a pipelined architecture of a high speed network security processor (NSP) for SSL/TLS protocol is implemented on a system on chip (SoC) where hardware information of all encryption, hashing and key exchange algorithms are stored in Secure Digital (SD) card in terms of bit files, in contrary to recent ones where all are actually implemented in hardware. The SoC works as NSP for the system (PC), which is running the application. Through the SoC the security algorithms are implemented and it also provides the Ethernet communication interface. The NSP finds applications in e-commerce, virtual private network (VPN) and in other fields that require data confidentiality.
  • 机译 不可靠算术处理器中数据处理的可靠性和故障补偿
    摘要:In logical circuits, like arithmetic operations in a processor system, arbitrary faults become a more tremendous aspect in future. Modern manufacturing processes lead to less reliability and higher vulnerability of software execution to soft-errors. The correctness of certain results is important especially for safety-critical applications whose reliability depends on the fault-free execution of each single instruction and the dependencies between them. The more complex a software is the more unreliable the outcome is. But, there is a contrary effect. If the probability for multiple faults increases, there is also the chance that two faults compensate each other and the result is correct again. This paper presents the basic ideas for such a reliability evaluation of a software's data flow with arbitrary soft-errors and the effect of fault compensation. Further, this evaluation provides a possibility to compare different implementations of a data flow with respect to the reliability. This is shown by the comparison of two different error codes as alternatives for coded data processing. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 分层NoC的设计和动态管理
    摘要:As the number of modules grows, performance scalability of planar topology Networks-on-Chip (NoCs) becomes limited due to increasing hop-distances, since long paths involve more routers. The growing hop distance affects both end-to-end network latency and overall network saturation. Hierarchical topologies provide routes with shorter hop-distances and therefore are more adequate for large systems. The introduction of hierarchical NoCs poses new challenges as they should provide the shortest hop-distance for as many as possible source-destination pairs with minimal interference among packets, at the lowest hardware and system costs.
  • 机译 基于全流水线的基于FPGA的架构,用于实时SIFT提取
    摘要:Image feature extraction constitutes a fundamental task in robotic vision applications. Scale-Invariant Feature Transform (SIFT) has been widely used as a robust method for detecting and matching features. Nevertheless, SIFT algorithm is computationally demanding and its implementation in an embedded system requires a subtle approach. In this paper, an optimized and fully pipelined architecture is proposed for real-time detection of SIFT keypoints and extraction of SIFT descriptors. The system is suitable to target robotic vision applications and it is pipelined on pixel basis. The architecture is hosted in a medium-scale Cyclone IV FPGA device clocked at 21.7 MHz and is capable of extracting a feature with its descriptor at every clock cycle, i.e. in 46 ns. This processing speed is independent of the number of features detected in the input image and it therefore represents a very high SIFT throughput, adequate for the most demanding SIFT-based robotic applications. The system can process 70 fps in VGA resolution, while it keeps power dissipation at low levels. Moreover, the proposed implementation achieves high response and repeatability values and its matching ability is directly comparable with floating point software-based SIFT implementations. Design details are given for the combinational and RAM-based circuits forming the SIFT datapath. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 紧凑型ISA扩展的设计和评估
    摘要:The modern embedded market massively relies on RISC processors. The code density of such processors directly affects memory usage, an expensive resource. Solutions to mitigate this issue include code compression techniques and ISAs extensions with reduced instructions bit-width, such as Thumb2 and MicroMIPS. This paper proposes a 16-bit extension to the SPARC processor, the SPARC16. Additionally, we provide the first methodology for generating 16-bit ISAs and evaluate compression among different 16-bit extensions. SPARC16 programs can achieve better compression ratios than other extensions, attaining results as low as 67%. Moreover, SPARC16 reduces cache miss rates up to 9%, requiring smaller caches than SPARC processors to achieve the same performance; a cache size reduction that can reach a factor of 16. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 使用新颖的分层互连拓扑和长布线改进基于集群的Mesh FPGA架构
    摘要:This paper presents an improved interconnect network for Mesh of Clusters (MoC) Field-Programmable Gate Array (FPGA) architecture. Proposed architecture has a depopulated intra-cluster interconnect with flexible Rent's parameter. It presents new multi-levels Switch Box (SB) interconnect which unifies a downward and an upward unidirectional networks based on the Butterfly-Fat-Tree (BFT) topology. To improve the routability of proposed MoC-based FPGA, long routing segments are introduced as a function of channel width with adjustable span. Compared to basic Versatile Place and Route (VPR) Mesh architecture, a saving of 32% of area and 30% of power was achieved with proposed MoC-based architecture. Based on analytical and experimental methods, we identified and explored architecture parameters that control the interconnect flexibility of the proposed MoC-based FPGA such as Rent's parameter, cluster size, Look-Up-Table (LUT) size, long wires span and percentage. Experimental results show that architecture with LUT size 4 and Cluster arity 8 is the best trade-off between power consumption and density. It can also be noted that in general long wires span equal to 4 and percentage between 20% and 30% produce most efficient results in terms of density and power. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 用于重新配置基于容错的NoC的MPSoC的方案预处理方法
    摘要:The latest technologies of integrated circuit manufacturing allow billions of transistors to be arranged on a single chip, enabling the chip to implement a complex parallel system, which requires a communications architecture that has high scalability and a high degree of parallelism, such as a Network-on-Chip (NoC). These technologies are very close to the physical limitations, which increases the faults in manufacturing and at runtime. Therefore, it is essential to provide a method for fault recovery that would enable the NoC to operate in the presence of faults and still ensure deadlock-free routing. The preprocessing of the most probable fault scenarios enables us to anticipate the calculation of deadlock-free routings, reducing the time that is necessary to interrupt the system during a fault occurrence. This work proposes a technique that employs the preprocessing of fault scenarios based on forecasting fault tendencies, which is performed with a fault threshold circuit operating in accordance with high-level software. We propose methods for dissimilarity analysis of scenarios based on cross-correlation measurements of link fault matrices. Experimental results employing RTL simulation with synthetic traffic prove the quality of the analytic metrics that are used to select the preprocessed scenarios. Furthermore, the experiments show the efficacy and efficiency of the proposed dissimilarity methods, quantifying the latency penalization when using the coverage scenarios approach. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 芯片多处理器中设备和体系结构异构性的设计空间探索
    摘要:As we enter the deep submicron era, the number of transistors integrated on die is exponentially increased. While the additional transistors largely boost the processor performance, a repugnant side effect caused by the evolution is the ever-rising power consumption and chip temperature. It is widely acknowledged that the shortage of power supplied to a processor will be a major hazard to sustain the generational performance scaling, if the processor design is to follow the conventional approach. To utilize the on-chip resources in an efficient manner, computer architects need to consider new design paradigms that effectively leverage the advantages of modern semiconductor technology. In this paper, we address this issue by exploiting the device-heterogeneity and two-fold asymmetry in the processor manufacturing. We conduct a thorough investigation on these design patterns from different evaluation perspectives including performance, energy efficiency, and cost-efficiency. Our observations can provide insightful guidance to the design of future processors. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 使用缩放模型的异构平台的通用能源优化框架
    摘要:Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on a-chip (MpSoC) with numerous other resources, including display, memory, power management IC, battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video, image, and audio processors. Platform energy consumption and responsiveness are two major considerations for mobile systems, since they determine the battery life and user satisfaction, respectively. As a result, energy minimization approaches targeting mobile computing need to consider the platform at various levels of granularity. In this paper, we first present power consumption, response time, and energy consumption models for mobile platforms. Using these models, we optimize the energy consumption of baseline platforms under power, response time, and thermal constraints with and without introducing new resources. Finally, we validate the proposed framework through experiments on Qualcomm's Snapdragon 800 Mobile Development Platforms. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 嵌入式系统安全性的多维分析
    摘要:The primary goals of this paper are to analyze the security of embedded systems at different levels of abstraction and to propose a new procedure to assess and improve the security of embedded systems during various product life cycle phases. To achieve these goals, this paper introduces new classification of embedded systems attacks using a novel multi-dimensional representation, explores the possible threats to embedded systems, and proposes a new procedure to evaluate and improve the security of embedded systems during various product development phases. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 基于测试优先级提高功能验证的效率
    摘要:Functional verification has become the key bottleneck that delays time-to-market during the embedded system design process. And simulation-based verification is the mainstream practice in functional verification due to its flexibility and scalability. In practice, the success of the simulation-based verification highly depends on the quality of functional tests in use which is usually evaluated by coverage metrics. Since test prioritization can provide a way to Simulate the more important tests which can improve the coverage metrics evidently earlier, we propose a test prioritization approach based on the clustering algorithm to obtain a high coverage level earlier in the simulation process. The k-means algorithm, which is one of the most popular clustering algorithms and usually used for the test prioritization, has some shortcomings which have an effect on the effectiveness of test prioritization. Thus we propose three enhanced k-means algorithms to overcome these shortcomings and improve the effectiveness of the test prioritization. Then the functional tests in the simulation environment can be ordered with the test prioritization based on the enhanced k-means algorithms. Finally, the more important tests, which can improve the coverage metrics evidently, can be selected and simulated early within the limited simulation time. Experimental results show that the enhanced k-means algorithms are more accurate and efficient than the standard k-means algorithm for the test prioritization, especially the third enhanced k-means algorithm. In comparison with simulating all the tests randomly, the more important tests, which are selected with the test prioritization based on the third enhanced k-means algorithm, achieve almost the same coverage metrics in a shorter time, which achieves a 90% simulation time saving. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 在循环仿真中通过并行随机穷举硬件随时进行系统级验证
    摘要:System level verification of cyber-physical systems has the goal of verifying that the whole (i.e., software + hardware) system meets the given specifications. Model checkers for hybrid systems cannot handle system level verification of actual systems. Thus, Hardware In the Loop Simulation (HILS) is currently the main workhorse for system level verification. By using model checking driven exhaustive HILS, System Level Formal Verification (SLFV) can be effectively carried out for actual systems.
  • 机译 一种建模网络物理系统可靠性的方法
    摘要:Cyber-Physical Systems (CPSs) represent a new generation of digital systems, where cyber entities and physical devices cooperate towards a set of common goals. The research presented in this paper aims to contribute to the development of CPSs by proposing: (1) an analysis methodology to model the CPS's behavior in terms of dependability; and (2) a CPS architecture with dependability facilities applicable in environmental monitoring, based on the Wireless Sensor Network, multi-agent and cloud computing technologies. The proposed methodology combines a primary dependability analysis technique with the representation of knowledge in order to support the development of CPSs capable to model the dependability at run-time. A dependability domain ontology has been implemented on a CPS case study based on this methodology and its effectiveness has been demonstrated, showing how the proposed approach is able to enhance system dependability. Also, the paper provides a detailed description of each architectural layer of the CPS case study, focusing on the wireless sensor node and on the intelligent decision system. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 高速AES设计可抵抗故障注入攻击
    摘要:To secure the Advanced Encryption Standard against physical attacks known as fault injection attacks, different countermeasures have been proposed. The AES is used in many embedded systems to provide security. It has become the default choice for security services in numerous applications. However, the natural and malicious injected faults reduce its robustness and may cause private information leakage. In this paper, we study the concurrent fault detection schemes for achieving a reliable AES implementation. We specifically propose a new fault detection scheme based on modification of the AES architecture. For this purpose, the round AES transformation is broken into two parts and a pipeline stage is inserted in between.
  • 机译 基于FPGA的无源双基地雷达(PBR)的广泛抵消算法(ECA)架构
    摘要:Passive Bistatic Radar (PBR) exploits existing signals of opportunity from different sources such as Radio and TV signals. Extensive Cancellation Algorithm (ECA) has been proven to be a very effective way to mitigate the effects of direct signal, multipath and clutter echoes in PBR. Also, it is able to detect a moving target accurately when it comes to strong-clutter environment and long-range detection providing evidence for its robustness. However, ECA is a computationally intensive algorithm and will benefit from parallel processing and modern computational platforms such as Field Programmable Gate Arrays (FPGAs). This work involves transformation of ECA by exploring opportunities for parallel processing and elimination of any unnecessary computations and storages. ECA algorithm has been also implemented on FPGAs for high speed computation by exploiting parallel and pipelining approaches. A new software tool called Radar Signal Processing Tool (RSPT) has been developed. It allows the designer to auto-generate fully optimized VHDL representation of ECA by specifying many user input parameters through GUI. The produced VHDL code is independent of FPGA part. It is also appropriate for use with any future high performance FPGAs or ASICs to further cut down computation time. Moreover, it provides the designer a feedback on various performance parameters. This offers the designer an ability to make any adjustments to the ECA component until the desired performance of the overall System on Chip (SoC) is achieved. The computation time of our transformed/optimized algorithm has improved by a factor of 3.8. Its FPGA implementation offers a speed up of 18 over CPU. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 FPGA上基于高性能ST-Box的统一AES加密/解密架构
    摘要:In this paper, a unified Field Programmable Gate Array (FPGA) based Advanced Encryption Standard (AES) encryptor/decryptor design is presented by proposing a symmetric ST-Box structure. This structure fully utilizes high capacity (32 Kb) Block RAM (BRAM) by accommodating all encryption and decryption lookup operations within a single BRAM in the form of single integrated Look-Up-Table. This design also caters the inherent asymmetric nature of encryption and decryption coefficients for a unified hardware. Further the symmetry at BRAM output is maintained to use a single XOR network during both encryption and decryption. The performance of design is enhanced by proposing a duty-cycle based accessing technique. It explores the switching capabilities of BRAM and effectively minimizes the ON time of BRAM by changing duty-cycle of input clock. This enables us to access single BRAM 4 times per clock. Effectiveness of design is further measured by implementing it, in both iterative and pipelined architectures. Our proposed iterative design on Virtex-7 proved to be the smallest 128-bit unified AES core with 48.70% reduced resources and the best Throughput Per Slice (TPS) of 11.56. Similarly our pipelined design saved 59.01% area and has the highest throughput of 45.69 Gbps. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 自主漫游车声纳阵列处理的计算架构
    摘要:This paper presents design of novel embedded computational architectures for real time, in-motion mapping based on ultrasound sensors for use in resource constrained autonomous rovers. Autonomous rovers are a class of real time systems that are constrained for size, weight, on-board computational resources and power. Embedded computing architectures designed for implementing the mapping and navigational algorithms must optimize the use of these resources. In the process of map generation, raw sensor data obtained from an array of ultrasound sensors is filtered for sensor noise using probabilistic sensor model, and probabilistic data fusion methods are employed for spatial and temporal correlation of data for improving the map. In this paper, we present a System-on-Chip design based design space exploration of embedded computational architectures for implementation on field programmable gate arrays. We seek to exploit system level, region level and sensor level parallelism in the mapping algorithm for enhancing the throughput. Design space exploration is carried out by employing existing soft core processors, designing custom co-processors and data path modules and integrating them using parallel and pipelined data flow approaches. Results of mapping a test area on all the architectures are compared to characterize the performance and suitability of the proposed architectures. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 FPGA上异构架构的保护:一种基于硬件防火墙的方法
    摘要:Embedded systems are parts of our daily life and used in many fields. They can be found in smart phones or in modern cars including GPS, light/rain sensors and other electronic assistance mechanisms. These systems may handle sensitive data (such as credit card numbers, critical information about the host system and so on) which must be protected against external attacks as these data may be transmitted through a communication link where attackers can connect to extract sensitive information or inject malicious code within the system. This work presents an approach to protect communications in multiprocessor architectures. This approach is based on hardware security enhancements acting as firewalls. These firewalls filter all data going through the system communication bus and an additional flexible cryptographic block aims to protect external memory from attacks. Benefits of our approach are demonstrated using a case study and some custom software applications implemented in a Field-Programmable Gate Array (FPGA). Firewalls implemented in the target architecture allow getting a low-latency security layer with flexible cryptographic features. To illustrate the benefit of such a solution, implementations are discussed for different MPSoCs implemented on Xilinx Virtex-6 FPGAs. Results demonstrate a reduction up to 33% in terms of latency overhead compared to existing efforts. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 结合抛物线合成方法和二次插值
    摘要:The Parabolic Synthesis methodology is an approximation methodology for implementing unary functions, such as trigonometric functions, logarithms and square root, as well as binary functions, such as division, in hardware. Unary functions are extensively used in baseband for wireless/wireline communication, computer graphics, digital signal processing, robotics, astrophysics, fluid physics, games and many other areas. For high-speed applications, as well as in low-power systems, software solutions are not sufficient and a hardware implementation is therefore needed. The Parabolic Synthesis methodology is a way to implement functions in hardware based on low complexity operations that are simple to implement in hardware. A difference in the Parabolic Synthesis methodology compared to many other approximation methodologies is that it is a multiplicative, in contrast to additive, methodology. To further improve the performance of Parabolic Synthesis based designs, the methodology is combined with Second-Degree Interpolation. The paper shows that the methodology provides a significant reduction in chip area, computation delay and power consumption with preserved characteristics of the error. To evaluate this, the logarithmic function was implemented, as an example, using the Parabolic Synthesis methodology in comparison to the Parabolic Synthesis methodology combined with Second-Degree Interpolation. To further demonstrate the feasibility of both methodologies, they have been compared with the CORDIC methodology. The comparison is made on the implementation of the fractional part of the logarithmic function with a 15-bit resolution. The designs implemented using the Parabolic Synthesis methodology - with and without the Second-Degree Interpolation - perform 4x and 8x better, respectively, than the CORDIC implementation in terms of throughput. In terms of energy consumption, the CORDIC implementation consumes 140% and 800% more energy, respectively. The chip area is also smaller in the case when the Parabolic Synthesis methodology combined with Second-Degree Interpolation is used. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 使用自动网表加密方法提高ASIC制造过程的安全性,以抵抗逆向工程攻击
    摘要:Reverse engineering is a great peril for hardware security especially when functional behavior extraction is required. In this paper a new automated mechanism is proposed to encrypt routing topology of the design which leads to hinder reverse engineering during the foundry/fabrication process. Moreover, new special standard cells (Wire Scrambling cells) are proposed corresponding with an automatic design flow to insert the WS-cells inside the netlist with the aim of maximum effectiveness of obfuscation and minimum overhead. The highlight feature of this mechanism is that it can be performed without detailed information about the functionality and structure of the design and hence, it can be automated easily. This methodology is implemented using an academic physical design framework (EduCAD). Experimental results show that reverse engineering can be hindered considerably in cost of negligible overheads in area, power consumption and total wire length. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 在基于FPGA的多核架构中管道化与数据相关的任务
    摘要:In recent years, there has been increasing interest in using task-level pipelining to accelerate the overall execution of applications mainly consisting of producer/consumer tasks. This paper proposes fine- and coarse-grained data synchronization approaches to achieve pipelining execution of producer/consumer tasks in FPGA-based multicore architectures. Our approaches are able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated with the producer/consumer tasks. We propose techniques to reduce the number of accesses to external memory in our fine-grained data synchronization approach. The experimental results show the feasibility of the approach in both in-order and out-of-order producer/consumer tasks. Moreover, the results using our approach reveal noticeable performance improvements for a number of benchmarks over a single core implementation without using task-level pipelining. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 探索3D堆叠式eDRAM缓存的温度感知刷新方案
    摘要:Recent studies have shown that embedded DRAM (eDRAM) is a promising approach for 3D stacked last level caches (LLCs) rather than SRAM due to its advantages over SRAM; (i) eDRAM occupies less area than SRAM due to its smaller bit cell size; and (ii) eDRAM has much less leakage power and access energy than SRAM, since it has much smaller number of transistors than SRAM. However, different from SRAM cells, eDRAM cells should be refreshed periodically in order to retain the data. Since refresh operations consume noticeable amount of energy, it is important to adopt appropriate refresh interval, which is highly dependent on the temperature. However, the conventional refresh method assumes the worst-case temperature for all eDRAM stacked cache banks, resulting in unnecessarily frequent refresh operations. In this paper, we propose a novel temperature-aware refresh scheme for 3D stacked eDRAM caches. Our proposed scheme dynamically changes refresh interval depending on the temperature of eDRAM stacked last-level cache (LLC). Compared to the conventional refresh method, our proposed scheme reduces the number of refresh operations of the eDRAM stacked LLC by 28.5% (on 32 MB eDRAM LLC), on average, with small area overhead. Consequently, our proposed scheme reduces the overall eDRAM LLC energy consumption by 12.5% (on 32 MB eDRAM LLC), on average. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 用于实时MPSoC的可重新配置缓存:计划和实施
    摘要:Shared cache in modern multi-core systems has been considered as one of the major factors that degrade system predictability and performance. How to manage the shared cache for real-time multi-core systems in order to optimize the system performance while guaranteeing the system predictability is an open issue. In this paper, we present a reconfigurable cache architecture which supports dynamic cache partitioning at hardware level and a framework that can exploit cache management for real-time MPSoCs. The proposed reconfigurable cache allows cores to dynamically allocate cache resource with minimal timing overhead while guaranteeing strict cache isolation among the real-time tasks. The cache management framework automatically determines time-triggered schedule and cache configuration for each task to minimize cache misses while guarantee the real-time constraints. We evaluate the proposed framework with respect to different numbers of cores and cache modules and prototype the constructed MPSoCs on FPGA. Our experiments show that, our automatic framework brings significant benefits over the state-of-the-art cache management strategies when testing 27 benchmark programs on the constructed MPSoCs. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 具有安全关键要求的基于组件的网络物理应用程序设计
    摘要:Cyber-physical systems typically involve large numbers of mobile autonomous devices that closely interact with each other and their environment. Standard design and development techniques often fail to effectively manage the complexity and dynamics of such systems. As a result, there is a strong need for new programing models and abstractions. Towards this, component-based design methods are a promising solution. However, existing such approaches either do not accurately model transitory interactions between components - which are typical of cyber-physical systems - or do not provide guarantees for real-time behavior which is essential in safety-critical applications. To overcome this problem, we present a component-based design technique based on DEECo (Dependable Emergent Ensembles of Components). The DEECo framework allows modeling large-scale dynamic systems by a set of interacting components and, in contrast to approaches from the literature, it provides mechanisms to describe transitory interactions between them. To allow reasoning about timing behavior at the component-description level, we characterize DEECo's closed-loop delay in the worst case, i.e., the maximum time needed to react to a change in the environment. Based on this, we incorporate real-time analysis into DEECo's design flow. This further allows us to analyze the system's robustness under unreliable communication and to design decentralized safety-preserving mechanisms. To illustrate the simplicity and usefulness of our approach, we present a case study consisting of an intelligent crossroad system. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 微电网的公平能源分配策略
    摘要:The design of smart grid systems have been proposed in several literature. However, the actual commercial feasibility of smart grids or micro grids still faces several problems including technical infrastructure related issues and market economy issues. One critical issue in the design of commercially feasible smart grids is how the electricity trading can be performed fairly among micro grids. In existing works, this issue has been mainly addressed by using some sort of priority scheme among buyers and sellers in electricity trading, which obviously does not guarantee a fair trading scheme. As a result, the commercial feasibility of existing works is at stake and will not work as proposed. This work tries to address this issue by proposing a Fair Energy Resource Allocation (FERA) method for smart grids. The proposed method has been implemented in a FIPA-compliant Multi-Agent System (MAS) based smart grid control system and evaluated against state-of-the-art round robin and priority based allocation methods. For trading among 30 micro grids, it is demonstrated that the proposed method results in a high fairness index of 96.22% even in the worst case, while the round robin scheme and the priority scheme result in a worst-case fairness index of only 57.8% and 11.29%, respectively. Thus, in the long term under different ratios of buyers and sellers, the proposed method is the only method that can achieve a very high fairness index in the worst case. Averaging over different ratios of buyers and sellers, the proposed method results in a fairness index of 99.57%, which is much higher that achieved by the round robin method (84.04%) and the priority scheme (63.56%). As far as cost saving is concerned, based on the cost saving opportunity (CSO) metric, in the long term (10,000 rounds of trading), the proposed method results in a CSO of 51.48%, which is much higher than that by the other two methods; round robin method results in 14.07% and priority-based method results in 34.44%. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 关于智能可重构系统建模,设计和实现的特殊问题简介
    摘要:
  • 机译 利用量子点元胞自动机的新型低功耗可逆二进制增量器设计
    摘要:This paper demonstrates the design of n-bit novel low power reversible binary incrementer in Quantum Dot Cellular Automata (QCA). The comparison of quantum cost in quantum gate based approach and in QCA based design agreed the cost efficient implementation in QCA. The power dissipation by proposed circuit is estimated, which shows that the circuit dissipates very low heat energy suitable for reversible computing. All the circuits are evaluated in terms of logic gates, circuit density and latency that confirm the faster operating speed at nano scale. The reliability of the circuit under thermal randomness is explored which describes the efficiency of the circuit. (C) 2015 Elsevier B.V. All rights reserved.
  • 机译 使用指令漏洞因子对嵌入式处理器进行快速,准确的架构漏洞分析
    摘要:Scaling new-silicons to nano-scale era has brought more integration, high performance and low power consumption while the reliability becomes a serious challenge for integrated circuits technology. Therefore, reliability awareness has become essential in early stages of integrated circuit design. Since many of modern chips scrimmage with the limited power budget and traditional techniques such as N-Modular Redundancy (NMR) is not efficient for non-uniform fault tolerance, accurate analyzing of the reliability of different hardware components or application parts is necessary. Transient and soft errors which are resulted from cosmic rays strike and Process Voltage and Temperature (PVT) variation are known as main sources of unreliability. Recently, Architectural Vulnerability Factor (AVF) is widely used for analyzing the reliability of a processors. In this paper, we have introduced a new metric named as Instruction Vulnerability Factor (IVF) which is used for fast, accurate, and recurring AVF estimation. Special scenarios have been developed which enable us to utilize exhaustive fault injection for precise IVF calculation for a given processor instruction set. IVFs of a special instruction considers the vulnerability of pipeline stages while executing the instruction. Finally, a simple equation has been derived for AVF estimation based on running instructions. Our experimental results which are extracted by our Configurable Reliability Analysis Framework (CRAF) confirm the accuracy of presented AVF estimation method. Moreover, IVF can be employed by reliability aware compilation or online AVF estimation techniques. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 基于模糊逻辑的FPGA立体声匹配单元
    摘要:Stereo matching is one of the most used algorithms in real-time image processing applications such as positioning systems for mobile robots, three-dimensional building mapping and recognition, detection and three-dimensional reconstruction of objects. In order to improve the performance, stereo matching algorithms often have been implemented in dedicated hardware such as FPGA or GPU devices. In this paper an FPGA stereo matching unit based on fuzzy logic is described. The proposed algorithm consists of three stages. First, three similarity parameters inherent to each pixel contained in the input stereo pair are computed. Then, the similarity parameters are sent to a fuzzy inference system which determines a fuzzy-similarity value. Finally, the disparity value is defined as the index which maximizes the fuzzy similarity values (zero up to dmax). Dense disparity maps are computed at a rate of 76 frames per second for input stereo pairs of 1280 x 1024 pixel resolution and a maximum expected disparity equal to 15. The developed FPGA architecture provides reduction of the hardware resource demand compared to other FPGA-based stereo matching algorithms: near to 72.35% for logic units and near to 32.24% for bits of memory. In addition, the developed FPGA architecture increases the processing speed: near to 34.90% pixels per second and outperforms the accuracy of most of real-time stereo matching algorithms in the state of the art. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 具有成本效益的芯片多处理器的片上网络拓扑分析
    摘要:As chip multiprocessors accommodate a growing number of cores, they demand interconnection networks that simultaneously provide low latency, high bandwidth, and low power. Our goal is to provide a comprehensive study of the interactions between the interconnection network and the memory hierarchy to enable a better co-design of both components. We explore the implications of the interconnect choice on overall performance by comparing the behaviour of three topologies (mesh, torus, and ring) and their concentrated versions. Simply choosing the concentrated mesh over the ring improves performance by over 40% in a 64-core chip.
  • 机译 片上网络的可重配置多播路由
    摘要:Several unicast and multicast routing protocols have been presented for MPSoCs. Multicast protocols in NoCs are used for cache coherency in distributed shared memory systems, replication, barrier synchronization, or clock synchronization. Unicast routing algorithms are not suitable for multicast, as they increase traffic, congestion and deadlock probability. Famous multicast schemes such as tree-based and path-based schemes have been proposed originally for multicomputers and recently adapted to NoCs. In this paper, we propose a switch tree-based multicast scheme, called STBA. This method supports tree construction with a minimum number of routers. Our evaluation results reveal that, for both synthetic and real traffic loads, the proposed scheme outperforms the baseline tree-based routing scheme in a conventional mesh by up to 41% and reduces power consumption by up to 29%. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 物联网应用处理器:评估设计空间和权衡取舍
    摘要:Contemporary embedded systems require low-power solutions while still keeping a minimum performance level, and this is even more acute in the Internet of Things (IoT) domain, with its vast design space. This work proposes a configurable RISC processor associated to a design flow that includes a hardware synthesis flow and a software toolchain. This design flow is useful to explore design space and trade-offs of processor cores for IoT applications, by enabling multiple hardware configurations with variable degrees of complexity, while maintaining compatibility with the chosen instruction set architecture, which is itself configurable. Results rely on example designs targeting a 65 nm technology and post-mapped hardware simulations of two benchmarks sets, the CoreMark and Malardalen suites. These results indicate that substantial power savings can be obtained by tailoring the architecture to a given application class, while reducing hardware complexity and maintaining performance figures. Findings show that the proposed processor provides an interesting resource to target low-end and middle-sized IoT applications, while demonstrating that reducing hardware complexity usually leads to the best trade-off between performance and power. (C) 2016 Elsevier B.V. All rights reserved.
  • 机译 具有混合SRAM和易失性STT-RAM配置的L1缓存设计的体系结构和数据迁移方法
    摘要:Spin-Transfer Torque RAM (STT-RAM) has the advantages of circuit density and ignorable leakage power. However, it suffers from the bad write latency and poor write power consumption. Therefore, it is difficult to replace entire SRAM with STT-RAM in the L1 cache, but we can relax the retention time of STT-RAM cell to improve its write performance and replace some of the SRAM capacity to reduce leakage power. In this paper, we propose a locality-aware approach for L1 cache design with hybrid SRAM and volatile STT-RAM configuration. Based on the principle of cache locality, data block is mapped to SRAM firstly to reduce write latency and write energy, and is moved to volatile STT-RAM to reduce leakage power consumption. After a time period when there is no access of a data block in the volatile STT-RAM, we then stop its refresh operations to further reduce power consumption. Experimental results show that in comparison with the SRAM only L1 cache configuration, our hybrid cache configuration and data migration methodology reduce energy consumption by about 15-20%, with only nearly to 5% of latency overhead. Also when comparing to the SI 1 RAM only L1 cache configuration, we reduce memory access latency nearly to 20% with close or even better energy consumption. (C) 2015 Elsevier B.V. All rights reserved.
  • 联系方式:18141920177 (微信同号) 客服邮箱:kefu@capm.ac.cn
  • 中文期刊 外文期刊
  • 京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-1 六维联合信息科技(北京)有限公司©版权所有
  • 客服微信
  • 服务号
免费A级毛片日韩欧美一中文字暮精品亚洲AV国产AV手机在线不卡無在線壹區二區三區觀免費人做人愛的視頻中文歐美無線碼 日本 v 影_亚洲 v 影_亚洲 a 电_欧美a电 国产a影_欧美a电_日本v影_亚洲v影 国产久久亚洲美女久久-国产亚洲日韩欧美看国产 午夜国产免费视频亚洲-在线欧美 精品 第1页 免费观看三级片_免费国产Av_免费国产黄片 亚洲 自拍 校园 欧美 日韩-久青草国产在线视频 亚洲 另类 小说 国产精品-香蕉国产精品偷在线观看 国产亚洲精品免费视频-国产亚洲日韩欧美看国产 国产亚洲精品香蕉视频播放-国产免费三级a在线观看 欧美图***另类偷偷自拍-亚洲 中文 字幕 国产 综合 国产亚洲日韩欧美看国产-99国产这里有精品视频 欧美 亚洲 日韩 国产 综合-国产亚洲日韩欧洲一区 五月丁香六月综合欧美-成长在线视频免费观看 免费视频一区二区三区-国语自产拍在线视频中文 欧美免费全部免费观看-亚洲 日韩 中文 综合av 国产国产成年在线视频区-色天天综合色天天久久婷婷 国产在线视频播放社区-五月丁香六月综合缴情基地 欧美亚洲综合另类无码-日本成本人片无码免费视频 五月丁香六月综合欧美-日本成本人片视频免费 亚洲 欧美 国产 综合五月天-亚洲欧美日本国产高清 精品AV综合导航-日本在线看片免费视频 日本欧美日韩中文亚洲-日本三级无码中文字幕 在线观看免费视频日本高清-成年大片免费视频播放 不卡本日Av网站_日本av网站-夜色撩人手机免费观看 国产Av在线看的_韩国日本免费不卡在线_免费aV 岛国a视频在线观看-三分钟免费观看视频 亚洲伊人***网站-国产免费三级a在线观看 大香中文字幕伊人久热大-伊人成综合网伊人222- 免费A级毛片_中国A级毛片_午夜国产免费视频亚洲-在线欧美 精品 第1页_a片在线观看 三级a片_成 三级 观看_人 三级 写真人体 三级真人牲交 free欧美高清猪马牛 我和狗做了4年都没事 午夜国产免费视频亚洲-在线欧美 精品 第1页 bt种子搜索 同房姿势108种 使劲里面痒想要 年轻的母亲线2免费 午夜国产免费视频亚洲-在线欧美 精品 第1页 爸爸快点我坚持不住了 午夜国产免费视频亚洲-在线欧美 精品 第1页 熟透的岳 熟妇的荡欲 午夜国产免费视频亚洲-在线欧美 精品 第1页 老熟妇乱子伦视频 亚洲五月六月丁香缴情 e欧美性情一线免费http 把你干到疼得下不了床 女人床上活好是啥样的 床戏 床 戏 三个人在一个床上做了 精品国产自在现线拍 免费A级毛片 特级做人爱c级 国内偷拍在线精品 国产精品香蕉视频在线 国产精品高清视频免费 朋友的姐姐线观高清2 欧美高清videosedexohd 迷人的保姆5线观高清 天天看高清影视在线观看 一本道理高清在线播放 日本一道本高清二区 天天看免费高清影视 一区二区三区高清视频 日本一大免费高清 欧美高清vitios 高清一区高清二区 天天看高清影视在线WWW 特级aav毛片欧美免费观看 午夜国产免费视频亚洲-在线欧美 精品 第1页 天天看大片特色视频 免费A级毛片 特级做人爱c级 午夜国产免费视频亚洲-在线欧美 精品 第1页 中国A级毛片 A级人体片 香港三级 公憩关系小说 欧美三级片 秋霞理论在一l级 超级乱婬长篇小说 天堂v无码亚洲一本道 中文字幕乱码 电影在线观看 中文字幕乱码免费 中文亚洲无线码 日本无码不卡中文免费 日本一本道免费天码av 中文欧美无线码 国产av在在免费线观看 精品国产自在现线拍 亚洲AV国产AV手机在线 久久爱www免费人成 女人哪种下面最受欢迎 小妖精一天不做就难受呀 非会员试看一分钟做受小视频 女人的性承受极限 偷窥女教师 妈妈的朋友4线观高清 4攻一受同时做宿舍 我的妻子的姐姐2 电影 家里没人半夜就和姐姐 younggir第一次young 坐车跟姐姐那个 爸不不要了太满了流来了 能看到让你流水的小说 蜜汁在马背上流下来 喷个不停gif出处 喷潮白浆直流视频在线 女人喷潮完整视频 吹潮流的水能喝吗 色综合亚洲色综合吹潮 美国式禁忌 老汉开花苞 免费人做人爱的视频 午夜国产免费视频亚洲-在线欧美 精品 第1页 a级做爰片 午夜国产免费视频亚洲-在线欧美 精品 第1页 做爱网站 白小姐四肖必选一肖 younggir第一次young 宝贝我有点大你忍一下 国语自产一区第二页 不卡无在线一区二区三区观 日本一大免费高清 日本一本免费一二三区 午夜国产免费视频亚洲-在线欧美 精品 第1页 在线不卡日本v二区 w006.top 五个大佬跪在我面前叫妈 gif动态图视频第五十八期 亚洲五月六月丁香缴情 五月爱婷婷六月丁香色 综合欧美五月丁香五月 色婷亚洲五月 五月爱婷婷六月丁香色 十大免费最污的直播 口述在车里下面被添 公车上强行被灌满浓精 坐车跟姐姐那个 呵呵我要别停我要死了 么公的好大好硬好深好爽想要 使劲里面痒想要 一晚上要了小姑娘三次 想要嘛人家想啊你快点嘛 求你们不要了np 老公说想放在里面睡觉 好妈妈快点想死我了 500短篇超污多肉推荐 很肉到处做1v1青梅竹马 可以免费观看的av毛片 午夜国产免费视频亚洲-在线欧美 精品 第1页 日本毛片18禁免费 日本高清免费毛片大全 午夜国产免费视频亚洲-在线欧美 精品 第1页 午夜国产免费视频亚洲-在线欧美 精品 第1页 18岁末年禁止观看试看一分钟 美国式禁忌5一11集 我的绝色总裁未婚妻 绝味儿媳妇txt 顶级少妇 荡公乱妇 玩弄放荡人妇系列 japanesewiif0孕妇 熟妇大尺度人体艺 玩两个少妇女邻居 美妇乱人伦小说 67194成l人在线观看线路 公憩关系小说 私欲小说 杂乱小说1第403部分 老师不行我做不下去了小说 图片区 偷拍区 小说区 销魂美女图库 做爱动态图 131美女做爰图片 gif动态图出处第900期 他抬高她的腰撞到最深处 甜宠肉H双处