Graduands required!
FPGA Fault Tolerance in Particle Physics Experiments
The increasing use of SRAM-based reconfigurable architectures at important areas of research and development like particle accelerators and space applications brings new, currently partially unattended effects on top. An already well known, but nevertheless important problem of such systems is its susceptibility to radiation which increases in with particle flux and energy. Unable to prevent these errors by the use of extensive shielding, our field of research is to use intelligent algorithms in conjunction with spacial and temporal redundancy features to eliminate each of such effects that may cause a Single Event Functional Interrupt (SEFI) which leads to miscalculation and system halt.
We are open to talented, dedicated students (BSC/MSC/Diploma/"Projektpraktikum") for working on the following research projects concerning FPGA design using fault-tolerance:
- Development of fault tolerant IP cores (MSC/Diploma)
To minimize chip failures in radiation vulnerable setups, the entire static FPGA configuration is refreshed continuously during system runtime (Scrubbing). On principle, this method is inapplicable for recovering dynamic data. Thus, all of the used design logics (IP-Cores) have to include additional recovery methods, fallback states and backup paths. Moreover they have to prevent Single Event Functional Interrupts (SEFI) to halt the entire system. To reach this level of fault tolerance, this may include:
- double module redundancy in design for functional units
- triple module redundancy in design for irrecoverable dynamic data
- Parity/CRC error detection/correction in data paths and busses
- fault tolerant state machines to detect invalid state crossings (Hamming-based state encoding where neighboured states have fixed/minimal Hamming distance)
The project aims at maximum fault tolerance with minimum size. It may enhance suitable opencores.
Challenge: VHDL Implementation of fault-tolerance techniques to a specific part of an FPGA system (DDR controller, ethernet, ...).
Special Requirements: VHDL basics
- Verification of a CPU IP Core - MIPS R3000 (BSC)
We have created a fault-tolerant CPU in VHDL. Now there is the requirement to verify its functionality in all stages of operation. This means, for example, that possible op-codes have to work with each other and with each memory address (except the MIPS constraints). Thus, it is required to establish an own test workflow.
Special Requirements: interest in processor design and VHDL may be of advantage
- Design study on fault-tolerance at Application Level ("4-Weeks-Projektpraktikum")
The top level of current system design is software development in high level languages (maybe object-oriented) like C, C++, Java, Perl and so on. All of them assume a correctly working hardware platform on the underlying layers, especially a fully functional processor for calculations and memory transfer. Current research has shown, especially at radiation critical applications, that emerging hardware Single Event Upsets (SEU) may cause data-corruption and miscalculations. Each of the high level languages provide simple and extended tricks and design guides on how to secure methods and variables from data-losses, data-corruption and data-invalidness.
Possible fault tolerant features in high level languages may be:
- add additional check routines, data paths and so on
- overload method signatures, e.g. increase(int i) to increase(int i1, int i2) with i1==i2
- add oblige assertions:
public int getLength(String s) {
assert(s != null);
return s.length();
}
- create wrappers for each data type, e.g. to replace int by r_int with included triple module redundancy
- add in-execution-checkpoints for data-validity
Challenge: find multiple ways to secure input and output data within different high level languages.
Special Requirements: Interest in Programming Languages / Software Design
If you are interested into our fields of research and want to get involved, please let us know!
ALICE HLT Cluster and Detector Management
For managing large computer clusters such as the HLT cluster of the ALICE experiment at CERN automated tools
are required to minimize the need for manual interaction. This involves monitoring, management and optimization of
hardware, operating systems and applications. Currently there are two main fields of research which we provide theses and internships for:
- Automated Cluster Management for providing fault tolerance in large distributed environments
- Application of virtualization techniques for the optimization of resource usage in distributed environments
Both projects are part of the current HLT installation at CERN and we always have new and interesting tasks for students.
If you are interested in these projects please contact us ...
ALICE DAQ & HLT Common RORC
A common Read-Out Receiver Card for DAQ and HLT will be built to replace the currently used PCI-X cards. This card will have a fast PCI-Express interface and several optical links. Currently a Xilinx Virtex-6 Evaluation board is used as prototyping platform. Possible topics for theses / lab courses are:
- Linux Kernel Driver Development / Scatter-Gather-DMA Engine (C / Kernel-Hacking)
- FPGA PCI-Express DMA-Engine / Implementation & Performance Testing (Verilog / VHDL / C)
- Optics with Virtex-6 / DetectorDataLink (DDL) (Verilog / VHDL)
If you are interested in these topics please contact:
Heiko Engel
hengel@iri.uni-frankfurt.de
The CBM Read-Out Controller
The CBM experiment requires the readout of multiple detector front end electronics. This is usually done with special FPGA based read-out controller (ROC) boards. Usually each detector assembles different frontend electronics, requiring different ROCs. However, the interface in the other direction of the readout chain - towards the computing nodes - is usually quite the same. This led to the idea of a modular ROC design, which separates the FPGA firmware into a readout logic module and a transport logic module. Much functionality can be reused and only the interface to the frontend electronics needs to be exchanged. In addition, this modular approach allows an efficient development of the firmware by more than one designer. The frontend and the transport module can be developed separately by different developers.
At the moment we provide two transport solutions (Optics and Ethernet) and the readout of two frontend chips (the nXYTER and the GET4). The support for a third frontend chip (the SPADIC) is planned as well as a third transport solution (USB).
The ROCs will later be operated in a radiation environment. Since the SRAM based FPGAs are susceptible to radiation, the firmeware requires to implement special radtiation mitigation techniques.
Since there are three slightly different read-out controller boards available (each assembled with a different FPGA), we support the firmware for three different target architectures. A fourth target architecture will be supported soon. So today we provide 12, in the future up to 36 different firmwares.
In this context we offer some lab courses ("Projektpraktika") and also search for bachelor and master students:
1) "The USB transport module"
- The upcoming SysCore v3 board comes without the necessary connections for an Ethernet transport module. To enable small Lab setups we want to provide a data transport solution via USB. We search a student for the implementation of the CBMnet emulation over an USB connection.
- You should have some experience with the Xilinx develoment tools (ISE/EDK/...) or be willing to learn how to use those programs. Experience with the linux operating system is welcome.
2) "Radiation tolerant Optics protocol"
- The Optics transport module will be used in a radiation environment. Besides the radiation mitigation techniques on hardware design level, also the protocol itself should be designed to cope with radiation caused faults. We search for a student to work in this field.
- You should have some C/C++ skills and experience with the Xilinx develoment tools (ISE/EDK/...) or be willing to learn how to use those programs. Experience with the linux operating system is welcome.
3) "Program the Actel-FPGA with the DirectC library running on a Microblaze softcore CPU"
- The SysCore based read-out controller boards come with a second FPGA (Actel) which is rather small. This FPGA is flash based and therefore it's configuration is non-volatile and, and that is the important part, not susceptible to radiation. However this FPGA still need to be reconfigured once in a while (e.g. when an update of the firemware is available). The task would be to implement a solution to program this second FPGA from a software running on a softcore CPU on the first FPGA.
- The student should have some experience with the C programming language, embedded systems, the Xilinx develoment tools (ISE/EDK/...) and the Actel design software (Libero/...) or be willing to gain the necessary expertise in these fields.
If you are interested in one of the topics, please contact:
Sebastian Manz
manz@iri.uni-frankfurt.de
geändert am 23. März 2012 E-Mail: wwwwww@rz.uni-frankfurt.de
|
|
Zur Navigationshilfe