François Piednöel: exclusive interview for AT
New interview, this time with François Piednöel, one of former performance gurus of Intel Corp., when He worked for 20 years. Father of microprocessors like the Intel Pentium 4 Extreme Edition, the Core 2 (Conroe), and Skulltrail. He is currently working to make the automotive industry «more intelligent».
Of course, he has also been involved (one of the two principal performance engineer) in other microarchitectures and Intel technologies during his years at this company, such as Katmai, Prescott, Penryn, Nehalem, Sandy Bridge, Ivy Bridge, Haswell, Broadwell, Skylake, Kaby Lake, as well as the Intel Atom line, also supported development of CPU-Z, Android x86, and the multithread HyperThreading implementation…
Architecnología: I always start interviews with this question: Who is François Piednoël? (Describe yourself, please)
François Piednöel: I ‘m somebody who likes solving complicated problem, life is a gigantic Rubik cube. I am the former performance lead for Intel, and I am working on designing autonomous driving systems.
AT: When and how did you start being passionate about technology? Since you were a child? Do you have a reference? Someone who has inspired you?
FP: When I was a kid, I discovered electricity and electronics by the age of 10. At 12, I had already build 220V system automatically turning on external lights with Transistors. By the age of 13, I owned a computer and published by 1st “Listing” on ZX81. Lord Sinclair is definitively the biggest influence at the begging. Later, I got an Oric Atmos, then a PC 80286, that I started immediately to overclock. By the age of 18, I have a very fast PC for the moment, and I was upclocking all subsystems, from CPU to display adaptors… Then, I bought few books around the 80386, and I am was obsessed by the Protected mode. I wrote my own system over DOS, then, without DOS to run the preemptive mechanism.
AT: How did you enroll in Intel? And What do you feel most proud of?
FP: While flying back from Seattle to Paris, an Intel man recognized me and offered to connect me with a manager in Munich , I was not happy with the weather in Seattle, so, I agreed to go to Munich for an interview. I got to meet some amazingly smart engineers in Intel Munich and I decided to do the jump to Intel. Later on, I started working on 3DMark, and it allowed me later to move back to USA permanently, but with the good weather of the bay area 😉
AT: What exactly was your goal in the P4 EE project?
FP: You are kidding right? The Pentium 4 EE is born into pain, the categoric refusal I have to lose. I hate losing, so, when we finally got the intel fellows of Oregon to accept that we were not going to win, a very focused vice President ask us if we could find a quick way to beat the Athlon FX53/57. The hard part was to get the fellows to accept that something would have to be done. Then, we figured out by trying that we could take a future xeon, and turn it to a consumer product, it only needed some serious change of cache and prediction policies. The Quality insurance people accepted to make the effort to re-quality the EE processor, with the tuning of the desktop part. Then few buddy and I spent a week reprogramming boards and CPUs manually to have enough for the press in time, while the production would start after the quality test were done. (4 weeks back then) We were able to stop a huge win for AMD, and it triggered the investigation to walk away from Netbust. The Oregon team was very allergic to this idea, so, Paul left them out of the decision process for the up coming replacement of the Pentium 4. Few people showed up with a code name: timna, but this is a story for an other time…
AT: How is the process (inside) to develop a microarchitecture?
FP: This is a 100 people operation; this is why the management is so important. Understand the consequences of architecture changes, through performance projection methodology is critical. It used to be an Intel fellow to principal engineer problem, but with Dadi Perlmuter being able to manage and do this, a consolidation of power happened. When Dadi left, they forgot to re-separate the role. I can’t comment on the process much or time, as they are trade secrets of Intel. Just know that If managed well, it works great, as we proved it for 20 years.
AT: Can you tell some interesting anecdote about your work at Intel?
FP: Skultrail was a dare between a Vis President, a marketing lead and I, it paid totally for itself, and made profit, while being a 8000$ computer when fully equipped.
AT: 7nm, 5nm, 3nm, 2nm,… it is approaching its limit. What will happen next? New materials and structures?
FP: In my 20 years at Intel, plenty of people have predicted the end of Moore’s law, and every single time, the research guys have found a way to push the limits. I can come up myself with 5 ways to integrate more if I want to (I have written them down). What matters is the level of integration, for the moment, we are planar, soon, we will speak about transistors laid out in 3D, on the top of transistors and metal layers. This should give up many years to go, on the top of move shrinking of the process itself. I don’t think they will get to the end before I get back to dust. (I plan to live long and prosper ;)).
AT: Lately there are many «refresh» products. I guess it’s more and more complicated to extract performance from a single core only varying the microarchitecture. Could microarchitectures also have a certain limit?
FP: The x86 instruction streams have their limits, you have a certain ratio of parallelism you can extract into an instruction stream. ARM is a little different, but has its limit too. By using compiler tricks, you can help the processor to have more room for parallelism into the instruction stream, and so, you can decrease the time required to run a task. Profile guided Optimization is one of them, it is already showing up on iOS, and I expect Microsoft and LLVMs to follow on PC, there are plenty of latencies you can reduce them you are observing each iteration of the code running, while the programmer is working. Many Benchmarks today do not take in consideration the modern way to compile on iOS for example, and they compare apple with banana, like a version with PGO and without it, and make conclusions on architecture performance … This field became so complicated that only very few press side are not being manipulated by marketing departments…
Despite what François says, and as I explain here, AMD has indeed been using neural networks (based on the perceptron) to improve branch prediction. Although some feasible hardware implementations had previously been presented, the first commercial was the branch predictor of AMD’s Piledriver microarchitecture. This enables the predictor to learn from the behavior of the path taken previously by an instruction or other instructions. In this way, the percentage of hits grows, which implies greater performance when it choose whether or not a branch is taken and avoiding that the pipeline has to be emptied in case of failure of the prediction.
AT: How can AI help with this? I mean to improve the performance of the processors.
FP: AMD has tried to use a pseudo AI for prediction. I think they have given up on it, TAGE is definitively kicking AI algos yet at prediction. Keep in mind that machine learning inference all suffer of outliers, and because of it, it comes down very quick to the mistake you make when predicting. There was a remarkably interesting paper on HOTCHIP2020 about using machine learning for lay out of your floor plan. I am already working on something even further of that, were clusters are not rectangular anymore… You can decrease dramatically the latencies by using such technics.
AT: Why are so many vulnerabilities being found in CPUs? Performance obsession simply or few security resources? Maybe now are there more eyes on it (after Meltdown and Spectre)?
FP: Spectre and Meltdown are new science, now, that you know the side effect of speculation, it is pretty easy to walk around. Just understand that the protection mechanism of 80386 are the basic of the system, since 1985. That was not touch much, as it is pretty strong at running code reliably. I am convinced that only a fortification of the LDT and GDT protection management will lead to a free of hack platform, without redoing the mechanism with full encryption of memory, hackers will always find ways to attack the software or the hardware. Making each process opaque to other process with encryption is the only long term way to solve this all mess.
AT: I have seen interesting retro chip reverse engineering projects. Only with a die shot you can «decipher» the circuit, but… What can you get by analyzing a current die shot? I mean, you can identify where some parts are located, core-count,… Could you extract something more interesting than that?
FP: That activity is fairly dangerous to national security of the wester countries, I will refrain to assist or comment.
AT: And finally, Do you think RISC-V will be «the next ARM»? Is the future of x86 threatened?
FP: RISC-V is remarkably interesting because of its license model. Performance wise, they are pretty far behind, but they will play an important role, as nVIDIA is acquiring ARM, some vendors are going to start funding seriously RISC-V. On the long term, for the moment, you know the CPUs, the GPUs, but there are many more categories that will emerged to solve problems that are just more than embarrassingly parallel, or single threaded limited. Inference and scoring are some example, but computing the real world, with its inherent unpredictability will create a new kind a need for processing. I believe the open nature of RISC-V is positioning them to be that platform.