Program FPGAs with Go

215 points by cjdrake 8 years ago

jerf 8 years ago

This being the internet, let me preface this with this being honest questions, rather than attacks. I did try to read through the docs before asking but I don't see the answers directly.

Especially on the FPGA side, how does this interact with all the features of Go that seem ill-suited to an FPGA implementation? Can I write functions that generate and consume closures? Where is my garbage going on the FPGA side and how is it collected? Or is the FPGA code being written only in a subset of Go?

I understand the idea of wrapping the primitives offered by the FPGA hardware itself into channels, but I'm unclear on how one can sensibly implement a Go runtime on top of that in the FPGA without making it too difficult to understand the cost model of your Go code.

marmaduke 8 years ago

might be like OpenCL: a subset of C99 with some extensions works for parallel hardware including FPGAs already so why not a subset of Go targeting FPGAs?
- jerf 8 years ago
  
  "A subset of Go" is certainly one sensible option, but I don't see it documented. The obvious subset that one could get with the least effort is a rather brutally cut down subset of Go, to the point that I'd consider claiming it to be "Go running on an FPGA" to be nearly a lie. On the other extreme, you've got a full FPGA on hand, so nothing stops them from bringing up a simple ARM CPU die and putting a bit of general-purpose RAM there, so it's theoretically possible that you could write general-purpose FPGA code and have that FPGA-CPU gloss over any runtime issues, but that's where I get my question about how a programmer would be expected to model the costs incurred by given constructs sensibly. (Presumably, if performance is not a big deal to the programmer, they're not using FPGAs at all, so I'm going in assuming performance is at least in the top 3 concerns of anyone who might use this, if not #1.) (Also I'm not 100% sure about the intersection of what ARM cores might be available vs. what Go runs on; IIRC Go does ARM but only very high-end ones. So, take the principle of my point rather than the literal text. With enough work they can do "anything" with the FPGA code.)
zhemao 8 years ago

Under "Familiar Tooling", the webpage indicates that you will be able to use a "streamlined subset" of Go on their platform.
bogomipz 8 years ago

>"Can I write functions that generate and consume closures?"
Could you elaborate on why closures are ill-suited to FPGAs? Thanks.
- pubby 8 years ago
  
  They require garbage collection.
  
  jdblair 8 years ago
  
  If by closures you mean "anonymous functions that capture the state of in-scope values," then c++11 shows that a gc is not a requirement.
  
  bogomipz 8 years ago
  
  That was kind of my understanding of closure in Go:
  https://gobyexample.com/closures
  
  wahern 8 years ago
  
  C++ only supports capturing a variable with a lifetime at least as long as the closure. In other words, you can't _return_ a closure from a function that captures an automatic variable (i.e. "stack-allocated" variable), or pass the closure to another thread of control that such that it outlives the original function invocation. AFAIK Go will automatically heap allocate any variable that is captured. (Theoretically, it could attempt to prove it won't out-live the enclosing function(s).) Automatic heap allocation requires automated GC.
  C++ has lots of syntactic sugar, but at the end of the day, like C and Rust it's a fundamentally pass-by-value language.
  Here's a good paper which describes some of the complexities and design decisions for a full and complete closure implementation that doesn't require decoration and compiler hinting:
  http://www.cs.tufts.edu/~nr/cs257/archive/roberto-ierusalimschy/closures-draft.pdf
  
  wahern 8 years ago
  
  More general issue: https://en.wikipedia.org/wiki/Funarg_problem
  
  jdblair 8 years ago
  
  This is important nuance that makes complete sense but that I didn't actually appreciate. Thank you.

rubenfiszel 8 years ago

Congrats!

We, a stanford lab, are pursuing similar goals but opensource and from a Scala DSL although our doc (http://spatial-lang.readthedocs.io/en/latest/tutorial/starti...) is not that up-to-date:

https://github.com/stanford-ppl/spatial-lang

eeks 8 years ago

I'll RTFM later, but in the meantime : how to you compare to Chisel https://chisel.eecs.berkeley.edu ?
- aseipp 8 years ago
  
  Spatial is built on Chisel, and from what I can tell, basically offers you an "SDSoC" experience for Scala.
  In Chisel, you write Scala that is compiled to Verilog(?) and you put it into your synthesis toolchain. But Chisel is mostly just that: it's a HDL, and not much more. And if you want to talk to an FPGA, especially from software, you still have to write another pile of glue that does interfacing to your device, over your peripherials, etc.
  Spatial gives you more on top of Chisel. Instead, you simply write a single program and say "Accelerate this bit", and it generates both the hardware and software and glues it all together. This means you write a single program once and the compiler generates all the glue for you, so the usage is more seamless.
  This is what "SDSoC" from Xilinx does, but for C/C++. You simply write a C program and annotate functions as "Accelerate" and it compiles both the hardware and software for you and generates the interconnections. Spatial is like that, for Scala.
  
  rubenfiszel 8 years ago
  
  Our project is Spatial, not Spinal though :)
  
  aseipp 8 years ago
  
  Sorry about that, fixed! I get confused, I use Clash/Haskell more than Scala, so this is all second hand :) (IIRC, Spinal is a Chisel fork which is probably where it came from...)
  
  eeks 8 years ago
  
  Gotcha. So very similar to what IBM tried to do with LiquidMetal (http://researcher.watson.ibm.com/researcher/view_group.php?i...), albeit for Java.
  
  zhemao 8 years ago
  
  > In Chisel, you write Scala that is compiled to Verilog(?)
  Berkeley grad student here. The current Chisel compiler generates an intermediate representation that can be compiled to any target. Our Verilog backend is definitely the most developed, but we also have an interpreter that can directly simulate the IR.
  
  aseipp 8 years ago
  
  Thanks! Is this a reference to FIRRTL that I've seen some light references to (e.g. Yosys's experimental FIRRTL backend)? I was under the impression FIR was somewhat new and wasn't sure if it was used in Chisel (yet).
  (To be clear, I figured you compiled to a generic IR before lowering onto some chosen HDL, regardless if it's FIR or not, but wasn't sure if Chisel had multiple HDL backends strictly speaking).
  
  zhemao 8 years ago
  
  Yes, I'm talking about FIRRTL, which is used in the latest version of Chisel (Chisel 3).
  There aren't official releases of this version yet, but you can get snapshots (if you don't mind some API breakage now and then). It works fairly well now and we use it in all our RTL designs.
- rubenfiszel 8 years ago
  
  We are not only concerned by FPGA but also CGRA (developping our own: Plasticine). For FPGA, we target Chisel. Chisel is Verilog without the boilerplate. We aim to be more high-level by generating also all the control flow.
  
  eeks 8 years ago
  
  Intuitively it seems easier to build strong inference heuristics for CGRA than FPGA given the more constrained (limited?) degrees of design freedom.
  I've always been wary of inference-based systems that target clocked logic. Most of them (especially the C-based ones) are not suited to produce efficient designs and usually require some heavy massaging of the original code to make the inference system happy.
  
  petra 8 years ago
  
  Like with any academic work, the biggest challenge is a route to market. Do you plan on any(for the your hardware tech) ?
  
  pawadu 8 years ago
  
  I think the main challenge of Chisel is the choice of language (Scala).
  The designers of Chisel don't speak the same "language" as your average hardware designer. Maybe this is why people have invested tons of hours to re-write risc-v in pure verilog [1] and VHDL [2].
  ---
  [1] https://github.com/cliffordwolf/picorv32
  [2] https://github.com/sergeykhbr/riscv_vhdl
aseipp 8 years ago

Shameless plug for a friend I know, but there's also Connectal, another open source project aiming to do the same thing, but for BlueSpec Verilog: https://github.com/cambridgehackers/connectal
The real shame of course is that Connectal is open source, although BSV is not...
dnautics 8 years ago

I've been playing around implementing this, but am stuck on sequential logic (I think it will have to use closures and function generators of some sort)...
https://github.com/interplanetary-robot/Verilog.jl
Wrote it in three days; although it's very young, some of the strengths are emission of human-readable verilog, and the ability to build the verilog into C (using verilator) and doing continuous testing without ever leaving julia.

mmastrac 8 years ago

Interesting. This is a neat approach to building a register-transfer language using some high-level bindings. Go (at a very high level) seems to have a pretty good match with the async nature of transistor-level bit-flow operations when you use channels. In theory you could map this to any language with strongly-typed async operations.

You get advantages of Go's type-checker, though you're probably limited to a very small subset of the language. Note that you won't be able to use a lot of third-party languages unless their translation is really good or the third-party package code is very simple.

Docs seem to be available here (thanks to another comment in this thread): http://docs.reconfigure.io/welcome.html

I think the approach of mapping a high-level language to a CPU is not necessarily novel, but using Go for it is.

I used a different approach to build a bit-flow graph in Java for a project in the past. Rather than map the whole language to the circuit, I created some APIs that would generate the graph and export it. It looks fairly similar to what you see here.

dhbx9 8 years ago

The website in itself does not give me an understanding of whatever the hell they're doing. I'm a computer engineering undergrad and I've done FPGA's before. I don't see what's "code in go and deploy FPGA's to the cloud". I think that putting some code and other actual use cases to the website would be nice.

Looking at some of the examples it seems to me that you'd still need to know hardware programming, memory etc. Now my comment seems very snarky, but I still think that it's a huge achievement to have gotten this far with this and I wish them luck! I just don't get the target user base.

pella 8 years ago

docs: http://docs.reconfigure.io/

github: https://github.com/ReconfigureIO

zackmorris 8 years ago

This is great, I've waited 20 years for this (computer engineering degree 1999). For all the naysayers - what has gone wrong with computing, why Moore's law no longer works, etc etc is that we've gone from general purpose computing to proprietary narrow-use computing thanks to Nvidia and others. VHDL and Verilog are basically assembly language and are not good paradigms for multicore programming.

The best languages to take advantage of chips that aren't compute-limited* are things like Erlang, Elixir, Go, MATLAB, R, Julia, Haskell, Scala, Clojure.. I could go on. Most of those are the assembly languages of functional programming and are also not really usable by humans for multicore programming.

I personally vote no confidence on any of this taking off until we have a Javascript-like language for concurrent programming. Go is the closest thing to that now, although Elixir or Clojure are better suited for maximum scalability because they are pure functional languages. I would give MATLAB a close second because it makes dealing with embarrassingly parallel problems embarrassingly easy. Most of the top rated articles on HN lately for AI are embarrassingly parallel or embarrassingly easy when you aren't compute-limited. We just aren't used to thinking in those terms.

* For now lets call compute-limited any chip that can't give you 1000 cores per $100

metaphor 8 years ago

> VHDL and Verilog are basically assembly language and are not good paradigms for multicore programming.
Err...I respectfully disagree. They're HDLs and more akin to hardware design than any traditional software abstraction, assembly included.
- Taniwha 8 years ago
  
  It sort of depends on coding style - if you're hand instantiating gates and wiring them in verilog then yes it is a lot like assembly ... on the other hand if you're coding in a high level way and having your design compiled to gates then no, it's not at all like assembly
  It also may depend on where you learned your craft - my decade as a logic designer seemed to show people who started life as an EE and went straight into logic design coded at a lower level than people who started as programmers (who tended to be more productive as a result) ....
  
  Scipio_Afri 8 years ago
  
  Does your comment imply that people who started as programmers tended to be more productive at FPGA design?
  
  Taniwha 8 years ago
  
  Well (from a VERY VERY small sample) I think yes (I was referring more generally to people doing logic design in Verilog, not just FPGAs) - in my experience they tended to write at a higher level and knock out gates faster .... that left them with more time to optimise their designs
  But this is old data - the world changes
  
  jnordwick 8 years ago
  
  In my experience although limited most software developers write terrible HDL code. They think too serially, step wise, and aren't very good at parallelizing their code on the level necessary for good hardware design.
  The exception to this are those with good hardware sympathy. They can after carry over that detail oriented thought process.
  
  Taniwha 8 years ago
  
  Maybe, but I'm not talking about beginners here, I'm talking about people who already have years of experience
akira2501 8 years ago

> The best languages to take advantage of chips that aren't compute-limited*
FPGAs may or may not deserve that distinction, depending on your point of view.. but even if you concede that, they're still heavily bandwidth limited. An SDRAM interface takes up quite a bit of floor space, especially if you want more than one FIFO to move data with -- and even then, you're still standing behind relatively slow memory interface. There's SRAM on newer chips, but it's still too paltry an amount to really do anything close to general purpose computing on.. especially with the languages you've mentioned.
> why Moore's law no longer works
Moore's law no longer works because we hit 4GHz in silicon, there's nowhere to go but sidways now, and that's true whether you're in dedicated or reconfigurable chips.
stephenmm 8 years ago

Actually there is already a Scala version that is well defined. From the website https://chisel.eecs.berkeley.edu/ 'Chisel is an open-source hardware construction language developed at UC Berkeley that supports advanced hardware design using highly parameterized generators and layered domain-specific hardware languages.'
Not sure why they wouldn't use it instead.
- danellis 8 years ago
  
  It seems to be a particular phenomenon of the Go community that they want to reimplement everything in Go.
  
  pawadu 8 years ago
  
  You could argue the same thing about Scala too...
zensavona 8 years ago
I would argue that Elixir is far closer to being a "JavaScript-like language for concurrent programming" than Go, due to it's dynamic typing and relative freedom it affords in comparison to the others you mentioned (except for Clojure, which is actually quite similar in a lot of ways).
Although it is technically a purely functional language, you can almost mutate variables (in reality it is creating a new immutable variable with the same name, which takes precedence)
```
  a = 1
  # a == 1
  a = 2
  # a == 2
```
Concurrency feels very natural:
```
  # concurrent
  numbers = [1,2,3,4,5]
  doubles =
    |> numbers
    |> Enum.map(fn(n) -> Task.async(fn -> n * 2 end) end)
    |> Enum.map(&Task.await/1)
  # doubles == [2,4,6,8,10]

  # consecutive
  numbers = [1,2,3,4,5]
  doubles = numbers |> Enum.map(fn(n) -> n * 2 end)
  # doubles == [2,4,6,8,10]
```
- runeks 8 years ago
  
  > Although it is technically a purely functional language [..]
  This (purity) stirred my interest, but as far as I can see it's incorrect. This[1] Wikipedia page on pure languages does not list Elixir, and the Elixir Wikipedia page itself does not mention purity at all.
  Can anyone clarify?
  [1] https://en.wikipedia.org/wiki/List_of_programming_languages_...

kutkloon7 8 years ago

Let me be a bit pessimistic amd ignorant here.

People who want to use an FPGA should learn VHDL or verilog. There have been a lot of projects to make C compile to VHDL/verilog, and it's generally accepted that it does not work very well.

What is the advantage of using Go for the same purpose?

gluggymug 8 years ago

As a long time HW FPGA guy, I think you might want to take a look at the C projects again. I don't know whether Go has any advantages but the concept of a higher level language for development is being used by the major FPGA companies.
Both Xilinx and Altera have High Level Synthesis (HLS) tools. These use C or C++. If you know how FPGA work is generally done, you can separate the hype from the reality and you can understand how to use it for a real application.
The vendors have lots of libraries for IP. You don't write RTL from scratch. It would take too long to verify. You tie IP together. It can be DSP or generic maths or a video codec thing. The VHDL is done for you.
You write your algorithm in C++ in a particular format using compatible data types and calling HLS libraries. You run it all in C++ first and make sure it does exactly what you want in SW. This is where the algorithm is developed.
THEN you fire up the HLS tool and a couple of hours of synthesizing later (lol) you get to load a bitstream onto a FPGA to verify it.
Of course there can be problems in that translation. It takes good engineering to dive down into the design and find the issues.
My current work does not touch any HLS. I am doing the VHDL stuff. But I know the algorithms all started from SW first. It always does. For the bulk of the work, verification, it is somewhat irrelevant whether it is manually converted to RTL or done via tools.
- rthomas6 8 years ago
  
  Another HW FPGA guy here. Albeit one who has never used HLS. My concern with the whole idea of HLS is that it fails to take advantage of the parallelization capability of FPGAs, which in my opinion is one of the main reasons to use an FPGA in the first place. It sounds great for designs that are linear in nature, that is, putting data through a bunch of sequential processing blocks and then outputting some result. But for most of those cases, why not just use a processor + DSP SoC? Or even something like a Zynq? It will probably be faster.
  Seeing how FPGAs do not operate in a linear way the way that software does on a processor, why are we trying to make them work that way? It would make more sense to me to design a high-level synthesis language with a paradigm that is also not imperative: functional programming. Like, for example, how would this kind of C code even be synthesized in hardware?:
  A = 5; B_out = A + 3; A = 6; C_out = A;
  "A" is used as two different things, which is totally fine when the code is run sequentially, which must be what is happening when code like this is synthesized, but that's wasteful on an FPGA, because B_out and C_out don't actually have dependence on each other and could be computed concurrently, which is what would happen if we used VHDL to do something similar. We need a high-level synthesis language that describes a system which solves the algorithm we want, the same way VHDL does, except with more abstraction capabilities. In my opinion this could be a functional language.
  
  gluggymug 8 years ago
  
  I agree about the parallelism but you have to understand the design methodology.
  Your example is somewhat pointless. The code is written to create the HW not the other way around. I can't feed it just any crap.
  You want parallelism you have to code it.
  Zynq would actually be what I use! You start with SW. The ARM core is not that quick. You will use the FPGA to accelerate the tough parts. You may think you will have throughput issues but you have options via the high performance AXI ports. Your FPGA modules access the data in memory via DMAs.
  KNOWING what part of the algorithm you need to accelerate actually suits FPGAs, you grab the HLS and start coding.
  You have to look at some of the libraries to understand what abstraction level you are working at: https://www.xilinx.com/products/design-tools/vivado/integrat...
  Video, matrices, linear algebra, encoders/decoders. Etc. I can string them together in the same way I would string HDL IP.
  The advantage is I can run the algorithm in C++ first and test it all, under the assumption that the HLS library has the equivalent HW version for synthesis.
  There is still a lot of HW work involved. For instance in your example with A used twice. One module would calculate B_out by reading A prior to changing its value then you would have to start the C_out module. You would need a way to coordinate the two modules to share the same memory at A. But they would be running in parallel, just not started at the same time.
enjenye 8 years ago

Agreed. However, MyHDL (www.myhdl.org) allows for programming using Python, and works very well : I haven't written any VHDL (other than at top level) for years now. It would be interesting to see how far they can take it using Go.

rob_reconfigure 8 years ago

Some great questions (and some other really exciting projects m!)

It's early days for us at reconfigure.io, we're just working with a few core early users at the moment and we'll be bringing more examples, benchmarks and increased access over time.

jnordwick 8 years ago

How are you dealing with GC issues? Is there a paper or docs?

mhh__ 8 years ago

Are they using soft-cores on a fpga or actually synthesizing a design specific to one's go code? Am I blind or is is the website not very clear?

quadrature 8 years ago

you're not blind, i was expecting some code samples to make this clear. I assume its actually synthesizing a specific design because otherwise it isn't really noteworthy, but its not clear.

ereyes01 8 years ago

This isn't quite the same thing, but it kind of reminds me of Altera's Nios soft processor: https://www.altera.com/products/processors/overview.html

In one of my previous lives doing embedded development, we were able able to program the FPGA using pretty plain looking C on the Nios, which just dedicated a portion of the FPGA's gates to running a simple, ARM-like processor.

It was cool for us software dudes because we could just do general-purpose computing (mostly) on the FPGA, and the verilog folks would wire it up for us to work right. It's not the cheapest way to design a product, but the stuff I worked on had crazy high profit margins, so it was a fair trade-off for better productivity.

cosinetau 8 years ago

FPGAs are worth studying in case anyone doesn't really know them. I once had an interview question right out of uni, and it was about FPGAs and I didn't know what they were. The interviewer really looked down his nose at me after I told him that.

dsacco 8 years ago

Unless you were interviewing for a position that required that sort of knowledge, your interviewer was just being an asshole.
Understanding of FPGAs is what I'd consider specialized enough that most people shouldn't be required to demonstrate it in a technical interview.
- metaphor 8 years ago
  
  For CS majors, you're probably spot on. For EE recent grads, no excuses. Heterogenous compute isn't the future: it's the now, ready or not.
- cosinetau 8 years ago
  
  I knew who I was dealing with before he said anything. ;)
analog31 8 years ago

Ask HN: What's an inexpensive set-up for learning? I've done a lot of work with microcontrollers, but never FPGAs. Is there a cheap development board with software, that I can try out at home?
- cosinetau 8 years ago
  
  I'm not up on a lot of the hardware stuff. Raspberry Pi? Beagleboard?
  
  analog31 8 years ago
  
  Got 'em. Those are microcontrollers. So to elaborate a bit more, a microcontroller (MCU) is a single integrated circuit containing a microprocessor CPU plus a fair amount of input/output hardware. These range in sophistication from 8-bit controllers costing less than a dollar and consuming very little electrical current, all the way up to the processors capable of running a cellphone and its likes. There are dozens of MCU's in your house, your car, and the gadgets that you carry around.
  A field-programmable gate array (FPGA) is a bunch of logic gates on a chip, that can be "wired" together in almost arbitrary ways. So in a sense it's more primitive than a MCU. The "field programmable" part is that the wiring pattern is programmed on a desktop computer and fed into the IC. This allows creating combinatorial or sequential logic functions that execute extremely quickly and often in parallel. In addition to simple logic gates, modern FPGA's offer other kinds of "cells" such as memory registers.
  Ironically, people have created wiring patterns that implement a complete microprocessor on an FPGA.
  This is the extent of my own knowledge, based on product descriptions but no actual experience doing anything with an FPGA.
  
  wott 8 years ago
  
  So you would fail that interview again :-)
  
  cosinetau 8 years ago
  
  No. I would not take a second interview from them.
- alunaryak 8 years ago
  
  Terasic makes good Altera boards such as the $100 DE0-Nano series, and Digilent makes a decent Xilinx Artix-based dev board for around the same price. They both have decent community support, but just be aware that setting up the dev environment and getting all of the USB cable permissions sorted out (especially on linux) can take time. It's worth sticking with, though, because FPGAs are a blast to mess around with. The sheer compute power available to you is fun.
  [1] http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=E...
  [2] http://store.digilentinc.com/arty-artix-7-fpga-development-b...
- chx 8 years ago
  
  I have an Embedded Micro Mojo v3 I do not use. It was $75 but if you pay shipping from Vancouver BC to wherever you are, it's yours for free. chx1975 gmail
- tikwidd 8 years ago
  
  Check out http://www.fpga4fun.com/

ris 8 years ago

This looks awwwwfully nonfree...

metaphor 8 years ago

In what sense? Target is an FPGA, so the point strikes me as moot.
- ris 8 years ago
  
  Investing time in writing your complex code in their Go dialect may leave you in shackles and at the mercy of their pricing plans.
  
  metaphor 8 years ago
  
  I completely agree with your point.
  To wit, pushing the performance of any FPGA with <insert_favorite_hdl_here> will inevitably result in a high degree of technical debt and/or vendor lock-in, e.g. instantiating device-specific hard IP.
  At the end of the day, we--as developers--aren't quite at that point where we can have our cake and eat it too, making this solution yet another product lifecycle trade-off decision.

blacksmythe 8 years ago

  >> expensive, hard-to-source hardware engineering skills.

If only that were true.

Lots of hardware engineers have moved into software.

lumost 8 years ago

If it was more practical to source hardware skills and develop fpga components there are quite a few companies and academic institutions interested in exploring direct hardware acceleration. In particular seeing examples of matrix operations carried out on fpga or graph traversals would be an excellent starting point to driving adoption

gaelow 8 years ago

This is cool but where are the metrics, examples?

deepnotderp 8 years ago

Someone needs a tool chain with chisel as the target.

hobo_mark 8 years ago

Why not target verilog directly? Chisel is just a verilog generator in the end.
- vasilia 8 years ago
  
  Because of ML and neural networks hype. No one wants to learn VDHL or Verilog, they want OpenCL or something similar.
  
  alunaryak 8 years ago
  
  I hear what you're saying, but Chisel is a lot closer to an RTL like Verilog than it is to OpenCL.

bonoetmalo 8 years ago

This post is peak 2017 Hacker News