Thats a fairly specialized chip and requires a bunch of custom software. The only way it can run apps unmodified is if the math libraries have been customized for this chip. If the performance is there, people will buy it.
For a minute I thought maybe it was Risc-V with a big vector unit, but its way different from that.
The quote at the end of the posted Reuters article (not the one you’re responding to) says that it doesn’t require extensive code modifications. So is the “custom software” is standard for the target customers of nextsilicon?
Companies often downplay the amount of software modifications necessary to benefit from their hardware platform's strengths because quite often, platforms that cannot run software out of the box lose out compared to those that can.
By the time special chips were completed and mature, the developers of "mainstream" CPUs had typically caught up speedwise in the past, which is why we do not see any "transputers" (e.g. Inmos T800), LISP machines (Symbolics XL1200, TI Explorer II), or other odd architectures like the Connection Machine CM-2 around anymore.
For example, when Richard Feynman was hired to work on the Connection Machine, he had to write a parallel version of BASIC first before he could write any programs for the computer they were selling:
https://longnow.org/ideas/richard-feynman-and-the-connection...
Yeah, it's an unfortunate overlap.
The Mill-Core in NextSilicon terminology is the software defined "configuration" of the chip so to speak that represents swaths of the application that are deemed worthy of acceleration as expressed on the custom HW.
So really the Mill-Core is in a way the expression of the customer's code. really.
I can't access the page directly, because my browser doesn't leak enough identifying information to convince Reuters I'm not a bot, but an actual bot is perfectly capable of accessing the page.
The other company I can think of focusing on F64 is Fujitsu with its A64FX processor. This is an ARM64 with really meaty SIMD to get 3TFLOP of FP64.
I guess it it hard to compare chip for chip but the question is, if you are building a supercomputer (and we ignore pressure to buy sovereign) then which is better bang for the buck on representative workloads?
Even if the hardware is really good, the software should be even better if they want to succeed.
Support for operating systems, compilers, programming languages, etc.
This is why a Raspberry Pi is still so popular even though there are a lot of cheaper alternatives with theoretically better performance. The software support is often just not as good.
The implication wasn't to use the raspberry pi toolchain. Just that toolchains are required and are a critical part of developing for new hardware. The Intel/AMD toolchain they will be competing with is even more mature than rpi. And toolchain availability and ease of use makes a huge difference whether you are developing for supercomputers or embedded systems. From the article:
"It uses technology called RISC-V, an open computing standard that competes with Arm Ltd and is increasingly being used by chip giants such as Nvidia and Broadcom."
So the fact that rpi tooling is better than the imitators and it has maintained a significant market share lead is relevant. Market share isn't just about performance and price. It's also about ease of use and network effects that come with popularity.
I was an architect on the Anton 2 and 3 machines - the systolic arrays that computed pairwise interactions were a significant component of the chips, but there were also an enormous number of fairly normal looking general-purpose (32-bit / 4-way SIMD) processor cores that we just programmed with C++.
I spent a lot of time on systolic arrays to compute crypto currency POW (Blake 2 specifically). It’s an interesting problem and I learned a lot but made no progress. I’ve often wondered if anyone has done the same?
If there really is enough market demand for this kind of processor, it seems like someone like NEC who still makes vector processors would be better poised than a startup rolling RISC-V
So, a Systolic Array[1] spiced up with a pinch of control flow and a side of compiler cleverness? At least that's the impression I get from the servethehome article linked upthead. I wasn't able to find non-marketing better-than-sliced-bread technical details from 3 minutes of poking at your website.
I can see why systolic arrays come to mind, but this is different.
While there are indeed many ALUs connected to each other in a systolic array and in a data-flow chip, data-flow is usually more flexible (at a cost of complexity) and the ALUs can be thought of as residing on some shared fabric.
Systolic arrays often (always?) have a predefined communication pattern and are often used in problems where data that passes through them is also retained in some shape or form.
For NextSilicon, the ALUs are reconfigured and rewired to express the application (or parts of) on the parallel data-flow acclerator.
My understanding is no, if I understand what people mean by systolic arrays.
GreenArray processors are complete computers with their own memory and running their own software. The GA144 chip has 144 independently programmable computers with 64 words of memory each. You program each of them, including external I/O and routing between them, and then you run the chip as a cluster of computers.
Text on the front page of the NS website* leads me to think you have a fancy compiler: "Intelligent software-defined hardware acceleration". Sounds like Cerebras to my non-expert ears.
NEC doesn't really make vector processors anymore. My company installed a new supercomputer built by NEC, and the hardware itself is actually Gigabyte servers running AMD Instinct MI300A, with NEC providing the installation, support, and other services.
You can indeed and should assume there is a heavy JIT component to it.
At the same time, it is important to note that this is geared for already highly parallel code.
In other words, while the JIT can be applied to all code in principle, the nature of accelerated HW is that it makes sense where embarrassingly parallel workloads are around.
Having said that, NextSilicon != GPU, so different approach to acceleration of said parallel code.
In a way, this is not new, it’s pretty much what annapurna did: they took ARM and got serious with it, creating the first high performance arm cpus. Then they got acqui-hired by amazon and the rest is history ;)
Stop using Apple, or Google, or Amazon, or Intel, or Broadcom, or Nvidia then. All have vast hardware development activities in that one country you don't like.
[1]https://www.servethehome.com/nextsilicon-maverick-2-brings-d...
For a minute I thought maybe it was Risc-V with a big vector unit, but its way different from that.
By the time special chips were completed and mature, the developers of "mainstream" CPUs had typically caught up speedwise in the past, which is why we do not see any "transputers" (e.g. Inmos T800), LISP machines (Symbolics XL1200, TI Explorer II), or other odd architectures like the Connection Machine CM-2 around anymore.
For example, when Richard Feynman was hired to work on the Connection Machine, he had to write a parallel version of BASIC first before he could write any programs for the computer they were selling: https://longnow.org/ideas/richard-feynman-and-the-connection...
This may also explain failures like Bristol-based CPU startup Graphcore, which was acquired by Softbank, but for less money than the investors had put in: https://sifted.eu/articles/graphcore-cofounder-exits-company...
So really the Mill-Core is in a way the expression of the customer's code. really.
I can't access the page directly, because my browser doesn't leak enough identifying information to convince Reuters I'm not a bot, but an actual bot is perfectly capable of accessing the page.
I guess it it hard to compare chip for chip but the question is, if you are building a supercomputer (and we ignore pressure to buy sovereign) then which is better bang for the buck on representative workloads?
Support for operating systems, compilers, programming languages, etc.
This is why a Raspberry Pi is still so popular even though there are a lot of cheaper alternatives with theoretically better performance. The software support is often just not as good.
"It uses technology called RISC-V, an open computing standard that competes with Arm Ltd and is increasingly being used by chip giants such as Nvidia and Broadcom."
So the fact that rpi tooling is better than the imitators and it has maintained a significant market share lead is relevant. Market share isn't just about performance and price. It's also about ease of use and network effects that come with popularity.
The main product/architecture discussed has nothing to do with vector processors or riscv.
It's a new, fundamentally different data-flow processor.
Hopefully we will improve in explaining what we do and why people may want to care.
[1]: https://en.wikipedia.org/wiki/Systolic_array
Systolic arrays often (always?) have a predefined communication pattern and are often used in problems where data that passes through them is also retained in some shape or form.
For NextSilicon, the ALUs are reconfigured and rewired to express the application (or parts of) on the parallel data-flow acclerator.
GreenArray processors are complete computers with their own memory and running their own software. The GA144 chip has 144 independently programmable computers with 64 words of memory each. You program each of them, including external I/O and routing between them, and then you run the chip as a cluster of computers.
[1] https://greenarraychips.com
* https://www.nextsilicon.com
https://www.nec.com/en/press/202411/global_20241113_02.html
In other words, while the JIT can be applied to all code in principle, the nature of accelerated HW is that it makes sense where embarrassingly parallel workloads are around.
Having said that, NextSilicon != GPU, so different approach to acceleration of said parallel code.
In a way, this is not new, it’s pretty much what annapurna did: they took ARM and got serious with it, creating the first high performance arm cpus. Then they got acqui-hired by amazon and the rest is history ;)