The point around Julia's performance is often in the context of mathematical com...

vvanders · on Oct 14, 2017

Sure but mathematical problems don't exist in isolation. I used to do a ton of 3D graphics work with matrices, vectors, etc. We definitely couldn't use something that didn't have the right support for data layout and good runtime semantics.

C/Rust/C++ fit that very well so I wanted to understand how similar claims were made for a GC'd language. C# for instance can work with value types and was wondering is Julia has similar constructs.

andyferris · on Oct 14, 2017

Just thought I'd chime in that there is a StaticArrays.jl package that lets you make arrays which are stack allocated like value types in other languages, and you can make arrays of static arrays that layout exactly like arrays of structs in C/C++, and can be memory mapped from disk and so-on. (Such arrays of static arrays do eventually get GC'd, but that is not necessarily very frequent). They also support the same set of abstractions as other arrays including performant (SIMD) math/linear algebra that in my (biased) opinion is perfect for 3D graphics.

(As an aside, we also take this further with abstractions of 3x3 rotation matrices using a variety of internal representations such as quaternions in Rotations.jl, and allow composing coordinate transformations in CoordianteTransformations.jl, etc, etc).

Touching on the original authors post about the package ecosystem, I feel it can't be understated enough how easily different packages can compose and be used together in Julia. I like to say there is a product rule here where functionality/productivity grows as the square of the number of packages/features you can use together naturally (regardless of programming language). We can use static arrays, rotations, coordinate transformations, etc in combination with differential equation packages to simulate a system or with an optimization package to optimize a coordinate transformation for SLAM given certain measurements, and so-on, at full run-time speed but comparatively little user effort.

omginternets · on Oct 14, 2017

In the case of research, data-science and -- generally speaking -- "scientific computing", mathematical problems pretty much do exist in isolation.

This is the niche that Julia wants to occupy.

vvanders · on Oct 14, 2017

If that were the case I wouldn't have an adjacent subthread telling me that they are pretty much the same ;).

To be clear, I have no beef with Julia, I'm sure it's a fantastic language. I take issue with people thinking they can get the same level of performance without explicitly controlling their memory access patterns/allocations.

98% of developers will never need it in their careers but when you do there is no substitute.

CyberDildonics · on Oct 15, 2017

Why don't you think you can control those things? Make an array, loop through linearly, just like C. Avoid allocations in inner loops, just like C.

xfer · on Oct 15, 2017

I agree, a lot of the performance problems have to with allocation in any language, gc'd or not. I still can't believe people making the same argument against gc'd languages even when highly performant jvm exists.

vvanders · on Oct 15, 2017

It's not an issue of allocation, it's an issue of allocation location to get better cache locality.

Take the JVM, a good majority of ORM databases uses Sun.misc.unsafe to do manual native memory allocation. Not because it's faster(because it isn't faster than bumping the nursery pointer).

They do it so they can control where disparate datatypes live in memory so that as a cache line is read in the prefetcher is already pulling in the next cache line.

Things like this will get you a 10-50x performance increase, which in some cases you absolutely need it.

xfer · on Oct 15, 2017

Sure, but like you said, 98% don't need it and there is indeed a substitute when you need it.

CyberDildonics · on Oct 15, 2017

I don't see too many reasonable arguments that garbage collection is slower. Taking up more memory, having pauses and requiring the same amount of thought as modern C++ are all arguments I've heard, which is my experience with Julia (sans the pauses since I haven't done something interactive yet).

xfer · on Oct 15, 2017

Requiring same amount of thought as modern C++ is not simply true, as soon as you start dealing with cyclic data structures/shared data(reference counting takes more memory and slower than a well implemented gc), that model breaks down.

Pauses is indeed a problem, which then requires you to manually tune the gc to your settings.

CyberDildonics · on Oct 17, 2017

That's not my experience and I have dealt with all of those things to a fairly heavy degree.

> reference counting takes more memory and slower than a well implemented gc

This is a ridiculous cliche at this point. It might be true if every memory allocation was reference counted, but in C++ (and julia) almost everything winds up on the stack. What doesn't wind of up on the stack is usually being dealt with using ownership and move semantics. The number of reference counted variables in my C++ programs is usually 0 unless they are being shared across threads. Not only that, but within a thread, move semantics means that the reference count doesn't need to be touched.

While there may be 'conventional wisdom' I have implemented non-trivial software in both C++11 and Julia, and optimizing memory allocations happens in both and Julia required more thought. Julia made data structures and general functions easier to write so it wasn't as if it was a net negative, but memory allocation wise I don't feel the garbage collector made things any easier.

Then on top of that you have the myth of the 'well implemented GC'. Java, C#, D, Go and Julia are all languages where this seems to be a constant struggle. After significant R+D some are there, but if you look at D and Julia, it is a constant user complaint.

As for cyclic data structures, I'm not sure why I would do that in the first place and I'm doubly unsure why I would do it with pointers and fragmented memory allocations.

_8ca6 · on Oct 14, 2017

Value types are also addressed in the article

vvanders · on Oct 14, 2017

From what I can find[1] it looks like while Julia has compact values it doesn't have what I'd traditionally call value types. Specifically value types that live on the stack unless they are a member of a reference type(which is what C# does).

Looking at the performance docs[2] this is pretty clear in that array types(which looks like how Julia does Matrices) get allocated on the heap. You can clearly see the performance impact in the docs(1.95s vs 0.08s) that this type of behavior has.

While you can preallocate(which the docs suggest and is the only path for GC'd languages) it's not an ideal solution. If your type is smaller than a single cache line, like in the example above, you've just flushed a whole cache line just to bring in that one pre-allocated value. You also run into the issue if you don't know exactly how many values you need upfront which leads to pooling. In that scenario your pool may be large enough that you're bouncing between cache lines on different pooled objects.

This is the type of thing that you really need control over if you want performance "as good as C". Anything less will be a compromise. There's also the whole class of zero-cost abstractions that can get from C++/Rust which leverage all the above to great effect. That lets you get things like nom[3] which gives you high level semantics + productivity while maintaining parity with C.

None of this is academic, these are all optimizations I've used on shipping products that went out to millions of users. In each case we had fixed hardware with a limited execution budget and the 5-20x improvements we made were critical to us shipping a product that people wanted to use.

[1] https://discourse.julialang.org/t/how-to-know-if-object-memo...

[2] https://docs.julialang.org/en/release-0.4/manual/performance...

[3] https://github.com/Geal/nom_benchmarks/tree/master/http / https://github.com/Geal/nom

StefanKarpinski · on Oct 14, 2017

The distinction in Julia isn't between value versus reference types (which have fundamentally incompatible semantics), it's immutable types (declared using `struct`) versus mutable types (declared using `mutable struct`). Immutable types are generally stack allocated and need not even be fully materialized, whereas mutable types are typically heap allocated and fully materialized. The built-in array-type is mutable and can change size. As you say, these can be preallocated and modified in place with a rich collection of in-place, mutating algorithms in the standard library, but sometimes that's not quite enough. If you want fixed-size, stack-allocated arrays, you can use the StaticArrays package [1], which provides precisely such types. Aside from immutability, stack allocation and amazing performance, StaticArrays look and behave just like built-in arrays: one of the basic premises of Julia is to allow you to implement types like this and get the exact feature/performance tradeoff you need.

[1] https://github.com/JuliaArrays/StaticArrays.jl

vvanders · on Oct 14, 2017

Uf, I really don't like intermixing mutability with allocation location. Those seem like two completely separate concerns.

One thing that was really common for us to do was to instance a weighted graph(something like this[1]) per-actor. This means that you might have 10-300 floating point values in a block indexed by the node they interact with. It was really common to see one, maybe two values change on a per-frame basis. With the constraint above I'm now copying 300 floating point values every time any node changes which would be brutal for performance. Or I'd take a potential cache miss each time I touched the array which could be even worse if it was a reference type.

To be clear, I'm not saying Julia isn't really good at what it does. My complaint stems from the fact that you can't claim performance good as/better than C without having all these tools at your disposal.

We haven't even gotten into things like restrict[2] where things like Rust's ownership model let you get it for free[3].

[1] https://docs.unrealengine.com/latest/INT/Engine/Rendering/Ma...

[2] https://en.wikipedia.org/wiki/Restrict

[3] https://doc.rust-lang.org/nomicon/aliasing.html

StefanKarpinski · on Oct 14, 2017

> I really don't like intermixing mutability with allocation location. Those seem like two completely separate concerns.

They're not. The semantics of value types and reference types are different in the presence of mutation. So if you want uniform object semantics in a language, then objects that can be implemented as values must be immutable. There are many languages that have kept these independent and bifurcated their type system instead (Java, C#), but it's been a source a great deal of pain and frustration (e.g. Java's `int` versus `Integer` awkwardness).

Fortunately, there's nothing you can do with mutation that you can't do just as well by modifying and replacing an immutable value in a mutable cell – the compiler implements them the same way. Wrap an immutable in a mutable `Ref` and voila, you've got something equivalent to a mutable value type without exposing completely incompatible semantics in the language.

vvanders · on Oct 14, 2017

> They're not. The semantics of value types and reference types are different in the presence of mutation.

I think C#'s overloaded term for "value-types" may have caused you to misinterpret the above. Let me be more clear, if a value lives on the stack or the heap is completely separable from if it is mutable or not, that's my objection.

> ... just as well by modifying and replacing an immutable value in a mutable cell

In terms of "correctness", sure. In terms of performance see my 300+ float block example above. I shouldn't have to copy, modify, copy when I can just mutate in-place(and be explicit about that rather than relying on language semantics).

Also I can seem to find any mention in the docs about Ref's semantics aside from a passing mention in the FFI section.

anon_342njlkesr · on Oct 15, 2017

I had the same problem; my problem was not the giant number of reads, but rather that a naive replacement needs a read and has a dependency on the write-back; hence, a scatter where you modify immutable structures by replacing certain fields induces a stall on cache-miss.

In my case, julia/llvm was smart enough to figure out that the read and write can be eliminated. Hence, the julia code that replaces an immutable with a copy where only few fields are changed generates the same @code_native as the obvious evil construction (figure out where the field is stored; unsafe_store! to the pointer).

But I guess this optimization is unreliable, or at least it is not well documented when this optimization is guaranteed to happen. So the situation is not optimal, but also not as catastrophic as you would have guessed without reading the generated native code.

StefanKarpinski · on Oct 17, 2017

Right, so the way to address this is to provide guarantees that this kind of optimization /will/ occur and providing syntax for making writing "pseudo-mutating" code more convenient. There's a PR [1] for the latter, but it's been shelved while we focus on getting 1.0 out the door instead of adding new features. Optimization guarantees + convenient syntax provides everything you need without trashing the semantics of the language by bifurcating the type system into two incompatible kinds of values.

[1] https://github.com/JuliaLang/julia/pull/21912

KenoFischer · on Oct 14, 2017

For values with compiler-visible scoped lifetime, the compiler will automatically promote them to stack variables. There is currently no way to enforce this happening, but it would be perfectly possible to add such an annotation.

Regarding preallocation, you tend to want to avoid dynamic memory allocation in high performance applications anyway, so whether you do that in C++ or in Julia, doesn't really make too much of a difference. It is true this is a little harder to control in GC'ed languages than in languages where you have to do memory management manually, but the effect is about the same. Julia provides tools to figure out where you're using dynamic memory allocation and those tools will certainly improve in the future.

One thing I think is under appreciated from the performance perspective though is how easily julia lets you express data layout transformation to take better advantage of the cache hierarchy. I touched on this a bit in my JuliaCon presentation [1].

[1] https://youtu.be/uecdcADM3hY?t=32m19s

bjourne · on Oct 14, 2017

> For values with compiler-visible scoped lifetime, the compiler will automatically promote them to stack variables. There is currently no way to enforce this happening, but it would be perfectly possible to add such an annotation.

Other languages such as Java have that optimization too. But it is quite limited because it can only be applied to data whose size is known at compile-time. There is also the problem that if some data allocated in a procedure is passed to another procedure, it can't be allocated on the stack because it can escape.

A programmer working in a low-level language can be smart and choose the best locations (stack or heap) for each piece of data. A language that does not allow explicit stack access (like C does), can never make as optimal decisions.

vvanders · on Oct 14, 2017

> Regarding preallocation, you tend to want to avoid dynamic memory allocation in high performance applications anyway, so whether you do that in C++ or in Julia, doesn't really make too much of a difference.

Yes! Which is why it shouldn't take me digging through 3 different documents and still getting it partly wrong!

C/C++/Rust make this easy by annotating the type modifiers(&/*/Box/etc) with how the object lives in my runtime system. Rust even gets bonus points for giving me aliasing information as well(&mut vs &).

ChrisRackauckas · on Oct 15, 2017

You just do @. and you're done?

KenoFischer · on Oct 14, 2017

> although I am certain KenoFischer would

Sure would.