Some Insights from a Julia Developer

88e282102ae2e5b · on Oct 14, 2017

It's great and all, but I can't justify switching languages for minor improvements over Python + numpy/scipy. I'd be abandoning:

  * My deep knowledge and experience with Python
  * My entire codebase
  * The ability to work on projects with colleagues who don't also switch
  * The certainty that when I leave my current job, someone will be able to pick up after me
  * Zero-based indexing

I've started to do some work in Rust when it makes sense, since there's occasionally a compelling case and it's substantially different than Python. Incremental advances in programming languages just aren't worth it, and knowing that other people are probably coming to the same conclusion means I can't expect a serious community to ever arise around Julia.

mbauman · on Oct 14, 2017

Honest question: did you read the post at all? Even the quick summary addresses this.

Chris' whole point is that the biggest benefits of switching to Julia will be felt by the folks that are developing the packages and libraries for others to use. He's advocating that the best way to get you to want to switch isn't incremental language-level features, but rather it's first-in-class domain-specific packages.

This is something that will take time, but Julia's language-level features are uniquely positioned to enable the development of such packages. Chris is extremely productive (and definitely an outlier), but in less than two years he managed to coordinate and build a first-in-class ecosystem for differential equations.

So he's not advocating for you to switch at all — he's advocating for folks to build the packages (like his) that will get you to want to switch.

rattray · on Oct 14, 2017

Agreed. A relevant excerpt:

> But for end users throwing together a 100 line script for a data analysis? I don't think that this crowd will actually see as much of a difference between other scripting languages if the packages in the other languages they are using are sufficiently performant (this isn't always true, but let's assume it is). To people who aren't "pros" in the language, it will probably look like it just has a different syntax. It will be a little faster than vectorized code in other languages if code is in type-stable functions, but most of the differences a user will notice will come from the mixture of features and performance of packages. Because of this, I am not sure if marketing the features of the language is actually the best way to approach the general audience. The general audience will be convinced Julia is worthwhile only by the package offering.

(For context, this was after several hundred words of compelling, detailed examples of Julia's language-level advantages).

EDIT: the author then goes on to claim

> that "lack of packages" isn't really a problem with the ecosystem: you can find a great package that does what you're looking for.

with several examples, pointing to poor discoverability, uniqueness, distribution and branding.

He also recommends more practical, use-case-specific how-to guides.

sgt101 · on Oct 14, 2017

My experience is that Julia promotes lucid and error free code vs R (especially) and Python. For me this matters more and more due in part to the business impact of my 100 line scripts but more significantly due to the impact of other people's 100 line scripts.

Certhas · on Oct 14, 2017

This is now happening. The DifferentialEquations Library in Julia seems to be best in class.

More importantly: I can imagine writing algorithmic improvements to it for my use case, which is where I hit a wall with python + scipy/numpy/numba.

I can't go and use my python skill to teach a scipy routine about some feature I need.

elcritch · on Oct 14, 2017

This is where Julia really shines. Hopefully as more researcher's realize they can quickly make customized/specialized implementations for their projects it'll help reinforce momentum. It's awesome being able to quickly dig into a library, grok the algorithm and tweak it. Actually could make for some fun hackathons!

88e282102ae2e5b · on Oct 14, 2017

I did read the post.

> but rather it's first-in-class domain-specific packages.

You'll note I didn't cite lack of libraries as a reason I'm reluctant to use Julia. I agree that a solid ecosystem is critical for adoption, but my point is that, for me, it's not sufficient to overcome the human/time/cost factors.

ViralBShah · on Oct 14, 2017

In my opinion, this is not about abandoning, but supplementing. Python is good at many things, as is Rust, as is Julia.

Stuff like the ODE library that Chris has, or JuMP (for mathematical programming) that Miles Lubin, Iain Dunning and Joey Huechette wrote, or a number of other packages are simply not available elsewhere. The Celeste project, for example, achieved 1.6 PetaFlop/sec of compute rate on half a million cores.

Programming languages are abstractions. Some abstractions are better for certain kind of problems. Many of the amazing libraries that are available in Julia would simply not be easy to do otherwise.

Think about it - if these were incremental advances and no serious community would ever form - then Chris would have never written his packages. Julia has seen over a million downloads. In my biased opinion, it is more than an incremental advance. I think the right approach is to understand where those advances are and apply it to the right kind of problems.

There is a balance to be struck between the two extremes of nothing new will ever get adopted, and everything should adopt anything new and shiny that comes out.

throwaway7645 · on Oct 14, 2017

Python's PULP & Pyomo are pretty similar to JUMP.

StefanKarpinski · on Oct 14, 2017

The JuMP paper [1] compares JuMP with all the major commercial and open source options in this area, including Pyomo. Pyomo is orders of magnitude slower than JuMP and doesn't scale as well so the comparison gets worse as problems get larger. (Pyomo had the worst performance of all the systems compared.) JuMP is the only open source option that has performance like commercial systems. Throw in JuMP's improved expressiveness and usability over Pyomo and JuMP's incredibly broad solver support [2] and there's really no contest. These days, your best option in Python is probably to call JuMP via pyjulia.

[1] https://arxiv.org/pdf/1508.01982.pdf

[2] http://jump.readthedocs.io/en/latest/installation.html#getti...

throwaway7645 · on Oct 14, 2017

Not sure how much performance matters here. You're just farming it off to CPLEX or Gurobi anyway. If performance really matters, you'll do this part in C++ anyway.

StefanKarpinski · on Oct 14, 2017

Unless you're solving toy problems, it matters quite a bit. In mathematical optimization, constructing the problem tends to be just as expensive as solving the problem is – sometimes more so. The existence of expensive commercial systems like AMPL and GAMS that only exist to express optimization problems demonstrates that this is a non-trivial issue that people are willing to pay money for. Using C++ APIs to solvers is extremely painful, inflexible, error-prone, and completely locks you into a specific solver.

_dps · on Oct 15, 2017

> Unless you're solving toy problems, it matters quite a bit. In mathematical optimization, constructing the problem tends to be just as expensive as solving the problem is – sometimes more so.

With all due respect, this is just false. I can construct a million dimensional linear program for a large network problem in a few seconds in python. Solving it with cplex can take minutes.

I have a Ph.D. in computational mathematics and do this for a living. I write lots of C, and still almost exclusively use Python or Lua for problem formulation because the performance just doesn't matter.

I'm certain there are some cases where it's true. One I've encountered is doing real-time solution of MILP for adaptive path planning. Here the problems are of moderate size and you want to reconstruct them from geometry data every 10ms or so. Here Python will indeed bite you in the problem formulation stage.

throwaway7645 · on Oct 15, 2017

Solver licenses are extremely expensive (~100k in production), so switching solvers isn't very common. Going to C++ helps a bit with I/O before it gets to the solver if you really need it, but a lot of people do just fine with things like AIMMS which is a proprietary high level modeling language similar in performance to Python. Maybe Julia + JUMP is nice if you want to remove I/O performance barriers, but don't want C++ pain & more flexibility, but I don't see JUMP being talked about much in the industry currently. If you don't mind me asking, what is your experience in this area and what domain are you working in?

tavert · on Oct 16, 2017

Stefan is one of the co-creators of the Julia language, not really an operations research person. He must have misread the results in the JuMP paper where Pyomo outperforms CVX and Yalmip on several problem classes.

StefanKarpinski · on Oct 16, 2017

You're right, I was looking at the lqcp results and misread the other results. Still, the main takeaway is that of the open source options for expressing OR problems, JuMP is the only one in the same class as the commercial tools. If you've got a problem where problem construction isn't hard, cool; often that's not the case.

throwaway543219 · on Oct 14, 2017

Can you share an example of calling JuMP directly via pyjulia? I was under the impression that Julia macros are not currently callable from Python via pyjulia.

ssivark · on Oct 14, 2017

Any time 1-based indexing is mentioned as a shortcoming of Julia reminds me of PG's 'Blub paradox' [1].

> As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

Julia has offset array indexing (for essentially no additional cost) which is an amazingly useful feature! See eg. [2] Think of this feature as blurring the distinction between data (accessing an array) and computation (calling a function). The fact is that arrays as they are used (contiguous memory location collection) often carry more information than being just a dumb list of arbitrary data, and it's very convenient to expose that in their interface.

[1]: http://www.paulgraham.com/avg.html [2]: https://julialang.org/blog/2017/04/offset-arrays

ACow_Adonis · on Oct 14, 2017

I've seen the claim that Julia has offest-arrays at essentially no additional cost a few times now, and I'm just not buying it.

As in, I don't see how its possible without stretching the meaning of "no additional cost". I'm familiar with displaced arrays and the like in Common Lisp, so i get that you can do offsets and things without using much additional memory, and i feel its a small jump from that to arbitrary indexing.

I mean, sure, if you're programming with constants or common patterns or things like that, a compiler could theoretically figure out and adjust the references itself. Or you can use macrology (I presume, since julia is so close to being lisp in matlab clothing, there's no issues with this in theory). Or you can come up with some kind of functional wrapper to adjust offset indexes back to the base case...

But if you're actually using run-time calculations using returned numbers from a calculation to index N-quadrillion times into various arrays and you're swapping between a standard and arbitrary/offset index measure, which is about the only time I can see you'd actually care about the relative efficiency of repeated arbitrary indexing, then it can't be a pre-compiled thing. The offsets between the actual numbers you're getting back and the actual index into the arrays have to be calculated in an additional step at run time.

Now its true that's likely to be only an addition or multiplication operation for each one, (and if it isn't, the run time application of another transformation just cements my point) but perhaps my notion of "essentially no additional cost" is more strict...

mbauman · on Oct 14, 2017

This is easily testable:

    julia> using BenchmarkTools
           using OffsetArrays
           A = rand(1000)
           O = OffsetArray(A, 0:999)
           function naive_sum(A)
               s = zero(eltype(A))
               @unsafe for i in eachindex(A) # @unsafe will eventually be folded into @inbounds
                   s += A[i]
               end
               return s
           end
    naive_sum (generic function with 1 method)
    
    julia> @btime naive_sum($A)
      1.025 μs (0 allocations: 0 bytes)
    514.0960505118594
    
    julia> @btime naive_sum($O)
      1.023 μs (0 allocations: 0 bytes)
    514.0960505118594

Yes, there's an extra operation or two in there, but it's effectively free on modern CPUs due to caching/op latencies/pipelining/etc.

StefanKarpinski · on Oct 15, 2017

The performance is the same because the core loop in both cases is identical:

    loop:
        vaddsd	(%rcx), %xmm0, %xmm0
        addq	$8, %rcx
        addq	$-1, %rax
        jne	loop

There's more setup required for offset arrays, but otherwise there's no difference. I'm not sure why arrays having an offset would be any more work: array indexing in general involves adding an index to a base pointer; having an offset effectively just changes what the base pointer value is.

88e282102ae2e5b · on Oct 14, 2017

I didn't mean to seem petty, it's just that going back and forth between zero-based and one-based languages just added extra mental overhead (admittedly this was with Fortran, so there may have been other issues).

ssivark · on Oct 14, 2017

The way I think of it is that different indexing schemes suit different problems. I want to think carefully about the problem domain and use the most convenient convention. For example, when my array stores a time series, I would like the index to correspond to timestamps (and still be performant, so long as my timestamps can be efficiently mapped to memory locations, which is true for affine transformations, for example). When another array stores the Fourier transform of that time series, I would like to access elements by the frequency they correspond to. That stops me from making annoying indexing errors (eg: off-by-1), because the data structure's interface maps nicely to the problem domain. I find that much easier than the cognitive cost that comes with trying to shoehorn a single convention on every situation. But it's difficult to appreciate that when thinking of language constructs divorced from specific problem domains, as one tends to do when typically studying data structures and/or algorithms.

simonbyrne · on Oct 15, 2017

Fourier transforms are a great example: when I started with Julia, this was the only time I missed 0-based indexing.

Now there is the awesome FFTViews.jl package which goes one better with wrap-around indexing: https://github.com/JuliaArrays/FFTViews.jl

iamed2 · on Oct 14, 2017

Sounds like https://github.com/JuliaArrays/AxisArrays.jl/ ;)

tlarkworthy · on Oct 14, 2017

I am so sold I want to learn right now.

chrispeel · on Oct 14, 2017

I think it crucial to offer at this point the important Julia package TwoBasedIndexing.jl :-)

https://github.com/simonster/TwoBasedIndexing.jl

Fomite · on Oct 15, 2017

Honestly, given how much work I do in R, and how much the assumption of 1-based indexing is in my field, this is a massive Julia selling point for the exact same mental overhead reasons you mention.

Bromskloss · on Oct 15, 2017

Wait a minute. Having an arbitrary offset is one thing. Settling on 1 as a standard is another one!

jernfrost · on Oct 14, 2017

The improvements are not minor they are massive. Citing rust shows that the advantage of Julia has not been explained well enough. Julia allows you to write as performant code as Rust with a much smaller investment in learning. You cite your concern for spending time learning something new. That makes no sense considering the high learning curve and complexity of Rust compared to Julia. Julia is quite fast to learn and has far more overlap with Python than Rust to make the transition easy. You can call python code from Julia and ise many of the same tools like the notebook.

vvanders · on Oct 14, 2017

Have a citation on the performance vs Rust?

I'm skeptical any GC'd language can approach C/Rust unless they have explicit mechanisms to do data layout for using the cache/prefetcher to the fullest degree.

ViralBShah · on Oct 14, 2017

The point around Julia's performance is often in the context of mathematical computing. The two languages are designed for very different use cases.

Julia is designed to make it extremely easy to write high performance mathematical programs with ease. Rust is designed to make it easy to do systems programming. Both are high performance for the things they are designed to do. Just like I wouldn't do systems programming in Julia (although I am certain KenoFischer would), I wouldn't want to do mathematical computing in Rust.

vvanders · on Oct 14, 2017

Sure but mathematical problems don't exist in isolation. I used to do a ton of 3D graphics work with matrices, vectors, etc. We definitely couldn't use something that didn't have the right support for data layout and good runtime semantics.

C/Rust/C++ fit that very well so I wanted to understand how similar claims were made for a GC'd language. C# for instance can work with value types and was wondering is Julia has similar constructs.

andyferris · on Oct 14, 2017

Just thought I'd chime in that there is a StaticArrays.jl package that lets you make arrays which are stack allocated like value types in other languages, and you can make arrays of static arrays that layout exactly like arrays of structs in C/C++, and can be memory mapped from disk and so-on. (Such arrays of static arrays do eventually get GC'd, but that is not necessarily very frequent). They also support the same set of abstractions as other arrays including performant (SIMD) math/linear algebra that in my (biased) opinion is perfect for 3D graphics.

(As an aside, we also take this further with abstractions of 3x3 rotation matrices using a variety of internal representations such as quaternions in Rotations.jl, and allow composing coordinate transformations in CoordianteTransformations.jl, etc, etc).

Touching on the original authors post about the package ecosystem, I feel it can't be understated enough how easily different packages can compose and be used together in Julia. I like to say there is a product rule here where functionality/productivity grows as the square of the number of packages/features you can use together naturally (regardless of programming language). We can use static arrays, rotations, coordinate transformations, etc in combination with differential equation packages to simulate a system or with an optimization package to optimize a coordinate transformation for SLAM given certain measurements, and so-on, at full run-time speed but comparatively little user effort.

omginternets · on Oct 14, 2017

In the case of research, data-science and -- generally speaking -- "scientific computing", mathematical problems pretty much do exist in isolation.

This is the niche that Julia wants to occupy.

vvanders · on Oct 14, 2017

If that were the case I wouldn't have an adjacent subthread telling me that they are pretty much the same ;).

To be clear, I have no beef with Julia, I'm sure it's a fantastic language. I take issue with people thinking they can get the same level of performance without explicitly controlling their memory access patterns/allocations.

98% of developers will never need it in their careers but when you do there is no substitute.

CyberDildonics · on Oct 15, 2017

Why don't you think you can control those things? Make an array, loop through linearly, just like C. Avoid allocations in inner loops, just like C.

xfer · on Oct 15, 2017

I agree, a lot of the performance problems have to with allocation in any language, gc'd or not. I still can't believe people making the same argument against gc'd languages even when highly performant jvm exists.

vvanders · on Oct 15, 2017

It's not an issue of allocation, it's an issue of allocation location to get better cache locality.

Take the JVM, a good majority of ORM databases uses Sun.misc.unsafe to do manual native memory allocation. Not because it's faster(because it isn't faster than bumping the nursery pointer).

They do it so they can control where disparate datatypes live in memory so that as a cache line is read in the prefetcher is already pulling in the next cache line.

Things like this will get you a 10-50x performance increase, which in some cases you absolutely need it.

xfer · on Oct 15, 2017

Sure, but like you said, 98% don't need it and there is indeed a substitute when you need it.

CyberDildonics · on Oct 15, 2017

I don't see too many reasonable arguments that garbage collection is slower. Taking up more memory, having pauses and requiring the same amount of thought as modern C++ are all arguments I've heard, which is my experience with Julia (sans the pauses since I haven't done something interactive yet).

xfer · on Oct 15, 2017

Requiring same amount of thought as modern C++ is not simply true, as soon as you start dealing with cyclic data structures/shared data(reference counting takes more memory and slower than a well implemented gc), that model breaks down.

Pauses is indeed a problem, which then requires you to manually tune the gc to your settings.

CyberDildonics · on Oct 17, 2017

That's not my experience and I have dealt with all of those things to a fairly heavy degree.

> reference counting takes more memory and slower than a well implemented gc

This is a ridiculous cliche at this point. It might be true if every memory allocation was reference counted, but in C++ (and julia) almost everything winds up on the stack. What doesn't wind of up on the stack is usually being dealt with using ownership and move semantics. The number of reference counted variables in my C++ programs is usually 0 unless they are being shared across threads. Not only that, but within a thread, move semantics means that the reference count doesn't need to be touched.

While there may be 'conventional wisdom' I have implemented non-trivial software in both C++11 and Julia, and optimizing memory allocations happens in both and Julia required more thought. Julia made data structures and general functions easier to write so it wasn't as if it was a net negative, but memory allocation wise I don't feel the garbage collector made things any easier.

Then on top of that you have the myth of the 'well implemented GC'. Java, C#, D, Go and Julia are all languages where this seems to be a constant struggle. After significant R+D some are there, but if you look at D and Julia, it is a constant user complaint.

As for cyclic data structures, I'm not sure why I would do that in the first place and I'm doubly unsure why I would do it with pointers and fragmented memory allocations.

_8ca6 · on Oct 14, 2017

Value types are also addressed in the article

vvanders · on Oct 14, 2017

From what I can find[1] it looks like while Julia has compact values it doesn't have what I'd traditionally call value types. Specifically value types that live on the stack unless they are a member of a reference type(which is what C# does).

Looking at the performance docs[2] this is pretty clear in that array types(which looks like how Julia does Matrices) get allocated on the heap. You can clearly see the performance impact in the docs(1.95s vs 0.08s) that this type of behavior has.

While you can preallocate(which the docs suggest and is the only path for GC'd languages) it's not an ideal solution. If your type is smaller than a single cache line, like in the example above, you've just flushed a whole cache line just to bring in that one pre-allocated value. You also run into the issue if you don't know exactly how many values you need upfront which leads to pooling. In that scenario your pool may be large enough that you're bouncing between cache lines on different pooled objects.

This is the type of thing that you really need control over if you want performance "as good as C". Anything less will be a compromise. There's also the whole class of zero-cost abstractions that can get from C++/Rust which leverage all the above to great effect. That lets you get things like nom[3] which gives you high level semantics + productivity while maintaining parity with C.

None of this is academic, these are all optimizations I've used on shipping products that went out to millions of users. In each case we had fixed hardware with a limited execution budget and the 5-20x improvements we made were critical to us shipping a product that people wanted to use.

[1] https://discourse.julialang.org/t/how-to-know-if-object-memo...

[2] https://docs.julialang.org/en/release-0.4/manual/performance...

[3] https://github.com/Geal/nom_benchmarks/tree/master/http / https://github.com/Geal/nom

StefanKarpinski · on Oct 14, 2017

The distinction in Julia isn't between value versus reference types (which have fundamentally incompatible semantics), it's immutable types (declared using `struct`) versus mutable types (declared using `mutable struct`). Immutable types are generally stack allocated and need not even be fully materialized, whereas mutable types are typically heap allocated and fully materialized. The built-in array-type is mutable and can change size. As you say, these can be preallocated and modified in place with a rich collection of in-place, mutating algorithms in the standard library, but sometimes that's not quite enough. If you want fixed-size, stack-allocated arrays, you can use the StaticArrays package [1], which provides precisely such types. Aside from immutability, stack allocation and amazing performance, StaticArrays look and behave just like built-in arrays: one of the basic premises of Julia is to allow you to implement types like this and get the exact feature/performance tradeoff you need.

[1] https://github.com/JuliaArrays/StaticArrays.jl

vvanders · on Oct 14, 2017

Uf, I really don't like intermixing mutability with allocation location. Those seem like two completely separate concerns.

One thing that was really common for us to do was to instance a weighted graph(something like this[1]) per-actor. This means that you might have 10-300 floating point values in a block indexed by the node they interact with. It was really common to see one, maybe two values change on a per-frame basis. With the constraint above I'm now copying 300 floating point values every time any node changes which would be brutal for performance. Or I'd take a potential cache miss each time I touched the array which could be even worse if it was a reference type.

To be clear, I'm not saying Julia isn't really good at what it does. My complaint stems from the fact that you can't claim performance good as/better than C without having all these tools at your disposal.

We haven't even gotten into things like restrict[2] where things like Rust's ownership model let you get it for free[3].

[1] https://docs.unrealengine.com/latest/INT/Engine/Rendering/Ma...

[2] https://en.wikipedia.org/wiki/Restrict

[3] https://doc.rust-lang.org/nomicon/aliasing.html

StefanKarpinski · on Oct 14, 2017

> I really don't like intermixing mutability with allocation location. Those seem like two completely separate concerns.

They're not. The semantics of value types and reference types are different in the presence of mutation. So if you want uniform object semantics in a language, then objects that can be implemented as values must be immutable. There are many languages that have kept these independent and bifurcated their type system instead (Java, C#), but it's been a source a great deal of pain and frustration (e.g. Java's `int` versus `Integer` awkwardness).

Fortunately, there's nothing you can do with mutation that you can't do just as well by modifying and replacing an immutable value in a mutable cell – the compiler implements them the same way. Wrap an immutable in a mutable `Ref` and voila, you've got something equivalent to a mutable value type without exposing completely incompatible semantics in the language.

vvanders · on Oct 14, 2017

> They're not. The semantics of value types and reference types are different in the presence of mutation.

I think C#'s overloaded term for "value-types" may have caused you to misinterpret the above. Let me be more clear, if a value lives on the stack or the heap is completely separable from if it is mutable or not, that's my objection.

> ... just as well by modifying and replacing an immutable value in a mutable cell

In terms of "correctness", sure. In terms of performance see my 300+ float block example above. I shouldn't have to copy, modify, copy when I can just mutate in-place(and be explicit about that rather than relying on language semantics).

Also I can seem to find any mention in the docs about Ref's semantics aside from a passing mention in the FFI section.

anon_342njlkesr · on Oct 15, 2017

I had the same problem; my problem was not the giant number of reads, but rather that a naive replacement needs a read and has a dependency on the write-back; hence, a scatter where you modify immutable structures by replacing certain fields induces a stall on cache-miss.

In my case, julia/llvm was smart enough to figure out that the read and write can be eliminated. Hence, the julia code that replaces an immutable with a copy where only few fields are changed generates the same @code_native as the obvious evil construction (figure out where the field is stored; unsafe_store! to the pointer).

But I guess this optimization is unreliable, or at least it is not well documented when this optimization is guaranteed to happen. So the situation is not optimal, but also not as catastrophic as you would have guessed without reading the generated native code.

StefanKarpinski · on Oct 17, 2017

Right, so the way to address this is to provide guarantees that this kind of optimization /will/ occur and providing syntax for making writing "pseudo-mutating" code more convenient. There's a PR [1] for the latter, but it's been shelved while we focus on getting 1.0 out the door instead of adding new features. Optimization guarantees + convenient syntax provides everything you need without trashing the semantics of the language by bifurcating the type system into two incompatible kinds of values.

[1] https://github.com/JuliaLang/julia/pull/21912

KenoFischer · on Oct 14, 2017

For values with compiler-visible scoped lifetime, the compiler will automatically promote them to stack variables. There is currently no way to enforce this happening, but it would be perfectly possible to add such an annotation.

Regarding preallocation, you tend to want to avoid dynamic memory allocation in high performance applications anyway, so whether you do that in C++ or in Julia, doesn't really make too much of a difference. It is true this is a little harder to control in GC'ed languages than in languages where you have to do memory management manually, but the effect is about the same. Julia provides tools to figure out where you're using dynamic memory allocation and those tools will certainly improve in the future.

One thing I think is under appreciated from the performance perspective though is how easily julia lets you express data layout transformation to take better advantage of the cache hierarchy. I touched on this a bit in my JuliaCon presentation [1].

[1] https://youtu.be/uecdcADM3hY?t=32m19s

bjourne · on Oct 14, 2017

> For values with compiler-visible scoped lifetime, the compiler will automatically promote them to stack variables. There is currently no way to enforce this happening, but it would be perfectly possible to add such an annotation.

Other languages such as Java have that optimization too. But it is quite limited because it can only be applied to data whose size is known at compile-time. There is also the problem that if some data allocated in a procedure is passed to another procedure, it can't be allocated on the stack because it can escape.

A programmer working in a low-level language can be smart and choose the best locations (stack or heap) for each piece of data. A language that does not allow explicit stack access (like C does), can never make as optimal decisions.

vvanders · on Oct 14, 2017

> Regarding preallocation, you tend to want to avoid dynamic memory allocation in high performance applications anyway, so whether you do that in C++ or in Julia, doesn't really make too much of a difference.

Yes! Which is why it shouldn't take me digging through 3 different documents and still getting it partly wrong!

C/C++/Rust make this easy by annotating the type modifiers(&/*/Box/etc) with how the object lives in my runtime system. Rust even gets bonus points for giving me aliasing information as well(&mut vs &).

ChrisRackauckas · on Oct 15, 2017

You just do @. and you're done?

KenoFischer · on Oct 14, 2017

> although I am certain KenoFischer would

Sure would.

jernfrost · on Oct 14, 2017

The language page shows micro benchmarks that Julia is comparable to C in performance, when optimized. I've seen various users of the language report similar experience for packages they've built. Since C and Rust are similar in performance I would assume Julia is close to Rust in performance.

You can also easily verify this yourself by looking at assembly code dumps for JITed Julia functions. You can see that the amount of code would be comparable to that of C for many common scenarios.

As for cache and data layout. That GC'd language can't control data layout and cache usage is simply wrong. Best known example is probably Go. You can nest structs to create contiguous blocks of memory in Go, just like in C. You can also create arrays of structs as contiguous blocks of memory.

Just because Java and C# doesn't work that way, doesn't mean all GC'd languages are like that.

A struct of concrete types in Julia is just like a struct in C in the way it is packed in memory. An array of structs in Julia or a nesting of structs will have similar memory layout as in C.

So I know this sounds really odd, but Julia despite being a dynamic language thus allows more optimal memory layout than a statically typed languages such as Java.

What is frustrating with promoting Julia, is that it simply sounds way too good to be true, so people dismiss the language before they have actually read up on it. I advice you to read up a bit more on the details of how the language works. It will then become clear why it can pull off stuff which seems far too good to be true.

faitswulff · on Oct 14, 2017

Julia's website has some benchmarks: https://julialang.org/benchmarks/

C is the leftmost dot, Julia is just to the right.

pollitos · on Oct 14, 2017

Just to note, these benchmarks are biased. This is a competition between highly-tuned-to-the-processor Julia with tuned BLAS vs other languages out-of-the-box binaries. Additionally, the benchmarks tests are written inefficiently in the comparison languages. But even then Javascript comes out on top on some benchmark tests.

dnautics · on Oct 14, 2017

That's because the jit compiler in the Julia standard library selects the blas for you. As a programmer you don't have to optimize beyond using idiomatic type strategies. You could, say write a galois field type and use the \ solve operator to solve a Reed Solomon Erasure using old fashioned lu decomposition (automatically without pivoting, even), and then turn around and use the same \ operator with floats one line down and it will give you highly optimized blas.

mbauman · on Oct 14, 2017

Yes, these benchmarks do tend to be a lightning-rod for controversy. You have two complaints:

1. Yes, some languages do not make it easy to install and link against a well-tuned BLAS. The benchmarks were very recently updated, and you can see the level of effort required to get everything installed and working properly. This was all done in the open on the discourse message board.

2. Yes, the benchmarks involve testing very specific language-level features like iteration, recursion, and IO. And they're testing the ability of specific languages to do these things. Yes, of course in Python you'd look towards a vectorized library call instead of a `for` loop, but then you're not testing Python itself anymore.

bjourne · on Oct 14, 2017

How come there are no entries for Julia on the Benchmarks Game? http://benchmarksgame.alioth.debian.org/ Given Julia's good performance, it is a little strange. :)

ihnorton · on Oct 15, 2017

Inclusion is by selection of the admin [1,2]. There are Julia implementations available for most of the benchmarks [3] so perhaps eventually they will be included.

[1] https://alioth.debian.org/forum/message.php?msg_id=182495&gr...

[2] http://benchmarksgame.alioth.debian.org/play.html#languagex

[3] https://github.com/JuliaLang/julia/tree/master/test/perf/sho...

bjourne · on Oct 15, 2017

Interesting! Does that mean the Benchmarks Game is not open for new languages? The faq entry is kind of ambiguous.

igouy · on Oct 16, 2017

It means -- "If you're interested in something not shown on the benchmarks game website then please take the program source code and the measurement scripts and publish your own measurements."

Like this guy did -- https://pybenchmarks.org/

ViralBShah · on Oct 14, 2017

Only one of the benchmarks is BLAS dependent and the point is to make sure that it is easy to get a tuned BLAS in that system.

If these benchmarks have resulted in all the languages focusing on a fast BLAS, I think that is fantastic.

vvanders · on Oct 14, 2017

According to that JavaScript is faster than both Julia and C by a significant margin in iter_mandelbrot ;).

mbauman · on Oct 14, 2017

Yup, it's really amazing what millions of hours of development time will do! That benchmark in particular is a little deceiving — and we take those benchmarks very seriously! See this issue for more details: https://github.com/JuliaLang/julia/issues/24097

Thaxll · on Oct 14, 2017

Performance is not everything, people are obsessed with it and it's a mistake.

- libraries

- community

- tools

Are much more important.

astrobe_ · on Oct 14, 2017

You're overlooking the context, that's a bigger mistake.

88e282102ae2e5b · on Oct 14, 2017

Is Julia really that much more performant than numpy?

vchuravy · on Oct 14, 2017

In my opinion that is the wrong question to ask. The right question is: How fast will my code be. Numpy has been heavily optimised and is written in C and not Python.

Take a look at the link below, comparing a simple sum in different languages (among them Python and Numpy). The power of Julia is that there is no privileged code. Your code will be as fast as the base library.

http://nbviewer.jupyter.org/github/alanedelman/18.337_2017/b...

improbable22 · on Oct 14, 2017

Yes to this. When you're calling some big fast package (be it some field-specific thing, or just matrix algebra) the language you're in doesn't matter that much. In terms of speed, but also in terms of effort -- you spend most of your time reading this field-specific package's manual.

What's liberating is that for the things which aren't covered by such packages, there's no huge penalty for just writing it yourself. In speed but in effort too -- it's quicker to write the loop than to figure out how to make np.einsum do what you have in mind.

81212w1 · on Oct 14, 2017

I find this benchmark a bit disorganized, but it seems that the "hand-written" Julia function is as slow as the "hand-written" C function.

The C function is of course naive. I'm pretty sure that a hand-unrolled loop would be faster.

vchuravy · on Oct 15, 2017

You are absolutely right, an optimised C Programn would be as fast or faster than the Numpy implementation.

quintessex · on Oct 15, 2017

I've been using Julia for a year or two now, and over that time my enthusiasm for the language has waned, not grown.

Here's the standard pitch for Julia: C-like performance with R/Python/MATLAB-like expressivity.

This is undeniably true. For someone coming from R/Python/MATLAB, you will be able to program as easily in Julia, if not more easily and more cleanly, and get much better speed.

However, what I've come to realize now is that you can get something similar from Kotlin, Nim, and Crystal, but those are much more general-purpose, and often are more consistent in their performance. They don't have the same numerical libraries at the moment, but they have a much bigger use-case target, and for something like Kotlin, a lot more leverage from the community.

Rust is a lot lower-level, but as the numerical resources grow (and they inevitably will) and layers of abstraction are added, what you will probably get is something very similar to Julia in expressivity but more extensible downward, and probably more performant. Rust has a ton of room to grow, and has many good role models from C (e.g., Eigen).

This also isn't even touching on PyPy and things of that sort.

I like Julia and still reach for it when I'm starting a project. Often it's worked great, but other times I have to go back to R or Python for some library that calls C, because the Julia library, while usable in a way that pure R or Python isn't, is still not actually as fast as C. Sometimes "on the order of magnitude of C" is not actually good enough, and you need C. Also, if you poke around for benchmarks that aren't posted by the Julia developers themselves, you'll see different conclusions about the real-world performance of Julia, so there's that.

I guess I read articles like the linked post, and I end up feeling like they're misleading. Is Julia better for data analysis than R or Python? Yes, almost certainly if you were starting from scratch and not calling any products of outside languages. Is Julia better than R or Python + C as it is in the real world? Not sure, because you still have to call C anyway at some point if you're being honest. Is it better than Kotlin, Nim, or Rust? I'm even less sure. If Julia wants to maintain long-term competitiveness, it has to reach out in the opposite direction of those languages, from numerics -> general purpose computing, because they are inevitably moving from general purpose computing -> numerics, and they are bringing a lot with them.

chubot · on Oct 15, 2017

What problems are you solving with Julia? The way I see it, Kotlin, Nim, and Crystal are completely different languages aimed at a different space.

e.g. Kotlin is being used for Android apps, which is worlds away from what MATLAB and R users do.

srean · on Oct 14, 2017

The same argument applied to the numpy/scipy stack when it was new. R, SAS and a whole slew of other offerings were available at that time.

This view of yours as applied to something new is as old as the woods, progress and shifts take place regardless of that. Small mercies actually but some are still writing and maintaining COBOL applications and earning extremely respectable salaries from it.

All you are really saying is you will not be an early adopter. That's OK, there will be others. Your non-adoption is unlikely to be a shattering loss for you or for Julia.

Regarding whether improvements are minor or significant that's a matter of opinion.

vchuravy · on Oct 14, 2017

* My entire codebase

Julia has great interoperability between Python (checkout PyCall.jl) and many other languages. You don't need to abandon your old codebase.

With regards to your other points, a lot of that is subjective and hard to argue about, but for me Julia has drastically changed the way I approach writing scientific code and is not just an incremental improvement upon the status quo.

rattray · on Oct 14, 2017

Interesting. How has it changed your approach?

dnautics · on Oct 14, 2017

At work, we had some code that was written in Python. We needed it to exercise a drive (and had to be I/o bound for the metrics to make sense). I couldn't believe it when after several weeks of coworkers using it I noticed the CPU was pegged at 100%... I rewrote in Julia (which was much more terse and debuggable) and the CPU load dropped to 30%.

BeetleB · on Oct 14, 2017

All of the above were arguments for not switching away from MATLAB 10 years ago, when I switched to Python + NumPy.

mkborregaard · on Oct 14, 2017

Julia supports indexing with arbitrary bases, and it generally works ecosystem-wide.

Bromskloss · on Oct 14, 2017

Oh, it does? I've taken to saying that "zero-based indexing – that's how Julia broke my heart", because at the time, I didn't get the impression that there was anything else in sight. How do I index based on zero?

vchuravy · on Oct 14, 2017

If you have a particular algorithm that is better expressed using 0-based indexing use https://github.com/JuliaArrays/OffsetArrays.jl

Bromskloss · on Oct 14, 2017

It's not just something particular I need it for; it's everything.

SolarNet · on Oct 14, 2017

If you read the OP again carefully, you'll see that you can use offset arrays for every array you make and suffer no performance penalty, because the offset is compiled away. And the development overhead is a single library import call; I replace core language data-structures all the time in any language for features I want, this is no different.

ACow_Adonis · on Oct 14, 2017

See my other comment in the thread for the thing I've always wondered: How can you compile away an arbitrary calculation that might need to be made at run time?

(assuming we're not just arguing over arbitrary-array indexing to input predefined constants at the REPL/code level).

Bromskloss · on Oct 14, 2017

Could it be that the offset calculation to find the start of the array is needed in any case?

ACow_Adonis · on Oct 15, 2017

My thinking is this:

To access an array via index, you basically have to take the reference to the array, the index (i), and generally multiply (i) by the space set aside for each element (e).

i * e

If you've offset the array, you have to take the offset index (o), and call a mapping function (m) that maps (o) to (i), so the result can be multiplied by (e). Though it should be said there is another theoretical alternative: you can figure out a way to map (o) directly to (i * e) without the intermediate operation or mapping (o) to (i).

Where (i) and (o), and the relationship between them, is known at compile time, you can theoretically get the compiler to magic that operation away.

But where (o) is only known at run time, you have to first fetch/generate (o), apply (m) to map it back to (i), then multiply (i) by (e) before knowing how to access an array.

If you know fetch/generate (i) at run-time, you don't need the map operation (m), you just multiple (i) by (e) and there's your array reference. But otherwise there's an operation (m) sitting around whose cost must be paid.

You can't handwave it away via reliance on cpu parallel instructions or anything like that because its by definition a serial operation: you can't multiply by (e) until after you've got (i), you can't get (i) until you've applied (m), and you can't apply (m) until after you've fetched (o).

Now where the cost of (m) is sufficiently low relative to other operations, or where the number of array accesses are low, the two may appear sufficiently similar and offset array access might appear practically costless. But the point where we generally care about these things is because we're doing them a very very large number of times on a sufficiently large number of indexes that can't be known or optimised away ahead of time.

And it is worth pointing out that addition operations and things are remarkably cheap in general. But if you're accessing these arrays themselves in order to do addition operations or something similar on their contents, then it may still take up a relatively high proportional amount of time relative to the entirety of the program/operation.

Avoiding theoretical discussion on the possibility of functions (m) that map (o) directly to (i * e) while being equal or cheaper than (i * e) in cost, there is another possibility that might appear in practice...

If the original array access operations are not optimal, (or slowed down due to other issues, such as in-optimal caching or memory access, then it may be possible for both offsets and array access to be equal in practice. If indexing by both (i) and indexing by (o) are both seeing an additional step between the getting of the index and multiplying by element size, or if there is sufficient lag because the CPU isn't doing work optimally, then in profiling and practice, both would appear to be running to the same speed. But this wouldn't imply you've optimised offset indexes, it implies your regular indexes and operations aren'y optimised enough already....that's why it appears you can get something for free.

Bromskloss · on Oct 15, 2017

Instead of storing the start address of the array, store a suitably offset address, right? That amounts to an addition that has to be performed when memory is allocated to the data structure, which may be at compile-time or at run-time.

omginternets · on Oct 14, 2017

Yeeesh, that sounds like a maintenance nightmare...

ChrisRackauckas · on Oct 16, 2017

Why? You don't have to write a package any differently to make it work with arbitrary indices.

omginternets · on Oct 18, 2017

Because it's inconsistent.

strictfp · on Oct 14, 2017

I also found this effect super annoying when I switched to using a car - having to sell my carriage, get rid of my stable, sell or rent all my horses. All my connections in remote stables were suddenly useless. And I had to sell of all that hay! And oil, I had to get lots of oil! And it's nowhere to be found! Outrageous, I say.

throwaway7645 · on Oct 14, 2017

Zero based indexing? I find it not too bad to switch between languages that use zero & one based indexing.

goatlover · on Oct 15, 2017

I don't have problems with it, but I also never had an issue with Python's whitespace, or languages without semicolons, or languages which prefer to use begin/end blocks. I don't get why it's a big deal.

The one exception to that is when switching between languages. Sometimes one forgets which language one is in and a syntax error will result. Like using the dot operator to concatenate strings in Python or leaving semicolons off the end of PHP statements.

Bromskloss · on Oct 15, 2017

It's not that one is incapable of doing it. It's that one way is wrong and the other is right!

btschaegg · on Oct 15, 2017

The argument of zero vs one-based offset is not very new, and since there's a record of musings from Dijkstra on the topic[1], I feel obliged to mention it.

Note: Although I'd side with EWD on this one (for now, at least), I don't intend to reference it as an "appeal to authority", but as an example of someone producing a reasonable, properly worded and thought-through argument on the topic, which I've not really seen since. I'd be very interested in what similar arguments would look like from the other perspective. If someone would like to convince me, feel free give me a link to some reasoning :)

Also: I personally find "it's always been like this" to be a terrible argument in either case. Those are the reason we weren't able to standardize for one system of units, for example. In the Swiss army, there's a saying that is used to explain the various idiotic decisions and the bogus explications thereof a soldier has to endure:

  Ist so weil ist so, bleibt so weil war so.

Liberally translated:

  Things are like that because they are like that.
  All stays that way because it was that way.

[1]: EWD831, https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...

goatlover · on Oct 15, 2017

Right, so zero-based is wrong unless one is using a language close to assembly.

Bromskloss · on Oct 15, 2017

Which one is "right" and which one is "wrong" (those are strong words I chose somewhat tongue-in-cheek) is a matter of opinion.

I have my opinion, and that is that zero is the proper number to start enumerations on, not just in programming, but for everything.

goatlover · on Oct 15, 2017

Sure, but why would zero be the proper number to start counting at outside of zero-based programming languages?

Everyone starts counting objects at 1. Zero means an absence of objects to count or list. I count that there are 7 cars in the lot. The first one is 1. If there were 0 cars, I would not have any to enumerate over. If I make a list, I always number the first one as 1 (or A). If there was nothing on my list, I would have nothing (0) to number.

Word processors start list numbering at 1. Printers number the first page as 1. Kids are taught to count starting with one, using one finger. I feel like this is only an argument because we have zero-based indexing in mainstream programming languages, although maybe there are some mathematicians who think otherwise?

Bromskloss · on Oct 15, 2017

> Sure, but why would zero be the proper number to start counting at outside of zero-based programming languages?

I don't take programming languages as a reference point. It's the other way around. I try to figure out what the most natural, principled, and elegant way would be, and I would base my programming language on that, if I were to design one.

> Zero means an absence of objects to count or list. I count that there are 7 cars in the lot. The first one is 1. If there were 0 cars, I would not have any to enumerate over. If I make a list, I always number the first one as 1 (or A). If there was nothing on my list, I would have nothing (0) to number.

We should distinguish between two uses of the natural numbers, I think:

0. _Ordinal numbers_, for assigning labels to things. "This is car 3." 1. _Cardinal numbers_, for saying how many things there are. "There are 7 cars."

The relationship between the two would be that the number of cars is the number that is next in line when you have called out the indices (0, 1, 2, 3, 4, 5, 6) of all the cars.

> Word processors start list numbering at 1. Printers number the first page as 1. Kids are taught to count starting with one, using one finger. I feel like this is only an argument because we have zero-based indexing in mainstream programming languages

What is the most common isn't what determines, for me, what is the most proper. I will play along and use 1-based indexing when I have too, but in my heart I know how I think it should _really_ be.

> maybe there are some mathematicians who think otherwise?

Yes, though starting numberings on 1 is still common (maybe the most common). Zero is at least recognised as a natural number nowadays. Historically, this wasn't the case. Apparently, Peano started on 1 in his axioms for the natural numbers [0], and zero, as far as I understand, hardly had the status of being a number at all during much of Greek and Roman antiquity [1].

[0] https://en.wikipedia.org/wiki/Peano_axioms#Formulation [1] https://en.wikipedia.org/wiki/0#Classical_antiquity

goatlover · on Oct 15, 2017

> The relationship between the two would be that the number of cars is the number that is next in line when you have called out the indices (0, 1, 2, 3, 4, 5, 6) of all the cars.

But who thinks that way? I just see 7 cars, not an indexed listing. And anyway, the 0 indexed car doesn't exist.

Bromskloss · on Oct 15, 2017

He who finds zero-based indexing to be the proper way and wants to shape his thinking thereafter!

rrock · on Oct 15, 2017

No, zero is the right number to start an offset, not an enumeration. No one starts counting from zero in real life.

Bromskloss · on Oct 15, 2017

> No one starts counting from zero in real life.

The question for me is what one _should_ do, not what is usually done.

flavio81 · on Oct 14, 2017

You will want to switch as soon as you have to confront the Global Interpreter Lock (GIL)

88e282102ae2e5b · on Oct 15, 2017

I've confronted it plenty of times, and for my use cases it's not really an issue. Many numpy functions release the GIL, so by keeping my performance-critical stuff in numpy it's just not an issue for me. I sometimes split tasks up with multiprocessing, which is a GIL workaround to be sure, but the memory footprint of extra interpreters is negligible compared to the data I work with.

fasquoika · on Oct 14, 2017

Whenever I see Julia mentioned, I like to link to this blog post by Graydon Hoare (creator of Rust). https://graydon2.dreamwidth.org/189377.html

karmakaze · on Oct 15, 2017

That is a great tour of the forces driving the evolution of languages. My reservation with Julia in the past was its widely varying performance. The OP clarifies how this happens and that these cases are not dark mysteries but rather the contrary. Here's a jewel from the Graydon post:

> PHP initially implemented its loops by fseek() in the source code

Bromskloss · on Oct 14, 2017

What would you say is the takeaway point for this discussion?

pygy_ · on Oct 14, 2017

Quoting the conclusion:

————

Julia, like Dylan and Lisp before it, is a Goldilocks language. Done by a bunch of Lisp hackers who seriously know what they're doing.

It is trying to span the entire spectrum of its target users' needs, from numerical inner loops to glue-language scripting to dynamic code generation and reflection. And it's doing a very credible job at it. Its designers have produced a language that seems to be a strict improvement on Dylan, which was itself excellent. Julia's multimethods are type-parametric. It ships with really good multi-language FFIs, green coroutines and integrated package management. Its codegen is LLVM-MCJIT, which is as good as it gets these days.

To my eyes, Julia is one of the brightest spots in the recent language-design landscape; it's working in a space that really needs good languages right now; and it's reviving a language-lineage that could really do with a renaissance. I'm excited for its future.

fasquoika · on Oct 14, 2017

This is a bit broader a point than the original post of "why you should use Julia", but bear with me. Basically, most languages (and even more generally, most tools) assume that you'll have external ways of dealing with their limitations. For example, we use slow but productive languages in conjunction with fast but tedious languages. Or we have a DSL for testing. Or we have a 3rd-party package manager. The assumption that these external tools will be there is generally true, plus it makes the language designer's job easier, so it's generally a no-brainer for them to make it. However, this pushes the complexity onto the system and the user, and creates redundancy.

There have been some attempts to forego this assumption and make a language that can do everything you need. This has led to the creation of Lisp, Forth, and Smalltalk, among others. (If you've ever wondered about the fanaticism generally associated with these languages, this is a major part of it.) Julia is attempting to be such a language, and in my opinion does a pretty good job of it.

kevinalexbrown · on Oct 14, 2017

The packages-first attitude feels significant to me. A language that “users” enjoy but package developers also enjoy seems important. I hadn’t thought about language choice from a heavily package-development weighted perspective before. It seems obvious in retrospect though, which is probably a sign of something cool.

A version of this would be: how can good package development be as easy as possible, and how can package use be as easy as possible?

I haven’t done any serious work in Julia mainly because the python libraries are mature, good, and performant enough. I can’t speak for everyone, but for end users in science labs library support is perhaps the biggest consideration for language choice.

pjmlp · on Oct 14, 2017

It is.

Many of the language flamewars we do, tend to skip over the eco-system.

Which is much more relevant that any language design issue.

Specially given that the decision of choosing a language is a consequence of working with a specific tool, unless one is open to face some hurdles.

ACow_Adonis · on Oct 14, 2017

Sometimes, in my darker moments, I have the terrifying thought that one of the reasons that many users like R and made it popular (apart from the historical context of its now many libraries and being the main free version of statistical software), is specifically that it isn't robust and sensibly designed from a programming/analytical perspective.

You can download a package, type in a preset command on a preset thing, and 95% of the time (its R, so its only ever 95% of the time), you get back an answer/number.

And if an answer is what you're after, that's where the considerations stop.

Was there an edge case? Is R's answer really correct? Did it wallop something in your search path? Do you have your profile set up differently to the author? Do all the dependent packages clash and silently depend on how you load them and in what version you did so? Has R coerced something in the background, or pattern matched your typo'd variable to something it shouldn't? Somewhere in your code did you hit on one of the thousands of gotchas?

Who cares? It gave me an answer. I can give it to my boss or put it in a paper.

I personally am all for faster, safer, stricter (while still being dynamic) languages. And i've tried to explain to many users that languages that produce errors when you do something you shouldn't are not your enemies, they're your friends. But I see many people every day aren't living that philosophy, they'd rather an answer than the right answer or a robust answer, despite what they'll tell you in plain english.

With respect to Julia, someone like me might like it (well, if they got rid of the matlab syntax and just went back to the Lisp they copied and relabeled as 'Julia' :P).

But perhaps many users don't want a faster, more robust language. Perhaps, many of them are comfortable with a simple but wrong answer. And for that purpose, I can't see where Julia wins out relative to R.

As I said though, these are thoughts i have in my darker moments...

x0x0 · on Oct 14, 2017

I think your fears are misplaced.

R users love R because of ease of use. R does sometimes ignores ugly corner cases in favor of that ease of use (though I'm skeptical this damages the validity of the answer in anything like 5% of cases), but that's a side effect.

sklearn and pandas are great, but you still simply have to be a programmer to use them, or at least much closer to a programmer than many statisticians want to be. Allowing non-programmers to do things like

   data <- read.csv(file='blah.csv')
   fit1 <- lm(outcome ~ var1 + var2, data)

   anova(fit1)
   summary(fit1)

to read data, run a linear model, and get an anova and p-values is amazing, and massively widens the scope of people to whom these tools are available.

This omission of most quoting, the magic inference of column names, etc etc all makes R much easier to use.

ACow_Adonis · on Oct 15, 2017

In many ways, I think we're actually in agreement :p

Free. Quick. Interactive. 4 lines. Done. What's the value proposition of Julia compared to that? Now you can argue that Julia will allow better type specification and specialisation, etc, etc. But that's not what's valued or used in your example.

Now, my mindset/domain has to engage with such, and i freak out at the implicit-ness/assumptions inherent in such code, especially with some of R's additional design choices.

In my mind, having code that crashes/errors out when something ambiguous/dangerous is encountered in your example a "good thing". But that's not what's valued in the above example.

And while Julia would allow someone to program in such a way as to specialise or raise these errors and concerns, what is valued is the ability to not think about them. And if you don't think about them, or even as a library designer, if you want to program in such a way that your user primarily doesn't want to think about them, then you're effectively very close to reinventing/reimplementing a new "R", with all its costs and benefits.

I'm just making this observation explicit, even though I actually do value the things the Julia crowd seems to value, but I'm not sure I'm in the majority, and I'm not sure you can push that down into the libraries without exposing concerns most users don't want to be exposed to.

fspeech · on Oct 15, 2017

DWIM (Do What I Mean) at the UI level seems a good feature to aspire to. The sanity check could (and should) be done using an independent method (say manually check a couple of things, or is that too out-of-dated?).

dnautics · on Oct 15, 2017

julia code tends to be extremely terse. I had to listen to unix domain sockets and spit out an aggregated result to an upstream analytics engine and I accomplished the whole thing (including spawning worker threads to listen to potentially multiple clients posting to the domain sockets), in about 65 lines of code. It's also very readable. I don't have any uptime guarantees or anything (and it requires a fairly heavyweight julia container - or a julia runtime installation - atm), so I wouldn't say it's an optimal solution moving forward, but as a glue piece I was able to show a demo and get to the point where I was debugging upstream and downstream components.

x0x0 · on Oct 15, 2017

I don't think julia is going after R's niche. They're going after matlab and sklearn/pandas/numpy.

so my 02 on julia's value prop: free. performant. scalable. capable. easy to use is there, but the others matter a bit more.

dnautics · on Oct 15, 2017

hell, I'm a programmer (currently writing low-level i/o access to a linux block device) and I shudder when thinking of having to learn sklearn and pandas.

x0x0 · on Oct 15, 2017

They're great, really! I'm not a hater. It's just you have to be at least an ok programmer to use them, because (and I know this isn't a word) the programmery-ness of what you're doing just pokes through a lot.

rrock · on Oct 15, 2017

Having literally spent all day fighting R, I’m looking forward to getting back on the Julia side of things. Time to port more code.

j7ake · on Oct 14, 2017

Everything sounds great about Julia but it's lacking sufficient critical mass to develop useful packages to make scientists and data analysts effective.

At the moment the bottleneck in our scientific computing and data analysis workflow is not waiting for code to run but rather quickly inplementing, evaluating, and iterating different models on datasets.

freyir · on Oct 14, 2017

When I last looked at Julia, the language wasn't yet stable. It's not fun developing packages for a language that changes from release to release, so I don't expect their ecosystem to stand a chance until they get to v1.0. All the advantages for package developers listed in this article are moot while the language remains a moving target.

It was said that v1.0 was due in early/mid 2017, but it looks like it's still a ways off.

There is (or was) a window of opportunity for Julia to steal mindshare in academia from MATLAB and R, but it feels like Python is beating them to the punch. Of course, Python had a 21 year head start.

sgt101 · on Oct 15, 2017

Talking to people in the community my impression was that 1.0 would appear when 1.0 appeared and that they felt that given the difficulties faced by other languages and the work required to produce a language that was fit for purpose in the modern context, a gestation time of more than a couple of years was reasonable.

In terms of community I think that the economics community and private equity firms in particular seem to have picked Julia up enthusiastically.

Having said that, I suspect that it will be three years before Julia becomes mainstream, and then it will be a minority choice for five or six more - if it is the big success it deserves to be!

kgwgk · on Oct 14, 2017

I was going to add that Python has been "getting there" since 1995, but I see you have already edited you comment :-)

ViralBShah · on Oct 14, 2017

There are over 1500 packages (pkg.julialang.org)in Julia now, with the ability to call Python and R, and any C and Fortran library written. Julia packages tend to be more Julian of course, and natural if you are using Julia.

I say most of the basic stuff is there, but a few things remain. What kinds of things are you looking for that Julia doesn't have?

mkborregaard · on Oct 14, 2017

What makes you say it lacks critical mass? I don't think that's accurate, but of course it takes time. Implementation time is important, yes - but julia is not just a fast-to-run-language, which is essentially the point - it is a fast-to-implement language.

j7ake · on Oct 14, 2017

At least in the field of computational biology most people use R or python because of the wealth of statistical packages and biology-specific packages (biomaRt) that makes implementing and testing models on biological data much faster than in Julia.

It's always this problem, if everybody is using R, it is hard to migrate to Julia. But in order to migrate to Julia you need people to stop using R and switch to Julia.

rattray · on Oct 14, 2017

Is there no Julia wrapper around biomaRt? Would it be challenging to create and maintain?

(I don't mean to imply that _you_ should be the one to do it - I'm genuinely curious)

crwalker · on Oct 14, 2017

Julia is my go-to language for numerical work.

Compared to other solutions I've used like python + numpy + pandas, Matlab, Mathcad, heh even Excel, Julia is a breath of fresh air. Fast, clean, powerful.

salqadri · on Oct 14, 2017

The feature to observe the underlying AST is amazing!

jey · on Oct 14, 2017

And ccall makes it trivial and a delight to call subroutines written in C++ with Eigen and OpenMP.

williamstein · on Oct 14, 2017

Strangely, the only mention of Cython is to point at that we have had less developers than Julia: "as evidenced by the over 500 committers to just the Base language, more than projects like Cython has ever had!"

ChrisRackauckas · on Oct 14, 2017

I don't find it strange: I wrote this to say why I like using Julia and point out what the community is missing, not as a comparison to every other JIT in existence. But if you want to know why I gave up on Cython, I'll lay it out for you. I tried it almost 2 years ago because some documents in a course had IPython notebooks which used it. So I did some standard scientific computing stuff like write some Runge-Kutta methods and yes its speed was fine (that was a pretty big part of the blog post: if you try hard and use the right tools you'll get pretty much the same performance anywhere). That's not the problem at all.

The problem was extending it to be more widely useful in my own research. I wanted to make those same compiled functions also work with complex numbers to integrate spectral discretizations of a stochastic PDE (instead of the finite difference one from before). I found some SO posts like:

https://stackoverflow.com/questions/30054019/complex-numbers... https://stackoverflow.com/questions/27906862/complex-valued-...

At that point it stopped looking like Python at all. I always found the SciPy syntax a little verbose since I had used a lot of MATLAB before (but I wanted to make this project not require a license to run) (this QuantEcon cheatsheet is a good demonstration of what syntax is like in my domain: https://cheatsheets.quantecon.org/). But to make this kind of "complex or not" logic work in compiled Cython, I resorted to conditional compilation (http://cython.readthedocs.io/en/latest/src/userguide/languag...). These days I understand that what I created was essentially a multiple dispatch mechanism.

Anyways, at around that time I started experimenting with other tools, especially Julia, because I really was getting frustrated whenever I had to "go beyond doubles" and write something that was extendable instead of a one-use script. Maybe there's some tricks I was missing, but I found it really hard in Python and MATLAB. Soon after, my PhD adviser and I started arguing about whether one of the properties in the simulation's solution was due to floating point errors. I couldn't convince him, so I wanted to write this integrator so it was fast and compiled, but allowed arbitrary precision so that way I could prove that it still existed even with very high precision. I couldn't find a page which explained how to do high precision arithmetic in Cython or Numba, so I completely gave up. Needless to say, I decided to re-write a small portion of this in Julia and it worked really well, pretty much instantly. Then I was digging around the Julia package listing and Viral pointed me to a big opportunity (https://github.com/JuliaDiffEq/ODE.jl/issues/64) and I have been developing a lot of Julia differential equation solvers ever since.

Obviously YMMV, but after computing a lot without a "first-choice language" (between R, Python, C, MATLAB, Mathematica) for quite awhile, I kept with Julia because I didn't have issues when I hit less standard tasks, and I found it very easy to contribute fixes to other people's projects because it was just Julia code. This stochastic PDE integrator story is just one (significant) project that led me in this direction.

williamstein · on Oct 14, 2017

> I couldn't find a page which explained how to do high precision arithmetic in Cython or Numba, so I completely gave up.

When that happened to me, I wrote Sage (http://sagemath.org).

syllogism · on Oct 14, 2017

Well...It is strange to omit Cython, because that's what the package authors actually use :). It's a huge part of answer to the question you pose about how package development works in Python. Cython isn't perfect, but it works very well.

To answer the specific question you had about multiple dispatch: Cython's had fused types for a while now, although I guess they weren't so prominent in the docs when you were trying to solve your problem.

The other problem with your argument is, I think you have the dynamics slightly backwards. Complicated libraries will go where the users are, not the other way around. Maybe my NLP library spaCy would've been easier to write in Nim or D. This didn't matter. I wrote it in Cython because I saw that the userbase would in Python. This has proven correct. It also has the effect of drawing slightly more users to Python, continuing the feedback loop. But each library's marginal impact on the developer community is small, so you're never going to get the marginal package author to switch language for the sake of convenience. What good is it to have a convenient experience developing a package nobody will use?

ChrisRackauckas · on Oct 15, 2017

The problem is, I needed the library. I didn't have the time and resources to build it in Python, while it was a night project for a week to build it in Julia. As a methods researcher I'm going to use the tool that doesn't impede my research but allows me to distribute robust and performant implementations, which after trying lot of other tools I've found is Julia.

bluescarni · on Oct 14, 2017

From your list of "first-choice languages", C++ is conspicuously missing. That seems rather peculiar, as the type of genericity you are praising Julia for has been one of the core concepts of generic programming in C++ since basically the late nineties with the standardization of the STL (and it's become an increasingly emphasized part of the language throughout its evolution in the last 10 years or so, see C++11 and later). The example you mention, numerical integrators using different number types (complex, arbitrary precision, intervals, etc.), is the type of stuff which would seem a perfect fit for a template-based implementation.

ChrisRackauckas · on Oct 15, 2017

I learned C because of classes in MPI and I tried to go back to it after years of Python and MATLAB but the amount of boilerplate code and the workflow slowed me down too much. C++ is definitely a fine choice if you are a great programmer but I don't find it easy at all to prototype or maintain codebases in languages like that. YMMV

ChrisRackauckas · on Oct 16, 2017

I would say though that if I didn't go the Julia route I would probably be using C++. Again, in the article I wrote that for package development, I see Julia as a more productive C++ as opposed to a faster Python. I'm using it in a way that is all based around generic algorithms that can statically compile well, so it's essentially C++ template magic but with a lot less code (no headers) and I can prototype separate implementations in the REPL as I go (in Juno, highlight and do Ctrl+enter and it replaces the function in the package with the new definition). I think the other clear choice in this space is actually D. But between Julia, C++, and D, I like Julia because I found it really easy to program things and have it work the first time.

j7ake · on Oct 14, 2017

I looked at your blog post on solving differential equations and it looks pretty attractive. I will install Julia and play around with this differential equation solver. At the moment R and python are awkward with differential equations, and I don't want to be married to matlab.

jampekka · on Oct 14, 2017

Too bad they somehow thought it was a good idea to make the syntax resemble MATLAB of all languages. Perhaps most of the nausea inducing warts could be worked around with some kind of transcompilation, although some semantic issues, such as one-based indexing, would remain.

It'll be a sad day if Julia starts to get such popularity that high quality libraries will be Julia-only.

jernfrost · on Oct 15, 2017

I really don't get the indexing religious wars. I "grew" up on 0-indexed languages, C/C++, assembly, Python, Ruby, Java, Objective-C, Swift etc.

Yet I've never felt problems using 1-based languages whether Lua or Julia.

To me I just switch my mental mode to think I am doing math. I am used to mathematics using 1-indexing. And I am not a physicist or mathematician so it is not like I am steep in this tradition.

Of all things one might object to in a language, I don't grasp this carries so much weight for many people.

azag0 · on Oct 14, 2017

One-based indexing is not a semantic issue, it's a language-design decision you may disagree with.

PeachPlum · on Oct 14, 2017

Julia 0.5 introduced support for any indexing scheme you care to invent.

1-based 0-based 20-based

https://docs.julialang.org/en/latest/devdocs/offset-arrays/

wtetzner · on Oct 14, 2017

Should array indices start at 0 or 1? My compromise of 0.5 was rejected without, I thought, proper consideration.

-Stan Kelly-Bootle

attractivechaos · on Oct 14, 2017

That is an ugly hack, mainly because it was not in the original language design.

ChrisRackauckas · on Oct 14, 2017

it is in the original language design which is all about type-genericness and separation of implementation from interface with zero-cost abstractions. Creating an array with a different `getindex` dispatch is a great example of what Julia's type system was made to do! The standard library chose to use 1-base indexed contiguous fixed-dimesion dynamic arrays, but that's not the right choice for every problem.

As evidence of this, check out JuliaArrays (https://github.com/JuliaArrays) which is a whole Github organization devoted to the development of alternative array types, like StaticArrays (which are stack-allocated immutable arrays) or CatViews (arrays which are non-contiguous and constructed from views of multiple different arrays). The nice thing about Julia is that, if packages are written to work with generic types, they can natively (and efficiently) work with these "non-standard" array types, making them easy to integrate into the scientific ecosystem.

pjmlp · on Oct 14, 2017

Why ugly? It looks similar to what I know in the Pascal family, where indexes can be ranges or enumerations.

attractivechaos · on Oct 14, 2017

It is ugly because 1) 1-based indexing and x-based indexing are treated differently; 2) x-based indexing has to use more complex syntax, which in effect discourages the use of non-1 indexing; 3) this strategy sets potential pitfalls (e.g. implementing length and size for non-1 indexing arrays). A cleaner design, I guess, would be to specify index range on declaration like pascal static arrays and use low(A):high(A) for iteration rather than 1:length(A). This, however, complicates 1-based use cases.

Generally, I don't think there is a good way to achieve flexible indexing without causing troubles somewhere, so I don't think Julia has really solved the problem.

ChrisRackauckas · on Oct 14, 2017

1) No, they are just different dispatches to getindex. 2) No, iteration is through indices(A) or eachindex(A), etc., which are the preferred way of iterating anyways. You shouldn't do 1:length(A) which is a MATLABism that works but I would say isn't good Julia. 3) Defining new dispatches for length and size is a pretty standard use of the language?

"Non-standard" arrays with non-standard indexing already work in lots of packages. It could be better (that's one of the things that I am advocating for), but it's not a language tooling issue whenever it's a problem, it was the developer going `::Array` and thus requiring a contiguous 1-base index array where other AbstractArrays would actually work.

ChrisRackauckas · on Oct 14, 2017

@attractivechaos I don't know how to reply to your last reply, so I'll do it here. 1:length(A) is bad because it's using a standard construction for intervals of numbers, but using it for indices. We don't want to get rid of it because 1:5 or 0:0.2:1 is something that is very common and necessary, but I don't see how to tell one that they should instead use eachindex(A) except through proper docs. 1:length(A) is so common in MATLAB though that I am sure people will carry it over, and I'll PR to their library to fix it. I'm not sure how to fix a knowledge issue like that.

You're not understanding generic types and its relation to (1). There's only one way to access an array: getindex. That's the function that's called with A[i]. However, you can use an immutable to put a thin (zero-cost) wrapper over an array, and define dispatches to getindex to do whatever you need it to do. So it's both implicit syntactically because the user just does A[i], but it's explicit because the user has to choose a different type. getindex is then usually inlined and then compiled according to the type, making it a thin abstraction over the implementation.

There are iterators which don't have a size or length. You can write generic algorithms which require an AbstractArray which HasLength and query at compile time for things like that and throw appropriate errors (those are called traits).

There is still a lot of development to do here, but the basics like this are pretty much solved except when new users treat Julia like MATLAB, but I'm not sure how anyone could control for that.

conistonwater · on Oct 14, 2017

You can click on the timestamp, and there will be a reply link there. I think reply links are hidden for a little bit of time after posting, but I'm not sure why.

pygy_ · on Oct 14, 2017

This is an anti-flame war feature, designed to let people cool off before replying (fast paced discussions were often contentious before that was introduced).

attractivechaos · on Oct 14, 2017

I guess so. This is a well thought feature. I like it.

attractivechaos · on Oct 14, 2017

On 2), if you think 1:length(A) is bad, why not forbid it from beginning (e.g. use low(A):high(A) instead)? To find an x-based array length, why not just length(A), instead of length(linearindices(A))? Decisions like such are remedies of immature early design. Also, what if I use length() on an x-based array? Abort or a wrong number silently? On 1), having two different ways to access array, the most fundamental data type, is already worrying enough. On 3), the page says "don't implement size or length". That is very uncommon in most other mainstream languages.

Julia has potential to become a great general-purpose programming language, but this indexing issue will practically limit it to the numerical computing community. Perhaps achieving that is already good enough.

mkborregaard · on Oct 14, 2017

The `length(linearindices)` and no `size` etc are for a short transitory period while packages are ported to the indexing-based-non-reliant framework.

mkborregaard · on Oct 14, 2017

One of the most mindboggling thing about the recurrent 0- vs 1-based indexing discussion is how incredibly rarely that difference is ever used programmatically in Julia. Most of the large julia packages are programmed in a way that doesn't care whether the array is 0 or 1 based. It is important in some other languages, and then I just think people are happy that this is something that everybody can agree to disagree on. I don't think the discussion is very productive though.

attractivechaos · on Oct 14, 2017

Hmm... I wanted to learn more about 0-indexed arrays, but after looking around, I still have not figured out how to declare 0-indexed arrays without extending the AbstractArray interface or using another package.

disconnected · on Oct 14, 2017

I don't have any particular feelings toward one or the other (it is a convention, get over it), but I think that zero-based indexing is just an artifact of C that stuck around.

In C, the array syntax is "mostly" just syntactic sugar for pointer arithmetic.

When you do "a[n]=value;" this is equivalent to "*(a+n) = value;". To get the nth cell of an array, you just add "n" to your base pointer "a".

Array indexing, therefore, is consistent with the pointer arithmetic.

That said, and funnily enough, Fortran, which is much older than C, has 1-based indexing (by default, but you can configure 0 based indexing, if I remember correctly).

pvg · on Oct 14, 2017

Zero based indexing is not a C artifact. Here's Dijkstra writing about it in '82:

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...

Bromskloss · on Oct 14, 2017

Note how, in the PDF version [0], Dijkstra numbers the pages starting on zero (handwriting, upper right corner), but whoever created the PDF disregarded its message and did numbering starting on one (lower right corner). :-)

[0] https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

Edit: Fixed link. Hacker News apparently doesn't understand the delimiting of a URL with "<" and ">". >-( https://tools.ietf.org/html/rfc3986#appendix-C

Avshalom · on Oct 14, 2017

There's the very real possibility that zero based indexing is in fact a Yacht Racing artifact.

http://exple.tive.org/blarg/2013/10/22/citation-needed/

Bromskloss · on Oct 15, 2017

> The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.

I don't buy it. Wouldn't people want their programs to run fast regardless of this?

CyberDildonics · on Oct 15, 2017

In C arrays are given an offset to the pointer, not an index, that is why they start at 0.

pjmlp · on Oct 14, 2017

Which happens to common across many languages outside C universe.

Some languages, like the Algol ones, even have user defined indexing.

So it is not neither 0 or 1, rather whatever the min value of the index happens to be.

goatlover · on Oct 15, 2017

Like every other language-design decision?

improbable22 · on Oct 14, 2017

What syntax would you like instead, for numerical programming?

* Python seems not so different, but once you add numpy then it involves a lot of ugly `np.arcsin(np.sqrt([...]))` type of things.

* R looks horrifyingly ugly but I haven't written anything

Coming from Mathematica 1-based is comforting, and it also matches the way people write mathematics on paper which is nice.

jampekka · on Oct 14, 2017

I mostly use Python, although it has some syntax issues too (eg. no real lambdas, I'd prefer no parenthesis for function calls etc). I don't see why that's a numpy problem. If you just do "from numpy import *" you can pollute your namespace with hundreds of symbols, just like eg. MATLAB.

R not just looks ugly, but is quite horrible in the inside too. For example the variable scoping is one of the most insane I have ever seen.

But in general, I think having special purpose languages is mostly a bad idea. Firstly, having domain specific languages isolate different fields of science: If engineers use MATLAB and statisticians use R, the transfer of progress between these fields is hindered. Secondly, almost all special purpose languages for scientific stuff encourage horrible programming practices. Which typically leads to practice in the field lagging years or even decades behind scientific advances, because usable implementations are quite scarce, or tied to a domain-specific language.

noobhacker · on Oct 14, 2017

Could you clarify what's insane about the variable scoping in R? I'm a bit too close to R so I'm afraid I'm oblivious.

sin7 · on Oct 14, 2017

I use a lot of R. The thing that drives me insane is when you create a function such as multxy <- function (x){x * y}, then you call multxy(6) R will not return an error as long as y is in the parent environment. That's pretty insane.

icebraining · on Oct 14, 2017

I don't use R, so I thought you were referring to dynamic scoping, which I do think it's horrifying.

But it seems R uses lexical scoping - that is, the y is captured at function definition time. That's extremely common and quite useful, in my opinion.

ACow_Adonis · on Oct 14, 2017

R documentation and users SAY that R has lexical scoping, which can be really confusing for people reading about the topic.

But if you're coming from something like Common Lisp (which is my point of reference), you soon discover that "Lexical Scoping" as implemented there and "Lexical Scoping" as implemented in R are two different things.

In R (paraphrasing) you generally have a search path of environments, and at run-time, R will find a variable by going from the inner-most to the outer-most environment looking for variables of that name. When it finds one, that's the variable R uses. What R does do is capture the path of environments that were in effect at the time of a function's creation. R has to do many of these variable lookups at runtime, which is also a factor that makes is so slow.

So what you get is a scoping scheme that, if i were being short, i'd call "actually more like dynamic scoping with environment search paths", and if i were being flippant "pretty much dynamic scoping dressed up to look like lexical scoping without any of the benefits/distinctions of either. If someone changes the applicable environments or variable later, R will just keep on humming along, pointing at new variable references that align with whatever variable name the original author of the code used, as long as that variable exists somewhere on the applicable search path"

kazinator · on Oct 16, 2017

A search through linked environment objects corresponding to nested lexical scopes is a valid, correct implementation of lexical scoping.

It could be that R somehow gets some aspect of it wrong but you haven't so far presented evidence of that. (Maybe it's something you know a lot about and so it's obvious to you.)

jampekka · on Oct 14, 2017

R uses a "sort of" lexical scoping which causes some "interesting" things. Eg a symbol may be simultaneously in global and local scope within the same function, which can be (ab)used to create a variable that's randomly scoped[0]. I'd say this is sort of a "dynamically lexical scope".

[0] http://andrewgelman.com/2014/01/29/stupid-r-tricks-random-sc...

kazinator · on Oct 16, 2017

The article is not convicing me that the language is obeying any poorly designed requirements.

The function f contains a free reference to a variable called a. This is satisfied in some sort of global environment.

Later, f is called from a function in which there is a local a bound to 100. Of course, this a which is local to that function is invisible to the free reference in f which continues to refers to the global a that contains 10.

It could be there is some problem in R, but whatever that is, this article isn't exposing it in a convincing way.

Same thing in Common Lisp (using CLISP):

  [1]> (defun f (x) (* a x))
  F
  [2]> (setf a 10)
  10
  [3]> (f 3)
  30
  [4]> (defun g (y) (let ((a 100)) (f y)))
  G
  [5]> (g 3)
  30

The surprise is supposed to be that `(g 3)` doesn't produce 100. Why should it?

(It could produce 100 if the symbol a were marked for binding as a special variable; but it isn't. Thus it's just a free variable in f resolved in the global environment, and a lexical binding in g).

icebraining · on Oct 14, 2017

Ah, fair enough, that's definitively weird.

noobhacker · on Oct 19, 2017

Sorry for being obtuse, but doesn't Python do the same thing? I just tried in ipython

  y = 2
  def mult(x):
      return x * y

  mult(6) # returns 12

sin7 · on Oct 14, 2017

R had changed a ton in the last three years. It's pretty clean these days.

baldfat · on Oct 14, 2017

I have zero understanding about this rant? 0 based is due to programming iteration. I don't use pandas because it is 0 based and it was a mistake.

Domain Specific Languages for Statistics and "Math" have traditionally been 1 based. R is one based and S before that. The fact that Pandas went with 0 based was a huge disappointment for many. I use R.

Bromskloss · on Oct 14, 2017

> 0 based is due to programming iteration.

I think it makes sense for enumeration in general to begin at 0. It's a mapping from the natural numbers after all, and zero is the most natural number!

What makes you think of programming iteration as a motivation for zero-based indexing?

dnautics · on Oct 15, 2017

Enumeration should generally begin at 1. In common language, you don't say "this person won the 0th place trophy", you say "this person won the 1st place trophy". It's address offsetting that should begin at 0.

C strongly encourages you to think of arrays as simply address+offset pointers, so it absolutely should start with 0.

Bromskloss · on Oct 15, 2017

> you don't say "this person won the 0th place trophy"

I think it would make sense to say it. The reason I don't is that people around me would misunderstand me if I did. (Around some people, it works, though!)

goatlover · on Oct 15, 2017

If some told me they got the "0th place trophy", I would take that to be a colorful way of saying they didn't win anything, as 0 means nothing, or the absence of trophies, in this case.

Bromskloss · on Oct 15, 2017

That's just an artefact of being used to a certain way, isn't it? For me, it's about finding fundamental reasons for one convention or another, not about the practicalities.

goatlover · on Oct 15, 2017

The reason for not starting at zero for enumeration is that zero means a lack of things to count or list. Therefore, we should start with the one, since that's the first number where we would have reason to enumerate.

dnautics · on Oct 16, 2017

Why not say someone won the -1th place trophy? Where does it start?

goatlover · on Oct 15, 2017

I start counting at 1 for everything when not programming, so I don't understand this. I always understood zero-based indexing to be primarily a hardware driven concern, with C being fairly close to assembly, and so many languages ended up copying C's syntax.

baldfat · on Oct 14, 2017

> To quote wikipedia on Zero-based "Zero-based numbering or index origin = 0[1][2] is a way of numbering in which the initial element of a sequence is assigned the index 0, rather than the index 1 as is typical in everyday non-mathematical/non-programming circumstances."

1 Based

0 = 1

1 = 2

Zero based

0 = nothing

1 = 1

Seem important for mathematics.

baldfat · on Oct 14, 2017

Also head and tail in Pandas is 1 based! That drove me crazy.

print(users[users.age > 25].head(3))

Returns 3 elements and not 4!

Bromskloss · on Oct 14, 2017

Is it possible that the argument to `head` is the _number_ of elements to return, and that the choice of indexing therefore doesn't enter the picture?

Bromskloss · on Oct 15, 2017

Actually, even if the argument (3 in this case) were an index, I would expect it to be the index of the first element _not_ to fetch, like how intervals typically are written with an inclusive lower bound and an exclusive upper bound as in "2 ≤ i < 13" [0]. That would make the result you're getting consistent with zero-indexing, but not with one-indexing.

Of course, I don't know what Pandas is like outside this particular example, or what Pandas even is. A database thing?

[0] https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

goatlover · on Oct 15, 2017

What's wrong with the syntax? I think Julia has really nice syntax.

mhd · on Oct 14, 2017

Is there a good use case for Julia outside the "math" community, when your alternatives wouldn't be R or Numpy, but Ruby or C#?

parenthephobia · on Oct 14, 2017

Perhaps it's a borderline case, but I've used Julia to make a "big data" database server, backed by a mmap'ed column store.

Because the compiler's available at run-time, queries can be compiled into efficient kernels that can run in parallel over multi-gigabyte arrays orders of magnitude faster than MySQL or Postgres on the same hardware.

Without Julia, I would have had to find some other way to compile queries into machine code at run-time, which for me would have ruled out all the languages you listed. In practical terms I would have abandoned the project.

flavio81 · on Oct 14, 2017

Excellent use case.

>Without Julia, I would have had to find some other way to compile queries into machine code at run-time

Note that you could also do it in Common Lisp.

But I agree, Julia is very powerful, perhaps one of the most powerful recent languages.