7

One of the stated reasons that structs can be more performant than classes is that no ARC is needed for structs. But suppose we have the following struct in swift:

struct Point {
   var x:Float
   var y:Float
   mutating func scale(_ a:Float){
      x *= a
      y *= a
   } 
}

var p1 = Point(x:1, y:1)
var p2 = p1 //p1 and p2 point to the same data internally
p1.scale(2) //copy on mutate, p1 and p2 now have distinct copies

Now I believe that various copies of a struct will actually point to the same data on the stack until a mutation forces a copy. This means either we have to keep track of how many objects reference a given memory address on the stack so we know if a mutation must form a copy, or we simply form a copy on every mutation. The former seems inefficient and the latter seems identical to ARC. What am I missing?

EDIT: I understand the difference between value semantics and reference semantics. But as an optimization swift does not actually create a new copy of the data until a mutation is made. This avoids unnecessary copies. See https://www.hackingwithswift.com/example-code/language/what-is-copy-on-write . But because of this I am not sure how they can avoid some form of ARC. I noticed that this article claims that copy on write is only done for arrays and dictionaries. Not sure if that is true but if so then my questions still stands for arrays.

gloo
  • 257
  • 2
  • 7
  • 8
    Why do you think `p2` will internally point to the same data as `p1`? Yes, they will contain *equivalent data* before you mutate one copy, but by assigning the struct to another variable you are already creating a copy – see the docs: [Classes and Structures – Structures and Enumerations Are Value Types](https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/ClassesAndStructures.html#//apple_ref/doc/uid/TP40014097-CH13-ID88) – amon Apr 09 '17 at 16:23
  • 1
    Swift employs copy on write as an optimization – gloo Apr 10 '17 at 16:23
  • 2
    @amon the swift compiler will do its best to optimise away unnecessary copies of value types, e.g. using copy-on-write. The semantics of value types are that they create copies on assignment, but that isn't necessarily what happens in the emitted code. gloo's question I think was more about how the book-keeping to manage this works and how it's made efficient enough to be a win over ARC/reference types. – GoatInTheMachine Apr 10 '17 at 16:54
  • 2
    @gloo The website you quoted says: “*Warning: copy on write is a feature specifically added to Swift arrays and dictionaries; you don't get it for free in your own data types.*” There is no way how value types can do COW without using some pointer indirection, which defeats the purpose of value types. Arrays and Dicts probably use pointers internally, which allows them to share data. – amon Apr 10 '17 at 17:22
  • @GoatInTheMachine I don't doubt that the Swift compiler is clever and I see how Swift is designed to allow for many optimization opportunities. But I don't see how COW can ever work for value types, and doubt that Apple has done the impossible. COW implies some indirection but the point of value types is not having such indirection. Of course COW is not the only way to avoid unnecessary copies, aliasing of immutable data would work as well. If Swift COWs some value types (and could you kindly point out documentation of this?), those special types probably use pointers internally. – amon Apr 10 '17 at 17:36
  • @amon - Swift is built on LLVM. LLVM can perform whole-program escape and modification analysis on parameters passed by reference between parts of a program. LLVM can very easily identify a large proportion of cases where it would be safe to use a pointer to a value type. Swift was designed by one of the primary developers of LLVM; if any language is going to take advantage of LLVM's capabilities in its design, it'll be Swift. – Jules Jan 06 '18 at 21:25

4 Answers4

4

You are starting with a misconception. It's a struct. structs are value types. The assignment to p2 creates a copy. p1 and p2 don't point anywhere.

That's exactly the difference between struct and class. Classes are reference types. If you had a class, the assignment to p2 would just create a second reference to the same object that p1 points to (and count the reference).

You are saying: "There is no way how value types can do COW without using some pointer indirection, which defeats the purpose of value types". The purpose of value types is the semantics of value types. Implementation details don't matter.

But we need to go further. When you copy a struct containing two Floats, the two Floats get copied. When you copy a struct containing two references to objects, the object references get copied - so you now have two structs, having references to the same objects. If you replace a reference in the copy, that original struct is not modified. If you modify an object using the reference in the copy, that's the same object referenced by the original, so the object referenced by the original struct, being the same object, has changed as well.

Arrays are structs, but the struct contains a reference to an object containing the actual data (that's an implementation detail of the Swift library's implementation of arrays). That reference gets copied, so at this point the copied array refers to the exact same data object. That's fine as long as nobody tries to modify the array. When that data object is modified, the accessor checks how many references to the data object there are, and if there is more than one, then a copy of the data object is made at that time.

The observable behaviour of arrays is as if all the items were copied when the array struct is copied. But the implementation is a lot faster because some clever code only actually copies the data when needed. If you want the same optimisation for your own structs, you have to write the code to do it. It has nothing whatsoever to do with the compiler, it's all part of the Swift library implementation of an array.

gnasher729
  • 42,090
  • 4
  • 59
  • 119
  • Under the hood structs actually do reference the same data. Only once a mutation is performed is a copy made. This is for optimization purposes. But because of this I am not sure how they can avoid some form of ARC. – gloo Apr 10 '17 at 16:26
  • 4
    @gloo: Not for the huge majority of structs. – gnasher729 Apr 10 '17 at 19:48
  • Swift practices a lot Copy On Write, the reason of @gloo hypothesis – Luc-Olivier May 22 '21 at 17:32
3

@amon's link is a full answer. To quote

A value type is a type whose value is copied when it is assigned to a variable or constant, or when it is passed to a function.

By contrast for classes

Unlike value types, reference types are not copied when they are assigned to a variable or constant, or when they are passed to a function. Rather than a copy, a reference to the same existing instance is used instead.

The copy-by-value semantics of struct types tends to trip users familiar with languages that are solely/primarily copy-by-reference. To really understand this behavior it's worth looking at C (as Swift is heavily connected to Objective-C, itself built on C). The basics are found in the C reference.

object: region of data storage in the execution environment, the contents of which can represent values.

A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.

In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand

Putting these three definitions together tells us why C structs are not copy-on-write. If we have in C

struct foo {};

bar() {
    struct foo a = <whatever>;
    struct foo b = <whatever>;
    b = a;
}

In this case, a and b are values in regions of data storage (note: not references or pointers to those regions but the values in those regions). The expression b = a replaces the value stored in b's region with that in a's region, i.e. the value of a gets "copied" and assigned into b.

So C's behavior here is a natural result of its specification and abstract machine model (which was designed to be very amenable to efficient implementation on standard hardware). Objective C inherited this behavior and Swift is following up on Objective C.

walpen
  • 3,231
  • 1
  • 11
  • 20
  • I don't think this answers the question. Swift optimises away many copies that would happen on assignment. The question is how is tracking change done efficiently enough to make structs beat out classes. – GoatInTheMachine Apr 10 '17 at 16:57
3

The swift struct only needs to know if it's data is uniquely referenced or not, and not needing a full-fledged ARC. For example, on the line

var p2 = p1

the compiler could theoretically just flip a uniquely-referenced flag of p1 from off to on. Then, when a write is made, the compiler knows to copy.

Suppose p2 is deallocated before this happens. Now, the copy is redundant, but the compiler doesn't know, so it still makes the copy. That's why the uniquely-referenced approach isn't a perfect approximation, in actuality it falls between full ARC and always copying.

Note: the swift compiler doesn't actually work this way - based on @Alexander's comment and this answer, the copy on write behavior is implemented in swift code, not as a compiler optimization. On user-defined structs, it simply copies every time.

As to your point about efficiency: yes, technically copying every time is less efficient. When you have a bunch of copying and mutating behaviors, at some point it becomes more efficient just to switch to classes. That's why swift gives you both options.

JSquared
  • 221
  • 1
  • 3
-1

Note that COW/copy-on-write is an optimization. For (tiny?) structs, the compiler optimizer may determine that simply copying all the struct data is smaller and/or faster than inserting and/or executing COW or ARC-like code. However, the semantics of class objects don't allow this.

Older answer:

With COW structs, you have to copy a struct in memory just before the very first time you modify it after getting it or after passing it along to someone else. That's a 1 bit flag per struct per user, and a potential one-time copy per user, which can be optimized away in some cases. After that, the data in the struct is in your private memory and no one else can see it or change it under you.

Anyone else who tries to modify that struct under you should have made their own private copy first, and modified that instead, invisible to you.

If no one writes, no copies or tracking needs to be done (other than a 1 bit "clean" flag per user, which may or may not be optimize-awayable). Even if the 1 bit flag per user can't be optimized away, it's a lot simpler than reference counting/tracking.

hotpaw2
  • 7,938
  • 4
  • 21
  • 47
  • 2
    That is most definitely _not_ how structs in Swift work. The few structs that do that (arrays, strings) contain a pointer to a reference counted object and use a method that returns whether a reference counted object is held by a single reference or multiple references. – gnasher729 Apr 17 '17 at 12:00
  • Not sure what to believe. Many people say COW for all structs in swift and many say only arrays. Wish I could get an official answer from Chris Latner. – gloo Apr 19 '17 at 04:40
  • 2
    @gloo CoW is a specific type of behavior. Structs don't, in general, have this behavior, but they *can*. CoW behavior is implemented for Array, Set, Dictionary, and String, which *happen* to be structs. It's very easy to verify yourself that assigning a struct always produces a copy (just look at the addresses in the debugger). – Alexander May 19 '17 at 23:08