Should I use a list or an array?

Question

I'm working on a windows form to calculate UPC for item numbers.

I successfully create one that will handle one item number/UPC at a time, now I want to expand and do it for multiple item numbers/UPCs.

I have started and tried using a list, but I keep getting stuck. I created a helper class:

public class Codes
{
    private string incrementedNumber;
    private string checkDigit;
    private string wholeNumber;
    private string wholeCodeNumber;
    private string itemNumber;

    public Codes(string itemNumber, string incrementedNumber, string checkDigit, string wholeNumber, string wholeCodeNumber)
    {
        this.incrementedNumber = incrementedNumber;
        this.checkDigit = checkDigit;
        this.wholeNumber = wholeNumber;
        this.wholeCodeNumber = wholeCodeNumber;
        this.itemNumber = itemNumber;
    }

    public string ItemNumber
    {
        get { return itemNumber; }
        set { itemNumber = value; }
    }

    public string IncrementedNumber
    { 
        get { return incrementedNumber; }
        set { incrementedNumber = value; } 
    }

    public string CheckDigit
    {
        get { return checkDigit; }
        set { checkDigit = value; }
    }

    public string WholeNumber
    {
        get { return wholeNumber; }
        set { wholeNumber = value; }
    }

    public string WholeCodeNumber
    {
        get { return wholeCodeNumber; }
        set { wholeCodeNumber = value; }
    }

}

Then I got started on my code, but the issue is that the process is incremental, meaning I get the item number from a gridview via checkboxes and put them in the list. Then I get the last UPC from the database, strip the checkdigit, then increment the number by one and put it in the list. Then I calculate the checkdigit for the new number and put that in the list. And here I already get an Out of Memory Exception. Here is the code I have so far:

List<Codes> ItemNumberList = new List<Codes>();


    private void buttonSearch2_Click(object sender, EventArgs e)
    {
        //Fill the datasets
        this.immasterTableAdapter.FillByWildcard(this.alereDataSet.immaster, (textBox5.Text));
        this.upccodeTableAdapter.FillByWildcard(this.hangtagDataSet.upccode, (textBox5.Text));
        this.uPCTableAdapter.Fill(this.uPCDataSet.UPC);
        string searchFor = textBox5.Text;
        int results = 0;
        DataRow[] returnedRows;
        returnedRows = uPCDataSet.Tables["UPC"].Select("ItemNumber = '" + searchFor + "2'");
        results = returnedRows.Length;
        if (results > 0)
        {
            MessageBox.Show("This item number already exists!");
            textBox5.Clear();
            //clearGrids();
        }
        else
        {
            //textBox4.Text = dataGridView1.Rows[0].Cells[1].Value.ToString();
            MessageBox.Show("Item number is unique.");
        }
    }

    public void checkMarks()
    {

        for (int i = 0; i < dataGridView7.Rows.Count; i++)
        {
            if ((bool)dataGridView7.Rows[i].Cells[3].FormattedValue)
            {
                {
                    ItemNumberList.Add(new Codes(dataGridView7.Rows[i].Cells[0].Value.ToString(), "", "", "", ""));
                }
            }
        }
    }

    public void multiValue1()
    {
        _value = uPCDataSet.UPC.Rows[uPCDataSet.UPC.Rows.Count - 1]["UPCNumber"].ToString();//get last UPC from database
        _UPCNumber = _value.Substring(0, 11);//strip out the check-digit
        _UPCNumberInc = Convert.ToInt64(_UPCNumber);//convert the value to a number

        for (int i = 0; i < ItemNumberList.Count; i++)
        {
            _UPCNumberInc = _UPCNumberInc + 1;
            _UPCNumberIncrement = Convert.ToString(_UPCNumberInc);//assign the incremented value to a new variable
            ItemNumberList.Add(new Codes("", _UPCNumberIncrement, "", "", ""));//**here I get the OutOfMemoreyException**
        }

        for (int i = 0; i < ItemNumberList.Count; i++)
        {
            long chkDigitOdd;
            long chkDigitEven;
            long chkDigitSubtotal;
            chkDigitOdd = Convert.ToInt64(_UPCNumberIncrement.Substring(0, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(2, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(4, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(6, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(8, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(10, 1));
            chkDigitOdd = (3 * chkDigitOdd);
            chkDigitEven = Convert.ToInt64(_UPCNumberIncrement.Substring(1, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(3, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(5, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(7, 1)) + Convert.ToInt64(_UPCNumberIncrement.Substring(9, 1));
            chkDigitSubtotal = (300 - (chkDigitEven + chkDigitOdd));
            _chkDigit = chkDigitSubtotal.ToString();
            _chkDigit = _chkDigit.Substring(_chkDigit.Length - 1, 1);
            ItemNumberList.Add(new Codes("", "",_chkDigit, "", ""));
        }

Is this the right way to go about it, using a list, or should I be looking at a different way?

Using a list is fine. Iterating the list while adding to it is a surefire way to blow up your code, and indicates a problem in your logic (or code writing). Also, this is your bug and not likely to help future visitors. Voting to close. — Telastyn, Dec 19 '13 at 15:56
As a side note, all these private fields (in your `Code` class) are redundant and nothing but noise really, `{ get; private set; }` would be enough. — Konrad Morawski, Dec 19 '13 at 15:58
Is the question title for this really accurate? This doesn't really seem like a *list vs array* question, so much as a *How can I improve my implementation* question. That being said, if you're **adding** or **removing** elements, you want a list (or other flexible data structure). Arrays are only really good when you know exactly how many elements you need at the start. — KChaloux, Dec 19 '13 at 15:59
@KChaloux, that's pretty much what I wanted to know. Is a list the right way to go about this, or should I have looked at a different way to pursue this? So it seems a list is a good way, I just have to adjust my logic. — campagnolo_1, Dec 19 '13 at 16:07
@Telastyn, I wasn't so much asking to improve my code as to show what I'm trying to do. — campagnolo_1, Dec 19 '13 at 16:08

score 74 · Accepted Answer · edited Jun 17 '15 at 17:27

I'll expand my comment:

... if you're adding or removing elements, you want a list (or other flexible data structure). Arrays are only really good when you know exactly how many elements you need at the start.

A Quick Breakdown

Arrays are good when you have a fixed number of elements that is unlikely to change, and you wish to access it in a non-sequential fashion.

Fixed Size
Fast Access - O(1)
Slow Resize - O(n) - needs to copy every element to a new array!

Linked-Lists are optimized for quick additions and removals at either end, but are slow to access in the middle.

Variable Size
Slow Access at middle - O(n)
- Needs to traverse each element starting from the head in order to reach the desired index
Fast Access at Head - O(1)
Potentially fast access at Tail
- O(1) if a reference is stored at the tail end (as with a doubly-linked list)
- O(n) if no reference is stored (same complexity as accessing a node in the middle)

Array Lists (such as List<T> in C#!) are a mixture of the two, with fairly fast additions and random access. List<T> will often be your go-to collection when you're not sure what to use.

Uses an array as a backing structure
Is smart about its resizing - allocates the double of its current space when it runs out of it.
- This leads to O(log n) resizes, which is better than resizing every time we add/remove
Fast Access - O(1)

How Arrays Work

Most languages model arrays as contiguous data in memory, of which each element is the same size. Let's say we had an array of ints (shown as [address: value], using decimal addresses because I'm lazy)

[0: 10][32: 20][64: 30][96: 40][128: 50][160: 60]

Each of these elements is a 32-bit integer, so we know how much space it takes up in memory (32 bits!). And we know the memory address of the pointer to the first element.

It's trivially easy to get to the value of any other element in that array:

Take the address of the first element
Take the offset of each element (its size in memory)
Multiply the offset by the desired index
Add your result to the address of the first element

Let's say our first element is at '0'. We know our second element is at '32' (0 + (32 * 1)), and our third element is at 64 (0 + (32 * 2)).

The fact that we can store all these values next to each other in memory means our array is as compact as it can possibly be. It also means that all our elements need to stay together for things to continue working!

As soon as we add or remove an element, we need to pick up everything else, and copy them over to some new place in memory, to make sure there are no gaps between elements, and everything has enough room. This can be very slow, especially if you're doing it every time you want to add a single element.

Linked Lists

Unlike arrays, Linked Lists don't need all their elements to be next to each other in memory. They are composed of nodes, that store the following info:

Node<T> {
    Value : T      // Value of this element in the list
    Next : Node<T> // Reference to the node that comes next
}

The list itself keeps a reference to the head and tail (first and last nodes) in most cases, and sometimes keeps track of its size.

If you want to add an element to the end of the list, all you need to do is get the tail, and change its Next to reference a new Node containing your value. Removing from the end is equally simple - just dereference the Next value of the preceding node.

Unfortunately, if you have a LinkedList<T> with 1000 elements, and you want element 500, there's no easy way to jump right to the 500th element like there is with an array. You need to start at the head, and keep going to the Next node, until you've done it 500 times.

This is why adding and removing from a LinkedList<T> is fast (when working at the ends), but accessing the middle is slow.

Edit: Brian points out in the comments that Linked Lists have a risk of causing a page fault, due to not being stored in contiguous memory. This can be hard to benchmark, and can make Linked Lists even a bit slower than you might expect given just their time complexities.

Best of Both Worlds

List<T> compromises for both T[] and LinkedList<T> and comes up with a solution that's reasonably fast and easy to use in most situations.

Internally, List<T> is an array! It still has to jump through the hoops of copying its elements when resizing, but it pulls some neat tricks.

For starters, adding a single element doesn't usually cause the array to copy. List<T> makes sure there's always enough room for more elements. When it runs out, instead of allocating a new internal array with just one new element, it will allocate a new array with several new elements (often twice as many as it currently holds!).

Copy operations are expensive, so List<T> cuts down on them as much as possible, while still allowing fast random access. As a side effect, it may end up wasting slightly more space than a straight-up array or linked list, but it's usually worth the tradeoff.

TL;DR

Use List<T>. It's normally what you want, and it seems to be correct for you in this situation (where you're calling .Add()). If you're unsure of what you need, List<T> is a good place to start.

Arrays are good for high-performance, "I know I need exactly X elements" things. Alternatively, they're useful for quick, one-off "I need to group these X things I've already defined together so I can loop over them" structures.

There are a number of other collection classes. Stack<T> is like a linked list that only operates from the top. Queue<T> works as a first-in-first-out list. Dictionary<T, U> is an unordered, associative mapping between keys and values. Play with them and get to know the strengths and weaknesses of each. They can make or break your algorithms.

This is brilliant! Thank you so much for going through the trouble and pointing out all the different possibilities. — campagnolo_1, Dec 19 '13 at 17:16
In some cases, there may be advantages to using a combination of an array and an `int` indicating the number of usable elements. Among other things, it's possible to bulk-copy multiple elements from one array to another, but copying between lists generally requires processing elements individually. Further, array elements may be passed by `ref` to things like `Interlocked.CompareExchange`, while list items cannot. — supercat, Dec 19 '13 at 17:35
I wish I could give more than one upvote. I knew the use-case differences, and how arrays/linked lists worked, but I never knew or thought about how `List<>` worked under the hood. — Bobson, Dec 19 '13 at 20:32
@Bobson I'm not an authority, so I hope my understanding is correct :p As far as I know, `List` is very similar to the old `ArrayList` class, except it's generic, rather than storing `object`s. — KChaloux, Dec 19 '13 at 20:38
Could you note that the properties you state for Linked Lists `O(1) for tail and head`, `O(m) middle`, are properties of doubly linked lists and not traditional (singly) linked lists? — James, Dec 20 '13 at 01:13
Adding a single element to a List is amortized O(1); efficiency of adding elements is not normally sufficient justification to use a linked list (and a circular List allows you to add to front AND back in amortized O(1)). Linked lists have a lot of performance idiosyncrasies. For example, not being contiguously stored in memory means iterating over an entire linked list is more likely to trigger a page fault...and this is hard to benchmark. The bigger justification for using a Linked List is when you want to concatenate two lists (can be done in O(1)) or add elements to the middle. — Brian, Dec 20 '13 at 04:18
I should clarify. When I said circular list, I meant a circular array list, not a circular linked list. The correct term would be deque (double-ended queue). They're often implemented pretty much the same way as a List (array under the hood), with one exception: There is an internal integer value, "first" which indicates which index of the array is the first element. To add an element to the back, you just subtract 1 from "first" (wrapping around to the array's length if necessary). To access an element, you just access `(index+first)%length`. — Brian, Dec 20 '13 at 04:27
There are a few things you can't do with a List that you can do with a plain array, for example, passing an index element as a ref parameter. — Ian Goldby, Apr 30 '15 at 14:27

score 6 · Answer 2 · edited Apr 12 '17 at 07:31

6

While KChaloux answer is great, I would like to point out another consideration: List<T> is a lot more powerful than an Array. The methods of List<T> are very useful in a lot of circumstances - an Array doesn't have these methods and you may spend a lot of time to implement workarounds.

So, from a development perspective I nearly always use List<T> because when there are additional requirements, they are often far easier to implement when you are using a List<T>.

This leads to a final problem: My code (I don't know about yours) contains 90% List<T>, so Arrays are not really fitting in. When I pass them around, I have to call their .toList() method and convert them to a List - this is annoying and is so slow that any performance gain from using an Array is lost.

edited Apr 12 '17 at 07:31

Community

1

answered Dec 20 '13 at 07:12

Christian Sauer

1,269
1
9
16

That's true, this is another good reason to use List - it provides a lot more functionality for you built directly in to the class. – KChaloux Dec 20 '13 at 13:40
1

Doesn't LINQ level the field by adding a lot of that functionality for any IEnumerable (including an array)? Is there anything left in modern C# (4+) that you can only do with a List and not an array? – Dave Oct 29 '14 at 14:46
1

@Dave Extending the Array/ List seems such a thing. Also, I would say the syntax to construct / handle a List are much nicer than for arrays. – Christian Sauer Oct 29 '14 at 15:32

score 2 · Answer 3 · answered Sep 05 '16 at 19:23

Noone mentioned this part though: "And here I already get an Out of Memory Exception." Which is entirely due to

for (int i = 0; i < ItemNumberList.Count; i++)
{
    ItemNumberList.Add(new item());
}

It's plain to see why. I don't know whether you meant to add to a different list, or should just store ItemNumberList.Count as a variable before the loop to get your desired result, but this is simply broken.

Programmers.SE is for "...interested in conceptual questions about software development...", and the other answers treated it as such. Try http://codereview.stackexchange.com instead, where this question would fit. But even there it is horrid, as we can only assume this code starts at _Click, which has no call to multiValue1 where you say the error happens.