9 Replies - 34865 Views - Last Post: 24 December 2008 - 09:58 AM Rate Topic: -----

#1 George2  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 83
  • Joined: 14-December 08

Find unique strings for a string array

Posted 22 December 2008 - 10:16 PM

Hello everyone,

I have a string array, but may have duplicate strings. Any built-in or smart way to remove the duplicate ones and generate a string array contains only unique ones?

For example, the input array is {"abc", "bcd", "abc"}, the unique output array is {"abc", "bcd"}.

thanks in advance,
George
Is This A Good Question/Topic? 0
  • +

Replies To: Find unique strings for a string array

#2 PsychoCoder  Icon User is offline

  • Google.Sucks.Init(true);
  • member icon

Reputation: 1639
  • View blog
  • Posts: 19,853
  • Joined: 26-July 07

Re: Find unique strings for a string array

Posted 22 December 2008 - 10:30 PM

First, an ArrayList is much more efficient than a string array, then using Generics is far more efficient than an ArrayList. For an ArrayList I would do something like this

public static ArrayList UniqueValues(string value)
{
    ArrayList values = new ArrayList();

    //make sure "value" doesnt already exist
    if (!(values.Contains(value)))
    {
        //since we've made it this far we can add it
        values.Add(value);
    }  
     return values
}



For Generics
public static List<string> UniqueValues(string value)
{
    List<string> values = new List<string>();

    //make sure "value" doesn't already exist
    if (!(values.Contains(value)))
    {
        //since we've made it this far we can add it
        values.Add(value);
    }
    return values;
}



Now granted in your situation your ArrayList of Generic Collection would have to be a form level global so you can add to it as you go, these examples create a new object each time.

Now if you're in a position where you have to use a string array (such as a homework assignment) here is an example of passing a string array to a method and ensure that the value being passed doesn't already exist in the new string array

public static string[] UniqueValues(string[] value)
{
    //create a string array the length of the values
    //being passed to it
    string[] values = new string[value.Length];

    //loop through the initial array being passed
    //to the method
    for (int i = 0; i < value.Length; i++)
    {
        //make sure that "newValue" doesnt already
        //exist in our new string array
        if(!(values[i] == value[i]))
        {
            //add it to the new array
            values[i] = value[i];
        }
    }
    return values;
}



Hope that helps :)
Was This Post Helpful? 1
  • +
  • -

#3 n8wxs  Icon User is offline

  • --... ...-- -.. . -. ---.. .-- -..- ...
  • member icon

Reputation: 972
  • View blog
  • Posts: 3,878
  • Joined: 07-January 08

Re: Find unique strings for a string array

Posted 22 December 2008 - 10:33 PM

Use an ArrayList. Copy the first string in your array to the ArrayList. Then for each additional string in your array use the ArrayList.Contains() method to if the string is matches a previous string. If not, copy it else skip it and go on to the next string.

See ArrayList Methods
Was This Post Helpful? 0
  • +
  • -

#4 Martyr2  Icon User is offline

  • Programming Theoretician
  • member icon

Reputation: 4332
  • View blog
  • Posts: 12,127
  • Joined: 18-April 07

Re: Find unique strings for a string array

Posted 22 December 2008 - 10:46 PM

I would go another direction and actually use a hashtable (or generically a hashset). The beauty with that is that you can actually add the strings to the hash table and all duplicates will overwrite themselves and essentially remove them. So once you loop through and add to the hashtable/hashset, the duplicates will no longer exist.

// Items with duplicates
String[] values = { "item1", "item2", "item1", "item3", "item2" };

// Create a hashset of strings
System.Collections.Generic.HashSet<String> hash = new System.Collections.Generic.HashSet<String>();

// Loop through values and add to hashset
foreach (String val in values) {
     hash.Add(val);
}

// Now loop through the hashset to show you no more duplicates
foreach (String hval in hash)
{
     MessageBox.Show(hval);
}



As you will notice the messagebox then shows you item1, item2 and item3. No duplicates. This is because each value is hashed and stored and items which are duplicates will hash to the same value and thus overwrite one another in the hashset.

Enjoy!

"At DIC we be hashset tossing code ninjas... and no we do not do hash. Period." :snap:
Was This Post Helpful? 0
  • +
  • -

#5 eclipsed4utoo  Icon User is offline

  • Not Your Ordinary Programmer
  • member icon

Reputation: 1524
  • View blog
  • Posts: 5,960
  • Joined: 21-March 08

Re: Find unique strings for a string array

Posted 23 December 2008 - 06:44 AM

View PostPsychoCoder, on 23 Dec, 2008 - 12:30 AM, said:

First, an ArrayList is much more efficient than a string array


I would like to correct this. A string array is more efficient than an ArrayList because with the string array, the objects remain the same type(string). With an ArrayList, when you add a string to the list, it is converted to an OBJECT datatype, and when you loop through the list "foreach string", those OBJECT datatypes are converted back to strings. This boxing/unboxing issue with the ArrayList causes a hit on performance more than it would if using a string array.

Now an ArrayList(and Generic List) are easier to use and have more ways of manipulating the data than arrays, but the ArrayList is certainly not more efficient than an array.
Was This Post Helpful? 0
  • +
  • -

#6 beatles1692  Icon User is offline

  • D.I.C Head

Reputation: 13
  • View blog
  • Posts: 62
  • Joined: 03-December 08

Re: Find unique strings for a string array

Posted 23 December 2008 - 07:27 AM

Hi
I prefer this solution :

var values=new[] {"abc","acd","abc"};
var distinctValues=(from value in values select value).Distinct();
//or even better
var distinctValues=values.Distinct();




You should use System.Linq namespace.

This post has been edited by beatles1692: 23 December 2008 - 07:29 AM

Was This Post Helpful? 1
  • +
  • -

#7 baavgai  Icon User is online

  • Dreaming Coder
  • member icon

Reputation: 5801
  • View blog
  • Posts: 12,638
  • Joined: 16-October 07

Re: Find unique strings for a string array

Posted 23 December 2008 - 07:42 AM

View Posteclipsed4utoo, on 23 Dec, 2008 - 07:44 AM, said:

This boxing/unboxing issue with the ArrayList causes a hit on performance more than it would if using a string array.


You are assuming considerable overhead in boxing; I'm not sure I'd agree.

In any case, I'd prefer this method, for the original poster:
string[] GetUnique(string[] list) {
	List<string> uList = new List<string>();
	foreach (string s in list) {
		if (!uList.Contains(s)) { uList.Add(s); }
	}
	return uList.ToArray();
}



Here, we're using the a generic List for string, presumably with less "boxing" issues than ArrayList. We're also taking advantage of the builtin ToArray method that will feed us back an array based on the generic type automatically.

While I agree with the Hashtable method or Dictionary method, my feeling is that the price of a unique key lookup is being paid somewhere, so I'm not sure if it's particularly more effective. In some languages that practically run on associative arrays, it probably would be best, though.
Was This Post Helpful? 0
  • +
  • -

#8 George2  Icon User is offline

  • D.I.C Head

Reputation: 1
  • View blog
  • Posts: 83
  • Joined: 14-December 08

Re: Find unique strings for a string array

Posted 24 December 2008 - 01:23 AM

Thanks PsychoCoder,

1.

View PostPsychoCoder, on 22 Dec, 2008 - 09:30 PM, said:

First, an ArrayList is much more efficient than a string array, then using Generics is far more efficient than an ArrayList. For an ArrayList I would do something like this


Could you describe what do you mean "efficient" please? And why you think ArrayList is efficient than string array and generics is more efficient than ArrayList?

2.

View PostPsychoCoder, on 22 Dec, 2008 - 09:30 PM, said:

Now granted in your situation your ArrayList of Generic Collection would have to be a form level global so you can add to it as you go, these examples create a new object each time.


What do you mean "form level global"? Do you mean a general solution which could be used in the future?

The new object created you mean new List<string>() and new ArrayList?

3. I think the following code is wrong. You need two loops. :-)

View PostPsychoCoder, on 22 Dec, 2008 - 09:30 PM, said:

public static string[] UniqueValues(string[] value)
{
    //create a string array the length of the values
    //being passed to it
    string[] values = new string[value.Length];

    //loop through the initial array being passed
    //to the method
    for (int i = 0; i < value.Length; i++)
    {
        //make sure that "newValue" doesnt already
        //exist in our new string array
        if(!(values[i] == value[i]))
        {
            //add it to the new array
            values[i] = value[i];
        }
    }
    return values;
}



Hope that helps :)


regards,
George

I like your method, thanks n8wxs!

View Postn8wxs, on 22 Dec, 2008 - 09:33 PM, said:

Use an ArrayList. Copy the first string in your array to the ArrayList. Then for each additional string in your array use the ArrayList.Contains() method to if the string is matches a previous string. If not, copy it else skip it and go on to the next string.

See ArrayList Methods


regards,
George


Thanks Martyr2,

View PostMartyr2, on 22 Dec, 2008 - 09:46 PM, said:

I would go another direction and actually use a hashtable (or generically a hashset). The beauty with that is that you can actually add the strings to the hash table and all duplicates will overwrite themselves and essentially remove them. So once you loop through and add to the hashtable/hashset, the duplicates will no longer exist.

// Items with duplicates
String[] values = { "item1", "item2", "item1", "item3", "item2" };

// Create a hashset of strings
System.Collections.Generic.HashSet<String> hash = new System.Collections.Generic.HashSet<String>();

// Loop through values and add to hashset
foreach (String val in values) {
     hash.Add(val);
}

// Now loop through the hashset to show you no more duplicates
foreach (String hval in hash)
{
     MessageBox.Show(hval);
}



As you will notice the messagebox then shows you item1, item2 and item3. No duplicates. This is because each value is hashed and stored and items which are duplicates will hash to the same value and thus overwrite one another in the hashset.

Enjoy!

"At DIC we be hashset tossing code ninjas... and no we do not do hash. Period." :snap:


Your method works!

regards,
George


Thanks eclipsed4utoo,

View Posteclipsed4utoo, on 23 Dec, 2008 - 05:44 AM, said:

View PostPsychoCoder, on 23 Dec, 2008 - 12:30 AM, said:

First, an ArrayList is much more efficient than a string array


I would like to correct this. A string array is more efficient than an ArrayList because with the string array, the objects remain the same type(string). With an ArrayList, when you add a string to the list, it is converted to an OBJECT datatype, and when you loop through the list "foreach string", those OBJECT datatypes are converted back to strings. This boxing/unboxing issue with the ArrayList causes a hit on performance more than it would if using a string array.

Now an ArrayList(and Generic List) are easier to use and have more ways of manipulating the data than arrays, but the ArrayList is certainly not more efficient than an array.


I disagree with you, I think string does not need to box/unbox. It is not value type but reference type. Any comments?

regards,
George


Sorry, beatles1692!

View Postbeatles1692, on 23 Dec, 2008 - 06:27 AM, said:

Hi
I prefer this solution :

var values=new[] {"abc","acd","abc"};
var distinctValues=(from value in values select value).Distinct();
//or even better
var distinctValues=values.Distinct();




You should use System.Linq namespace.


I need to use .Net version 3.0. LINQ is from .Net 3.5.

regards,
George


Hi baavgai,

View Postbaavgai, on 23 Dec, 2008 - 06:42 AM, said:

View Posteclipsed4utoo, on 23 Dec, 2008 - 07:44 AM, said:

This boxing/unboxing issue with the ArrayList causes a hit on performance more than it would if using a string array.


You are assuming considerable overhead in boxing; I'm not sure I'd agree.

In any case, I'd prefer this method, for the original poster:
string[] GetUnique(string[] list) {
	List<string> uList = new List<string>();
	foreach (string s in list) {
		if (!uList.Contains(s)) { uList.Add(s); }
	}
	return uList.ToArray();
}



Here, we're using the a generic List for string, presumably with less "boxing" issues than ArrayList. We're also taking advantage of the builtin ToArray method that will feed us back an array based on the generic type automatically.

While I agree with the Hashtable method or Dictionary method, my feeling is that the price of a unique key lookup is being paid somewhere, so I'm not sure if it's particularly more effective. In some languages that practically run on associative arrays, it probably would be best, though.


Sorry I disagree with you. Box/unbox related to value type, but string is a reference type. So, no box/unbox is needed. Please feel free to correct me if I am wrong.

regards,
George
Was This Post Helpful? 0
  • +
  • -

#9 eclipsed4utoo  Icon User is offline

  • Not Your Ordinary Programmer
  • member icon

Reputation: 1524
  • View blog
  • Posts: 5,960
  • Joined: 21-March 08

Re: Find unique strings for a string array

Posted 24 December 2008 - 09:48 AM

View PostGeorge2, on 24 Dec, 2008 - 03:23 AM, said:

2.

View PostPsychoCoder, on 22 Dec, 2008 - 09:30 PM, said:

Now granted in your situation your ArrayList of Generic Collection would have to be a form level global so you can add to it as you go, these examples create a new object each time.


What do you mean "form level global"? Do you mean a general solution which could be used in the future?

The new object created you mean new List<string>() and new ArrayList?


This is what he means...

// ....using statements....

namespace WindowsFormsApplication4
{
	public partial class Form1 : Form
	{
		//This is a form level global variable.
		//It is accessible to all methods and events of the form.
		List<string> listOfStrings = new List<string>();

		public Form1()
		{
			InitializeComponent();
		}

		private void Form1_Load(object sender, EventArgs e)
		{
			listOfStrings.Add("some new string");
		}
	}
}


Was This Post Helpful? 0
  • +
  • -

#10 baavgai  Icon User is online

  • Dreaming Coder
  • member icon

Reputation: 5801
  • View blog
  • Posts: 12,638
  • Joined: 16-October 07

Re: Find unique strings for a string array

Posted 24 December 2008 - 09:58 AM

View PostGeorge2, on 24 Dec, 2008 - 02:23 AM, said:

Sorry I disagree with you. Box/unbox related to value type, but string is a reference type. So, no box/unbox is needed. Please feel free to correct me if I am wrong.


You are correct, of course. My nomenclature was a play of the poster I was referencing and not accurate.

What I should have said is type obfuscation that comes from referencing the parent type of an instance. In searching for equality this should be irrelevant, since Equals is a base method of all objects and can be safely called without casting. The only overhead would come from casting from object to string and that should be minimal.

Hope this makes sense. It's not really relevant to the code I offered, but you asked.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1