Generating Substrings

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

31 Replies - 1578 Views - Last Post: 14 July 2013 - 12:09 AM Rate Topic: -----

#1 salazar  Icon User is offline

  • D.I.C Addict

Reputation: 86
  • View blog
  • Posts: 538
  • Joined: 26-June 13

Generating Substrings

Posted 03 July 2013 - 07:23 PM

Does anyone know of a more optimal yet simple way to generate all the substrings of a string?

set<string> generate_all_substrings(string s)
{
	
	set<string> substrings;
	string substring = "";
	string::iterator iter;
	for(iter = s.begin(); iter != s.end(); ++iter, substring = "")
	{
		for(string::iterator iter_2 = iter; iter_2 != s.end(); ++iter_2)
		{
			substring += *iter_2;
			substrings.insert(substring);	
		}		
		substrings.insert(substring);				
	}
	return substrings;	
}



Is This A Good Question/Topic? 0
  • +

Replies To: Generating Substrings

#2 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3162
  • View blog
  • Posts: 9,541
  • Joined: 05-May 12

Re: Generating Substrings

Posted 03 July 2013 - 08:04 PM

Yes. Use string::substr() instead.
http://en.cppreferen...c_string/substr

By building up the string one character at at time in your inner loop, your are incurring the cost of reallocating a string each character added if it is a naive implementation of the string class.

This post has been edited by Skydiver: 03 July 2013 - 08:05 PM

Was This Post Helpful? 1
  • +
  • -

#3 David W  Icon User is offline

  • DIC supporter
  • member icon

Reputation: 275
  • View blog
  • Posts: 1,764
  • Joined: 20-September 08

Re: Generating Substrings

Posted 04 July 2013 - 04:47 PM

Sometimes just getting the code,

'just to work as desired',

may result in NOT optimal code, (at first) ... :)

You may like to add some permutations to the original string to get all possible subsets
(if order is considered)...



#include <iostream>
#include <iomanip>
#include <string>
#include <set>

#include <algorithm> // re. next_permutation( beginPtr, endPtr );

// main //

    // using a C++ string //
    
    string s = "abcd";
    set< string > substrings;
    generate_all_substrings( substrings, s );
    while( next_permutation( s.begin(), s.end() ) )
    {
        generate_all_substrings( substrings, s );
    }
    print_set( substrings );
    
    substrings.clear();
    cout << endl;


This post has been edited by David W: 04 July 2013 - 04:50 PM

Was This Post Helpful? 1
  • +
  • -

#4 salazar  Icon User is offline

  • D.I.C Addict

Reputation: 86
  • View blog
  • Posts: 538
  • Joined: 26-June 13

Re: Generating Substrings

Posted 04 July 2013 - 10:55 PM

View PostSkydiver, on 03 July 2013 - 08:04 PM, said:

Yes. Use string::substr() instead.
http://en.cppreferen...c_string/substr

By building up the string one character at at time in your inner loop, your are incurring the cost of reallocating a string each character added if it is a naive implementation of the string class.


Would you kind enough to give your interpretation of how string::substring() is implemented?

Wouldn't the substring have to do the same thing I am doing in my for loop? Does it use a more efficient algorithm for getting the substring?
Was This Post Helpful? 0
  • +
  • -

#5 salazar  Icon User is offline

  • D.I.C Addict

Reputation: 86
  • View blog
  • Posts: 538
  • Joined: 26-June 13

Re: Generating Substrings

Posted 04 July 2013 - 11:02 PM

View PostDavid W, on 04 July 2013 - 04:47 PM, said:

You may like to add some permutations to the original string to get all possible subsets
(if order is considered)...


Yeah, it is nice if you can get it to work at all. But it does not mean it is optimal.

I only want to generate the substrings as they appear, not considering the different orders of the string, however.
Does this work when not considering permutations?
Was This Post Helpful? 0
  • +
  • -

#6 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3162
  • View blog
  • Posts: 9,541
  • Joined: 05-May 12

Re: Generating Substrings

Posted 05 July 2013 - 05:15 AM

View Postsalazar, on 05 July 2013 - 01:55 AM, said:

Would you kind enough to give your interpretation of how string::substring() is implemented?

Wouldn't the substring have to do the same thing I am doing in my for loop? Does it use a more efficient algorithm for getting the substring?

If were to implement substr(), the core of my code would allocate the space needed and then copy the characters across instead of allocate one character, copy, allocate one character, copy, etc.

I've not looked at the GCC implementation of substr(), but here's the Microsoft implementation which is based on the Dinkumware implementation:
	_Myt substr(size_type _Off = 0, size_type _Count = npos) const
		{	// return [_Off, _Off + _Count) as new string
		return (_Myt(*this, _Off, _Count, get_allocator()));
		}

	basic_string(const _Myt& _Right, size_type _Roff, size_type _Count,
		const _Alloc& _Al)
		: _Mybase(_Al)
		{	// construct from _Right [_Roff, _Roff + _Count) with allocator
		_Tidy();
		assign(_Right, _Roff, _Count);
		}

	_Myt& assign(const _Myt& _Right,
		size_type _Roff, size_type _Count)
		{	// assign _Right [_Roff, _Roff + _Count)
		if (_Right.size() < _Roff)
			_Xran();	// _Roff off end
		size_type _Num = _Right.size() - _Roff;
		if (_Count < _Num)
			_Num = _Count;	// trim _Num to size

		if (this == &_Right)
			erase((size_type)(_Roff + _Num)), erase(0, _Roff);	// substring
		else if (_Grow(_Num))
			{	// make room and assign new stuff
			_Traits::copy(this->_Myptr(),
				_Right._Myptr() + _Roff, _Num);
			_Eos(_Num);
			}
		return (*this);
		}

	static _Elem *__CLRCALL_OR_CDECL copy(_Elem *_First1, const _Elem *_First2,
		size_t _Count)
		{	// copy [_First2, _First2 + _Count) to [_First1, ...)
		return (_Count == 0 ? _First1
			: (_Elem *)_CSTD memcpy(_First1, _First2, _Count));
		}



_Grow() is what does the memory allocation, and as you can see, copy just copies all the data in one go.

There are a lot of smart people who have contributed to GCC's standard library. I would be very surprised if the implementation is not optimized as well.
Was This Post Helpful? 1
  • +
  • -

#7 salazar  Icon User is offline

  • D.I.C Addict

Reputation: 86
  • View blog
  • Posts: 538
  • Joined: 26-June 13

Re: Generating Substrings

Posted 05 July 2013 - 01:35 PM

Quote

If were to implement substr(), the core of my code would allocate the space needed and then copy the characters across instead of allocate one character, copy, allocate one character, copy, etc.

I didn't realize that I was allocating every time. Wow, that's cool.
Took me a little bit of time to understand, but I think I get the main flow of the code. Okay, thanks for this info, I'll try this and so how it goes.

This post has been edited by salazar: 05 July 2013 - 01:36 PM

Was This Post Helpful? 0
  • +
  • -

#8 salazar  Icon User is offline

  • D.I.C Addict

Reputation: 86
  • View blog
  • Posts: 538
  • Joined: 26-June 13

Re: Generating Substrings

Posted 06 July 2013 - 01:34 PM

It's still not efficient enough.
Was This Post Helpful? 0
  • +
  • -

#9 jimblumberg  Icon User is offline

  • member icon


Reputation: 3845
  • View blog
  • Posts: 11,735
  • Joined: 25-December 09

Re: Generating Substrings

Posted 06 July 2013 - 01:56 PM

Quote

It's still not efficient enough.

Not efficient enough for what?

Post your current code and describe exactly what your trying to accomplish. And if you have a problem, clearly state the problem.


Jim
Was This Post Helpful? 0
  • +
  • -

#10 vividexstance  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 630
  • View blog
  • Posts: 2,107
  • Joined: 31-December 10

Re: Generating Substrings

Posted 06 July 2013 - 02:06 PM

I'm just wondering how you're testing it to see if it's efficient or not?
Was This Post Helpful? 0
  • +
  • -

#11 #define  Icon User is offline

  • Duke of Err
  • member icon

Reputation: 1276
  • View blog
  • Posts: 4,395
  • Joined: 19-February 09

Re: Generating Substrings

Posted 06 July 2013 - 06:35 PM

Do you want speed, memory, or code optimization/efficiency?

You could pass the set as a reference parameter.

void generate_all_substrings(set<string> &substr, string str)



Your function could just generate one sub-string at a time, which might be more efficient in certain situations.
Was This Post Helpful? 0
  • +
  • -

#12 salazar  Icon User is offline

  • D.I.C Addict

Reputation: 86
  • View blog
  • Posts: 538
  • Joined: 26-June 13

Re: Generating Substrings

Posted 08 July 2013 - 11:36 PM

It's part of program i'm writing as a programming competition practice problem. On this program, I keep exceeding the time limit for execution. I'm trying to find a faster way that isn't too complex.

#include <algorithm>
#include <iostream>
#include <set>
using namespace std;
set<string> generate_all_substrings(string s)
{	
	set<string> substrings;
	string substring = "";
	size_t size = s.size();
	for(int start = 0; start < size; ++start, substring = "")
	{
		for(int count = 1; count <= size; ++count)
		{
			substring = s.substr(start, count);
			substrings.insert(substring);	
		}				
	}
	return substrings;
	
}


Was This Post Helpful? 0
  • +
  • -

#13 Skydiver  Icon User is offline

  • Code herder
  • member icon

Reputation: 3162
  • View blog
  • Posts: 9,541
  • Joined: 05-May 12

Re: Generating Substrings

Posted 09 July 2013 - 12:12 AM

Have you run a profiler to verify that where the majority of your time is spent?
Was This Post Helpful? 3
  • +
  • -

#14 salazar  Icon User is offline

  • D.I.C Addict

Reputation: 86
  • View blog
  • Posts: 538
  • Joined: 26-June 13

Re: Generating Substrings

Posted 09 July 2013 - 12:30 AM

What's a profiler?
Was This Post Helpful? 0
  • +
  • -

#15 jimblumberg  Icon User is offline

  • member icon


Reputation: 3845
  • View blog
  • Posts: 11,735
  • Joined: 25-December 09

Re: Generating Substrings

Posted 09 July 2013 - 12:36 AM

I agree with Skydiver it is possible that this function is not the source of the problem. It would be helpful to see the complete program.

Also I would recommend that you create the set<> instance in your calling function and pass that instance by reference into your function.

void generate_all_substrings(set<string>& substrings, string& s)


Passing the parameters by reference will avoid the copying that is being done in your current function.

Jim

This post has been edited by jimblumberg: 09 July 2013 - 12:36 AM

Was This Post Helpful? 2
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3