Super String Documentation

Contents

Overview

Super string is a set of souped up string classes with fancy query, replacement, and type conversion functions. Super string has two variants:

The immutable form is provided as an experiment based on discussions from the boost list (see References). It has the advantage of never changing and hence is thread safe and can optimize some operations. However, it is slower on some kinds of operations. See Performance for more details.

Some of the key functions provided include:

Code Examples

#include "super_string/super_string.hpp"
//...

  super_string s("    (456789) [123]  2006-10-01    abcdef   ");
  s.to_upper();
  cout << s << endl;
  
  s.trim();  //lop off the whitespace on both sides
  cout << s << endl;
  
  double dbl = 1.23456;
  s.append(dbl);  //append any streamable type 
  s+= "  ";
  cout << s << endl;
  
  date d(2006, Jul, 1);
  s.insert_at(28, d);  //insert any streamable type
  cout << s << endl;
  
  //find the yyyy-mm-dd date format
  if (s.contains_regex("\\d{4}-\\d{2}-\\d{2}")) {
    //replace parens around digits with square brackets [the digits]
    s.replace_all_regex("\\(([0-9]+)\\)", "__[$1]__");
    cout << s << endl;
    
    
    //split the string on white space to process parts
    super_string::string_vector out_vec;
    unsigned int count = s.split_regex("\\s+", out_vec);
    if (count) {
      for(int i=0; i < out_vec.size(); ++i) {
        out_vec[i].replace_first("__","");  //get rid of first __ in string
        cout << i << "  " << out_vec[i] << endl;
      }
    }
  }
  
  //wide strings too...
  wsuper_string ws(L"   hello world ");
  ws.trim_left();
  wcout << ws << endl;

In the immutable form all mutating functions must be assigned back to the target string. In this case always the same string.

#include "super_string/const_super_string.hpp"
//...

  //const_super_string is immutable
  const_super_string s("    (456789) [123]  2006-10-01    abcdef   ");
  s = s.to_upper(); //"    (456789) [123]  2006-10-01    ABCDEF   "
  cout << s << endl;
  
  s = s.trim();  //"(456789) [123]  2006-10-01    ABCDEF"
  cout << s << endl;
  
  double dbl = 1.23456;
  s = s.append(dbl);  //"(456789) [123]  2006-10-01    ABCDEF1.23456"
  cout << s << endl;
  
  //find the yyyy-mm-dd date format
  if (s.contains_regex("\\d{4}-\\d{2}-\\d{2}")) {
    //replace parens around digits with square brackets [the digits]
    s = s.replace_all_regex("\\(([0-9]+)\\)", "__[$1]__");
    cout << s << endl;
    
    
    //split the string on white space to process parts
    const_super_string::string_vector out_vec;
    unsigned int count = s.split_regex("\\s+", out_vec);
    if (count) {
      for(int i=0; i < out_vec.size(); ++i) {
        out_vec[i].replace_first("__","");  //get rid of first __ in string
        cout << i << "  " << out_vec[i] << endl;
      }
    }
  }
  
  //wide strings too...
  wconst_super_string ws(L"   hello world ");
  ws.trim_left();
  wcout << ws << endl;
  
  return 0;

}

Library Goals

This library the following main goals:

Functional Requirements

Non-Functional Requirements

Overall, this class is mostly a convience wrapper around functions available in boost.string_algo and boost.regex.

Building Super String

Files:

Test code:

Example code:

Docs

Design Decisions

Why a new string type when functional interfaces work fine?

The main rational for a new type is to integrate and simplify the interface to regex and the other libraries. There's two dimensions to the cleaner code: 1) library user code is clearer, and 2) interface to complex generic libraries is simplified.

Here's an example of how the type-based interface results in clearer client code:

  std::string s1("foo");
  std::string s2("bar);
  std::string s3("foo");
  //The next line makes me go read the docs again, every time
  replace_all(s1,s2,s3); //which string is modified exactly?
 or
  s1.replace_all(s2, s3); //obvious which string is modified here

Another reason for super_string is the simplification of documenation. Generic libraries have many template parameters which often makes it difficult to focus on the user documentation. Just take regex_replace as a case in point:

template <class OutputIterator, class BidirectionalIterator, class traits,
class charT>
OutputIterator regex_replace(OutputIterator out,
                            BidirectionalIterator first,
                            BidirectionalIterator last,
                            const basic_regex<charT, traits>& e,
                            const basic_string<charT>& fmt,
                            match_flag_type flags = match_default);

template <class traits, class charT>
basic_string<charT> regex_replace(const basic_string<charT>& s,
                             const basic_regex<charT, traits>& e,
                             const basic_string<charT>& fmt,
                             match_flag_type flags = match_default);

My first reaction when I read this is, wow, interesting, but how do I use it? It's hard for even an experienced guy like me to see the forest from the template tree's here. So I scroll down to the example and start reading the example code. Ok, now I see it and I can go back, consume it, ponder more...then realize, ok I guess it's the second signature because I'm using an std::string...now I can go write some code. (Of course, I normally don't do it like this because I go and look up some regex code I've already written).

Now lets compare JavaString.replaceAll short description.

  String replaceAll(String regex, String replacement)
           Replaces each substring of this string that matches the given
           regular expression with the given replacement.

Wow, ok I don't need to see the example code, I can write code now. I might need to read more about the regex string rules, but no biggie they follow expected conventions. After 2 minutes I'm testing code.

Of course, JavaString.replaceAll is just lame compared to what regex can do. But, you know, it covers most of what I use for typical day to day string processing. It's clean, easy, fast -- I can focus on other parts of my app rather than the template parameters for the string function.

Now lets examine the hastily created *pre-alpha* super_string docs:

template<class char_type>
void 
basic_super_string< char_type >::replace_all_regex(
                             const base_string_type & match_regex,
                             const base_string_type & replace_format)

Replace the all instance of the match_string with the replace_format.

     super_string s("(abc)3333()(456789) [123] (1) (cde)");

     //replace parens around digits with #--the digits--#
     s.replace_all_regex("\\(([0-9]+)\\)", "#--$1--#");

     //s == "(abc)3333()#--456789--# [123] #--1--# (cde)"

Right from the start there's only one signature and only one template parameter to document -- char_type is pretty easy to understand, doesn't even really require explanation -- but really the docs would be nicer without that distraction. The context is string processing, so I don't have to worry about explaining the regex function can work on vector<char> or whatever sequence I want. I've ditched a couple parameters of function parameters -- always going for the regex defaults. So super_string is more like JavaString -- very limited compared to full up regex or string_algo, but it's easier to document and use for common cases.

Most of the explanation comes from this discussion thread from the Boost list.

Why use basic_string since it's interfaces are a "mess"?

Because, like it or not, basic_string is part of the C++ standard library and is used widely. super_string isn't an attempt to redo basic_string, but rather to extend for some very common functions that aren't easy with basic_string.

Isn't the interface of super_string too "fat"?

Yes, super_string has a large interface. However, all of the functions in the interface are typical needs for string processing. The size of the super_string interface is similar or even smaller than the number of functions in some other string classes or libraries.

Put another way, just because a function isn't a member of a class doesn't mean it isn't part of the programmer interface. If you consider the size of the interface defined by boost string algorithms, boost regex, and boost format the super_string is small in comparison.

Performance

Included in the package are some performance tests that show the tradeoffs of the const versus mutable versions versions of super_string. These tests do some rough comparisons of const_super_string versus mutable super_string. As expected, for some immutable operations const_super_string is slightly faster than super_string. However, for some mutation functions const_super_string is a bit slower.

The following is the output of the performance program on Linux compiled with gcc-4.0 with -O3.

500000 iterations of const append test: 0 --> 00:00:02.455757
500000 iterations of const append test: 1 --> 00:00:02.525834
500000 iterations of const append test: 2 --> 00:00:02.515612
500000 iterations of const append test: 3 --> 00:00:02.483702
500000 iterations of const append test: 4 --> 00:00:02.512949
500000 iterations of const append test: 5 --> 00:00:02.508725
500000 iterations of const append test: 6 --> 00:00:02.534567
500000 iterations of const append test: 7 --> 00:00:02.513080
500000 iterations of const append test: 8 --> 00:00:02.515201
500000 iterations of const append test: 9 --> 00:00:02.506260
const append test --> 10 trials 500000 iterations/trial  total elapsed: 00:00:25.071687
500000 iterations of mutable append test: 0 --> 00:00:02.136063
500000 iterations of mutable append test: 1 --> 00:00:02.190057
500000 iterations of mutable append test: 2 --> 00:00:02.149970
500000 iterations of mutable append test: 3 --> 00:00:02.185335
500000 iterations of mutable append test: 4 --> 00:00:02.188219
500000 iterations of mutable append test: 5 --> 00:00:02.206448
500000 iterations of mutable append test: 6 --> 00:00:02.274351
500000 iterations of mutable append test: 7 --> 00:00:02.192235
500000 iterations of mutable append test: 8 --> 00:00:02.193230
500000 iterations of mutable append test: 9 --> 00:00:02.188578
mutable append test --> 10 trials 500000 iterations/trial  total elapsed: 00:00:21.904486
1000000 iterations of const trim test: 0 --> 00:00:01.484179
1000000 iterations of const trim test: 1 --> 00:00:01.488741
1000000 iterations of const trim test: 2 --> 00:00:01.489407
1000000 iterations of const trim test: 3 --> 00:00:01.498976
1000000 iterations of const trim test: 4 --> 00:00:01.501453
1000000 iterations of const trim test: 5 --> 00:00:01.501348
1000000 iterations of const trim test: 6 --> 00:00:01.503969
1000000 iterations of const trim test: 7 --> 00:00:01.512235
1000000 iterations of const trim test: 8 --> 00:00:01.501978
1000000 iterations of const trim test: 9 --> 00:00:01.495517
const trim test --> 10 trials 1000000 iterations/trial  total elapsed: 00:00:14.977803
1000000 iterations of mutable trim  test: 0 --> 00:00:01.160393
1000000 iterations of mutable trim  test: 1 --> 00:00:01.160549
1000000 iterations of mutable trim  test: 2 --> 00:00:01.165937
1000000 iterations of mutable trim  test: 3 --> 00:00:01.173893
1000000 iterations of mutable trim  test: 4 --> 00:00:01.173779
1000000 iterations of mutable trim  test: 5 --> 00:00:01.182267
1000000 iterations of mutable trim  test: 6 --> 00:00:01.173212
1000000 iterations of mutable trim  test: 7 --> 00:00:01.172464
1000000 iterations of mutable trim  test: 8 --> 00:00:01.171245
1000000 iterations of mutable trim  test: 9 --> 00:00:01.159949
mutable trim test --> 10 trials 1000000 iterations/trial  total elapsed: 00:00:11.693688
100000 iterations of const contains regex test: 0 --> 00:00:02.701513
100000 iterations of const contains regex test: 1 --> 00:00:02.729457
100000 iterations of const contains regex test: 2 --> 00:00:02.704867
100000 iterations of const contains regex test: 3 --> 00:00:02.700814
100000 iterations of const contains regex test: 4 --> 00:00:02.702312
100000 iterations of const contains regex test: 5 --> 00:00:02.699262
100000 iterations of const contains regex test: 6 --> 00:00:02.703698
100000 iterations of const contains regex test: 7 --> 00:00:02.703122
100000 iterations of const contains regex test: 8 --> 00:00:02.704925
100000 iterations of const contains regex test: 9 --> 00:00:02.694748
const contains regex test --> 10 trials 100000 iterations/trial  total elapsed: 00:00:27.044718
100000 iterations of mutable contains regex test: 0 --> 00:00:02.781685
100000 iterations of mutable contains regex test: 1 --> 00:00:02.759013
100000 iterations of mutable contains regex test: 2 --> 00:00:02.762457
100000 iterations of mutable contains regex test: 3 --> 00:00:02.761785
100000 iterations of mutable contains regex test: 4 --> 00:00:02.761454
100000 iterations of mutable contains regex test: 5 --> 00:00:02.761979
100000 iterations of mutable contains regex test: 6 --> 00:00:02.760832
100000 iterations of mutable contains regex test: 7 --> 00:00:02.763009
100000 iterations of mutable contains regex test: 8 --> 00:00:02.760816
100000 iterations of mutable contains regex test: 9 --> 00:00:02.760578
mutable contains regex test --> 10 trials 100000 iterations/trial  total elapsed: 00:00:27.633608
100000 iterations of const split regex test: 0 --> 00:00:01.636014
100000 iterations of const split regex test: 1 --> 00:00:01.636228
100000 iterations of const split regex test: 2 --> 00:00:01.635880
100000 iterations of const split regex test: 3 --> 00:00:01.632247
100000 iterations of const split regex test: 4 --> 00:00:01.637720
100000 iterations of const split regex test: 5 --> 00:00:01.691922
100000 iterations of const split regex test: 6 --> 00:00:01.682272
100000 iterations of const split regex test: 7 --> 00:00:01.657319
100000 iterations of const split regex test: 8 --> 00:00:01.654802
100000 iterations of const split regex test: 9 --> 00:00:01.652675
const split regex test --> 10 trials 100000 iterations/trial  total elapsed: 00:00:16.517079
100000 iterations of mutable split regex test: 0 --> 00:00:01.766323
100000 iterations of mutable split regex test: 1 --> 00:00:01.766020
100000 iterations of mutable split regex test: 2 --> 00:00:01.739266
100000 iterations of mutable split regex test: 3 --> 00:00:01.766605
100000 iterations of mutable split regex test: 4 --> 00:00:01.767521
100000 iterations of mutable split regex test: 5 --> 00:00:01.739031
100000 iterations of mutable split regex test: 6 --> 00:00:01.767348
100000 iterations of mutable split regex test: 7 --> 00:00:01.765335
100000 iterations of mutable split regex test: 8 --> 00:00:01.740199
100000 iterations of mutable split regex test: 9 --> 00:00:01.765060
mutable split regex test --> 10 trials 100000 iterations/trial  total elapsed: 00:00:17.582708

Change History

version 1 -- Uploaded July 1, 2006

version 2

References


Generated on Sun Jul 9 15:43:03 2006 for SuperString by  doxygen 1.4.6