STLAlgorithmExtensions/CopyIf

copy_if is a much-requested and often-implented algorithm which really should have been in the Standard library initially.

In order to support Input Iterators, which can only be dereferenced once, we have to copy each element so we can run the predicate against it, _and_ copy it into the destination if necessary. This is inefficient for forward+ iterators, so we specialize. Also, to support compilers (like MSVC) which don't support partial specialization, and so don't have iterator traits for pointers, we specialize on pointers to be random access iterators, too.

 namespace helper
 {
   template<typename IteratorTag?>
   struct CopyIfHelper2?
   {
     template<typename InputIterator?,typename OutputIterator?,
              typename Predicate>
     inline static void copy_if(InputIterator? first,InputIterator? last,OutputIterator? dest,Predicate pred)
     {
       for(;first!=last;
           ++first)
       {
         if(pred(*first))
         {
           *dest=*first;
           ++dest;
         }
       }
     }
   };

   template<>
   struct CopyIfHelper2?<std::input_iterator_tag>
   {
     template<typename InputIterator?,typename OutputIterator?,
              typename Predicate>
     inline static void copy_if(InputIterator? first,InputIterator? last,OutputIterator? dest,Predicate pred)
     {
       for(;first!=last;
           ++first)
       {
         typename std::iterator_traits<InputIterator?>::value_type value(*first);
         if(pred(value))
         {
           *dest=value;
           ++dest;
         }
       }
     }
   };

   template<bool isPtr>
   struct CopyIfHelper?
   {
     template<typename InputIterator?,typename OutputIterator?,
              typename Predicate>
     inline static void copy_if(InputIterator? first,InputIterator? last,OutputIterator? dest,Predicate pred)
     {
       typedef typename std::iterator_traits<InputIterator?>::iterator_category TagType?;

       CopyIfHelper2?<TagType?>::copy_if(first,last,dest,pred);
     }
   };

   template<>
   struct CopyIfHelper?<true>
   {
     template<typename InputValue?,typename OutputIterator?,
              typename Predicate>
     inline static void copy_if(InputValue?* first,InputValue?* last,OutputIterator? dest,Predicate pred)
     {
       CopyIfHelper2?<std::random_access_iterator_tag>::copy_if(first,last,dest,pred);
     }
   };

 template<typename InputIterator?,typename OutputIterator?,
          typename Predicate>
 inline void copy_if(InputIterator? first,InputIterator? last,OutputIterator? dest,Predicate pred)
 {
   helper::CopyIfHelper?<boost::is_pointer<InputIterator?>::value>::copy_if(first,last,dest,pred);
 }

Never mind my last idiot post. Brain freeze. Sorry!

OK, I'm better now -- I've been using a different library where dereferencing an iterator DOES increment it, so I got confused.

Nevertheless, I still have a question/suggestion:

I don't see where the specification of an Input Iterator states that it can be dereferenced only once. The current value is lost when it gets incremented, but in its current position, it can be dereferenced multiple times. In particular, looking at the SGI remove_copy_if, it looks roughly like this (note two dereferences of __first):

 template <class _InputIter?, class _OutputIter?, class _Predicate>
 _OutputIter? remove_copy_if(_InputIter? __first, _InputIter? __last,
                            _OutputIter? __result, _Predicate __pred) {
   for ( ; __first != __last; ++__first)
     if (!__pred(*__first)) {
       *__result = *__first;
       ++__result;
     }
   return __result;
 }

it appears to me that copy_if can be written more simply as follows, without any of the helper objects, or worrying about the iterator category:

template <typename InputIterator, typename OutputIterator, typename Predicate>
OutputIterator copy_if(InputIterator first, InputIterator last, OutputIterator dest,
                       Predicate p)
{
    for (first != last; ++first)
    {
        if (p(*first))
        {
            *dest = *first;
            ++dest;
        }
    }
    return dest;
}

My interpretation of the Input Iterator requirements is that I can fulfil the reqs by writing an iterator like so:

struct InputIterator
{
    typedef XXX value_type;

    value_type get_and_move(); // gets value and moves forward, like fgetc

    InputIterator():
        dereferenced(false)
    {}

    value_type operator*()
    {
        value_type result=get_and_move();
        dereferenced=true;
        return result;
    }
    InputIterator& operator++()
    {
        if(!dereferenced)
        {
            get_and_move();
        }
        else
        {
            dereferenced=false;
        }
    }

    bool dereferenced;
};

It is for this reason, that I have filed a defect report about remove_copy_if, and have posted a bug report for STLport about the issue.

The committee have responded to my defect report to say "there is no basis for the assumption that an InputIterator? cannot be dereferenced twice", so the iterator I gave as an example above is not valid. Therefore CopyIfHelper2? can be done-away with. In fact, we don't need either of the helpers, since we don't need to check the iterator tag, so we're left with the standard implementation:

    template<typename InputIterator,typename OutputIterator,
      typename Predicate>
    inline void copy_if(InputIterator first,InputIterator last,OutputIterator dest,Predicate pred)
    {
 for(;first!=last;
     ++first)
	{
	    if(pred(*first))
	    {
		*dest=*first;
		++dest;
	    }
	}
    }

I much prefer having a copy_if that allows you to copy when matching a certain value OR when a predicate returns true. Is there any reason not to allow a general version of copy_if as above (except Predicate is essentially Type) and a specialised version for an unary predicate?

The generalisation is (minor modification from above):

    // generalised copy_if to copy when matching a value
    template<typename InputIterator,typename OutputIterator, typename Type>
    inline void copy_if(InputIterator first,InputIterator last,OutputIterator dest,Type value)
    {
        for(; first!=last; ++first)
        {
            if(value == *first))
            {
                *dest=*first;
                ++dest;
            }
        }
    }

Essentially the specialisation becomes


    // specialised copy_if for an unary predicate
    template
    < 
        typename InputIterator
    ,   typename OutputIterator
    ,   typename UnaryPredicateReturnType
    ,   typename UnaryPredicateArgumentType
    >
    inline void copy_if(InputIterator first,InputIterator last,OutputIterator dest,
                        UnaryPredicateReturnType( * const pred )( UnaryPredicateArgumentType ))
    {
        for(; first!=last; ++first)
        {
            if(pred(*first))
            {
                *dest=*first;
                ++dest;
            }
        }
    }

Having versions of algorithms for values and predicates seems to be a common requirement. In the Standard Library we have remove and remove_if, for example. Any reason why they couldn't use the same name, and the technique given above?

Because a predicate is not necessarily a function pointer, but can be a function object. There may be an ambiguity issue too.

Yep, you're right, and I take back the above proposal. I tried it with binders and it fell over. Nevertheless, I think it's handy having algorithms that match on a value, and also ones that match on a predicate. But copy_if and copy_if_pred seems a bit unwieldy. Do we do without one, or have two functions?

Disclaimer: This site not officially maintained by Boost Developers