Searching a string that contains nulls

  • (3 Pages)
  • +
  • 1
  • 2
  • 3

37 Replies - 1729 Views - Last Post: 23 June 2019 - 09:02 AM Rate Topic: -----

#31 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6985
  • View blog
  • Posts: 23,753
  • Joined: 05-May 12

Re: Searching a string that contains nulls

Posted 20 June 2019 - 05:53 PM

Quote

If you use malloc(3)/calloc(3), you need to call free(3) for the allocated memories. In contrast, you don't need to free each memory chunks in memory pool. You just call apr_pool_destroy() for the memory pool and it frees all the memory chunks.


But by not freeing even within the memory pool, won't you exhaust the space within the pool itself? Is this why the next comment within that same page was:

Quote

REMARK: By default, memory pool manager never returns allocated memory back to the system. If a program runs for a long time, it would have problem. I recommend you to specify the upper limit as follows:


I was originally thinking that each web request would be essentially the "session" that could mark the lifetime of the pool as indicated also in that tutorial, but what I thought I understood from the other thread as well as this thread was that the session could get pretty big before it ends.

Anyway, I learned something new.
Was This Post Helpful? 0
  • +
  • -

#32 german-one   User is offline

  • D.I.C Head
  • member icon

Reputation: 22
  • View blog
  • Posts: 57
  • Joined: 05-August 18

Re: Searching a string that contains nulls

Posted 21 June 2019 - 12:37 AM

Quote

OK, I may need to deploy this new version sooner rather than later then. This is the original code that was delivered:

And you still don't know how to implement it?
static char *read_file(request_rec *r, const char *filename) {
  FILE *f = fopen(filename, "rb");
  if (!f) {
    return NULL;
  }

  fseek(f, 0, SEEK_END);
  long length = ftell(f);
  fseek(f, 0, SEEK_SET);
  char *buffer = apr_pcalloc(r->pool, length + 1);

  if (buffer == NULL || fread(buffer, 1, length, f) < length) {
    fclose(f);
    return NULL;
  }

  fclose(f);

  for (char* begin = buffer, *end = buffer + length, *found = NULL; begin < end; begin = ++found)
  {
    found = (char*)memchr(begin, 0, end - begin);
    if (found == NULL)
      break;

    *found = ' ';
  }

  return buffer;
}




Quote

the session could get pretty big before it ends.

Also my understanding. I would have thought twice before I used this memory pool library. Seems like it has the habit to bloat processes.
Was This Post Helpful? 0
  • +
  • -

#33 ArtificialSoldier   User is offline

  • D.I.C Lover
  • member icon

Reputation: 2351
  • View blog
  • Posts: 7,185
  • Joined: 15-January 14

Re: Searching a string that contains nulls

Posted 21 June 2019 - 11:16 AM

Quote

And you still don't know how to implement it?

I understand how to implement it, I'm pointing out that the original code, running right now on all 7 of our production servers, was delivered like that, where the buffer is the length of the file, and that, because of that, it would probably be a good idea for me to get the new code into production soon.

Note also that Apache does recycle its worker processes. Right now, on one of our shared servers, the Apache child process using the most memory is using 0.07% of the available memory.

It looks like right now I have Apache configured to recycle a child process after it serves 10,000 requests.
Was This Post Helpful? 0
  • +
  • -

#34 german-one   User is offline

  • D.I.C Head
  • member icon

Reputation: 22
  • View blog
  • Posts: 57
  • Joined: 05-August 18

Re: Searching a string that contains nulls

Posted 21 June 2019 - 12:11 PM

Seems like I did sound rude. Sorry, that was not my intention.

Quote

Right now, on one of our shared servers, the Apache child process using the most memory is using 0.07% of the available memory.

OK, then it's certainly not a problem.
Was This Post Helpful? 0
  • +
  • -

#35 german-one   User is offline

  • D.I.C Head
  • member icon

Reputation: 22
  • View blog
  • Posts: 57
  • Joined: 05-August 18

Re: Searching a string that contains nulls

Posted 22 June 2019 - 03:12 PM

I wrote a MemMem function for you (analogous to strstr). It avoids the zero byte replacement and thus, you don't have to run over the memory block twice.
It lacks a smart search algorithm (such as Boyer-Moore or similar) but I think I can't hit the performance of memchr and memcmp if I implement it myself.
#include <stdio.h>
#include <string.h>

/** \brief Finds the first occurrence of the target byte sequence in the block of memory pointed to by buffer.
 *
 *  \param  buffer      Pointer to the block of memory where the search is performed.
 *  \param  bufferSize  Number of bytes in buffer to be analyzed.
 *  \param  target      Pointer to the block of memory to search for.
 *  \param  targetSize  Number of bytes in target to be searched.
 *
 *  \return Pointer to the first byte in buffer where the target sequence is found, or NULL if the target sequence is not found. */
void *MemMem(const void *buffer, size_t bufferSize, const void *target, size_t targetSize)
{
  if (buffer == NULL || target == NULL || targetSize > bufferSize || bufferSize == 0 || targetSize == 0)
    return NULL;

  for (
    const unsigned char *begin = (const unsigned char*)buffer,
                        *end = begin + bufferSize - (targetSize - 1),
                        *found = NULL;
    begin < end;
    begin = ++found
  )
  {
    found = (const unsigned char*)memchr(begin, *(const unsigned char*)target, end - begin);
    if (found == NULL)
      return NULL;

    if (memcmp(found, target, targetSize) == 0)
      return (void*)found;
  }

  return NULL;
}


int main(void)
{
  const char buffer[] = {'\1', '\0', '\2', '\3', '\4', '\0', '\1', '\2', '\3', '\4'},
             target[] = {'\0', '\1', '\2', '\3', '\4'};

  char *found = (char*)MemMem(buffer, sizeof(buffer), target, sizeof(target));
  if (found == NULL)
    fputs("Target not found.\n", stderr);
  else
    printf("buffer   %p\nfound at %p (offset: %td)\n", buffer, found, found - buffer);

  return 0;
}



Output:
buffer   000000000060FE16
found at 000000000060FE1B (offset: 5)


I'm not sure if it is able to outperform the former approach. Just give it a go.

This post has been edited by german-one: 22 June 2019 - 03:16 PM

Was This Post Helpful? 0
  • +
  • -

#36 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6985
  • View blog
  • Posts: 23,753
  • Joined: 05-May 12

Re: Searching a string that contains nulls

Posted 22 June 2019 - 06:16 PM

Based on my understanding of the other thread, there are actually multiple string searches that occur over the same buffer. Perhaps Rabin-Karp or Aho-Corasick will be better algorithms to use since these two are geared for matching multiple strings simultaneously.

But then again, it comes down to the question of is the increase in performance worth the cost of developer time to do initial implementation, and then the time for some new developer to understand the could and be able to do maintenance down the road. Some of the maintenance cost can be pre-paid by having extensive unit tests up front that not only verify correctness, but as well as can detect performance degradation in case the later developers goof.
Was This Post Helpful? 0
  • +
  • -

#37 german-one   User is offline

  • D.I.C Head
  • member icon

Reputation: 22
  • View blog
  • Posts: 57
  • Joined: 05-August 18

Re: Searching a string that contains nulls

Posted 23 June 2019 - 03:15 AM

The main problem is that I failed every time when I tried to beat the performance of the optimized standard functions. As soon as you have to do byte-wise operations this will be almost impossible. Those functions are written in Assembly and my understanding is that they process a chunk of memory with a native size (4 or maybe 8 bytes) at once. That's the reason why I think it's better to use the library functions.

The above function uses memchr to find the first byte of the search sequence and memcmp to determine if all bytes match at the found position. These functions are optimized as described.

I thought about to change this function a little. Once the first byte was found you don't need to compare it again using memcmp. But even that would add complexity because now you have to handle the special case if targetSize is 1. You save the comparison of only one byte in a function that compares several bytes at once. Not worth it ...

EDIT: If I move the special case out of the loop it might not be too bad though.
/** \brief Finds the first occurrence of the target byte sequence in the block of memory pointed to by buffer.
 *
 *  \param  buffer      Pointer to the block of memory where the search is performed.
 *  \param  bufferSize  Number of bytes in buffer to be analyzed.
 *  \param  target      Pointer to the block of memory to search for.
 *  \param  targetSize  Number of bytes in target to be searched.
 *
 *  \return Pointer to the first byte in buffer where the target sequence is found, or NULL if the target sequence is not found. */
void *MemMem(const void *buffer, size_t bufferSize, const void *target, size_t targetSize)
{
  if (buffer == NULL || target == NULL || targetSize > bufferSize || bufferSize == 0 || targetSize == 0)
    return NULL;

  // Special case: If targetSize is 1 then a single call of memchr returns the the position of the first occurrence of the byte that target points to.
  if (targetSize == 1)
    return (void*)memchr(buffer, *(const unsigned char*)target, bufferSize);

  const unsigned char *begin = (const unsigned char*)buffer;      // start position where the search is performed
  const unsigned char *end = begin + bufferSize - (--targetSize); // position where searching using memchr stops (Note: from now on targetSize is the size of targetTail)
  const unsigned char *targetHead = (const unsigned char*)target; // points to the first byte in target
  const unsigned char *targetTail = targetHead + 1;               // points to the second byte in target
  while (begin < end)
  {
    const unsigned char *found = (const unsigned char*)memchr(begin, *targetHead, end - begin); // search the first byte of target
    if (found == NULL)
      return NULL;

    begin = found + 1;
    if (memcmp(begin, targetTail, targetSize) == 0) // compare the next bytes in target
      return (void*)found;
  }

  return NULL;
}


This post has been edited by german-one: 23 June 2019 - 04:26 AM

Was This Post Helpful? 1
  • +
  • -

#38 Skydiver   User is offline

  • Code herder
  • member icon

Reputation: 6985
  • View blog
  • Posts: 23,753
  • Joined: 05-May 12

Re: Searching a string that contains nulls

Posted 23 June 2019 - 09:02 AM

View Postgerman-one, on 23 June 2019 - 06:15 AM, said:

The main problem is that I failed every time when I tried to beat the performance of the optimized standard functions. As soon as you have to do byte-wise operations this will be almost impossible. Those functions are written in Assembly and my understanding is that they process a chunk of memory with a native size (4 or maybe 8 bytes) at once. That's the reason why I think it's better to use the library functions.

And they also take advantage of vectorization of operations if the CPU supports it. I've not kept up with current trends of compiler optimizations, but I've not seen or heard much about C/C++ compilers taking advantage of CPU vector support, except in the simplest of cases when a loop is unrolled and the compiler detects that what you are trying to do can be applied in parallel as a vector.
Was This Post Helpful? 1
  • +
  • -

  • (3 Pages)
  • +
  • 1
  • 2
  • 3