7 Replies - 5004 Views - Last Post: 24 July 2008 - 12:33 AM Rate Topic: -----

#1 adw888  Icon User is offline

  • New D.I.C Head

Reputation: 7
  • View blog
  • Posts: 24
  • Joined: 10-July 08

Ensuring Data Alignment

Posted 22 July 2008 - 04:05 PM

For a time-critical part of an application, I am using an optimised memcpy routine written by AMD which can be found on page 198 (or 178 of the paper document) of this PDF document. This is working well and has yielded measurable gains in performance.

However, the document says (emphasis mine):

Quote

Data alignment is strongly recommended for good performance, but this code can handle non-aligned blocks.

So my question is simply how can I ensure that my source and destination char arrays are data aligned?
Is it good enough that they will always have a size that is a multiple of 64, or is there something more I need to do?

Thanks in advance. :)

Is This A Good Question/Topic? 0
  • +

Replies To: Ensuring Data Alignment

#2 perfectly.insane  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 70
  • View blog
  • Posts: 644
  • Joined: 22-March 08

Re: Ensuring Data Alignment

Posted 22 July 2008 - 07:12 PM

Looking at this, I think your data should be QWORD aligned (meaning, the addresses should be in multiples of 8). And as you've already stated, your size should be a multiple of 64-bytes, as it uses a non-MMX copy for chunks less than this size. Whether or not you're getting QWORD aligned addresses will depend on how your allocating the memory (i.e. malloc implementation). As far as this goes, I haven't investigated the details, and your allocator might already do this, but then again, it might give addresses that are DWORD aligned instead... (which would cause a short 4-byte copy at the beginning before the MMX copy begins, I think).

One thing to note though is that having blocks that are 64-byte multiples does not guarantee optimal performance (in theory, not necessarily in practice). If you have 68 bytes to copy, then with rounding up, you're copying 56 more bytes than necessary. Whether or not switching from the MMX copy to another method (in this case, multiple movsd instructions, but in some cases, the infamous rep movsb (for being slow)) is faster or slower just to copy a few bytes, I'm not sure.

This post has been edited by perfectly.insane: 22 July 2008 - 07:25 PM

Was This Post Helpful? 1
  • +
  • -

#3 adw888  Icon User is offline

  • New D.I.C Head

Reputation: 7
  • View blog
  • Posts: 24
  • Joined: 10-July 08

Re: Ensuring Data Alignment

Posted 23 July 2008 - 10:07 AM

View Postperfectly.insane, on 23 Jul, 2008 - 03:12 AM, said:

Looking at this, I think your data should be QWORD aligned (meaning, the addresses should be in multiples of 8). And as you've already stated, your size should be a multiple of 64-bytes, as it uses a non-MMX copy for chunks less than this size. Whether or not you're getting QWORD aligned addresses will depend on how your allocating the memory (i.e. malloc implementation). As far as this goes, I haven't investigated the details, and your allocator might already do this, but then again, it might give addresses that are DWORD aligned instead... (which would cause a short 4-byte copy at the beginning before the MMX copy begins, I think).

Firstly, thank you for your reply. :^:

I am allocating the memory using a standard malloc:
char * bData = (char *) malloc(lDataSize);
where lDataSize is always a multiple of 64.
I'll have a look at the addresses it is giving me but I'm guessing that it will not automatically ensure that it is QWORD aligned.

View Postperfectly.insane, on 23 Jul, 2008 - 03:12 AM, said:

One thing to note though is that having blocks that are 64-byte multiples does not guarantee optimal performance (in theory, not necessarily in practice). If you have 68 bytes to copy, then with rounding up, you're copying 56 more bytes than necessary. Whether or not switching from the MMX copy to another method (in this case, multiple movsd instructions, but in some cases, the infamous rep movsb (for being slow)) is faster or slower just to copy a few bytes, I'm not sure.

In this part of the application, it will always be copying a multiple of 64 bytes, typically 2048 bytes or 4096 bytes.
Was This Post Helpful? 0
  • +
  • -

#4 NickDMax  Icon User is offline

  • Can grep dead trees!
  • member icon

Reputation: 2246
  • View blog
  • Posts: 9,236
  • Joined: 18-February 07

Re: Ensuring Data Alignment

Posted 23 July 2008 - 01:33 PM

Setting alignment is a compiler specific operation.

Generally it is done using "#pragma align" so you would check your compiler documentation for the proper setting for a 64byte.

you may also wish to look at "#pragma pack" which deals with how structs and unions are packed -- (i.e. internal spacing).

Note that these settings are COMPILER specific. So the exact setting depend upon you compiler -- and you compile may use a different mechanism, or may not offer any mechanism for controlling alignment.
Was This Post Helpful? 1
  • +
  • -

#5 perfectly.insane  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 70
  • View blog
  • Posts: 644
  • Joined: 22-March 08

Re: Ensuring Data Alignment

Posted 23 July 2008 - 02:38 PM

I would think that these settings are only related to statically allocated items. I wouldn't think that it would have any effect on dynamically allocated items, as those are decided upon by library calls, not the compiler (though I suppose that there could be a way to use different versions of malloc based on preprocessor directives, if one were to implement that). #pragma pack/__attribute__((packed)) should be independent of library functions though, as struct layout is a compile time thing in all cases.
Was This Post Helpful? 1
  • +
  • -

#6 perfectly.insane  Icon User is offline

  • D.I.C Addict
  • member icon

Reputation: 70
  • View blog
  • Posts: 644
  • Joined: 22-March 08

Re: Ensuring Data Alignment

Posted 23 July 2008 - 02:58 PM

I'm thinking that this simple test might help you determine the alignment of malloc:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv)
{
    unsigned int p1 = (unsigned int)malloc(1);
    unsigned int p2 = (unsigned int)malloc(1);

    printf("p1 = 0x%08x\n", p1);
    printf("p2 = 0x%08x\n", p2);

    printf("Possible granularity for malloc() is: %d bytes\n", p2 - p1);

    for(int n = 0; n < 32; n++)
    {
        if((p2 >> n) & 1 || (p1 >> n) & 1) {
           printf("Possible alignment for malloc() is: %d bytes\n", 1 << n);
           break;
        }
    }
    return 0;
}



Which yields for me:
p1 = 0x003d4d50
p2 = 0x003d4dd0
Possible granularity for malloc() is: 128 bytes
Possible alignment for malloc() is: 16 bytes

16 bytes, otherwise known as a paragraph, seems plausible. Perhaps others may know for sure though.

This post has been edited by perfectly.insane: 23 July 2008 - 02:59 PM

Was This Post Helpful? 1
  • +
  • -

#7 NickDMax  Icon User is offline

  • Can grep dead trees!
  • member icon

Reputation: 2246
  • View blog
  • Posts: 9,236
  • Joined: 18-February 07

Re: Ensuring Data Alignment

Posted 23 July 2008 - 04:15 PM

ah, sorry I misunderstood. I know that malloc always aligns to a multiple of 8 and in the past it was generally paragraph aligned (16bytes?).

I found this

Quote

void* AlignedMalloc(size_t size,int byteAlign)
{
	void *mallocPtr = malloc(size + byteAlign + sizeof(void*));
	size_t ptrInt = (size_t)mallocPtr;

	ptrInt = (ptrInt + byteAlign + sizeof(void*)) / byteAlign * byteAlign;
	*(((void**)ptrInt) - 1) = mallocPtr;

	return (void*)ptrInt;
}

void AlignedFree(void *ptr)
{
	free(*(((void**)ptr) - 1));
}
from here.

The logic of it seems to make sense. Didn't look too close.
Was This Post Helpful? 1
  • +
  • -

#8 adw888  Icon User is offline

  • New D.I.C Head

Reputation: 7
  • View blog
  • Posts: 24
  • Joined: 10-July 08

Re: Ensuring Data Alignment

Posted 24 July 2008 - 12:33 AM

View Postperfectly.insane, on 23 Jul, 2008 - 10:58 PM, said:

I'm thinking that this simple test might help you determine the alignment of malloc: ...

Thanks; I get granularity 56 and alignment 8 using that test.

View PostNickDMax, on 24 Jul, 2008 - 12:15 AM, said:

ah, sorry I misunderstood. I know that malloc always aligns to a multiple of 8 and in the past it was generally paragraph aligned (16bytes?).

I found this ...

The logic of it seems to make sense. Didn't look too close.

Thanks; I tried that code out using the test perfectly.insane wrote and it seems to work very well.

Thank you both again for all the help. :)
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1