5 Replies - 332 Views - Last Post: 21 November 2018 - 04:17 PM

#1 h4nnib4l   User is online

  • The Noid
  • member icon

Reputation: 1344
  • View blog
  • Posts: 1,910
  • Joined: 24-August 11

Naming convention for date partitions in Azure blob

Posted 12 November 2018 - 05:45 PM

We have daily exports from a variety of hosted (SaaS) solutions, as well as from partner systems on JV projects, that we receive via FTP. I'm using Microsoft Logic Apps to monitor the drop folders and copy new files to blob storage for staging/archive before copying them to an Azure Data Lake Store (HDFS) using Azure Data Factory.

I had planned to partition the daily files using the naming convention *root*/*system*/*project*/2018/11/12, like we do in the data lake. However, much to my disappointment, I found out that Azure blob storage requires 3+ characters in container names, which prevents having folders named for the numeric month and day. I've considered several alternatives - prepending a 0 to the front of the month and day number, throwing an extra char into the name (11m/12d, for example), or even using something like **/2018/11-12 - but I honestly don't like any of them.

We've considered Azure file storage, but fundamentally I think that blob (cool) makes the most sense for archival, particularly since we're looking to archive exactly as we received them in case there are questions about what we received from JV partners, but will likely never access them after the initial copy and ADF pipeline run.

Has any else dealt with something similar? Any better ideas?

Is This A Good Question/Topic? 0
  • +

Replies To: Naming convention for date partitions in Azure blob

#2 modi123_1   User is online

  • Suitor #2
  • member icon



Reputation: 14685
  • View blog
  • Posts: 58,683
  • Joined: 12-June 08

Re: Naming convention for date partitions in Azure blob

Posted 12 November 2018 - 05:49 PM

I haven't had to deal with that situation, but my first instinct is just to slap a 0 in front.
Was This Post Helpful? 1
  • +
  • -

#3 h4nnib4l   User is online

  • The Noid
  • member icon

Reputation: 1344
  • View blog
  • Posts: 1,910
  • Joined: 24-August 11

Re: Naming convention for date partitions in Azure blob

Posted 12 November 2018 - 06:00 PM

Fair enough. My initial hesitation with that solution was that rebuilding a DateTime from the folder names would require a bit of extra parsing of the path, but I wasn't really thinking that through. Regardless of the strategy, I'd have to parse out the names (likely with a Regex), and then use int.TryParse() on the appropriate sections, which has no issue with the leading 0s.
Was This Post Helpful? 0
  • +
  • -

#4 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 6656
  • View blog
  • Posts: 22,740
  • Joined: 05-May 12

Re: Naming convention for date partitions in Azure blob

Posted 12 November 2018 - 06:41 PM

Do you really need the granularity of being able to group by month, and by day? Is the blob storage like FAT32 which slows down when there are more than X number of directory entries within a directory? If not, then I think that the "MM-dd" should be good enough (as well as takes away one more directory level to navigate). And if you really don't need to distinguish which month and day, why not simply use a zero padded Julian day?
Was This Post Helpful? 1
  • +
  • -

#5 h4nnib4l   User is online

  • The Noid
  • member icon

Reputation: 1344
  • View blog
  • Posts: 1,910
  • Joined: 24-August 11

Re: Naming convention for date partitions in Azure blob

Posted 14 November 2018 - 10:31 AM

It's really more of a design consideration. At one end of the spectrum, we can prepend the full date to the front of the file name and keep it all in one bucket. At the other end, we manage that organization with folders. We got the folder model from Microsoft in a consulting engagement for our data lake solution, so that's the model I had in mind when I engaged this blob staging/archival problem space. At the end of the day, engaging the data from software is just a matter of pattern matching the path based on the implemented solution. More than anything, I'm just looking for input (like I'm getting) on how others have or would solve this problem so that we're not just breathing our own air.
Was This Post Helpful? 0
  • +
  • -

#6 cfoley   User is offline

  • Cabbage
  • member icon

Reputation: 2391
  • View blog
  • Posts: 5,020
  • Joined: 11-December 07

Re: Naming convention for date partitions in Azure blob

Posted 21 November 2018 - 04:17 PM

I would do this:

*root*/*system*/*project*/2018-11-12

If you feel that the list of directories would get too long, I might be tempted to do this:

*root*/*system*/*project*/2018/2018-11-12

The year is in twice but I prefer that to an awkward combination of month and day.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1