10 Replies - 852 Views - Last Post: 15 September 2013 - 04:24 PM

#1 leadfirelf  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 7
  • View blog
  • Posts: 189
  • Joined: 11-February 11

the DocX format...

Posted 13 September 2013 - 03:02 PM

So as much as I've been into programming these past few years, I have yet to figure out how to take a MSDocX format and convert it to a MSDoc format. I was hoping that it would be as simple as extracting the archive, but unfortunately after doing this, I found a butt load of xml docs and nothing I really could work with. I was hoping that I could figure out a method to extract the doc from the docx. Of which I originally thought was just a compressed version of the document.
Thanks for the help!

Is This A Good Question/Topic? 0
  • +

Replies To: the DocX format...

#2 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3325
  • View blog
  • Posts: 11,246
  • Joined: 12-December 12

Re: the DocX format...

Posted 13 September 2013 - 03:08 PM

Can you explain what your intention is in more detail? Otherwise, to convert from docx to doc I would just Open the document in Word (2007+), and use SaveAs(?). Or there are a number of apps to assist:

http://www.doc.investintech.com/

This post has been edited by andrewsw: 13 September 2013 - 03:10 PM

Was This Post Helpful? 0
  • +
  • -

#3 leadfirelf  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 7
  • View blog
  • Posts: 189
  • Joined: 11-February 11

Re: the DocX format...

Posted 13 September 2013 - 03:12 PM

Well honestly it's just curiosity, if everyone were able to simply use an api to convert that wouldn't be much fun at all, or at least in my eyes it wouldn't.
Was This Post Helpful? 0
  • +
  • -

#4 jon.kiparsky  Icon User is online

  • Pancakes!
  • member icon


Reputation: 7624
  • View blog
  • Posts: 12,854
  • Joined: 19-March 11

Re: the DocX format...

Posted 13 September 2013 - 03:53 PM

Apache has a number of java libraries useful for dealing with the befunged formats provided by microsoft. You might look at one or two of those.

I'm sorry if this spoils your fun, but unless you want to reverse engineer the formats for yourself, this is probably the easiest way to do it programmatically. If you really do want to reverse engineer the formats for yourself... well, I guess you'd start by saving an empty file, and look at what appears on disk, and then add a character and see what changes, and so forth.

Not my idea of fun, but if that's your kink, it's not going to do any harm to any man nor to any man's daughter.
Was This Post Helpful? 0
  • +
  • -

#5 leadfirelf  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 7
  • View blog
  • Posts: 189
  • Joined: 11-February 11

Re: the DocX format...

Posted 13 September 2013 - 03:58 PM

Well thank you Jon, I will certainly look into this when I get to my home.(At work currently)
Was This Post Helpful? 0
  • +
  • -

#6 cfoley  Icon User is offline

  • Cabbage
  • member icon

Reputation: 1949
  • View blog
  • Posts: 4,048
  • Joined: 11-December 07

Re: the DocX format...

Posted 14 September 2013 - 05:27 PM

Don't count on being able to do a good job of it yourself. If Open Office's support for .DOC over the years is anything to go by, working with that format is difficult, and they supposedly have lots of good people working on the problem.
Was This Post Helpful? 0
  • +
  • -

#7 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 3534
  • View blog
  • Posts: 10,941
  • Joined: 05-May 12

Re: the DocX format...

Posted 15 September 2013 - 08:15 AM

Off to a tangent...

Which make the demise of WordPerfect even sadder. The WordPerfect file format made a lot more sense for its era.

Back on topic...
The new Office file formats, even though more complicated, actually are designed with a lot of thought behind them. Sadly, it feels over-engineered because of its complexity, and how hard it is to get even the simplest things to done. On the other hand, it does allow maximum flexibility.
Was This Post Helpful? 0
  • +
  • -

#8 jon.kiparsky  Icon User is online

  • Pancakes!
  • member icon


Reputation: 7624
  • View blog
  • Posts: 12,854
  • Joined: 19-March 11

Re: the DocX format...

Posted 15 September 2013 - 08:58 AM

View PostSkydiver, on 15 September 2013 - 10:15 AM, said:

Which make the demise of WordPerfect even sadder.


Surely the sadder thing is the survival of Word, no?
Was This Post Helpful? 1
  • +
  • -

#9 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3325
  • View blog
  • Posts: 11,246
  • Joined: 12-December 12

Re: the DocX format...

Posted 15 September 2013 - 09:10 AM

Off topic: It is hard to think of an application that is more complex than Word(?). If we count games as an 'application' then they can be much larger, and involve complex graphics/mathematics, but, overall, I still think Word is more complex.
Was This Post Helpful? 0
  • +
  • -

#10 Skydiver  Icon User is online

  • Code herder
  • member icon

Reputation: 3534
  • View blog
  • Posts: 10,941
  • Joined: 05-May 12

Re: the DocX format...

Posted 15 September 2013 - 04:14 PM

View Postjon.kiparsky, on 15 September 2013 - 11:58 AM, said:

View PostSkydiver, on 15 September 2013 - 10:15 AM, said:

Which make the demise of WordPerfect even sadder.


Surely the sadder thing is the survival of Word, no?


Too true. I never realized how much I hated Word until this month when I've taken over updating status reports. Woe be to you who accidentally breaks the formatting on a paragraph or table because sometimes the only way to fix things is to go back to a backup, and try editting again and hopefully not make the same mistake again. With WordPerfect, it is a simple Shift-F8 to show the formatting codes and fixing up the codes as needed.
Was This Post Helpful? 0
  • +
  • -

#11 andrewsw  Icon User is online

  • Fire giant boob nipple gun!
  • member icon

Reputation: 3325
  • View blog
  • Posts: 11,246
  • Joined: 12-December 12

Re: the DocX format...

Posted 15 September 2013 - 04:24 PM

The nearest Word has is Shift-F1 to Reveal Formatting, but it is not the same :rolleyes2:
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1