stefano
January 17, 2008, 18:35:24
Hi.
My goal is to recognize the format of a file to determine whether it is a Office document (Word, Excel, etc...) or not. I need also to extract metadata and textual content.
I cannot use the file extension (.doc, .rtf, etc...) to determine the file format because my file has a random name.
I'm evaluating TX Text Control .NET Server 13.0. It works fine, but it seems that I have to already know the file format before calling the Load() method. So, I have no choice but to manage eventual exceptions and try to load the file many times in sequence. Here is my C# code:
TXTextControl.LoadSettings loadSettings = new LoadSettings();
ServerTextControl file = new ServerTextControl();
if (file.Create()) {
try {
file.Load(url, StreamType.MSWord, loadSettings);
// it is a Word document
} catch {
try {
file.Load(url, StreamType.RichTextFormat, loadSettings);
// it is a RTF document
} catch {
try {
file.Load(url, StreamType.AdobePDF, loadSettings);
// it is a PDF document
} catch {
throw;
// unknown document
}
}
} finally {
file.Dispose();
}
}
My question is: there is a simpler (and faster) method to know the stream type of a file?
Thank you in advance
Regards
Stefano Babayantz
Tera Digital Publishing
My goal is to recognize the format of a file to determine whether it is a Office document (Word, Excel, etc...) or not. I need also to extract metadata and textual content.
I cannot use the file extension (.doc, .rtf, etc...) to determine the file format because my file has a random name.
I'm evaluating TX Text Control .NET Server 13.0. It works fine, but it seems that I have to already know the file format before calling the Load() method. So, I have no choice but to manage eventual exceptions and try to load the file many times in sequence. Here is my C# code:
TXTextControl.LoadSettings loadSettings = new LoadSettings();
ServerTextControl file = new ServerTextControl();
if (file.Create()) {
try {
file.Load(url, StreamType.MSWord, loadSettings);
// it is a Word document
} catch {
try {
file.Load(url, StreamType.RichTextFormat, loadSettings);
// it is a RTF document
} catch {
try {
file.Load(url, StreamType.AdobePDF, loadSettings);
// it is a PDF document
} catch {
throw;
// unknown document
}
}
} finally {
file.Dispose();
}
}
My question is: there is a simpler (and faster) method to know the stream type of a file?
Thank you in advance
Regards
Stefano Babayantz
Tera Digital Publishing