%0 Journal Article %T Content Classification of Multimedia Documents using Partitions of Low-Level Features %A Leopold, Edda %A Kindermann, Jörg %J JVRB - Journal of Virtual Reality and Broadcasting %D 2007 %V 3(2006) %N 6 %@ 1860-2037 %F leopold2007 %X Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video. %L 004 %K Audio-visual content classification %K integration of modalities %K speech recognition %K support vector machines %R 10.20385/1860-2037/3.2006.6 %U http://nbn-resolving.de/urn:nbn:de:0009-6-7607 %U http://dx.doi.org/10.20385/1860-2037/3.2006.6