Citation and metadata
Recommended citation
Edda Leopold, and Jörg Kindermann, Content Classification of Multimedia Documents using Partitions of Low-Level Features. JVRB - Journal of Virtual Reality and Broadcasting, 3(2006), no. 6. (urn:nbn:de:0009-6-7607)
Download Citation
Endnote
%0 Journal Article %T Content Classification of Multimedia Documents using Partitions of Low-Level Features %A Leopold, Edda %A Kindermann, Jörg %J JVRB - Journal of Virtual Reality and Broadcasting %D 2007 %V 3(2006) %N 6 %@ 1860-2037 %F leopold2007 %X Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video. %L 004 %K Audio-visual content classification %K integration of modalities %K speech recognition %K support vector machines %R 10.20385/1860-2037/3.2006.6 %U http://nbn-resolving.de/urn:nbn:de:0009-6-7607 %U http://dx.doi.org/10.20385/1860-2037/3.2006.6Download
Bibtex
@Article{leopold2007, author = "Leopold, Edda and Kindermann, J{\"o}rg", title = "Content Classification of Multimedia Documents using Partitions of Low-Level Features", journal = "JVRB - Journal of Virtual Reality and Broadcasting", year = "2007", volume = "3(2006)", number = "6", keywords = "Audio-visual content classification; integration of modalities; speech recognition; support vector machines", abstract = "Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, ``video words'' based on low level color features (color moments, color correlogram and color wavelet), and ``audio words'' based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62{\%} and 94{\%} corresponding to 50{\%} - 84{\%} above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.", issn = "1860-2037", doi = "10.20385/1860-2037/3.2006.6", url = "http://nbn-resolving.de/urn:nbn:de:0009-6-7607" }Download
RIS
TY - JOUR AU - Leopold, Edda AU - Kindermann, Jörg PY - 2007 DA - 2007// TI - Content Classification of Multimedia Documents using Partitions of Low-Level Features JO - JVRB - Journal of Virtual Reality and Broadcasting VL - 3(2006) IS - 6 KW - Audio-visual content classification KW - integration of modalities KW - speech recognition KW - support vector machines AB - Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video. SN - 1860-2037 UR - http://nbn-resolving.de/urn:nbn:de:0009-6-7607 DO - 10.20385/1860-2037/3.2006.6 ID - leopold2007 ER -Download
Wordbib
<?xml version="1.0" encoding="UTF-8"?> <b:Sources SelectedStyle="" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" > <b:Source> <b:Tag>leopold2007</b:Tag> <b:SourceType>ArticleInAPeriodical</b:SourceType> <b:Year>2007</b:Year> <b:PeriodicalTitle>JVRB - Journal of Virtual Reality and Broadcasting</b:PeriodicalTitle> <b:Volume>3(2006)</b:Volume> <b:Issue>6</b:Issue> <b:Url>http://nbn-resolving.de/urn:nbn:de:0009-6-7607</b:Url> <b:Url>http://dx.doi.org/10.20385/1860-2037/3.2006.6</b:Url> <b:Author> <b:Author><b:NameList> <b:Person><b:Last>Leopold</b:Last><b:First>Edda</b:First></b:Person> <b:Person><b:Last>Kindermann</b:Last><b:First>Jörg</b:First></b:Person> </b:NameList></b:Author> </b:Author> <b:Title>Content Classification of Multimedia Documents using Partitions of Low-Level Features</b:Title> <b:Comments>Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.</b:Comments> </b:Source> </b:Sources>Download
ISI
PT Journal AU Leopold, E Kindermann, J TI Content Classification of Multimedia Documents using Partitions of Low-Level Features SO JVRB - Journal of Virtual Reality and Broadcasting PY 2007 VL 3(2006) IS 6 DI 10.20385/1860-2037/3.2006.6 DE Audio-visual content classification; integration of modalities; speech recognition; support vector machines AB Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video. ERDownload
Mods
<mods> <titleInfo> <title>Content Classification of Multimedia Documents using Partitions of Low-Level Features</title> </titleInfo> <name type="personal"> <namePart type="family">Leopold</namePart> <namePart type="given">Edda</namePart> </name> <name type="personal"> <namePart type="family">Kindermann</namePart> <namePart type="given">Jörg</namePart> </name> <abstract>Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.</abstract> <subject> <topic>Audio-visual content classification</topic> <topic>integration of modalities</topic> <topic>speech recognition</topic> <topic>support vector machines</topic> </subject> <classification authority="ddc">004</classification> <relatedItem type="host"> <genre authority="marcgt">periodical</genre> <genre>academic journal</genre> <titleInfo> <title>JVRB - Journal of Virtual Reality and Broadcasting</title> </titleInfo> <part> <detail type="volume"> <number>3(2006)</number> </detail> <detail type="issue"> <number>6</number> </detail> <date>2007</date> </part> </relatedItem> <identifier type="issn">1860-2037</identifier> <identifier type="urn">urn:nbn:de:0009-6-7607</identifier> <identifier type="doi">10.20385/1860-2037/3.2006.6</identifier> <identifier type="uri">http://nbn-resolving.de/urn:nbn:de:0009-6-7607</identifier> <identifier type="citekey">leopold2007</identifier> </mods>Download
Full Metadata
Bibliographic Citation | JVRB, 3(2006), no. 6. |
---|---|
Title |
Content Classification of Multimedia Documents using Partitions of Low-Level Features (eng) |
Author | Edda Leopold, Jörg Kindermann |
Language | eng |
Abstract | Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video. |
Subject | Audio-visual content classification, integration of modalities, speech recognition, support vector machines |
Classified Subjects |
|
DDC | 004 |
Rights | DPPL |
URN: | urn:nbn:de:0009-6-7607 |
DOI | https://doi.org/10.20385/1860-2037/3.2006.6 |