{"id":84,"date":"2021-12-05T18:21:18","date_gmt":"2021-12-05T18:21:18","guid":{"rendered":"https:\/\/wp.coventry.domains\/e2create\/?page_id=84"},"modified":"2021-12-06T20:01:17","modified_gmt":"2021-12-06T20:01:17","slug":"ramfem","status":"publish","type":"page","link":"https:\/\/wp.coventry.domains\/e2create\/ramfem\/","title":{"rendered":"Raw Music from Free Movements"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p>Raw Music from Free Movements (RAMFEM) is a deep learning architecture that translates pose sequences into audio waveforms. The architecture combines a sequence-to-sequence model generating audio encodings and an adversarial autoencoder that generates raw audio from audio encodings. RAMFEM constitutes an  attempt to design a digital music instrument by starting from the creative decisions a dancer makes when translating music into movement and then reverse these decisions for the purpose of generating music from movement. An important aspect of RAMFEM&#8217;s capability to learn from and recreate existing movement and music relationships is its operation in the raw audio domain. Because of this, RAMFEM can be applied to any recordings of movement and music, capture their correlations, and subsequently recreate the acoustic characteristics of the music through embodied gestures.<\/p>\n\n\n\n<p>This project has been realised in collaboration with Kivan\u00e7 Tatar, at that time independent musician and researcher, Vancouver, Canada. A detailed description of the project has been <a href=\"https:\/\/www.researchgate.net\/profile\/Daniel-Bisig\/publication\/353447404_Raw_Music_from_Free_Movements_Early_Experiments_in_Using_Machine_Learning_to_Create_Raw_Audio_from_Dance_Movements\/links\/6106a47c169a1a0103cd2ba9\/Raw-Music-from-Free-Movements-Early-Experiments-in-Using-Machine-Learning-to-Create-Raw-Audio-from-Dance-Movements.pdf?_sg%5B0%5D=52jKme6uNLcq1lec3UUcCVe1EMRh-BrdOzdyGOAYBJbsCb96Ptwlq-dMmmw_UPmpusDiJK34rNrTMooeNcmcoA.mkECxdWspwszCi7iCs2xBS9P474FThsoRA35U6VZw1eR6vsxrURPUf1ghMwC-TcDeMVLUCSMA2IEHuwzKLNDFg.eb6jQCylq2FJ-pWNLGiVKnfSGUCOUqhGtp5-xR7rGGYv6o_lZvGdOmwOieh5lDD6My7tV6NqFWr_oEYYEXfg2A&amp;_sg%5B1%5D=Sg898BM1Jd3b3ahI0JWBe55LDQrFX1MqfX_gxd_3R_goFK0ZYpLBJykWtXDahDlqcPOnrEfXPumuWyk1KIX1-eTkedgdMlIn7fSvdp_ZbAHZ.mkECxdWspwszCi7iCs2xBS9P474FThsoRA35U6VZw1eR6vsxrURPUf1ghMwC-TcDeMVLUCSMA2IEHuwzKLNDFg.eb6jQCylq2FJ-pWNLGiVKnfSGUCOUqhGtp5-xR7rGGYv6o_lZvGdOmwOieh5lDD6My7tV6NqFWr_oEYYEXfg2A&amp;_sg%5B2%5D=bWZoGEBOu92u6-bxJsU8hfjnOHGnQlcHoksou9dvD9jW-jf0hoFWUlyOOwo4HtKZ2FUkcEtz3bn7jlQ.m-Tt91LzV9ivB82wYfalWiy3D_spNeOPplCYiN__q3Lwbxe6nwbfFZaGtijct_UT6o3tRq0N1Pj46PyyLhdcNg&amp;_iepl=\" data-type=\"page\">published<\/a>. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Machine Learning Model<\/h2>\n\n\n\n<p>The current architecture of RAMFEM consists of three components: an adversarial autoencoder (AAE), a sequence to sequence transducer (Seq2Seq), and an audio concatenation mechanism. The source code, trained models, and audio and motion capture data required for testing and training are available <a href=\"https:\/\/github.coventry.ac.uk\/ad5041\/RawMusicFromFreeMovements\">online<\/a>.<\/p>\n\n\n\n<p>The AAE in RAMFEM encodes and decodes short audio waveforms into and from latent vectors. The Seq2Seq takes a sequence of poses as input and translates them into a sequence of audio encodings. These encodings are passed to an audio decoder which transforms them into waveforms.  The audio concatenation mechanism takes a sequence of waveforms, applies a Hanning window as amplitude envelope to each of them, and then concatenates them with a 50% overlap to create the final audio sequence.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Model_Pipeline-996x1024.png\" alt=\"\" class=\"wp-image-139\" width=\"409\" height=\"420\" srcset=\"https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Model_Pipeline-996x1024.png 996w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Model_Pipeline-292x300.png 292w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Model_Pipeline-768x789.png 768w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Model_Pipeline.png 1329w\" sizes=\"auto, (max-width: 409px) 100vw, 409px\" \/><figcaption>RAMFEM Processing Pipeline.  RAMFEM  takes as input a short sequence of dance poses and produces as output a sequence of audio windows which are blended together using an amplitude envelope.<\/figcaption><\/figure><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"324\" src=\"https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/seq2seq_autoencoder-1024x324.png\" alt=\"\" class=\"wp-image-138\" srcset=\"https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/seq2seq_autoencoder-1024x324.png 1024w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/seq2seq_autoencoder-300x95.png 300w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/seq2seq_autoencoder-768x243.png 768w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/seq2seq_autoencoder-1536x486.png 1536w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/seq2seq_autoencoder-1568x496.png 1568w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/seq2seq_autoencoder.png 1931w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption> RAMFEM Model Architecture. The model consists of several neural network that form part of the sequence to sequence transducer (left side) and adversarial autoencoder (right side).<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Dataset<\/h2>\n\n\n\n<p>Two different datasets were employed for training, named improvisation dataset and sonification dataset. The improvisation dataset consists of pose sequences and audio that have been recorded while a dancer was freely improvising to a given music. The dancer is an expert with a specialisation in contemporary dance and improvisation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Results<\/h2>\n\n\n\n<p>Audio generated by the model trained on the sonification dataset when it is presented with the original movement sequence used for sonification.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"533\" height=\"1024\" src=\"https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Spectrograms_Soni_Ori-533x1024.jpg\" alt=\"\" class=\"wp-image-144\" srcset=\"https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Spectrograms_Soni_Ori-533x1024.jpg 533w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Spectrograms_Soni_Ori-156x300.jpg 156w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Spectrograms_Soni_Ori.jpg 597w\" sizes=\"auto, (max-width: 533px) 100vw, 533px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-embed is-type-video is-provider-vimeo wp-block-embed-vimeo wp-embed-aspect-1-1 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"seq2seq_soni_dataset_orig_mov_14000.mp4\" src=\"https:\/\/player.vimeo.com\/video\/653817526?h=b6cece375a&amp;dnt=1&amp;app_id=122963\" width=\"500\" height=\"500\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p><\/p>\n<\/div>\n<\/div>\n\n\n\n<p>Audio generated by the model trained on the improvisation dataset when it is presented with the original movement sequence used for sonification. <\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"532\" height=\"1024\" src=\"https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Spectrograms_Impro_Ori-532x1024.jpg\" alt=\"\" class=\"wp-image-149\" srcset=\"https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Spectrograms_Impro_Ori-532x1024.jpg 532w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Spectrograms_Impro_Ori-156x300.jpg 156w, https:\/\/wp.coventry.domains\/e2create\/wp-content\/uploads\/sites\/1833\/2021\/12\/RAMFEM_Spectrograms_Impro_Ori.jpg 579w\" sizes=\"auto, (max-width: 532px) 100vw, 532px\" \/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-embed is-type-video is-provider-vimeo wp-block-embed-vimeo wp-embed-aspect-1-1 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"seq2seq_impro_dataset_orig_mov_14000.mp4\" src=\"https:\/\/player.vimeo.com\/video\/653825208?h=859e6b7628&amp;dnt=1&amp;app_id=122963\" width=\"360\" height=\"360\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>Audio generated by the model trained on the sonification dataset when it is presented with a different movement sequence.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-vimeo wp-block-embed-vimeo wp-embed-aspect-1-1 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"seq2seq_soni_dataset_new_mov_14000.mp4\" src=\"https:\/\/player.vimeo.com\/video\/653826437?h=aeaa9fed18&amp;dnt=1&amp;app_id=122963\" width=\"360\" height=\"360\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>Audio generated by the model trained on the improvisation dataset when it is presented with a different movement sequence. <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-vimeo wp-block-embed-vimeo wp-embed-aspect-1-1 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"seq2seq_impro_dataset_new_mov_14000_1.mp4\" src=\"https:\/\/player.vimeo.com\/video\/653827626?h=90dd9b27a2&amp;dnt=1&amp;app_id=122963\" width=\"360\" height=\"360\" frameborder=\"0\" allow=\"autoplay; fullscreen; picture-in-picture\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Summary Raw Music from Free Movements (RAMFEM) is a deep learning architecture that translates pose sequences into audio waveforms. The architecture combines a sequence-to-sequence model generating audio encodings and an adversarial autoencoder that generates raw audio from audio encodings. RAMFEM constitutes an attempt to design a digital music instrument by starting from the creative decisions&hellip; <a class=\"more-link\" href=\"https:\/\/wp.coventry.domains\/e2create\/ramfem\/\">Continue reading <span class=\"screen-reader-text\">Raw Music from Free Movements<\/span><\/a><\/p>\n","protected":false},"author":2154,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","footnotes":""},"class_list":["post-84","page","type-page","status-publish","hentry","entry"],"_links":{"self":[{"href":"https:\/\/wp.coventry.domains\/e2create\/wp-json\/wp\/v2\/pages\/84","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.coventry.domains\/e2create\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/wp.coventry.domains\/e2create\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/wp.coventry.domains\/e2create\/wp-json\/wp\/v2\/users\/2154"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.coventry.domains\/e2create\/wp-json\/wp\/v2\/comments?post=84"}],"version-history":[{"count":6,"href":"https:\/\/wp.coventry.domains\/e2create\/wp-json\/wp\/v2\/pages\/84\/revisions"}],"predecessor-version":[{"id":150,"href":"https:\/\/wp.coventry.domains\/e2create\/wp-json\/wp\/v2\/pages\/84\/revisions\/150"}],"wp:attachment":[{"href":"https:\/\/wp.coventry.domains\/e2create\/wp-json\/wp\/v2\/media?parent=84"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}