A prototype built in Max/MSP/Jitter. Webcam-based grid music controller.
Motivation
Music Therapy (MT) has been proven highly effective in treating and rehabilitating gross and fine motor function for people living with a variety of injury and movement disorders []. Abundant research has highlighted long-term positive effects of MT interventions on upper limb motor function, several studies underlining benefits for stroke survivors []. Neurologic music therapy (NMT) clients’ sensory-motor abilities and neural connectivity also see significant long-term improvements due to entrainment. Music therapists utilize music components such as rhythm, which can “entrain movement patterns” in the body and brain of people who live with movement disorders. This is why rhythm-based activities are often employed to help people improve their gait [], balance [], coordination [], movement accuracy [] and others. However, these methods of treatment typically require the client to be physically present to engage in embodied interaction with the music and therapist. This can make it challenging for those who live long distances away or have mobility challenges to frequently attend therapy in person. This is why telehealth has become a popular form of accessing therapy remotely. This study explores how a gestural music interface can be integrated into the music therapist’s workflow with remote clients, with considerations in embodied interaction and how well does it translate to telepresence.
Methodology
This study will use a prebuilt prototype of a webcam-based program which acts as a music instrument that is played through touchless arm and hand movements. This will be used as a design probe [] to elicit discussion about the challenges and opportunities of its implementation and its design, thus following a research through design (RtD) paradigm.
The prototype was built using the software Max/MSP/Jitter. It is a computer-based platform with 4 key features:
- Webcam screen: This is the webcam feed, in a screen divided into a 3x3 grid, resulting in 9 squares, each of which can trigger a sound when it detects movement.
- Motion Sensitivity: It defines how strongly the webcam-facing user needs to move to trigger sound outputs through a number from 1 to 100. 1 sets the maximum sensitivity, triggering sound with the slightest motion. As the number increases, the greater degree of motion is going to trigger sound.
- Arming the instrument: This contains three separate toggles, one to turn the Audio On, and Off, and the same for Video (webcam feed) and the instrument (allow video to trigger sound).
- Choosing instrument kit: This is the collection of sounds that will be triggered. In the prototype, the author included three settings: Standard Drum Kit (snare, hi hat, cymbal, toms etc.), Latin Percussion (bongoes, congas, etc.) and Vibraphone notes in the key of C major.
Each feature is numbered and presented as a sequence of steps to make the prototype ready to play. Fig. 1 shows a screenshot of the prototype user interface (left) and its underlying Max/MSP/Jitter patch (right).

Research Procedure
The study follows an explorative approach, wherein the prototype is sent to participants a week in advance so they can experience the interaction first-hand and explore the features to become familiar with the design. There are no tasks, because of two main reasons: 1) MT-BCs possess more experiential and tacit knowledge about MT and different client health conditions, all of which can help situate the design in context-relevant uses cases; and 2) This is a digital music interface for open-ended play, originally intended for MT-BCs to appropriate and repurpose for their individualized MT interventions, according to their individual client needs.
The research activities will consist of:
- A digital “packet”, which will contain a short demographic and job characteristics survey, as well as a description of the music interface. The music interface prototype will also be included, in a self-contained format, so participants do not need to download or install specialized software.
- A follow-up semi-structured interview, where participants and researchers will critically discuss the benefits and challenges of this technology and its larger impact on MT practice.
Expected Outcomes
From this study we expect the following data:
- Demographic information: Music therapist demographic data, client characteristics, health conditions frequently treated.
- Interface assessment: Quantitative data (Ratings) from adapted design heuristics.
- Impact discussion: Interview transcripts with data covering identified use cases; perceived benefits and concerns of prototype; perspectives on clinical efficacy and acceptability, future design directions and implications for telehealth.