This documentation presents tools developed to construct a web demonstrator of multiple transcription systems.
The user must be able to see in real-time the transcriptions synchronized with the original video (it can also be an audio recording) and the differences between the transcriptions highlighted. The comparison is made between one transcription (the first given) chosen as a reference and the other that are called hypothesis. Therefore the first transcription is shown unmodified but the other have information added in their content which help the user to distinguish the words in each hypothesis that have to be inserted, substituted by an other word (which comes from the reference) or simply deleted so this hypothesis matches with the reference. The comparison is made with an adaptation of the DTW algorithm which calculates the best alignment between each hypothesis and the reference.
A plugin is also available to show the distribution of the speaking time between the speakers (diarization viewer), with an interactive colored bar. The developer have to provide the transcription files and the video for the demonstration.
Every time given is in second except the contrary is said.
The plugins are implemented in javascript using the "AngularJs" framework. They also use the "Restangular" service. Some element of the page use the "Bootstrap" framework.
Two files are provided to make them work: "services.js" and "main.css".
To use those plugins, the developer has to insert the services contained in "services.js" to his services and to insert (or adapt) the styles contained in "main.css" to his css styles.
To work, the plugins need several transcription files (.ctm) for each transcription system, a segment file (.seg) for the analyze of the transcriptions and a video or audio file (from which the transcriptions were made) to synchronize the transcription displays in real-time with.
The structures which have to be employed are described bellow:
TODO: decription of a ctm file and where they have to be placed by the developer
The information extracted from the ctm files will be stored in a json format. Here is a description of the resulting json data:
data in the json file:
[ transcription_0, transcription_1, … , transcription_n ]
They must be at least one transcription (but at least two is necessary to start a comparison)
is the name of the transcription system (string) -
[wordObject_0, wordObject_1, … ,wordObject_n]
is the time when the word begins. It can be a string as well as a float (it will be parsed anyway).
is the actual word (string). -
is the name of the speaker (string).
have the value "m" for male and "f" for female.
Here is an example of the json file content:
{"system": “SYST1” ,"content":[{"start":"0","word":"Hello","spk": {"id":"S1","gender":"m"}},{"start":"0.3","word":"World","spk": {"id":"S1","gender":"m"}}, … ]},
{"system": “SYST2” ,"content":[{"start":"0","word":"Cello","spk":{"id":"S1","gender":"m"}},{"start":"0.2","word":"Word","spk": {"id":"S1","gender":"m"}}, … ]}
The segment file (.seg) is a text document which contain the sentences' time delimitations that were used during the transcription process. They are necessary to make the comparison between the transcriptions. The segment file has to be named "sentence_bounds.seg" and to be stored in "/assets/files".
It has the following structure:
video_title sentence0_start sentence0_end concatenation_of_title_and_start_and_end
video_title sentence1_start sentence1_end concatenation_of_title_and_start_and_end
video_title sentence2_start sentence2_end concatenation_of_title_and_start_and_end
The times are in centiseconds
Here is an example:
myVideo 12500 12730 myVideo-125.00-127.30-other-informations
myVideo 12731 12945 myVideo-127.31-129.45-other-informations
What matters for the comparison are the second and third value for each line (the times in centiseconds).
First of all, the plugin put the json data in an object to use them. Then, when the comparison is made, it simply add other information to this json object concerning the words that are different between the first transcriptions (reference: index 0) and the others (hypothesis: index>0).
The computation of the comparison can be long. The plugin is made so it will not freeze the browser but if the enhanced json data are get back by the developer and put on the server, they can be directly used and the user will not have to waste time in computation. The enhanced json data have to be recovered and saved in "/assets/files" with the name "enhanced-transcription.json". When the user visit the web page, if the file is found on the server, the results appear directly, if not , the calculation is made.
Two different controllers have been built in this project. The first one is for a transcription comparator: it will display several transcription, show the difference between the reference and the hypothesis and also give a interactive speaker bar for the first transcription (reference). The second one takes care of one transcription only and give a speaker bar too (it is referencedd in this documentation as a diarization viewer). There is no DTW comparison.
The service Controller
from the controllerServices
is dedicated to the controllers initialization. They will attach specific instances to a root scope as well as useful functions that are then available in the html pages.
The instances manipulated are transcriptionData
which contains the important information built from the seg and ctm files given by the developer and speakerBar
which contain the information built from transcriptionData
and which permits to handle the interactive bar.
The transcription comparator has to call the service :
is the integer value representing the number of word displayed at the same time.
is an array of color names. One color is attributed to one speaker on the bar: they will be given in the order of the array to the speakers sorted (decreasing) by their speaking time. Nevertheless, if they are less speakers in the transcriptions than colors given, only the first colors will be used. And if they are more speakers than colors given, the last color will be used several time for the speakers whose talk the less.
The speaker bar will automatically represent the first transcription (index 0 in the json data array).
The diarization viewer has to call the service :
has the same signification as previously.
As we have seen before, the json data are an array of transcription. A diarization viewer's controller handle one transcription only (its goal his to give a graphical representation of the different stakeholders for one transcription).numTranscription
is the index in the array of the transcription that will be used.
represents the same thing that before.
The tooltips must be allowed in the web pages by inserting this directive in the project:
.directive('tooltip', function () {
return {
link: function(scope, element, attrs)
Here is the description of the different tags to dispose in the pages.
remember to indicate the right controller:
<div ng-controller="YourTranscriptionCtrl">
User messages:
Some messages are provided to inform the user on what is happening.
<div id="progressBar"> <div class="progress progress-striped active"> <div id="progressBarContent" class="bar"></div> </div> <p>Dtw Calculation ({{transcriptionsData.progressBarContent[0].style.width}})</p> </div> <div id="calculationOverAlert" class="alert alert-success"> <button type="button" class="close" data-dismiss="alert">×</button> <h4>Calculation Over!</h4> You can <button class=" btn btn-success btn-large" ng-click="transcriptionsData.copyTranscription()">get the get the transcriptions json data</button> (with the comparison information added) . </div> <div id="outTranscriptionAlert" class="alert alert-error"> <button type="button" class="close" data-dismiss="alert">×</button> <h4>Warning, you are out of the {{transcriptionsData.displayedTranscriptions[0].id}} transcription !</h4> {{transcriptionsData.message}} <button class=" btn btn-danger btn-large" ng-click="startVideo(transcriptionsData.fullTranscription[0].content[0].start)">{{transcriptionsData.clickableMessage}}</button> </div>
The id "progressBar", "calculationOverAlert" and "outTranscriptionAlert" must be present so those element can be updated.
The progress bar represents the percentage of the Dtw calculation done and it will automaticaly be updated.transcriptionsData.progressBarContent[0].style.width
is the actual percentage. The bar appears when there is DTW calculation (the enhanced json file is not found) and disapears when it is over.
The "calculationOverAlert" message appears when the DTW calculation is over (if the enhanced json file was not found). The enhanced json data can be covered by clicking on the button connected totranscriptionsData.copyTranscription()
The "outTranscriptionAlert" message appears when the video is currently out of the transcripted part.transcriptionsData.message
compose a complete message indicating if the video is outside of the transcribed part (taking the first transcription as a reference too for the limits).transcriptionsData.clickableMessage
contains the time when the transcription start and is dedicated to be bounded to the action:startVideo(transcriptionsData.fullTranscription[0].content[0].start)
which set video to the start of the transcription. The start chosen here is the one of the reference.
is the array created from the json files (or found in the enhanced transcription json file if founded). -
The developer can access the transcription system n°i name like this:
<span class="title">{{transcriptionsData.displayedTranscriptions[0].id}} Transcription</span>
Displayed Transcription:
The transcriptions are displayed part by part in synchronization with the video (a whole transcription is too big to be displayed at once). The developer can access these transcription pieces with
(for the transcription n°i). Here is an example to show the displayed part of the reference:<span class="italic">{{transcriptionsData.displayedTranscriptions[0].message}}</span> <article id="content0"> <p> <span style="cursor: pointer;" ng-repeat="jsonWord in transcriptionsData.displayedTranscriptions[0].transcription" ngModel="transcriptionsData.displayedTranscriptions[0].transcription" data-start="{{jsonWord.start}}" ng-click="moveVideo($event)" rel="tooltip" tooltip="'start: '+jsonWord.start+' seconds'" class="{{jsonWord.wordClass}}"> {{jsonWord.word}} </span> </p> </article>
And here is an example to show the displayed part of the hypothesis (transcription n°i with i>0) which is slightly different because of the
fields:<span class="italic">{{transcriptionsData.displayedTranscriptions[i].message}}</span> <article id="contenti"> <p> <span style="cursor: pointer;" ng-repeat="jsonWord in transcriptionsData.displayedTranscriptions[i].transcription" ngModel="transcriptionsData.displayedTranscriptions[i].transcription" data-start="{{jsonWord.start}}" ng-click="moveVideo($event)" ng-mouseenter="transcriptionsData.showCorespondingWordInReferenceWord(jsonWord)" ng-mouseleave="transcriptionsData.hideCorespondingWordInReferenceWord(jsonWord)" rel="tooltip" tooltip="'start: '+jsonWord.start+' seconds'" class="{{jsonWord.wordClass}}"> {{jsonWord.word}} </span> </p> </article>
The code of the displayed transcription must be inside of a container having the id "contenti" where i is the index of the transcription (content0 for the transcription n°0).
is an array ofdisplayedTranscription
objects which regroups the information of the displayed part of a transcription (one for each complete transcription). The word objects (same structure as those of the original json file) are accessible viatranscriptionsData.displayedTranscriptions[i].transcription
are important because they allow the user to click on a word to set the video at the start of this word.
Two more field are presents in the hypothesis displays. As it was told before, information has been added to word if they are word to delete, insert (word that comes from the reference) or substitute (by a word from the reference). In the insert and substitute case, it could be interesting for the user to see to which word it correspond in the reference. That what thetranscriptionsData.showCorespondingWordInReferenceWord(jsonWord)
functions are for. When the user points his mouse on a inserted or substituted word in a hypothesis, it will change the style or the corresponding word in the reference. When his mouse leaves it, the original style is restituted.
A tooltip can be configured on each word.
is a message which is empty if there is a displayed part at this moment for the transcription n°i. If tere is not, the message indicate it. -
The developer must place his video somewhere in the page like this
<video width="512" height="288" id="mediafile" controls preload> <source type="video/mp4" src="http://your-video.mp4" /> <source type="video/webm" src="http://your-video .webm" /> </video>
It is important that the video (or audio) has the id "mediafile" otherwise the functions used by the controller will not be able to detect it.
It could be interesting to give some explanation to the users. The developer can use something like this if he brought four transcription systems for example:
<span class="title">Caption</span> <br><br> <p>The comparisons between the transcriptions are made with a Dynamic time warping (DTW) algorithm wich mesures similarity between two transcriptions sentence by sentence: a reference and a hypothesis. We can then determine the modifications that should be done in the hypothesis so it matches with the reference.</p> <p>We choose the {{transcriptionsData.displayedTranscriptions[0].id}} transcription as a reference for the comparisons. The {{transcriptionsData.displayedTranscriptions[3].id}},{{transcriptionsData.displayedTranscriptions[1].id}} and {{transcriptionsData.displayedTranscriptions[2].id}} transcriptions are the hypothesis.<br>Here are the caption of the modifications that must be applied to the hypothesis ({{transcriptionsData.displayedTranscriptions[3].id}},{{transcriptionsData.displayedTranscriptions[1].id}} or {{transcriptionsData.displayedTranscriptions[2].id}} transcription) so it matches with the reference ({{transcriptionsData.displayedTranscriptions[0].id}} transcription) :</p> <p>_ <span class="{{transcriptionsData.substitutionStyle}}">hypothesisWord(>>referenceWord) </span> : hypothesisWord (from {{transcriptionsData.displayedTranscriptions[3].id}},{{transcriptionsData.displayedTranscriptions[1].id}} or {{transcriptionsData.displayedTranscriptions[2].id}} transcription) must be substituted with referenceWord (from {{transcriptionsData.displayedTranscriptions[0].id}} transcription) to match.<br> _ <span class="{{transcriptionsData.suppressionStyle}}">hypothesisWord </span> : hypothesisWord (from {{transcriptionsData.displayedTranscriptions[3].id}},{{transcriptionsData.displayedTranscriptions[1].id}} or {{transcriptionsData.displayedTranscriptions[2].id}} transcription) must be deleted to match.<br> _ <span class="{{transcriptionsData.insertionStyle}}">referenceWord </span> : referenceWord (from {{transcriptionsData.displayedTranscriptions[0].id}}) must be inserted to match.</p> <p>Other information:</p> <p>_ <span class="{{transcriptionsData.showStyle}}">referenceWord </span> : When your mouse is over an inserted or substituted word in a hypothesis, the corresponding word in the reference is highlighted.<br> _ <span class="untreatedDtw">word </span> : This word has not been treated by a DTW because it does not belong to any sentence.</p>
Interactive speaker bar:
The interactive speaker bar is composed of a clickable bar to navigate in the video (this bar also show the repartition of the speakers), a timer and also a popver which give some information when the mouse is over the bar. Here is the way the developer should insert the bar in his page (for the transcription n°i=0 because it will automatically corespond to th reference):
<div id="popover" class="popover"> <div class="bloc"> <h4>Speaker Bar Info</h4> <span>{{speakerBar.popoverText}}</span> </div class="bloc"> </div> <div id="canvasicontainer"> <canvas class="canvas" id="canvasi" ng-click="speakerBar.clickUpdate($event)" ng-mousemove="speakerBar.openPopover($event)" ng-mouseleave="speakerBar.closePopover()"> <p>updates are necessary</p> </canvas> </div> <p><span id="progressTimei">--:--</span></p>
As before, the canvas tag must have the id "canvasi" and the timer the id "progressTimei" where i is the index of the transcription. The canvas must be in a container which have the id "canvasicontainer": it will permit the bar to be auto-sized.
The canvas must be bounded to theclickUpdate
function when the user click on it.
The popover is in a container with the id "popover". Its content isspeakerBar.popoverText
, a message automatically updated depanding on the position of the mouse on the"speakerBar.openPopover($event)"
serve to bound the popover to the bar.
The message "updates are necessary" is useful to inform if the browser does not support canvas. -
Interactive speaker list:
A interactive speaker list is available. The user can see the different speakers and their respective color but it also display the speaker that is currently talking. Beside if the user click on the colored rectangle of a speaker, the video will be settled on the first time when the speaker talks. Here is the way the caption can be made:
<!-- Main Speaker --> <span class="bold">{{speakerBar.mainSpeakersTitle}}</span> <div class="container-fluid"> <div class="row-fluid"> <div class="span6"> <ul class="nav nav-pills nav-stacked"> <li ng-repeat="speaker in speakerBar.mainSpeakers.slice(0,speakerBar.mainSpeakers.length / 2+speakerBar.mainSpeakers.length % 2)" ngModel="speakerBar.mainSpeakers.slice(0,speakerBar.mainSpeakers.length / 2+speakerBar.mainSpeakers.length % 2)" class="{{speaker.speakingStatus}}" > <a style="text-decoration:none;"><button class=" btn btn-large" style="background:{{speaker.color}};" ng-click="speaker.moveVideoToSpeechStart()" rel="tooltip" tooltip="'first speech: '+speaker.giveFirstSpeechTimeString()"> </button> id: <span class="badge badge">{{speaker.spkId}}</span> , gender: <span class="badge badge">{{speaker.gender}}</span> , total speech time= <span class="badge badge">{{speaker.giveTotalTimeString()}}</span> </a> </li> </ul> </div> <div class="span6"> <ul class="nav nav-pills nav-stacked"> <li ng-repeat="speaker in speakerBar.mainSpeakers.slice(speakerBar.mainSpeakers.length / 2+speakerBar.mainSpeakers.length % 2,speakerBar.mainSpeakers.length)" ngModel="speakerBar.mainSpeakers.slice(speakerBar.mainSpeakers.length / 2+speakerBar.mainSpeakers.length % 2,speakerBar.mainSpeakers.length)" class="{{speaker.speakingStatus}}" > <a style="text-decoration:none;"><button class=" btn btn-large" style="background:{{speaker.color}};" ng-click="speaker.moveVideoToSpeechStart()" rel="tooltip" tooltip="'first speech: '+speaker.giveFirstSpeechTimeString()"> </button> id: <span class="badge badge">{{speaker.spkId}}</span> , gender: <span class="badge badge">{{speaker.gender}}</span> , total speech time= <span class="badge badge">{{speaker.giveTotalTimeString()}}</span> </a> </li> </ul> </div> </div> </div> <!-- Secondary Speaker --> <span class="bold">{{speakerBar.secondarySpeakersTitle}}</span> <div class="container-fluid"> <div class="row-fluid"> <div class="span6"> <ul class="nav nav-pills nav-stacked"> <li ng-repeat="speaker in speakerBar.secondarySpeakers.slice(0,speakerBar.secondarySpeakers.length / 2+speakerBar.secondarySpeakers.length % 2)" ngModel="speakerBar.secondarySpeakers.slice(0,speakerBar.secondarySpeakers.length / 2+speakerBar.secondarySpeakers.length % 2)" class="{{speaker.speakingStatus}}" > <a style="text-decoration:none;"><button class=" btn btn-large" style="background:{{speaker.color}};" ng-click="speaker.moveVideoToSpeechStart()" rel="tooltip" tooltip="'first speech: '+speaker.giveFirstSpeechTimeString()"> </button> id: <span class="badge badge">{{speaker.spkId}}</span> , gender: <span class="badge badge">{{speaker.gender}}</span> , total speech time= <span class="badge badge">{{speaker.giveTotalTimeString()}}</span> </a> </li> </ul> </div> <div class="span6"> <ul class="nav nav-pills nav-stacked"> <li ng-repeat="speaker in speakerBar.secondarySpeakers.slice(speakerBar.secondarySpeakers.length / 2+speakerBar.secondarySpeakers.length % 2,speakerBar.secondarySpeakers.length)" ngModel="speakerBar.secondarySpeakers.slice(speakerBar.secondarySpeakers.length / 2+speakerBar.secondarySpeakers.length % 2,speakerBar.secondarySpeakers.length)" class="{{speaker.speakingStatus}}" > <a style="text-decoration:none;"><button class=" btn btn-large" style="background:{{speaker.color}};" ng-click="speaker.moveVideoToSpeechStart()" rel="tooltip" tooltip="'first speech: '+speaker.giveFirstSpeechTimeString()"> </button> id: <span class="badge badge">{{speaker.spkId}}</span> , gender: <span class="badge badge">{{speaker.gender}}</span> , total speech time= <span class="badge badge">{{speaker.giveTotalTimeString()}}</span> </a> </li> </ul> </div> </div>
The developer can access the data of the different speaker (their color, id, gender, speaking periods, speaking status) via
which is an array ofspeakerData
objects. But here, it is more interesting to usespeakerBar.mainSpeakers
which contain both a subset of thespeakerBar.speakers
are the speakers who talk the most and have their own color whilespeakerBar.secondarySpeakers
are the speakers who talk the less and share the same color. It all depends of the colors the developer gave in first time when he initialized the controller. Now it is possible to separate the main speakers from the others in the caption. The speakers are split in two columns in this example.
correspond to a css (bootstrap css) style different if the speaker is currently speaking or not and it is frequently updated.
The colored button has to be bounded to the functionspeaker.moveVideoToSpeechStart()
which set the video to the moment when the speaker talk for the first time.
Titles are available (speakerBar.mainSpeakersTitle
) as well as a tooltip message (tooltip="'first speech: '+speaker.giveFirstSpeechTimeString()"
remember to indicate the right controller again:
<div ng-controller="YourSpeakerCtrl">
User messages:
are still present in this controller. The element concerning the DTW calculation are not necessary. -
Same thing too.
Same thing too.
object is the same as previously and can be use in the same way (to display the transcription) except that the comparison information have not been calculated and therefore are not present in this object. -
Displayed Transcription:
Quite the same thing except that the developer should not place the
attribute in the span tag whereng-repeat
is. Indeed thiswordClass
is a part of the information added by the comparison. It is different depending if the word must be substituted, inserted, deleted. Beside if a part of a transcription have not been treated by the DTW (if it is outside of any sentence bounds) then the corresponding word will have the wordClass "untreatedDtw" (should appear as grey text). Or as we seen before the comparison is not made in the diarization controller so the developer should not place the class attribute if he do not want to see the all text appear in grey. However he may use the "none" class (a default style).
are not necessary too. -
Interactive speaker bar and list:
Same thing too except here, i can be >0.
The css file contains styles used by the plugins.
current : The displayed part of one transcription is progressively highlighted as the video is read. This style is given to a word highlighted.
none : This style is given for words that have not to be modified (but it means they have been treated by the Dtw).
untreatedDtw : This style is given for words that have not been treated in the Dtw (grey text).
canvas : The style of the speaker bar (form, shadows...)
bold, italic, title and bloc (for certain container).
popover styles are defined for the speaker bar popover.
The service file is composed of several modules.
_ function getNextWords: returns the next list of words to display on the screen
transcription: the complete transcription concerned
nextWordToDisplay: the index of the next word to display in this transcription (in the array transcription.content)
step: the number of word displayed at a time -
{ “words”: an array of wordObject, “currentWordEnd”: the index of the last word to display, “currentWordStart”: the index of the first word to display, “nextWordToDisplay” the index of the first word in the next display, “nextTimeToDisplay”: the start of the first word in the next display }
_ function instance: It returns an instance of the DtwTranscription class. This class contains information concerning a DTW between two transcriptions.
hypothesis: a subset of the hypothesis complete transcription content
indexStartHyp: the index of the first word in the complete transcription content
reference: a subset of the hypothesis complete transcription content
indexStartRef: the index of the first word in the complete transcription content -
PointDtwTranscription: regroups the information of a point in the DTW matrix
dtw: the corresponding DtwTranscription
cost: the cost calculated by the Dtw on this point
operation: the corresponding operation (substitution, suppression, insertion)
matrixLine: The line of the point
matrixCol: The column of the point -
instance variables
cost: the cost calculated by the Dtw on this point
operation: the corresponding operation (substitution, suppression, insertion)
indexFullHyp: the index of the corresponding word in the complete hypothesis transcription
indexFullRef: the index of the corresponding word in the complete reference transcription
instance variables
indexStartHyp: the index of the first word in the complete transcription content
indexStartRef: the index of the first word in the complete transcription content
hypothesis: a subset of the hypothesis complete transcription content
reference: a subset of the reference complete transcription content
iM: the maximum line index of the Dtw matrix
jM: the maximum column index of the Dtw matrix
matrix: the Dtw matrix -
calculate: fills the matrix with points
givePath: returns the shortest path- return: an array of PointDtwTranscription
_ function instance: It returns an instance of the TranscriptionData class. This class contains information concerning the transcriptions.
transcriptionTable: an array which contains the json data of the different transcriptions (extracted from the json data file)
globalStep: a step that will be used for each displayed transcription -
DisplayedTranscription: regroups the information of a displayed part of a transcription
step: the number of words currently displayed
id: the name of the transcription system -
instance variables
message: a message usefull when the displayed part is out of the trancripted part
id: the id of the transcription system
nextWordToDisplay: the index of the next word to display in the complete transcription
currentHighlightedIndex: the index of the currently highlighted word in the displayed part
currentWordStart: the index of the first word displayed in the complete transcription
currentWordEnd: the index of the last word displayed in the complete transcription
step: the number of word displayed at a time
nextTimeToDisplay: the start of the next word to display
transcription: an array of wordObject to display
WordToAdd: represents a word object that will have to be inserted in a transcription (they are inserted at the end because of the shift)
wordObject: the wordObject to insert
position: the index where it should be inserted -
instance variables
wordObject: the wordObject to insert
position: the index where it should be inserted
instance variables
fullTranscription: the array of complete transcriptions issued from the json file
globalStep: the step for all the displayedTranscription
displayedTranscriptions: an array of DisplayedTranscription (one for each complete transcription)
message: a message to inform the user if he is outside the transcription
clickableMessage: the clickable part of the message
progressBarContent: the progress bar content in the web page
progressBar: the progress bar in the web page
outTranscriptionAlert: the "outTranscriptionAlert" in the web page
calculationOverAlert: the "calculationOverAlert" in the web page
insertionStyle: the style used for inserton
suppressionStyle: the style used for suppression
substitutionStyle: the style used for substitution
showStyle: the style used to show a corespondance in the reference -
updateDisplayedTranscription: updates the content of a displayed transcription
transcriptionNum: the number of the displayed transcription to update
timeUpdateDisplay: updates the display of the transcriptions at a specific time
currentTime: the time for the update
seekingUpdateDisplay: updates the display when seeking in the media
seekingTime: the time sought
initDisplay: inits the displayed transcriptions
timeStart: the time when the video is started
addWords: adds all the word in the complete hypothesis transcriptions
wordsToAddInTranscriptions: an array of array(one for each transcription except the reference) of WordToAdd objects
updateTranscriptionsWithDtw: calculates the DTW between the references and the hypothesis and put the resulting information in the transcriptions
segments: an array of bounds that delimit the sentences used in the DTWs. Each bounds is an size 2 array which contain the start and the end of a sentence
refresh: a function to refresh the web page content during the calculation process
showCorespondingWordInReferenceWord: changes the style of a word in the reference (function used when the user point his mouse on a substituted or inserted word in the hypothesis)
word: a wordObject which belong to one of the hypothesis
hideCorespondingWordInReferenceWord: restores the style of the reference word when the user's mouse leave the substituted or inserted word in the hypothesis
word: a wordObject which belong to one of the hypothesis
adjustTranscriptions: adds/modifies information to the transcriptions and adjust the hypothesis transcriptions to the reference before anything starts
copyTranscription: opens a modal window for the user allowing him to get the json transcription data with the DTW information added
_ function instance: It returns an instance of the SpeakerBar class. This class contains information concerning the speaker bar for the diarization.
transcription: the complete transcription that the bar will describe
transcriptionNum: the number of this transcription to identify the bar elements in the page.
colors: an array of colors that will be used to identify the speakers -
SpeakerData: regroups the information of a single speaker
spk: a spk object
color: the color (string) attributed to this speaker -
instance variables
spkId: the id of the speaker
gender: the gender of the speaker (string)
color: the color to identify the speaker
speakingPeriods: an array of size 2 arrays that contains start and ends of a speaking period
speakingStatus: a string determining if the speaker is actually speaking or not (corespond to a css style) -
giveTotalTime: gives the sum of the speaker speech
- return: the total amount of time cumulated in speakingPeriods
giveTotalTimeString: gives a string representing the sum of the speaker speech
- return: a string representing the total time of speech
addSpeakingPeriod: add a new speaking period
start: the start of the period
end: the end of the period
moveVideoToSpeechStart: moves the video to the moment when the speaker speaks for the first time
instance variables
transcriptionNum: the number of the transcription
transcription: the complete transcription
timeStart: the moment when the video start
timeEnd: the moment when the video end
canvas: the canvas found in the html page
canvasContainer: the canvas container found in the html page
context: the context of the canvas
timer: the timer found in the html page
canvasWidth: the width of the canvas
canvasHeight: the height of the canvas
duration: the duration of the transcription
colors: the array of color
speakers: an array of SpeakerData object
mainSpeakers: a sub-array of speakers which contains those who talk the more
secondarySpeakers: a sub-array of speakers which contains those who talk the less
secondarySpeakersTitle: a title used only if there is secondary speakers (those who share the same color)
mainSpeakersTitle: a tittle for the main speakers
this.grd: a gradiant for the speaker bar coloration
contextCopy: a copy of the speaker bar context once it is colored. It is not necessary to draw again the bar for each update
popoverText: the information given in the popover -
updateSpeakers: fills the speakers array with SpeakerData objects corresponding to the transcription
setColor: sets the current color to fill the canvas-
color: the color to use
drawSegment: draws a segment in the canvas
start: the moment when when the period to draw begin
width: the time length of the period to draw
drawSpeakers: draws all the speakers in the bar
timeUpdate: updates the bar corresponding to a specific time-
currentTime: the time to update
clickUpdate: updates the video with time corresponding to the spot the user clicked on the bar
event: an $event object
initialize: initializes the speaker bar for the first time
openPopover: opens the popover which describe the bar
event: an $event object
closePopover: closes the popover
_ function search: perform a binary search
items: an array in which we search something
value: the value to find
accessFunction: the function to apply on each item to extract the values we search on -
return: the index of the value if found. Otherwise an error is returned. -2 is returned if the value we search is inferior to the items given. -3 if it is superior. -1 is returned for the other cases.
- function get: gets a file from assets/files
fileId: the object {fileId: 'your.file'} to access assets/files/your.file
return: the file
_ function get: parses the content of a seg file in assets/files into an exploitable array
segfileContent: the content of the seg file obtained with File.get
return: an array of {“start”:moment when sentence starts, “end”:moment when sentence ends} objects
_ function startVideo: starts the video at a specific time and init the display of the corresponding transcriptionsData
timeStart: time to start the video
transcriptionsData: a TranscriptionsData object to init
_ function moveVideo: moves the video at the time corresponding to the word the user clicked on in the page
event: an $event object
_ function moveVideoTo: moves the video to a specific time
time: time to move the video
_ function giveCurrentTime: returns the current time of the video
- return: the current time
_ function giveDuration: returns the duration of the video
- return: the duration
_ function format: gives a string representation of a time in second (rounded to the lower value)
time: the time to represent
return: a string representation
_ function getElementPosition: gives the absolute position of an element in the page
element: the element in the page
return: an { “x”: abscissa, “y”: ordinate } object giving the coordinates of the left top of the element
_ getMousePosition: gives the coordinates of the mouse position
event: an $event object
return: an { “x”: abscissa, “y”: ordinate } object giving the coordinates of the left top of the mouse
i. Controller (uses the services Video, File, Restangular, SentenceBoundaries, TranscriptionsData and SpeakerBar)
_ function initializeTranscriptionComparisonCtrl: initializes the controller for the transcription comparator
scope: a $scope object
globalStep: the step used for every displayed transcription
colors: an array of the colors used to represent the speakers
_ function initializeDiarizationCtrl: initializes the controller for the diarization viewer
scope: a $scope object
globalStep: the step used for every displayed transcription
transcriptionNum: the number of the transcription used for this bar
colors: an array of the colors used to represent the speakers