CSU-K-Toolkit

This is the CSU-K toolkit for spoken call shared task 2. It contains several scripts, models and other data. Those files have been used to develop our two CSU-K ST2 systems, which you can find here:

Furthermore there are (not officially published) papers, which explain the Toolkit and our DNN-Based-System in more detail:

Citation

If you use parts of this toolkit (including papers), please make sure to cite our paper at Interspeech 2018. See https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1000.html.

Cite as: Jülg, D., Kunstek, M., Freimoser, C.P., Berkling, K., Qian, M. (2018) The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task. Proc. Interspeech 2018, 2359-2363, DOI: 10.21437/Interspeech.2018-1000.

@inproceedings{Jülg2018,
  author={Dominik Jülg and Mario Kunstek and Cem Philipp Freimoser and Kay Berkling and Mengjie Qian},
  title={The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2359--2363},
  doi={10.21437/Interspeech.2018-1000},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1000}
}

Script descriptions

Order	ScriptName	Input	Output	notes
1, c	generateDistanceVectors.py	Doc2Vec, ReferenceGrammer, RecResult	X-Vectors as CSV
a	generateDoc2VecModel.py	Dir to Training Text	a Doc2Vec Model
b	split_file.py	Training CSV, Modulator	Train and Test	you can use another split script
d	makeLabelsFromCSV.py	Test or Test CSV	Y-Vectors as CSV
opt	splitBasedOnIdFile.py	Txt containing ids, file you want to split	Two files one only contraining ids from input txt	alternative to b
opt	split_csv_by_mod_10.py	Training CSV	Train and Test where Test ist 10% of Train	alternative to b
opt	makeLabelsFromCSV_v2.py	Test CSV from ST2	Y-Vectors as CSV
help	validateUniqueIds.py	As many files as you like	IDs that are shared	check if you have clean train and test
help	extractBaseOnId.py	Txt containing ids, some file	Two files like the secound input file. But one just containg the ids from txt	if you have messed up something
help	merge_sc_1_training_data_with_asr.py	Train CSV, ASR, ASR_Data	sc1 train entries are have asr output as rec_result
help	merge_sc_2_training_data_with_asr.py	Test CSV, ASR, ASR_Data	sc2 train entries,are have asr output as rec_result
help	normalize_sc_1_test.py	ABC CSV files, Outputfile	CSV file formatted like sc1_train
help	normalize_sc_2_abc.py	Test CSV, Outputfile	CSV file formatted like sc1_train
help	get_intersected_ids.py	A lot of CSV	intersacting ids
help	fix_id_in_csv.py	Data CSV, Outputfile	same as input but with fixed ids
exp	generateWord2VecModel.py	Dir with Training Text	a Word2Vec Model
exp	makeMeaningLabelsFromCSV.py	Data CSV	File with Meaning Labels
exp	makeMeaningLabelsFromCSV.py	Data CSV	File with Meaning Labels
exp	makeGrammarLabelFromCSV.py	Data CSV	File with Grammer Labels
exp	split_ABC_intelligent.py	Data CSV A B and C	Several files needed by ASR. Creates scp files, spk2utt, utt2spk and text	Use with caution. You should already have most of those files

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
models		models
CSU-K-ST2-Tutorial.pdf		CSU-K-ST2-Tutorial.pdf
Dictionary_ Shared Task.pdf		Dictionary_ Shared Task.pdf
README.md		README.md
create_valid_pos_sentence_orders.py		create_valid_pos_sentence_orders.py
extractBasedOnId.py		extractBasedOnId.py
fix_id_in_asr_output.py		fix_id_in_asr_output.py
fix_id_in_csv.py		fix_id_in_csv.py
generateDoc2VecModel.py		generateDoc2VecModel.py
generateParallelizedSimVector_v2.py		generateParallelizedSimVector_v2.py
generateWord2VecModel.py		generateWord2VecModel.py
get_intersecting_ids.py		get_intersecting_ids.py
makeGrammerLabelsFromCsv.py		makeGrammerLabelsFromCsv.py
makeLabelsFromCsv.py		makeLabelsFromCsv.py
makeLabelsFromCsv_v2.py		makeLabelsFromCsv_v2.py
makeMeaningLabelsFromCsv.py		makeMeaningLabelsFromCsv.py
merge_sc1_training_data_with_asr_output.py		merge_sc1_training_data_with_asr_output.py
merge_sc2_training_data_with_asr_output.py		merge_sc2_training_data_with_asr_output.py
normalize_sc1_test.py		normalize_sc1_test.py
normalize_sc2_abc.py		normalize_sc2_abc.py
splitBasedOnIdFile.py		splitBasedOnIdFile.py
split_ABC_intelligent.py		split_ABC_intelligent.py
split_csv_by_mod_10.py		split_csv_by_mod_10.py
split_file.py		split_file.py
validateUniqIds.py		validateUniqIds.py
wav_scp_to_spk2utt.py		wav_scp_to_spk2utt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSU-K-Toolkit

Citation

Script descriptions

About

Releases

Packages

Languages

Snow-White-Group/CSU-K-Toolkit

Folders and files

Latest commit

History

Repository files navigation

CSU-K-Toolkit

Citation

Script descriptions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages