I have a dataset which has different 737 articles related to 5 categories such as cricket,volley.... Vocab for the documents is already given which contains different 4613 words

vocab[:5]
>> ['claxton', 'hunt', 'first', 'major', 'medal']

I have labels for all the 737 documents as

labels[:10]
>> [0,0,0,1,1,1,3,3,4,0]

I have been given a Feature vector as

feat
>>>
1 1 7.0
1 58 2.0
1 59 1.0
2 182 1.0
3 25 1.0
3 26 1.0
3 34 1.0

So I think it is in the form of vocab_index - docid - frequency

What do I have to do to use this data for clustering using K-means? OR How can I transform this data so that I can use it in some form to be used by the K-Means?

Related posts

Recent Viewed