This matrix shows the word frequency of each word across all three documents following tokenization and stopword removal. Each column corresponds to a document, while each row corresponds to a specific word found across the whole text corpus. The values in the matrix signify the number of times a given term appears in a given document. If term w occurs n times within document d, then [w,d] = n. So, for example, document 1 uses 'red' twice, and so [red, d1] = 2.
From the document-term matrix, LSA produces a document-document matrix and term-term matrix. If the document-term matrix dimensions are defined as d documents times w words, then the document-document matrix is d times d and the term-term matrix w times w. Each value in the document-document matrix indicates the number of words each document has in common. Each value in the term-term matrix indicates the number of documents in which two terms co-occur.3
Data sparsity, which leads to model overfitting, is when a majority of data values in a given dataset are null (that is, empty). This happens regularly when constructing document-term matrices, for which each individual word is a separate row and vector space dimension, as one document will regularly lack a majority of the words that are more frequent in other documents. Indeed, the example document-term matrix here used contains numerous uses for words such as Moses, violets and blue that appear in only one document. Of course, text preprocessing techniques, such as stopword removal, stemming and lemmatization, can help reduce sparsity. LSA offers a more targeted approach however.