I have a dataframe with shape (4237, 19) and then other dataframe with the shape (4237, 6), I need to combine both these dataframes column wise, so technically resultant dataframe should be of the shape (4237, 25) but am getting as (5524, 25). Am not able to understand the issue.
Code which I have used.
social_media_vectorizer = CountVectorizer(lowercase=True)
train_social_media_vector = social_media_vectorizer.fit_transform(x_train["social_media"].values.astype("U"))
test_social_media_vector = social_media_vectorizer.transform(x_test["social_media"].values.astype('U'))
print(x_train.shape)
print(x_test.shape)
train_social_media_df = pd.DataFrame(train_social_media_vector.todense(), columns=social_media_vectorizer.get_feature_names_out())
test_social_media_df = pd.DataFrame(test_social_media_vector.todense(), columns=social_media_vectorizer.get_feature_names_out())
x_train = pd.concat([x_train, train_social_media_df], axis=1)
x_test = pd.concat([x_test, test_social_media_df], axis=1)
print("="*100)
print(x_train.shape)
print(x_test.shape)
print("="*100)
print(social_media_vectorizer.vocabulary_)
Result
(4237, 19)
(1816, 19)
====================================================================================================
(5524, 25)
(3058, 25)
====================================================================================================
{'facebook': 0, 'linkedin': 2, 'twitter': 4, 'instagram': 1, 'youtube': 5, 'producthunt': 3}