Home » Clustering with k-Means

Clustering with k-Means

Clustering is an unsupervised machine learning method used for grouping similar data in datasets so it can be easily understood and manipulated. One such algorithm, k-means, takes data and learns how it can be grouped. Some real-world examples of its use include fake news identification, fantasy league stat analysis, insurance fraud detection, or customer/market segmentation.

To perform a k-means analysis using the k-means algorithm, complete the following:

Access the “UCI Machine Learning Repository,” https://archive.ics.uci.edu/datasets . Note: There are about 120 data sets that are suitable for use in a clustering task. For this part of the exercise, you must choose two of these datasets, provided they include at least 10 attributes and 10,000 instances.

  • Ensure that the datasets are suitable for clustering using these methods.
  • You may search for data in other repositories, such as Data.gov or Kaggle.
  • For your selected datasets, build a K-means clustering model.

    1. Start by choosing the number of clusters. Discuss how you would find the optimal number of clusters that best fits the dataset.
    2. Randomly pick k centroids “not necessarily from your dataset” (or points that will be the center of your clusters) in d-space. Try to make them near the data but different from one another.
    3. Assign each data point to the closest centroid. This will form your k clusters. Apply the Euclidian distance to form your clusters.
    4. Move the centroids to the average location of the data points assigned to it.
    5. Repeat the preceding two steps until the assignments do not change or change very little.

    Note: A key objective is to minimize the variation within the clusters defined as the sum of squared Euclidean distances between items and the corresponding centroid.

    1. Explain the dataset and the type of information you wish to gain by applying a clustering method.
    2. Explain the k-means algorithm and how you will be using it in your analysis (list the steps, the intuition behind the mathematical representation, and address its assumptions).
    3. Import the necessary libraries, then read the dataset into a data frame and perform initial statistical exploration.
    4. Clean the data and address unusual phenomena (e.g., outliers); use illustrative diagrams and plots and explain them.
    5. Formulate two questions that can be answered by performing a clustering analysis using the k-means.
    6. Use the elbow method to find the optimal number of clusters for your chosen dataset. Justify your chosen (final) value of k.
    7. Perform k-means analysis. Explain the intuition behind each mathematical step.
    8. Interpret the results in the context of the questions you asked.
    9. Discuss how you minimized the variation within the clusters.
    10. Validate your model. Then, explain the results.
    11. Include all mathematical formulas used and graphs representing the final outcomes.

    Prepare a comprehensive technical report as a Jupyter notebook, including all code, code comments, all outputs, plots, and analysis. Make sure the project documentation contains

    a) Problem statement

    b) Algorithm of the solution

    c) Analysis of the findings

    d) References

    Place your order
    (550 words)

    Approximate price: $22

    Calculate the price of your order

    550 words
    We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
    Total price:
    $26
    The price is based on these factors:
    Academic level
    Number of pages
    Urgency
    Basic features
    • Free title page and bibliography
    • Unlimited revisions
    • Plagiarism-free guarantee
    • Money-back guarantee
    • 24/7 support
    On-demand options
    • Writer’s samples
    • Part-by-part delivery
    • Overnight delivery
    • Copies of used sources
    • Expert Proofreading
    Paper format
    • 275 words per page
    • 12 pt Arial/Times New Roman
    • Double line spacing
    • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

    Our guarantees

    Delivering a high-quality product at a reasonable price is not enough anymore.
    That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

    Money-back guarantee

    You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

    Read more

    Zero-plagiarism guarantee

    Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

    Read more

    Free-revision policy

    Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

    Read more

    Privacy policy

    Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

    Read more

    Fair-cooperation guarantee

    By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

    Read more

    Order your essay today and save 30% with the discount code ESSAYHELP