Exploring the Amazon Product Co-Purchasing Network: A Social Network Analysis Approach
INTRODUCTION:
The Amazon product co-purchasing network is a rich and complex dataset that provides valuable insights into the behavior of customer purchasing patterns on Amazon.com. In this project report, we use this dataset to explore the co-purchasing relationships between products sold on Amazon and investigate the structure of the underlying graph.
Specifically, we aim to answer the following research questions:
- What is the overall structure of the Amazon product co-purchasing network?
- What are the most frequently co-purchased products on Amazon?
- Are there any communities or clusters of products that tend to be co-purchased together?
- How can we use the co-purchasing network to make recommendations for new products to Amazon customers?
- To answer these questions, we use a combination of network analysis, graph mining, and machine learning techniques. We first visualize the co-purchasing network using various graph visualization tools and identify the key structural properties of the network, such as degree distribution, clustering coefficient, and centrality measures. We then use community detection algorithms to identify clusters of products that are frequently co-purchased together.
ABOUT DATASET:
Network was collected by crawling the Amazon website. It is based on Customers Who Bought This Item Also Bought feature of the Amazon website. If a product I is frequently co-purchased with a product j, the graph contains a directed edge from i to j.
The data was collected on March 2, 2003. Dataset statistics
Nodes 262111
Edges 1234877
Nodes in the largest WCC, 262111 (1.000)
Edges in the largest WCC (1234877) (1.000)
Nodes in the largest SCC, 241761 (0.922),
Edges in the largest SCC are 1131217 (0.916).
Average clustering coefficient: 0.4198
Number of triangles: 717719
Fraction of closed triangles: 0.09339
Diameter (longest and shortest path) 32 90-percentile
effective diameter: 1
PLATFORMS/ TECHNOLOGIES:
- Gephi
- Python
- Neo4j
- Apache Spark
- Graph Data Science Plugin
- AOC Plugin
WORK:
Despite the success of the product co-purchasing network and other group buying options, Amazon decided to discontinue the network in 2015. There were a number of reasons for this decision, including changes in the competitive landscape, shifting customer preferences, and the need to focus on other initiatives.
Contribution:
Dataset selection from Stanford website:
This data is collected by crawling the Amazon website, it is based on the purchase of items by the customer who also bought the features of Amazon. If a product I is frequently co-purchased with a product j, the graph contains a directed edge from i to j.
Dataset research:
The Amazon Product Co-Purchasing Network dataset provides a valuable resource for researchers and data scientists to study consumer behavior, develop recommendation systems, and improve marketing strategies for e-commerce platforms.
Related work research:
related work research for exploring the Amazon Product Co-Purchasing Network using a social network analysis approach would involve a review of previous studies and research that have used social network analysis to study e-commerce networks, co-purchasing behavior, product recommendations, network analysis techniques, visualization, and machine learning approaches. This review will help to establish a foundation for the research, identify key research questions and issues, and ensure that the research is relevant, original, and contributes to the current knowledge base in the field.
neo4j setup and configuration for small dataset to test out the installed plugins :
Neo4j is a powerful graph database management system that allows you to store and query complex data in a highly efficient manner. Setting up and configuring Neo4j for a small dataset is a relatively simple process. Here are the steps you can follow:
- Download and Install Neo4j: The first step is to download and install Neo4j on your local machine. You can download the latest version of Neo4j from the official website (https://neo4j.com/download/). Follow the instructions to install Neo4j on your machine.
- Start Neo4j: Once Neo4j is installed, you can start the server by running the Neo4j desktop application or by running the neo4j console in your terminal or command prompt.
- Create a New Graph Database: Once the server is started, you can create a new graph database by clicking on the “New” button in the Neo4j desktop application or by running the following command in the Neo4j console:
- Install Plugins: If you want to test any specific plugins, you can install them using the Neo4j desktop application or by adding the plugin’s JAR file to the “plugins” directory in the Neo4j installation folder.
- Configure Neo4j: You can configure Neo4j by modifying the “neo4j.conf” file in the Neo4j installation folder. This file contains various configuration options, including memory allocation, security settings, and database settings.
- Test the Setup: Once the data is loaded and the plugins are installed, you can test the setup by running queries on the test data and the installed plugins.
CONCLUSION:
In conclusion, the Amazon product co-purchasing network was a pioneering feature that allowed customers to save money by teaming up with others to purchase items at a discounted price. While the network may no longer be available, it paved the way for new approaches to group buying and helped to change the way people shop online.