CCA Spark and Hadoop Developer Exam(CCA175)考试模拟题讲解二这是我参与8月更文

这是我参与8月更文挑战的第4天，活动详情查看：8月更文挑战

NO.3 CORRECT TEXT（第三题：正确文本）

Problem Scenario 84 : In Continuation of previous question, please accomplish following activities.（问题场景84 : 继续前一个问题（NO2），请完成以下活动。）

Select all the products which has product code as null
Select all the products, whose name starts with Pen and results should be order by Price descending order.
Select all the products, whose name starts with Pen and results should be order by Price descending order and quantity ascending order.
Select top 2 products by price

Answer:（答）

See the explanation for Step by Step Solution and configuration.（请参阅逐步解决方案和配置的说明）

Explanation:（说明）

Solution :（解决方案）

Step 1 : Select all the products which has product code as null（选出code为空的所有产品）

val sqlContext=new org.apache.spark.sql.SQLContext(sc)

val results = sqlContext.sql("SELECT * FROM products WHERE code IS NULL")

results. show()

Step 2 : Select all the products , whose name starts with Pen and results should be order by Price descending order. （选出name以Pen开头所有产品，并按Price降序排列）

val results = sqlContext.sql("SELECT * FROM products WHERE name LIKE 'Pen %' ORDER BY price DESC")

results. show()

Step 3 : Select all the products , whose name starts with Pen and results should be order by Price descending order and quantity ascending order. （选出name以pen开头，并按Price降序和quantity升序排列所有产品）

val results = sqlContext.sql("SELECT * FROM products WHERE name LIKE 'Pen %' ORDER BY quantity,price DESC")

results. show()

Step 4 : Select top 2 products by price（按Price选择前2名产品）

val results = sqlContext.sql("SELECT * FROM products ORDER BY price desc LIMIT 2")

results. show()

NO.4 CORRECT TEXT（第四题正确文本）

Problem Scenario 4: You have been given MySQL DB with following details.（问题场景4：MySQL数据库提供了以下详细信息）

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.categories

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.（请完成以下活动）

Import Single table categories (Subset data} to hive managed table , where category_id between 1 and 22（将categories表（子集数据）id介于1和22之间的数据导入hive托管表）


备注：1.我的数据库用户名和密码跟考试环境不同2.考试的时候表是已经建好的，我这里就先造一点数据create table categories (id int,name varchar(12));insert into categories values(1,'yee'),(2,'xun'),(3,'zhang'),(4,'ya'),(5,'guang'),(6,'zyg'),(7,'ya'),(8,'guang'),(9,'zyg'),(10,'ya'),(11,'guang'),(12,'zyg'),(13,'ya'),(14,'guang'),(15,'zyg'),(16,'ya'),(17,'guang'),(18,'zyg'),(19,'ya'),(20,'guang'),(21,'zyg'),(22,'ya'),(23,'guang'),(24,'zyg'),(25,'ya'),(26,'guang'),(27,'zyg'),(28,'ya'),(29,'guang'),(30,'zyg');

备注：1.我的数据库用户名和密码跟考试环境不同2.考试的时候表是已经建好的，我这里就先造一点数据create table categories (id int,name varchar(12));insert into categories values(1,'yee'),(2,'xun'),(3,'zhang'),(4,'ya'),(5,'guang'),(6,'zyg'),(7,'ya'),(8,'guang'),(9,'zyg'),(10,'ya'),(11,'guang'),(12,'zyg'),(13,'ya'),(14,'guang'),(15,'zyg'),(16,'ya'),(17,'guang'),(18,'zyg'),(19,'ya'),(20,'guang'),(21,'zyg'),(22,'ya'),(23,'guang'),(24,'zyg'),(25,'ya'),(26,'guang'),(27,'zyg'),(28,'ya'),(29,'guang'),(30,'zyg');

Answer:（答）

See the explanation for Step by Step Solution and configuration.（请参阅逐步解决方案和配置的说明）

Explanation:（说明）

Solution :（解决方案）

Step 1 : Import Single table (Subset data)（导入categories表（子集数据））

sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba -

password=cloudera -table=categories -where "category_id between 1 and 22" --hive-import --m 1


我测试环境的sqoop语句：sqoop import --connect jdbc:mysql://manager:3306/retail_db -username=root -password=yeexun123 -table=categories -where "id between 1 and 22" --hive-import --m 1部分执行过程如下：

Note: This command will create a managed table and content will be created in the following directory.（此命令将创建一个托管表，并在以下目录中创建内容。）

/user/hive/warehouse/categories

我这里的目录是我配置hdfs参数时写的，跟考试环境的目录不同：

Step 2 : Check whether table is created or not (In Hive)

show tables;

select * from categories;